Skip to content

Commit

Permalink
Merge pull request #10319 from QualitativeDataRepository/IQSS/10318-u…
Browse files Browse the repository at this point in the history
…ningest_reingest_UI

IQSS/10318 Uningest/Reingest UI
  • Loading branch information
sekmiller committed Feb 29, 2024
2 parents c90595c + 6480896 commit 2e3b333
Show file tree
Hide file tree
Showing 9 changed files with 239 additions and 39 deletions.
3 changes: 3 additions & 0 deletions doc/release-notes/10318-uningest-and-reingest.md
@@ -0,0 +1,3 @@
New Uningest/Reingest options are available in the File Page Edit menu, allowing ingest errors to be cleared (by users who can published the associated dataset)
and (by superusers) for a successful ingest to be undone or retried (e.g. after a Dataverse version update or if ingest size limits are changed).
The /api/files/<id>/uningest api also now allows users who can publish the dataset to undo an ingest failure.
1 change: 1 addition & 0 deletions doc/sphinx-guides/source/api/changelog.rst
Expand Up @@ -14,6 +14,7 @@ v6.2

- **/api/datasets/{id}/versions/{versionId}**: The includeFiles parameter has been renamed to excludeFiles. The default behavior remains the same, which is to include files. However, when excludeFiles is set to true, the files will be excluded. A bug that caused the API to only return a deaccessioned dataset if the user had edit privileges has been fixed.
- **/api/datasets/{id}/versions**: The includeFiles parameter has been renamed to excludeFiles. The default behavior remains the same, which is to include files. However, when excludeFiles is set to true, the files will be excluded.
- **/api/files/$ID/uningest**: Can now be used by users with the ability to publish the dataset to undo a failed ingest. (Removing a successful ingest still requires being superuser)

v6.1
----
Expand Down
9 changes: 8 additions & 1 deletion doc/sphinx-guides/source/api/native-api.rst
Expand Up @@ -3017,10 +3017,15 @@ The fully expanded example above (without environment variables) looks like this
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X PUT -d true "https://demo.dataverse.org/api/files/:persistentId/restrict?persistentId=doi:10.5072/FK2/AAA000"
.. _file-uningest:
Uningest a File
~~~~~~~~~~~~~~~
Reverse the tabular data ingest process performed on a file where ``ID`` is the database id or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file to process. Note that this requires "superuser" credentials.
Reverse the tabular data ingest process performed on a file where ``ID`` is the database id or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file to process.
Note that this requires "superuser" credentials to undo a successful ingest and remove the variable-level metadata and .tab version of the file.
It can also be used by a user who can publish the dataset to clear the error from an unsuccessful ingest.
A curl example using an ``ID``:
Expand Down Expand Up @@ -3054,6 +3059,8 @@ The fully expanded example above (without environment variables) looks like this
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/:persistentId/uningest?persistentId=doi:10.5072/FK2/AAA000"
.. _file-reingest:
Reingest a File
~~~~~~~~~~~~~~~
Expand Down
Expand Up @@ -32,7 +32,7 @@ format. (more info below)


Tabular Data and Metadata
==========================
=========================

Data vs. Metadata
-----------------
Expand All @@ -56,3 +56,22 @@ the Dataverse Software was originally based on the `DDI Codebook
<https://www.ddialliance.org/Specification/DDI-Codebook/2.5/>`_ format.

You can see an example of DDI output under the :ref:`data-variable-metadata-access` section of the :doc:`/api/dataaccess` section of the API Guide.

Uningest and Reingest
=====================

Ingest will only work for files whose content can be interpreted as a table.
Multi-sheet spreadsheets and CSV files with a different number of entries per row are two examples where ingest will fail.
This is non-fatal. The Dataverse software will not produce a .tab version of the file and will show a warning to users
who can see the draft version of the dataset containing the file that will indicate why ingest failed. When the file is published as
part of the dataset, there will be no indication that ingest was attempted and failed.

If the warning message is a concern, the Dataverse software includes both an API call (see :ref:`file-uningest` in the :doc:`/api/native-api` guide)
and an Edit/Uningest menu option displayed on the file page, that allow a file to be uningested by anyone who can publish the dataset.

Uningest will remove the warning. Uningest can also be done for a file that was successfully ingested. This is only available to superusers.
This will remove the variable-level metadata and the .tab version of the file that was generated.

If a file is a tabular format but was never ingested, .e.g. due to the ingest file size limit being lower in the past, or if ingest had failed,
e.g. in a prior Dataverse version, an reingest API (see :ref:`file-reingest` in the :doc:`/api/native-api` guide) and a file page Edit/Reingest option
in the user interface allow ingest to be tried again. As with Uningest, this fucntionality is only available to superusers.
132 changes: 129 additions & 3 deletions src/main/java/edu/harvard/iq/dataverse/FilePage.java
Expand Up @@ -21,20 +21,25 @@
import edu.harvard.iq.dataverse.engine.command.impl.CreateNewDatasetCommand;
import edu.harvard.iq.dataverse.engine.command.impl.PersistProvFreeFormCommand;
import edu.harvard.iq.dataverse.engine.command.impl.RestrictFileCommand;
import edu.harvard.iq.dataverse.engine.command.impl.UningestFileCommand;
import edu.harvard.iq.dataverse.engine.command.impl.UpdateDatasetVersionCommand;
import edu.harvard.iq.dataverse.export.ExportService;
import io.gdcc.spi.export.ExportException;
import io.gdcc.spi.export.Exporter;
import edu.harvard.iq.dataverse.externaltools.ExternalTool;
import edu.harvard.iq.dataverse.externaltools.ExternalToolHandler;
import edu.harvard.iq.dataverse.externaltools.ExternalToolServiceBean;
import edu.harvard.iq.dataverse.ingest.IngestRequest;
import edu.harvard.iq.dataverse.ingest.IngestServiceBean;
import edu.harvard.iq.dataverse.makedatacount.MakeDataCountLoggingServiceBean;
import edu.harvard.iq.dataverse.makedatacount.MakeDataCountLoggingServiceBean.MakeDataCountEntry;
import edu.harvard.iq.dataverse.privateurl.PrivateUrlServiceBean;
import edu.harvard.iq.dataverse.settings.SettingsServiceBean;
import edu.harvard.iq.dataverse.util.BundleUtil;
import edu.harvard.iq.dataverse.util.FileUtil;
import edu.harvard.iq.dataverse.util.JsfHelper;
import edu.harvard.iq.dataverse.util.StringUtil;

import static edu.harvard.iq.dataverse.util.JsfHelper.JH;
import edu.harvard.iq.dataverse.util.SystemConfig;

Expand All @@ -45,6 +50,7 @@
import java.util.Comparator;
import java.util.List;
import java.util.Set;
import java.util.logging.Level;
import java.util.logging.Logger;
import jakarta.ejb.EJB;
import jakarta.ejb.EJBException;
Expand Down Expand Up @@ -112,10 +118,10 @@ public class FilePage implements java.io.Serializable {
GuestbookResponseServiceBean guestbookResponseService;
@EJB
AuthenticationServiceBean authService;

@EJB
DatasetServiceBean datasetService;

@EJB
IngestServiceBean ingestService;
@EJB
SystemConfig systemConfig;

Expand Down Expand Up @@ -209,7 +215,7 @@ public String init() {
// If this DatasetVersion is unpublished and permission is doesn't have permissions:
// > Go to the Login page
//
// Check permisisons
// Check permissions
Boolean authorized = (fileMetadata.getDatasetVersion().isReleased())
|| (!fileMetadata.getDatasetVersion().isReleased() && this.canViewUnpublishedDataset());

Expand Down Expand Up @@ -475,6 +481,120 @@ public String restrictFile(boolean restricted) throws CommandException{
return returnToDraftVersion();
}

public String ingestFile() throws CommandException{

User u = session.getUser();
if(!u.isAuthenticated() || !u.isSuperuser()) {
//Shouldn't happen (choice not displayed for users who don't have the right permission), but check anyway
logger.warning("User: " + u.getIdentifier() + " tried to ingest a file");
JH.addMessage(FacesMessage.SEVERITY_WARN, BundleUtil.getStringFromBundle("file.ingest.cantIngestFileWarning"));
return null;
}

DataFile dataFile = fileMetadata.getDataFile();
editDataset = dataFile.getOwner();

if (dataFile.isTabularData()) {
JH.addMessage(FacesMessage.SEVERITY_WARN, BundleUtil.getStringFromBundle("file.ingest.alreadyIngestedWarning"));
return null;
}

boolean ingestLock = dataset.isLockedFor(DatasetLock.Reason.Ingest);

if (ingestLock) {
JH.addMessage(FacesMessage.SEVERITY_WARN, BundleUtil.getStringFromBundle("file.ingest.ingestInProgressWarning"));
return null;
}

if (!FileUtil.canIngestAsTabular(dataFile)) {
JH.addMessage(FacesMessage.SEVERITY_WARN, BundleUtil.getStringFromBundle("file.ingest.cantIngestFileWarning"));
return null;

}

dataFile.SetIngestScheduled();

if (dataFile.getIngestRequest() == null) {
dataFile.setIngestRequest(new IngestRequest(dataFile));
}

dataFile.getIngestRequest().setForceTypeCheck(true);

// update the datafile, to save the newIngest request in the database:
datafileService.save(file);

// queue the data ingest job for asynchronous execution:
String status = ingestService.startIngestJobs(editDataset.getId(), new ArrayList<>(Arrays.asList(dataFile)), (AuthenticatedUser) session.getUser());

if (!StringUtil.isEmpty(status)) {
// This most likely indicates some sort of a problem (for example,
// the ingest job was not put on the JMS queue because of the size
// of the file). But we are still returning the OK status - because
// from the point of view of the API, it's a success - we have
// successfully gone through the process of trying to schedule the
// ingest job...

logger.warning("Ingest Status for file: " + dataFile.getId() + " : " + status);
}
logger.fine("File: " + dataFile.getId() + " ingest queued");

init();
JsfHelper.addInfoMessage(BundleUtil.getStringFromBundle("file.ingest.ingestQueued"));
return returnToDraftVersion();
}

public String uningestFile() throws CommandException {

if (!file.isTabularData()) {
//Ingest never succeeded, either there was a failure or this is not a tabular data file
if (file.isIngestProblem()) {
//We allow anyone who can publish to uningest in order to clear a problem
User u = session.getUser();
if (!u.isAuthenticated() || !(permissionService.permissionsFor(u, file).contains(Permission.PublishDataset))) {
logger.warning("User: " + u.getIdentifier() + " tried to uningest a file");
// Shouldn't happen (choice not displayed for users who don't have the right
// permission), but check anyway
JH.addMessage(FacesMessage.SEVERITY_WARN,
BundleUtil.getStringFromBundle("file.ingest.cantUningestFileWarning"));
return null;
}
file.setIngestDone();
file.setIngestReport(null);
} else {
//Shouldn't happen - got called when there is no tabular data or an ingest problem
JH.addMessage(FacesMessage.SEVERITY_WARN,
BundleUtil.getStringFromBundle("file.ingest.cantUningestFileWarning"));
return null;
}
} else {
//Superuser required to uningest after a success
//Uningest command does it's own check for isSuperuser
commandEngine.submit(new UningestFileCommand(dvRequestService.getDataverseRequest(), file));
Long dataFileId = file.getId();
file = datafileService.find(dataFileId);
}
editDataset = file.getOwner();
if (editDataset.isReleased()) {
try {
ExportService instance = ExportService.getInstance();
instance.exportAllFormats(editDataset);

} catch (ExportException ex) {
// Something went wrong!
// Just like with indexing, a failure to export is not a fatal
// condition. We'll just log the error as a warning and keep
// going:
logger.log(Level.WARNING, "Uningest: Exception while exporting:{0}", ex.getMessage());
}
}
datafileService.save(file);

// Refresh filemetadata with file title, etc.
init();
JH.addMessage(FacesMessage.SEVERITY_INFO, BundleUtil.getStringFromBundle("file.uningest.complete"));
return returnToDraftVersion();
}

private List<FileMetadata> filesToBeDeleted = new ArrayList<>();

public String deleteFile() {
Expand Down Expand Up @@ -948,6 +1068,12 @@ public boolean isPubliclyDownloadable() {
return FileUtil.isPubliclyDownloadable(fileMetadata);
}

public boolean isIngestable() {
DataFile f = fileMetadata.getDataFile();
//Datafile is an ingestable type and hasn't been ingested yet or had an ingest fail
return (FileUtil.canIngestAsTabular(f)&&!(f.isTabularData() || f.isIngestProblem()));
}

private Boolean lockedFromEditsVar;
private Boolean lockedFromDownloadVar;

Expand Down

0 comments on commit 2e3b333

Please sign in to comment.