Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IQSS/10318 Uningest/Reingest UI #10319

Merged
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/release-notes/10318-uningest-and-reingest.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
New Uningest/Reingest options are available in the File Page Edit menu, allowing ingest errors to be cleared (by users who can published the associated dataset)
and (by suerpsuers) for a successful ingest to be undone or retried (e.g. after a Dataverse version update or if ingest size limits are changed).
qqmyers marked this conversation as resolved.
Show resolved Hide resolved
The /api/files/<id>/uningest api also now allows users who can publish the dataset to undo an ingest failure.
1 change: 1 addition & 0 deletions doc/sphinx-guides/source/api/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ v6.2

- **/api/datasets/{id}/versions/{versionId}**: The includeFiles parameter has been renamed to excludeFiles. The default behavior remains the same, which is to include files. However, when excludeFiles is set to true, the files will be excluded. A bug that caused the API to only return a deaccessioned dataset if the user had edit privileges has been fixed.
- **/api/datasets/{id}/versions**: The includeFiles parameter has been renamed to excludeFiles. The default behavior remains the same, which is to include files. However, when excludeFiles is set to true, the files will be excluded.
- **/api/files/$ID/uningest**: Can now be used by users with the ability to publish the dataset to undo a failed ingest. (Removing a successful ingest still requires being superuser)

v6.1
----
Expand Down
9 changes: 8 additions & 1 deletion doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2854,10 +2854,15 @@ The fully expanded example above (without environment variables) looks like this

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X PUT -d true "https://demo.dataverse.org/api/files/:persistentId/restrict?persistentId=doi:10.5072/FK2/AAA000"

.. _file-uningest:

Uningest a File
~~~~~~~~~~~~~~~

Reverse the tabular data ingest process performed on a file where ``ID`` is the database id or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file to process. Note that this requires "superuser" credentials.
Reverse the tabular data ingest process performed on a file where ``ID`` is the database id or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file to process.

Note that this requires "superuser" credentials to undo a successful ingest and remove the variable-level metadata and .tab version of the file.
It can also be used by a user who can publish the dataset to clear the error from an unsuccessful ingest.

A curl example using an ``ID``:

Expand Down Expand Up @@ -2891,6 +2896,8 @@ The fully expanded example above (without environment variables) looks like this

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/:persistentId/uningest?persistentId=doi:10.5072/FK2/AAA000"

.. _file-reingest:

Reingest a File
~~~~~~~~~~~~~~~

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ format. (more info below)


Tabular Data and Metadata
==========================
=========================

Data vs. Metadata
-----------------
Expand All @@ -56,3 +56,22 @@ the Dataverse Software was originally based on the `DDI Codebook
<https://www.ddialliance.org/Specification/DDI-Codebook/2.5/>`_ format.

You can see an example of DDI output under the :ref:`data-variable-metadata-access` section of the :doc:`/api/dataaccess` section of the API Guide.

Uningest and Reingest
=====================

Ingest will only work for files whose content can be interpreted as a table.
Multi-sheet spreadsheets and CSV files with a different number of entries per row are two examples where ingest will fail.
This is non-fatal. The Dataverse software will not produce a .tab version of the file and will show a warning to users
who can see the draft version of the dataset containing the file that will indicate why ingest failed. When the file is published as
part of the dataset, there will be no indication that ingest was attempted and failed.

If the warning message is a concern, the Dataverse software includes both an API call (see :ref:`file-uningest` in the :doc:`/api/native-api` guide)
and an Edit/Uningest menu option displayed on the file page, that allow a file to be uningested by anyone who can publish the dataset.

Uningest will remove the warning. Uningest can also be done for a file that was successfully ingested. This is only available to superusers.
This will remove the variable-level metadata and the .tab version of the file that was generated.

If a file is a tabular format but was never ingested, .e.g. due to the ingest file size limit being lower in the past, or if ingest had failed,
e.g. in a prior Dataverse version, an reingest API (see :ref:`file-reingest` in the :doc:`/api/native-api` guide) and a file page Edit/Reingest option
in the user interface allow ingest to be tried again. As with Uningest, this fucntionality is only available to superusers.
131 changes: 128 additions & 3 deletions src/main/java/edu/harvard/iq/dataverse/FilePage.java
Original file line number Diff line number Diff line change
Expand Up @@ -21,20 +21,25 @@
import edu.harvard.iq.dataverse.engine.command.impl.CreateNewDatasetCommand;
import edu.harvard.iq.dataverse.engine.command.impl.PersistProvFreeFormCommand;
import edu.harvard.iq.dataverse.engine.command.impl.RestrictFileCommand;
import edu.harvard.iq.dataverse.engine.command.impl.UningestFileCommand;
import edu.harvard.iq.dataverse.engine.command.impl.UpdateDatasetVersionCommand;
import edu.harvard.iq.dataverse.export.ExportService;
import io.gdcc.spi.export.ExportException;
import io.gdcc.spi.export.Exporter;
import edu.harvard.iq.dataverse.externaltools.ExternalTool;
import edu.harvard.iq.dataverse.externaltools.ExternalToolHandler;
import edu.harvard.iq.dataverse.externaltools.ExternalToolServiceBean;
import edu.harvard.iq.dataverse.ingest.IngestRequest;
import edu.harvard.iq.dataverse.ingest.IngestServiceBean;
import edu.harvard.iq.dataverse.makedatacount.MakeDataCountLoggingServiceBean;
import edu.harvard.iq.dataverse.makedatacount.MakeDataCountLoggingServiceBean.MakeDataCountEntry;
import edu.harvard.iq.dataverse.privateurl.PrivateUrlServiceBean;
import edu.harvard.iq.dataverse.settings.SettingsServiceBean;
import edu.harvard.iq.dataverse.util.BundleUtil;
import edu.harvard.iq.dataverse.util.FileUtil;
import edu.harvard.iq.dataverse.util.JsfHelper;
import edu.harvard.iq.dataverse.util.StringUtil;

import static edu.harvard.iq.dataverse.util.JsfHelper.JH;
import edu.harvard.iq.dataverse.util.SystemConfig;

Expand All @@ -45,6 +50,7 @@
import java.util.Comparator;
import java.util.List;
import java.util.Set;
import java.util.logging.Level;
import java.util.logging.Logger;
import jakarta.ejb.EJB;
import jakarta.ejb.EJBException;
Expand Down Expand Up @@ -112,10 +118,10 @@ public class FilePage implements java.io.Serializable {
GuestbookResponseServiceBean guestbookResponseService;
@EJB
AuthenticationServiceBean authService;

@EJB
DatasetServiceBean datasetService;

@EJB
IngestServiceBean ingestService;
@EJB
SystemConfig systemConfig;

Expand Down Expand Up @@ -209,7 +215,7 @@ public String init() {
// If this DatasetVersion is unpublished and permission is doesn't have permissions:
// > Go to the Login page
//
// Check permisisons
// Check permissions
Boolean authorized = (fileMetadata.getDatasetVersion().isReleased())
|| (!fileMetadata.getDatasetVersion().isReleased() && this.canViewUnpublishedDataset());

Expand Down Expand Up @@ -475,6 +481,119 @@ public String restrictFile(boolean restricted) throws CommandException{
return returnToDraftVersion();
}

public String ingestFile() throws CommandException{

User u = session.getUser();
if(!u.isAuthenticated() || !u.isSuperuser()) {
//Shouldn't happen (choice not displayed for users who don't have the right permission), but check anyway
logger.warning("User: " + u.getIdentifier() + " tried to ingest a file");
JH.addMessage(FacesMessage.SEVERITY_WARN, BundleUtil.getStringFromBundle("file.ingest.cantIngestFileWarning"));
return null;
}

DataFile dataFile = fileMetadata.getDataFile();
editDataset = dataFile.getOwner();

if (dataFile.isTabularData()) {
JH.addMessage(FacesMessage.SEVERITY_WARN, BundleUtil.getStringFromBundle("file.ingest.alreadyIngestedWarning"));
return null;
}

boolean ingestLock = dataset.isLockedFor(DatasetLock.Reason.Ingest);

if (ingestLock) {
JH.addMessage(FacesMessage.SEVERITY_WARN, BundleUtil.getStringFromBundle("file.ingest.ingestInProgressWarning"));
return null;
}

if (!FileUtil.canIngestAsTabular(dataFile)) {
JH.addMessage(FacesMessage.SEVERITY_WARN, BundleUtil.getStringFromBundle("file.ingest.cantIngestFileWarning"));
return null;

}

dataFile.SetIngestScheduled();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, it's unconventional that this and other methods start with capital "S" but whatever, something to address another time:

    public void SetIngestScheduled() {
        ingestStatus = INGEST_STATUS_SCHEDULED;
    }
    
    public void SetIngestInProgress() {
        ingestStatus = INGEST_STATUS_INPROGRESS;
    }
    
    public void SetIngestProblem() {
        ingestStatus = INGEST_STATUS_ERROR;
    }


if (dataFile.getIngestRequest() == null) {
dataFile.setIngestRequest(new IngestRequest(dataFile));
}

dataFile.getIngestRequest().setForceTypeCheck(true);

// update the datafile, to save the newIngest request in the database:
save();

// queue the data ingest job for asynchronous execution:
String status = ingestService.startIngestJobs(editDataset.getId(), new ArrayList<>(Arrays.asList(dataFile)), (AuthenticatedUser) session.getUser());

if (!StringUtil.isEmpty(status)) {
// This most likely indicates some sort of a problem (for example,
// the ingest job was not put on the JMS queue because of the size
// of the file). But we are still returning the OK status - because
// from the point of view of the API, it's a success - we have
// successfully gone through the process of trying to schedule the
// ingest job...

logger.warning("Ingest Status for file: " + dataFile.getId() + " : " + status);
}
logger.info("File: " + dataFile.getId() + " ingest queued");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.info("File: " + dataFile.getId() + " ingest queued");
logger.fine("File: " + dataFile.getId() + " ingest queued");


init();
JsfHelper.addInfoMessage(BundleUtil.getStringFromBundle("file.ingest.ingestQueued"));
return returnToDraftVersion();
}

public String uningestFile() throws CommandException {

if (!file.isTabularData()) {
//Ingest never succeeded, either there was a failure or this is not a tabular data file
if (file.isIngestProblem()) {
//We allow anyone who can publish to uningest in order to clear a problem
User u = session.getUser();
if (!u.isAuthenticated() || !(permissionService.permissionsFor(u, file).contains(Permission.PublishDataset))) {
logger.warning("User: " + u.getIdentifier() + " tried to uningest a file");
// Shouldn't happen (choice not displayed for users who don't have the right
// permission), but check anyway
JH.addMessage(FacesMessage.SEVERITY_WARN,
BundleUtil.getStringFromBundle("file.ingest.cantUningestFileWarning"));
return null;
}
file.setIngestDone();
file.setIngestReport(null);
} else {
//Shouldn't happen - got called when there is no tabular data or an ingest problem
JH.addMessage(FacesMessage.SEVERITY_WARN,
BundleUtil.getStringFromBundle("file.ingest.cantUningestFileWarning"));
return null;
}
} else {
//Superuser required to uningest after a success
//Uningest command does it's own check for isSuperuser
commandEngine.submit(new UningestFileCommand(dvRequestService.getDataverseRequest(), file));
Long dataFileId = file.getId();
file = datafileService.find(dataFileId);
}
editDataset = file.getOwner();
if (editDataset.isReleased()) {
try {
ExportService instance = ExportService.getInstance();
instance.exportAllFormats(editDataset);

} catch (ExportException ex) {
// Something went wrong!
// Just like with indexing, a failure to export is not a fatal
// condition. We'll just log the error as a warning and keep
// going:
logger.log(Level.WARNING, "Uningest: Exception while exporting:{0}", ex.getMessage());
}
}
save();
// Refresh filemetadata with file title, etc.
init();
JH.addMessage(FacesMessage.SEVERITY_INFO, BundleUtil.getStringFromBundle("file.uningest.complete"));
return returnToDraftVersion();
}

private List<FileMetadata> filesToBeDeleted = new ArrayList<>();

public String deleteFile() {
Expand Down Expand Up @@ -948,6 +1067,12 @@ public boolean isPubliclyDownloadable() {
return FileUtil.isPubliclyDownloadable(fileMetadata);
}

public boolean isIngestable() {
DataFile f = fileMetadata.getDataFile();
//Datafile is an ingestable type and hasn't been ingested yet or had an ingest fail
return (FileUtil.canIngestAsTabular(f)&&!(f.isTabularData() || f.isIngestProblem()));
}

private Boolean lockedFromEditsVar;
private Boolean lockedFromDownloadVar;

Expand Down
50 changes: 35 additions & 15 deletions src/main/java/edu/harvard/iq/dataverse/api/Files.java
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@
import edu.harvard.iq.dataverse.util.SystemConfig;
import edu.harvard.iq.dataverse.util.URLTokenUtil;

import static edu.harvard.iq.dataverse.util.JsfHelper.JH;
import static edu.harvard.iq.dataverse.util.json.JsonPrinter.json;
import edu.harvard.iq.dataverse.util.json.JsonUtil;
import edu.harvard.iq.dataverse.util.json.NullSafeJsonBuilder;
Expand All @@ -65,6 +66,7 @@
import java.util.logging.Logger;
import jakarta.ejb.EJB;
import jakarta.ejb.EJBException;
import jakarta.faces.application.FacesMessage;
import jakarta.inject.Inject;
import jakarta.json.Json;
import jakarta.json.JsonArray;
Expand Down Expand Up @@ -637,23 +639,41 @@ public Response uningestDatafile(@Context ContainerRequestContext crc, @PathPara
if (dataFile == null) {
return error(Response.Status.NOT_FOUND, "File not found for given id.");
}

if (!dataFile.isTabularData()) {
return error(Response.Status.BAD_REQUEST, "Cannot uningest non-tabular file.");
}

try {
DataverseRequest req = createDataverseRequest(getRequestUser(crc));
execCommand(new UningestFileCommand(req, dataFile));
Long dataFileId = dataFile.getId();
dataFile = fileService.find(dataFileId);
Dataset theDataset = dataFile.getOwner();
exportDatasetMetadata(settingsService, theDataset);
return ok("Datafile " + dataFileId + " uningested.");
} catch (WrappedResponse wr) {
return wr.getResponse();
// Ingest never succeeded, either there was a failure or this is not a tabular
// data file
// We allow anyone who can publish to uningest in order to clear a problem
if (dataFile.isIngestProblem()) {
try {
AuthenticatedUser au = getRequestAuthenticatedUserOrDie(crc);
if (!(permissionSvc.permissionsFor(au, dataFile).contains(Permission.PublishDataset))) {
return forbidden(
"Uningesting to remove an ingest problem can only be done by those who can publish the dataset");
}
} catch (WrappedResponse wr) {
return wr.getResponse();
}
dataFile.setIngestDone();
dataFile.setIngestReport(null);
fileService.save(dataFile);
return ok("Datafile " + dataFile.getId() + " uningested.");
} else {
return error(Response.Status.BAD_REQUEST,
BundleUtil.getStringFromBundle("Cannot uningest non-tabular file."));
}
} else {
try {
DataverseRequest req = createDataverseRequest(getRequestUser(crc));
execCommand(new UningestFileCommand(req, dataFile));
Long dataFileId = dataFile.getId();
dataFile = fileService.find(dataFileId);
Dataset theDataset = dataFile.getOwner();
exportDatasetMetadata(settingsService, theDataset);
return ok("Datafile " + dataFileId + " uningested.");
} catch (WrappedResponse wr) {
return wr.getResponse();
}
}

}

// reingest attempts to queue an *existing* DataFile
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,10 @@ public UningestFileCommand(DataverseRequest aRequest, DataFile uningest) {
@Override
protected void executeImpl(CommandContext ctxt) throws CommandException {

// first check if user is a superuser
if ( (!(getUser() instanceof AuthenticatedUser) || !getUser().isSuperuser() ) ) {
throw new PermissionException("Uningest File can only be called by Superusers.",
this, Collections.singleton(Permission.EditDataset), uningest);
// first check if user is a superuser
if ((!(getUser() instanceof AuthenticatedUser) || !getUser().isSuperuser())) {
throw new PermissionException("Uningest File can only be called by Superusers.", this,
Collections.singleton(Permission.EditDataset), uningest);
}

// is this actually a tabular data file?
Expand Down
9 changes: 9 additions & 0 deletions src/main/java/propertyFiles/Bundle.properties
Original file line number Diff line number Diff line change
Expand Up @@ -2203,6 +2203,15 @@ ingest.csv.lineMismatch=Mismatch between line counts in first and final passes!,
ingest.csv.recordMismatch=Reading mismatch, line {0} of the Data file: {1} delimited values expected, {2} found.
ingest.csv.nullStream=Stream can't be null.

file.ingest=Ingest
file.uningest=Uningest
file.ingest.alreadyIngestedWarning=This file has already been ingested
file.ingest.ingestInProgressWarning=Ingestion of this file is already in progress
file.ingest.cantIngestFileWarning=Ingest not supported for this file type
file.ingest.ingestQueued=Ingestion has been requested
file.ingest.cantUningestFileWarning=This file cannot be uningested
file.uningest.complete=Uningestion of this file has been completed

# editdatafile.xhtml

# editFilesFragment.xhtml
Expand Down
16 changes: 16 additions & 0 deletions src/main/webapp/file-edit-button-fragment.xhtml
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,22 @@
</h:outputLink>
</li>
</ui:fragment>

<!-- Single file uningest/reingest -->
<ui:fragment rendered="#{isFilePg and (dataverseSession.user.superuser and FilePage.fileMetadata.dataFile.isTabularData()) or (FilePage.fileMetadata.dataFile.isIngestProblem() and FilePage.canPublishDataset())}">
<li>
<p:commandLink update="@form,:messagePanel" action="#{FilePage.uningestFile()}">
<h:outputText value="#{bundle['file.uningest']}"/>
</p:commandLink>
</li>
</ui:fragment>
<ui:fragment rendered="#{isFilePg and dataverseSession.user.superuser and FilePage.isIngestable()}">
<li>
<p:commandLink update="@form,:messagePanel" actionListener="#{FilePage.ingestFile()}">
<h:outputText value="#{bundle['file.ingest']}"/>
</p:commandLink>
</li>
</ui:fragment>
pdurbin marked this conversation as resolved.
Show resolved Hide resolved

<!-- TO-DO #3488 - ADD FILE TAGS FOR FILE PG AND SINGLE FILE-->
<ui:fragment rendered="#{fileMetadata==null}">
Expand Down