Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Globus Bug Fixes #10345

Merged
merged 16 commits into from
Mar 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
48 changes: 28 additions & 20 deletions doc/sphinx-guides/source/developers/globus-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ The first step in preparing for a Globus transfer/reference operation is to requ

.. code-block:: bash

curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/globusUploadParameters?locale=$LOCALE"
curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/globusUploadParameters?persistentId=$PERSISTENT_IDENTIFIER&locale=$LOCALE"

The response will be of the form:

Expand All @@ -37,6 +37,8 @@ The response will be of the form:
"dvLocale": "en",
"datasetPid": "doi:10.5072/FK2/ILLPXE",
"managed": "true",
"fileSizeLimit": 100000000000,
"remainingQuota": 1000000000000,
"endpoint": "d8c42580-6528-4605-9ad8-116a61982644"
},
"signedUrls": [
Expand Down Expand Up @@ -68,11 +70,16 @@ The response will be of the form:
}
}

The response includes the id for the Globus endpoint to use along with several signed URLs.
The response includes the id for the Globus endpoint to use along with several parameters and signed URLs. The parameters include whether the Globus endpoint is "managed" by Dataverse and,
if so, if there is a "fileSizeLimit" (see :ref:`:MaxFileUploadSizeInBytes`) that will be enforced and/or, if there is a quota (see :doc:`/admin/collectionquotas`) on the overall size of data
that can be upload, what the "remainingQuota" is. Both are in bytes.

Note that while Dataverse will not add files that violate the size or quota rules, Globus itself doesn't enforce these during the transfer. API users should thus check the size of the files
they intend to transfer before submitting a transfer request to Globus.

The getDatasetMetadata and getFileListing URLs are just signed versions of the standard Dataset metadata and file listing API calls. The other two are Globus specific.

If called for, a dataset using a store that is configured with a remote Globus endpoint(s), the return response is similar but the response includes a
If called for a dataset using a store that is configured with a remote Globus endpoint(s), the return response is similar but the response includes a
the "managed" parameter will be false, the "endpoint" parameter is replaced with a JSON array of "referenceEndpointsWithPaths" and the
requestGlobusTransferPaths and addGlobusFiles URLs are replaced with ones for requestGlobusReferencePaths and addFiles. All of these calls are
described further below.
Expand All @@ -81,7 +88,7 @@ The call to set up for a transfer out (download) is similar:

.. code-block:: bash

curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/globusDownloadParameters?locale=$LOCALE"
curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/globusDownloadParameters?persistentId=$PERSISTENT_IDENTIFIER&locale=$LOCALE"

Note that this API call supports an additional downloadId query parameter. This is only used when the globus-dataverse app is called from the Dataverse user interface. There is no need to use it when calling the API directly.

Expand All @@ -99,10 +106,11 @@ Once the user identifies which files are to be added, the requestGlobusTransferP

export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV
export LOCALE=en-US

curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:application/json" -X POST "$SERVER_URL/api/datasets/:persistentId/requestGlobusUploadPaths"
export JSON_DATA="... (SEE BELOW)"

curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:application/json" -X POST -d "$JSON_DATA" "$SERVER_URL/api/datasets/:persistentId/requestGlobusUploadPaths?persistentId=$PERSISTENT_IDENTIFIER"

Note that when using the dataverse-globus app or the return from the previous call, the URL for this call will be signed and no API_TOKEN is needed.

Expand Down Expand Up @@ -163,12 +171,12 @@ In the managed case, you must initiate a Globus transfer and take note of its ta

export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV
export JSON_DATA='{"taskIdentifier":"3f530302-6c48-11ee-8428-378be0d9c521", \
"files": [{"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"globusm://18b3972213f-f6b5c2221423", "fileName":"file1.txt", "mimeType":"text/plain", "checksum": {"@type": "MD5", "@value": "1234"}}, \
{"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"globusm://18b39722140-50eb7d3c5ece", "fileName":"file2.txt", "mimeType":"text/plain", "checksum": {"@type": "MD5", "@value": "2345"}}]}'

curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:multipart/form-data" -X POST "$SERVER_URL/api/datasets/:persistentId/addGlobusFiles" -F "jsonData=$JSON_DATA"
curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:multipart/form-data" -X POST "$SERVER_URL/api/datasets/:persistentId/addGlobusFiles?persistentId=$PERSISTENT_IDENTIFIER" -F "jsonData=$JSON_DATA"

Note that the mimetype is multipart/form-data, matching the /addFiles API call. Also note that the API_TOKEN is not needed when using a signed URL.

Expand All @@ -190,18 +198,18 @@ To begin downloading files, the requestGlobusDownload URL is used:

export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV

curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:application/json" -X POST "$SERVER_URL/api/datasets/:persistentId/requestGlobusDownload"
curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:application/json" -X POST -d "$JSON_DATA" "$SERVER_URL/api/datasets/:persistentId/requestGlobusDownload?persistentId=$PERSISTENT_IDENTIFIER"

The JSON body sent should include a list of file ids to download and, for a managed endpoint, the Globus principal that will make the transfer:

.. code-block:: bash

{
"principal":"d15d4244-fc10-47f3-a790-85bdb6db9a75",
"fileIds":[60, 61]
}
export JSON_DATA='{ \
"principal":"d15d4244-fc10-47f3-a790-85bdb6db9a75", \
"fileIds":[60, 61] \
}'

Note that this API call takes an optional downloadId parameter that is used with the dataverse-globus app. When downloadId is included, the list of fileIds is not needed.

Expand All @@ -224,16 +232,16 @@ Dataverse will then monitor the transfer and revoke the read permission when the

export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV

curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:application/json" -X POST "$SERVER_URL/api/datasets/:persistentId/monitorGlobusDownload"
curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:application/json" -X POST -d "$JSON_DATA" "$SERVER_URL/api/datasets/:persistentId/monitorGlobusDownload?persistentId=$PERSISTENT_IDENTIFIER"

The JSON body sent just contains the task identifier for the transfer:

.. code-block:: bash

{
"taskIdentifier":"b5fd01aa-8963-11ee-83ae-d5484943e99a"
}
export JSON_DATA='{ \
"taskIdentifier":"b5fd01aa-8963-11ee-83ae-d5484943e99a" \
}'


2 changes: 2 additions & 0 deletions doc/sphinx-guides/source/developers/s3-direct-upload-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,8 @@ Multiple URLs: when the file must be uploaded in multiple parts. The part size i
"storageIdentifier":"s3://demo-dataverse-bucket:177883b000e-49cedef268ac"
}

The call will return a 400 (BAD REQUEST) response if the file is larger than what is allowed by the :ref:`:MaxFileUploadSizeInBytes`) and/or a quota (see :doc:`/admin/collectionquotas`).

In the example responses above, the URLs, which are very long, have been omitted. These URLs reference the S3 server and the specific object identifier that will be used, starting with, for example, https://demo-dataverse-bucket.s3.amazonaws.com/10.5072/FK2FOQPJS/177883b000e-49cedef268ac?...

The client must then use the URL(s) to PUT the file, or if the file is larger than the specified partSize, parts of the file.
Expand Down
3 changes: 1 addition & 2 deletions src/main/java/edu/harvard/iq/dataverse/DatasetPage.java
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@
import edu.harvard.iq.dataverse.authorization.users.User;
import edu.harvard.iq.dataverse.branding.BrandingUtil;
import edu.harvard.iq.dataverse.dataaccess.StorageIO;
import edu.harvard.iq.dataverse.dataaccess.AbstractRemoteOverlayAccessIO;
import edu.harvard.iq.dataverse.dataaccess.DataAccess;
import edu.harvard.iq.dataverse.dataaccess.GlobusAccessibleStore;
import edu.harvard.iq.dataverse.dataaccess.ImageThumbConverter;
Expand Down Expand Up @@ -3372,7 +3371,7 @@ private boolean filterSelectedFiles(){
if(globusDownloadEnabled) {
String driverId = DataAccess.getStorageDriverFromIdentifier(fmd.getDataFile().getStorageIdentifier());
globusTransferable = GlobusAccessibleStore.isGlobusAccessible(driverId);
downloadable = downloadable && !AbstractRemoteOverlayAccessIO.isNotDataverseAccessible(driverId);
downloadable = downloadable && StorageIO.isDataverseAccessible(driverId);
}
if(downloadable){
getSelectedDownloadableFiles().add(fmd);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@
import edu.harvard.iq.dataverse.authorization.Permission;
import edu.harvard.iq.dataverse.authorization.users.AuthenticatedUser;
import edu.harvard.iq.dataverse.authorization.users.PrivateUrlUser;
import edu.harvard.iq.dataverse.dataaccess.DataAccess;
import edu.harvard.iq.dataverse.dataaccess.StorageIO;
import edu.harvard.iq.dataverse.externaltools.ExternalTool;
import edu.harvard.iq.dataverse.globus.GlobusServiceBean;
import edu.harvard.iq.dataverse.util.BundleUtil;
Expand Down
28 changes: 22 additions & 6 deletions src/main/java/edu/harvard/iq/dataverse/FilePage.java
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@
import java.util.Set;
import java.util.logging.Level;
import java.util.logging.Logger;
import java.util.stream.Collectors;

import jakarta.ejb.EJB;
import jakarta.ejb.EJBException;
import jakarta.faces.application.FacesMessage;
Expand Down Expand Up @@ -244,12 +246,10 @@ public String init() {
if (file.isTabularData()) {
contentType=DataFileServiceBean.MIME_TYPE_TSV_ALT;
}
configureTools = externalToolService.findFileToolsByTypeAndContentType(ExternalTool.Type.CONFIGURE, contentType);
exploreTools = externalToolService.findFileToolsByTypeAndContentType(ExternalTool.Type.EXPLORE, contentType);
queryTools = externalToolService.findFileToolsByTypeAndContentType(ExternalTool.Type.QUERY, contentType);
Collections.sort(exploreTools, CompareExternalToolName);
toolsWithPreviews = sortExternalTools();

loadExternalTools();



if (toolType != null) {
if (toolType.equals("PREVIEW")) {
if (!toolsWithPreviews.isEmpty()) {
Expand Down Expand Up @@ -282,6 +282,22 @@ public String init() {
return null;
}

private void loadExternalTools() {
String contentType= file.getContentType();
configureTools = externalToolService.findFileToolsByTypeAndContentType(ExternalTool.Type.CONFIGURE, contentType);
exploreTools = externalToolService.findFileToolsByTypeAndContentType(ExternalTool.Type.EXPLORE, contentType);
queryTools = externalToolService.findFileToolsByTypeAndContentType(ExternalTool.Type.QUERY, contentType);
Collections.sort(exploreTools, CompareExternalToolName);
toolsWithPreviews = sortExternalTools();
//For inaccessible files, only show the tools that have access to aux files (which are currently always accessible)
if(!StorageIO.isDataverseAccessible(DataAccess.getStorageDriverFromIdentifier(file.getStorageIdentifier()))) {
configureTools = configureTools.stream().filter(tool ->tool.accessesAuxFiles()).collect(Collectors.toList());
exploreTools = exploreTools.stream().filter(tool ->tool.accessesAuxFiles()).collect(Collectors.toList());
queryTools = queryTools.stream().filter(tool ->tool.accessesAuxFiles()).collect(Collectors.toList());
toolsWithPreviews = toolsWithPreviews.stream().filter(tool ->tool.accessesAuxFiles()).collect(Collectors.toList());
}
}

private void displayPublishMessage(){
if (fileMetadata.getDatasetVersion().isDraft() && canUpdateDataset()
&& (canPublishDataset() || !fileMetadata.getDatasetVersion().getDataset().isLockedFor(DatasetLock.Reason.InReview))){
Expand Down
4 changes: 2 additions & 2 deletions src/main/java/edu/harvard/iq/dataverse/SettingsWrapper.java
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@
package edu.harvard.iq.dataverse;

import edu.harvard.iq.dataverse.branding.BrandingUtil;
import edu.harvard.iq.dataverse.dataaccess.AbstractRemoteOverlayAccessIO;
import edu.harvard.iq.dataverse.dataaccess.DataAccess;
import edu.harvard.iq.dataverse.dataaccess.GlobusAccessibleStore;
import edu.harvard.iq.dataverse.dataaccess.StorageIO;
import edu.harvard.iq.dataverse.settings.JvmSettings;
import edu.harvard.iq.dataverse.settings.Setting;
import edu.harvard.iq.dataverse.settings.SettingsServiceBean;
Expand Down Expand Up @@ -344,7 +344,7 @@ public boolean isDownloadable(FileMetadata fmd) {
if(isGlobusFileDownload()) {
String driverId = DataAccess.getStorageDriverFromIdentifier(fmd.getDataFile().getStorageIdentifier());

downloadable = downloadable && !AbstractRemoteOverlayAccessIO.isNotDataverseAccessible(driverId);
downloadable = downloadable && StorageIO.isDataverseAccessible(driverId);
}
return downloadable;
}
Expand Down
63 changes: 27 additions & 36 deletions src/main/java/edu/harvard/iq/dataverse/api/Datasets.java
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
import edu.harvard.iq.dataverse.search.IndexServiceBean;
import edu.harvard.iq.dataverse.settings.JvmSettings;
import edu.harvard.iq.dataverse.settings.SettingsServiceBean;
import edu.harvard.iq.dataverse.storageuse.UploadSessionQuotaLimit;
import edu.harvard.iq.dataverse.util.*;
import edu.harvard.iq.dataverse.util.bagit.OREMap;
import edu.harvard.iq.dataverse.util.json.*;
Expand Down Expand Up @@ -2220,42 +2221,6 @@ public Response deleteCurationStatus(@Context ContainerRequestContext crc, @Path
}
}

@GET
@AuthRequired
@Path("{id}/uploadsid")
@Deprecated
public Response getUploadUrl(@Context ContainerRequestContext crc, @PathParam("id") String idSupplied) {
try {
Dataset dataset = findDatasetOrDie(idSupplied);

boolean canUpdateDataset = false;
canUpdateDataset = permissionSvc.requestOn(createDataverseRequest(getRequestUser(crc)), dataset).canIssue(UpdateDatasetVersionCommand.class);
if (!canUpdateDataset) {
return error(Response.Status.FORBIDDEN, "You are not permitted to upload files to this dataset.");
}
S3AccessIO<?> s3io = FileUtil.getS3AccessForDirectUpload(dataset);
if (s3io == null) {
return error(Response.Status.NOT_FOUND, "Direct upload not supported for files in this dataset: " + dataset.getId());
}
String url = null;
String storageIdentifier = null;
try {
url = s3io.generateTemporaryS3UploadUrl();
storageIdentifier = FileUtil.getStorageIdentifierFromLocation(s3io.getStorageLocation());
} catch (IOException io) {
logger.warning(io.getMessage());
throw new WrappedResponse(io, error(Response.Status.INTERNAL_SERVER_ERROR, "Could not create process direct upload request"));
}

JsonObjectBuilder response = Json.createObjectBuilder()
.add("url", url)
.add("storageIdentifier", storageIdentifier);
return ok(response);
} catch (WrappedResponse wr) {
return wr.getResponse();
}
}

@GET
@AuthRequired
@Path("{id}/uploadurls")
Expand All @@ -2274,6 +2239,22 @@ public Response getMPUploadUrls(@Context ContainerRequestContext crc, @PathParam
return error(Response.Status.NOT_FOUND,
"Direct upload not supported for files in this dataset: " + dataset.getId());
}
Long maxSize = systemConfig.getMaxFileUploadSizeForStore(dataset.getEffectiveStorageDriverId());
if (maxSize != null) {
if(fileSize > maxSize) {
return error(Response.Status.BAD_REQUEST,
"The file you are trying to upload is too large to be uploaded to this dataset. " +
"The maximum allowed file size is " + maxSize + " bytes.");
}
}
UploadSessionQuotaLimit limit = fileService.getUploadSessionQuotaLimit(dataset);
if (limit != null) {
if(fileSize > limit.getRemainingQuotaInBytes()) {
return error(Response.Status.BAD_REQUEST,
"The file you are trying to upload is too large to be uploaded to this dataset. " +
"The remaing file size quota is " + limit.getRemainingQuotaInBytes() + " bytes.");
}
}
JsonObjectBuilder response = null;
String storageIdentifier = null;
try {
Expand Down Expand Up @@ -3485,6 +3466,16 @@ public Response getGlobusUploadParams(@Context ContainerRequestContext crc, @Pat
params.add(key, substitutedParams.get(key));
});
params.add("managed", Boolean.toString(managed));
if (managed) {
Long maxSize = systemConfig.getMaxFileUploadSizeForStore(storeId);
if (maxSize != null) {
params.add("fileSizeLimit", maxSize);
}
UploadSessionQuotaLimit limit = fileService.getUploadSessionQuotaLimit(dataset);
if (limit != null) {
params.add("remainingQuota", limit.getRemainingQuotaInBytes());
}
}
if (transferEndpoint != null) {
params.add("endpoint", transferEndpoint);
} else {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -334,11 +334,5 @@ protected String getStoragePath() throws IOException {
logger.fine("fullStoragePath: " + fullStoragePath);
return fullStoragePath;
}

public static boolean isNotDataverseAccessible(String storeId) {
return Boolean.parseBoolean(StorageIO.getConfigParamForDriver(storeId, FILES_NOT_ACCESSIBLE_BY_DATAVERSE));
}



}