Skip to content

Commit

Permalink
Merge pull request #10345 from GlobalDataverseCommunityConsortium/glo…
Browse files Browse the repository at this point in the history
…busstore

Globus Bug Fixes
  • Loading branch information
landreev committed Mar 13, 2024
2 parents ec1b174 + 3be8cac commit 5caf436
Show file tree
Hide file tree
Showing 13 changed files with 116 additions and 93 deletions.
48 changes: 28 additions & 20 deletions doc/sphinx-guides/source/developers/globus-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ The first step in preparing for a Globus transfer/reference operation is to requ

.. code-block:: bash
curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/globusUploadParameters?locale=$LOCALE"
curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/globusUploadParameters?persistentId=$PERSISTENT_IDENTIFIER&locale=$LOCALE"
The response will be of the form:

Expand All @@ -37,6 +37,8 @@ The response will be of the form:
"dvLocale": "en",
"datasetPid": "doi:10.5072/FK2/ILLPXE",
"managed": "true",
"fileSizeLimit": 100000000000,
"remainingQuota": 1000000000000,
"endpoint": "d8c42580-6528-4605-9ad8-116a61982644"
},
"signedUrls": [
Expand Down Expand Up @@ -68,11 +70,16 @@ The response will be of the form:
}
}
The response includes the id for the Globus endpoint to use along with several signed URLs.
The response includes the id for the Globus endpoint to use along with several parameters and signed URLs. The parameters include whether the Globus endpoint is "managed" by Dataverse and,
if so, if there is a "fileSizeLimit" (see :ref:`:MaxFileUploadSizeInBytes`) that will be enforced and/or, if there is a quota (see :doc:`/admin/collectionquotas`) on the overall size of data
that can be upload, what the "remainingQuota" is. Both are in bytes.

Note that while Dataverse will not add files that violate the size or quota rules, Globus itself doesn't enforce these during the transfer. API users should thus check the size of the files
they intend to transfer before submitting a transfer request to Globus.

The getDatasetMetadata and getFileListing URLs are just signed versions of the standard Dataset metadata and file listing API calls. The other two are Globus specific.

If called for, a dataset using a store that is configured with a remote Globus endpoint(s), the return response is similar but the response includes a
If called for a dataset using a store that is configured with a remote Globus endpoint(s), the return response is similar but the response includes a
the "managed" parameter will be false, the "endpoint" parameter is replaced with a JSON array of "referenceEndpointsWithPaths" and the
requestGlobusTransferPaths and addGlobusFiles URLs are replaced with ones for requestGlobusReferencePaths and addFiles. All of these calls are
described further below.
Expand All @@ -81,7 +88,7 @@ The call to set up for a transfer out (download) is similar:

.. code-block:: bash
curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/globusDownloadParameters?locale=$LOCALE"
curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/globusDownloadParameters?persistentId=$PERSISTENT_IDENTIFIER&locale=$LOCALE"
Note that this API call supports an additional downloadId query parameter. This is only used when the globus-dataverse app is called from the Dataverse user interface. There is no need to use it when calling the API directly.

Expand All @@ -99,10 +106,11 @@ Once the user identifies which files are to be added, the requestGlobusTransferP
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV
export LOCALE=en-US
curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:application/json" -X POST "$SERVER_URL/api/datasets/:persistentId/requestGlobusUploadPaths"
export JSON_DATA="... (SEE BELOW)"
curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:application/json" -X POST -d "$JSON_DATA" "$SERVER_URL/api/datasets/:persistentId/requestGlobusUploadPaths?persistentId=$PERSISTENT_IDENTIFIER"
Note that when using the dataverse-globus app or the return from the previous call, the URL for this call will be signed and no API_TOKEN is needed.

Expand Down Expand Up @@ -163,12 +171,12 @@ In the managed case, you must initiate a Globus transfer and take note of its ta
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV
export JSON_DATA='{"taskIdentifier":"3f530302-6c48-11ee-8428-378be0d9c521", \
"files": [{"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"globusm://18b3972213f-f6b5c2221423", "fileName":"file1.txt", "mimeType":"text/plain", "checksum": {"@type": "MD5", "@value": "1234"}}, \
{"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"globusm://18b39722140-50eb7d3c5ece", "fileName":"file2.txt", "mimeType":"text/plain", "checksum": {"@type": "MD5", "@value": "2345"}}]}'
curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:multipart/form-data" -X POST "$SERVER_URL/api/datasets/:persistentId/addGlobusFiles" -F "jsonData=$JSON_DATA"
curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:multipart/form-data" -X POST "$SERVER_URL/api/datasets/:persistentId/addGlobusFiles?persistentId=$PERSISTENT_IDENTIFIER" -F "jsonData=$JSON_DATA"
Note that the mimetype is multipart/form-data, matching the /addFiles API call. Also note that the API_TOKEN is not needed when using a signed URL.

Expand All @@ -190,18 +198,18 @@ To begin downloading files, the requestGlobusDownload URL is used:
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV
curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:application/json" -X POST "$SERVER_URL/api/datasets/:persistentId/requestGlobusDownload"
curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:application/json" -X POST -d "$JSON_DATA" "$SERVER_URL/api/datasets/:persistentId/requestGlobusDownload?persistentId=$PERSISTENT_IDENTIFIER"
The JSON body sent should include a list of file ids to download and, for a managed endpoint, the Globus principal that will make the transfer:

.. code-block:: bash
{
"principal":"d15d4244-fc10-47f3-a790-85bdb6db9a75",
"fileIds":[60, 61]
}
export JSON_DATA='{ \
"principal":"d15d4244-fc10-47f3-a790-85bdb6db9a75", \
"fileIds":[60, 61] \
}'
Note that this API call takes an optional downloadId parameter that is used with the dataverse-globus app. When downloadId is included, the list of fileIds is not needed.

Expand All @@ -224,16 +232,16 @@ Dataverse will then monitor the transfer and revoke the read permission when the
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV
curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:application/json" -X POST "$SERVER_URL/api/datasets/:persistentId/monitorGlobusDownload"
curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:application/json" -X POST -d "$JSON_DATA" "$SERVER_URL/api/datasets/:persistentId/monitorGlobusDownload?persistentId=$PERSISTENT_IDENTIFIER"
The JSON body sent just contains the task identifier for the transfer:

.. code-block:: bash
{
"taskIdentifier":"b5fd01aa-8963-11ee-83ae-d5484943e99a"
}
export JSON_DATA='{ \
"taskIdentifier":"b5fd01aa-8963-11ee-83ae-d5484943e99a" \
}'
2 changes: 2 additions & 0 deletions doc/sphinx-guides/source/developers/s3-direct-upload-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,8 @@ Multiple URLs: when the file must be uploaded in multiple parts. The part size i
"storageIdentifier":"s3://demo-dataverse-bucket:177883b000e-49cedef268ac"
}
The call will return a 400 (BAD REQUEST) response if the file is larger than what is allowed by the :ref:`:MaxFileUploadSizeInBytes`) and/or a quota (see :doc:`/admin/collectionquotas`).

In the example responses above, the URLs, which are very long, have been omitted. These URLs reference the S3 server and the specific object identifier that will be used, starting with, for example, https://demo-dataverse-bucket.s3.amazonaws.com/10.5072/FK2FOQPJS/177883b000e-49cedef268ac?...

The client must then use the URL(s) to PUT the file, or if the file is larger than the specified partSize, parts of the file.
Expand Down
3 changes: 1 addition & 2 deletions src/main/java/edu/harvard/iq/dataverse/DatasetPage.java
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@
import edu.harvard.iq.dataverse.authorization.users.User;
import edu.harvard.iq.dataverse.branding.BrandingUtil;
import edu.harvard.iq.dataverse.dataaccess.StorageIO;
import edu.harvard.iq.dataverse.dataaccess.AbstractRemoteOverlayAccessIO;
import edu.harvard.iq.dataverse.dataaccess.DataAccess;
import edu.harvard.iq.dataverse.dataaccess.GlobusAccessibleStore;
import edu.harvard.iq.dataverse.dataaccess.ImageThumbConverter;
Expand Down Expand Up @@ -3372,7 +3371,7 @@ private boolean filterSelectedFiles(){
if(globusDownloadEnabled) {
String driverId = DataAccess.getStorageDriverFromIdentifier(fmd.getDataFile().getStorageIdentifier());
globusTransferable = GlobusAccessibleStore.isGlobusAccessible(driverId);
downloadable = downloadable && !AbstractRemoteOverlayAccessIO.isNotDataverseAccessible(driverId);
downloadable = downloadable && StorageIO.isDataverseAccessible(driverId);
}
if(downloadable){
getSelectedDownloadableFiles().add(fmd);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@
import edu.harvard.iq.dataverse.authorization.Permission;
import edu.harvard.iq.dataverse.authorization.users.AuthenticatedUser;
import edu.harvard.iq.dataverse.authorization.users.PrivateUrlUser;
import edu.harvard.iq.dataverse.dataaccess.DataAccess;
import edu.harvard.iq.dataverse.dataaccess.StorageIO;
import edu.harvard.iq.dataverse.externaltools.ExternalTool;
import edu.harvard.iq.dataverse.globus.GlobusServiceBean;
import edu.harvard.iq.dataverse.util.BundleUtil;
Expand Down
28 changes: 22 additions & 6 deletions src/main/java/edu/harvard/iq/dataverse/FilePage.java
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@
import java.util.Set;
import java.util.logging.Level;
import java.util.logging.Logger;
import java.util.stream.Collectors;

import jakarta.ejb.EJB;
import jakarta.ejb.EJBException;
import jakarta.faces.application.FacesMessage;
Expand Down Expand Up @@ -244,12 +246,10 @@ public String init() {
if (file.isTabularData()) {
contentType=DataFileServiceBean.MIME_TYPE_TSV_ALT;
}
configureTools = externalToolService.findFileToolsByTypeAndContentType(ExternalTool.Type.CONFIGURE, contentType);
exploreTools = externalToolService.findFileToolsByTypeAndContentType(ExternalTool.Type.EXPLORE, contentType);
queryTools = externalToolService.findFileToolsByTypeAndContentType(ExternalTool.Type.QUERY, contentType);
Collections.sort(exploreTools, CompareExternalToolName);
toolsWithPreviews = sortExternalTools();

loadExternalTools();



if (toolType != null) {
if (toolType.equals("PREVIEW")) {
if (!toolsWithPreviews.isEmpty()) {
Expand Down Expand Up @@ -282,6 +282,22 @@ public String init() {
return null;
}

private void loadExternalTools() {
String contentType= file.getContentType();
configureTools = externalToolService.findFileToolsByTypeAndContentType(ExternalTool.Type.CONFIGURE, contentType);
exploreTools = externalToolService.findFileToolsByTypeAndContentType(ExternalTool.Type.EXPLORE, contentType);
queryTools = externalToolService.findFileToolsByTypeAndContentType(ExternalTool.Type.QUERY, contentType);
Collections.sort(exploreTools, CompareExternalToolName);
toolsWithPreviews = sortExternalTools();
//For inaccessible files, only show the tools that have access to aux files (which are currently always accessible)
if(!StorageIO.isDataverseAccessible(DataAccess.getStorageDriverFromIdentifier(file.getStorageIdentifier()))) {
configureTools = configureTools.stream().filter(tool ->tool.accessesAuxFiles()).collect(Collectors.toList());
exploreTools = exploreTools.stream().filter(tool ->tool.accessesAuxFiles()).collect(Collectors.toList());
queryTools = queryTools.stream().filter(tool ->tool.accessesAuxFiles()).collect(Collectors.toList());
toolsWithPreviews = toolsWithPreviews.stream().filter(tool ->tool.accessesAuxFiles()).collect(Collectors.toList());
}
}

private void displayPublishMessage(){
if (fileMetadata.getDatasetVersion().isDraft() && canUpdateDataset()
&& (canPublishDataset() || !fileMetadata.getDatasetVersion().getDataset().isLockedFor(DatasetLock.Reason.InReview))){
Expand Down
4 changes: 2 additions & 2 deletions src/main/java/edu/harvard/iq/dataverse/SettingsWrapper.java
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@
package edu.harvard.iq.dataverse;

import edu.harvard.iq.dataverse.branding.BrandingUtil;
import edu.harvard.iq.dataverse.dataaccess.AbstractRemoteOverlayAccessIO;
import edu.harvard.iq.dataverse.dataaccess.DataAccess;
import edu.harvard.iq.dataverse.dataaccess.GlobusAccessibleStore;
import edu.harvard.iq.dataverse.dataaccess.StorageIO;
import edu.harvard.iq.dataverse.settings.JvmSettings;
import edu.harvard.iq.dataverse.settings.Setting;
import edu.harvard.iq.dataverse.settings.SettingsServiceBean;
Expand Down Expand Up @@ -344,7 +344,7 @@ public boolean isDownloadable(FileMetadata fmd) {
if(isGlobusFileDownload()) {
String driverId = DataAccess.getStorageDriverFromIdentifier(fmd.getDataFile().getStorageIdentifier());

downloadable = downloadable && !AbstractRemoteOverlayAccessIO.isNotDataverseAccessible(driverId);
downloadable = downloadable && StorageIO.isDataverseAccessible(driverId);
}
return downloadable;
}
Expand Down
63 changes: 27 additions & 36 deletions src/main/java/edu/harvard/iq/dataverse/api/Datasets.java
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
import edu.harvard.iq.dataverse.search.IndexServiceBean;
import edu.harvard.iq.dataverse.settings.JvmSettings;
import edu.harvard.iq.dataverse.settings.SettingsServiceBean;
import edu.harvard.iq.dataverse.storageuse.UploadSessionQuotaLimit;
import edu.harvard.iq.dataverse.util.*;
import edu.harvard.iq.dataverse.util.bagit.OREMap;
import edu.harvard.iq.dataverse.util.json.*;
Expand Down Expand Up @@ -2220,42 +2221,6 @@ public Response deleteCurationStatus(@Context ContainerRequestContext crc, @Path
}
}

@GET
@AuthRequired
@Path("{id}/uploadsid")
@Deprecated
public Response getUploadUrl(@Context ContainerRequestContext crc, @PathParam("id") String idSupplied) {
try {
Dataset dataset = findDatasetOrDie(idSupplied);

boolean canUpdateDataset = false;
canUpdateDataset = permissionSvc.requestOn(createDataverseRequest(getRequestUser(crc)), dataset).canIssue(UpdateDatasetVersionCommand.class);
if (!canUpdateDataset) {
return error(Response.Status.FORBIDDEN, "You are not permitted to upload files to this dataset.");
}
S3AccessIO<?> s3io = FileUtil.getS3AccessForDirectUpload(dataset);
if (s3io == null) {
return error(Response.Status.NOT_FOUND, "Direct upload not supported for files in this dataset: " + dataset.getId());
}
String url = null;
String storageIdentifier = null;
try {
url = s3io.generateTemporaryS3UploadUrl();
storageIdentifier = FileUtil.getStorageIdentifierFromLocation(s3io.getStorageLocation());
} catch (IOException io) {
logger.warning(io.getMessage());
throw new WrappedResponse(io, error(Response.Status.INTERNAL_SERVER_ERROR, "Could not create process direct upload request"));
}

JsonObjectBuilder response = Json.createObjectBuilder()
.add("url", url)
.add("storageIdentifier", storageIdentifier);
return ok(response);
} catch (WrappedResponse wr) {
return wr.getResponse();
}
}

@GET
@AuthRequired
@Path("{id}/uploadurls")
Expand All @@ -2274,6 +2239,22 @@ public Response getMPUploadUrls(@Context ContainerRequestContext crc, @PathParam
return error(Response.Status.NOT_FOUND,
"Direct upload not supported for files in this dataset: " + dataset.getId());
}
Long maxSize = systemConfig.getMaxFileUploadSizeForStore(dataset.getEffectiveStorageDriverId());
if (maxSize != null) {
if(fileSize > maxSize) {
return error(Response.Status.BAD_REQUEST,
"The file you are trying to upload is too large to be uploaded to this dataset. " +
"The maximum allowed file size is " + maxSize + " bytes.");
}
}
UploadSessionQuotaLimit limit = fileService.getUploadSessionQuotaLimit(dataset);
if (limit != null) {
if(fileSize > limit.getRemainingQuotaInBytes()) {
return error(Response.Status.BAD_REQUEST,
"The file you are trying to upload is too large to be uploaded to this dataset. " +
"The remaing file size quota is " + limit.getRemainingQuotaInBytes() + " bytes.");
}
}
JsonObjectBuilder response = null;
String storageIdentifier = null;
try {
Expand Down Expand Up @@ -3485,6 +3466,16 @@ public Response getGlobusUploadParams(@Context ContainerRequestContext crc, @Pat
params.add(key, substitutedParams.get(key));
});
params.add("managed", Boolean.toString(managed));
if (managed) {
Long maxSize = systemConfig.getMaxFileUploadSizeForStore(storeId);
if (maxSize != null) {
params.add("fileSizeLimit", maxSize);
}
UploadSessionQuotaLimit limit = fileService.getUploadSessionQuotaLimit(dataset);
if (limit != null) {
params.add("remainingQuota", limit.getRemainingQuotaInBytes());
}
}
if (transferEndpoint != null) {
params.add("endpoint", transferEndpoint);
} else {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -334,11 +334,5 @@ protected String getStoragePath() throws IOException {
logger.fine("fullStoragePath: " + fullStoragePath);
return fullStoragePath;
}

public static boolean isNotDataverseAccessible(String storeId) {
return Boolean.parseBoolean(StorageIO.getConfigParamForDriver(storeId, FILES_NOT_ACCESSIBLE_BY_DATAVERSE));
}



}

0 comments on commit 5caf436

Please sign in to comment.