Merge pull request #10345 from GlobalDataverseCommunityConsortium/glo…

…busstore Globus Bug Fixes
IQSS · Mar 13, 2024 · 5caf436 · 5caf436
2 parents ec1b174 + 3be8cac
commit 5caf436
Show file tree

Hide file tree

Showing 13 changed files with 116 additions and 93 deletions.
diff --git a/doc/sphinx-guides/source/developers/globus-api.rst b/doc/sphinx-guides/source/developers/globus-api.rst
@@ -21,7 +21,7 @@ The first step in preparing for a Globus transfer/reference operation is to requ
 
 .. code-block:: bash
 
-  curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/globusUploadParameters?locale=$LOCALE"
+  curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/globusUploadParameters?persistentId=$PERSISTENT_IDENTIFIER&locale=$LOCALE"
 
 The response will be of the form:
 
@@ -37,6 +37,8 @@ The response will be of the form:
                   "dvLocale": "en",
                   "datasetPid": "doi:10.5072/FK2/ILLPXE",
                   "managed": "true",
+                  "fileSizeLimit": 100000000000,
+                  "remainingQuota": 1000000000000,
                   "endpoint": "d8c42580-6528-4605-9ad8-116a61982644"
               },
               "signedUrls": [
@@ -68,11 +70,16 @@ The response will be of the form:
           }
     }
 
-The response includes the id for the Globus endpoint to use along with several signed URLs.
+The response includes the id for the Globus endpoint to use along with several parameters and signed URLs. The parameters include whether the Globus endpoint is "managed" by Dataverse and,
+if so, if there is a "fileSizeLimit" (see :ref:`:MaxFileUploadSizeInBytes`) that will be enforced and/or, if there is a quota (see :doc:`/admin/collectionquotas`) on the overall size of data
+that can be upload, what the "remainingQuota" is. Both are in bytes.
+
+Note that while Dataverse will not add files that violate the size or quota rules, Globus itself doesn't enforce these during the transfer. API users should thus check the size of the files
+they intend to transfer before submitting a transfer request to Globus.
 
 The getDatasetMetadata and getFileListing URLs are just signed versions of the standard Dataset metadata and file listing API calls. The other two are Globus specific.
 
-If called for, a dataset using a store that is configured with a remote Globus endpoint(s), the return response is similar but the response includes a
+If called for a dataset using a store that is configured with a remote Globus endpoint(s), the return response is similar but the response includes a
 the "managed" parameter will be false, the "endpoint" parameter is replaced with a JSON array of "referenceEndpointsWithPaths" and the
 requestGlobusTransferPaths and addGlobusFiles URLs are replaced with ones for requestGlobusReferencePaths and addFiles. All of these calls are
 described further below.
@@ -81,7 +88,7 @@ The call to set up for a transfer out (download) is similar:
 
 .. code-block:: bash
 
-  curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/globusDownloadParameters?locale=$LOCALE"
+  curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/globusDownloadParameters?persistentId=$PERSISTENT_IDENTIFIER&locale=$LOCALE"
 
 Note that this API call supports an additional downloadId query parameter. This is only used when the globus-dataverse app is called from the Dataverse user interface. There is no need to use it when calling the API directly.
 
@@ -99,10 +106,11 @@ Once the user identifies which files are to be added, the requestGlobusTransferP
 
   export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
   export SERVER_URL=https://demo.dataverse.org
-  export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV
+  export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV
   export LOCALE=en-US
- 
-  curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:application/json" -X POST "$SERVER_URL/api/datasets/:persistentId/requestGlobusUploadPaths"
+  export JSON_DATA="... (SEE BELOW)" 
+
+  curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:application/json" -X POST  -d "$JSON_DATA" "$SERVER_URL/api/datasets/:persistentId/requestGlobusUploadPaths?persistentId=$PERSISTENT_IDENTIFIER"
 
 Note that when using the dataverse-globus app or the return from the previous call, the URL for this call will be signed and no API_TOKEN is needed. 
 
@@ -163,12 +171,12 @@ In the managed case, you must initiate a Globus transfer and take note of its ta
 
   export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
   export SERVER_URL=https://demo.dataverse.org
-  export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV
+  export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV
   export JSON_DATA='{"taskIdentifier":"3f530302-6c48-11ee-8428-378be0d9c521", \
                     "files": [{"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"globusm://18b3972213f-f6b5c2221423", "fileName":"file1.txt", "mimeType":"text/plain", "checksum": {"@type": "MD5", "@value": "1234"}}, \
                     {"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"globusm://18b39722140-50eb7d3c5ece", "fileName":"file2.txt", "mimeType":"text/plain", "checksum": {"@type": "MD5", "@value": "2345"}}]}'
 
-  curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:multipart/form-data" -X POST "$SERVER_URL/api/datasets/:persistentId/addGlobusFiles" -F "jsonData=$JSON_DATA"
+  curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:multipart/form-data" -X POST "$SERVER_URL/api/datasets/:persistentId/addGlobusFiles?persistentId=$PERSISTENT_IDENTIFIER" -F "jsonData=$JSON_DATA"
 
 Note that the mimetype is multipart/form-data, matching the /addFiles API call. Also note that the API_TOKEN is not needed when using a signed URL.
 
@@ -190,18 +198,18 @@ To begin downloading files, the requestGlobusDownload URL is used:
 
   export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
   export SERVER_URL=https://demo.dataverse.org
-  export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV
+  export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV
   
-  curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:application/json" -X POST "$SERVER_URL/api/datasets/:persistentId/requestGlobusDownload"
+  curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:application/json" -X POST -d "$JSON_DATA" "$SERVER_URL/api/datasets/:persistentId/requestGlobusDownload?persistentId=$PERSISTENT_IDENTIFIER"
 
 The JSON body sent should include a list of file ids to download and, for a managed endpoint, the Globus principal that will make the transfer:
 
 .. code-block:: bash
 
-  {
-    "principal":"d15d4244-fc10-47f3-a790-85bdb6db9a75", 
-    "fileIds":[60, 61]
-  }
+  export JSON_DATA='{ \
+    "principal":"d15d4244-fc10-47f3-a790-85bdb6db9a75", \ 
+    "fileIds":[60, 61] \
+  }'
   
 Note that this API call takes an optional downloadId parameter that is used with the dataverse-globus app. When downloadId is included, the list of fileIds is not needed.
 
@@ -224,16 +232,16 @@ Dataverse will then monitor the transfer and revoke the read permission when the
 
   export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
   export SERVER_URL=https://demo.dataverse.org
-  export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV
+  export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV
   
-  curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:application/json" -X POST "$SERVER_URL/api/datasets/:persistentId/monitorGlobusDownload"
+  curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:application/json" -X POST -d "$JSON_DATA" "$SERVER_URL/api/datasets/:persistentId/monitorGlobusDownload?persistentId=$PERSISTENT_IDENTIFIER"
   
 The JSON body sent just contains the task identifier for the transfer:
 
 .. code-block:: bash
 
-  {
-    "taskIdentifier":"b5fd01aa-8963-11ee-83ae-d5484943e99a"
-  }
+  export JSON_DATA='{ \
+    "taskIdentifier":"b5fd01aa-8963-11ee-83ae-d5484943e99a" \
+  }'
  
 
diff --git a/doc/sphinx-guides/source/developers/s3-direct-upload-api.rst b/doc/sphinx-guides/source/developers/s3-direct-upload-api.rst
@@ -67,6 +67,8 @@ Multiple URLs: when the file must be uploaded in multiple parts. The part size i
     "storageIdentifier":"s3://demo-dataverse-bucket:177883b000e-49cedef268ac"
   }
 
+The call will return a 400 (BAD REQUEST) response if the file is larger than what is allowed by the :ref:`:MaxFileUploadSizeInBytes`) and/or a quota (see :doc:`/admin/collectionquotas`).
+
 In the example responses above, the URLs, which are very long, have been omitted. These URLs reference the S3 server and the specific object identifier that will be used, starting with, for example, https://demo-dataverse-bucket.s3.amazonaws.com/10.5072/FK2FOQPJS/177883b000e-49cedef268ac?...
 
 The client must then use the URL(s) to PUT the file, or if the file is larger than the specified partSize, parts of the file. 

diff --git a/src/main/java/edu/harvard/iq/dataverse/DatasetPage.java b/src/main/java/edu/harvard/iq/dataverse/DatasetPage.java
@@ -11,7 +11,6 @@
 import edu.harvard.iq.dataverse.authorization.users.User;
 import edu.harvard.iq.dataverse.branding.BrandingUtil;
 import edu.harvard.iq.dataverse.dataaccess.StorageIO;
-import edu.harvard.iq.dataverse.dataaccess.AbstractRemoteOverlayAccessIO;
 import edu.harvard.iq.dataverse.dataaccess.DataAccess;
 import edu.harvard.iq.dataverse.dataaccess.GlobusAccessibleStore;
 import edu.harvard.iq.dataverse.dataaccess.ImageThumbConverter;
@@ -3372,7 +3371,7 @@ private boolean filterSelectedFiles(){
             if(globusDownloadEnabled) {
                 String driverId = DataAccess.getStorageDriverFromIdentifier(fmd.getDataFile().getStorageIdentifier());
                 globusTransferable = GlobusAccessibleStore.isGlobusAccessible(driverId);
-                downloadable = downloadable && !AbstractRemoteOverlayAccessIO.isNotDataverseAccessible(driverId); 
+                downloadable = downloadable && StorageIO.isDataverseAccessible(driverId); 
             }
             if(downloadable){
                 getSelectedDownloadableFiles().add(fmd);

diff --git a/src/main/java/edu/harvard/iq/dataverse/FileDownloadHelper.java b/src/main/java/edu/harvard/iq/dataverse/FileDownloadHelper.java
@@ -8,6 +8,8 @@
 import edu.harvard.iq.dataverse.authorization.Permission;
 import edu.harvard.iq.dataverse.authorization.users.AuthenticatedUser;
 import edu.harvard.iq.dataverse.authorization.users.PrivateUrlUser;
+import edu.harvard.iq.dataverse.dataaccess.DataAccess;
+import edu.harvard.iq.dataverse.dataaccess.StorageIO;
 import edu.harvard.iq.dataverse.externaltools.ExternalTool;
 import edu.harvard.iq.dataverse.globus.GlobusServiceBean;
 import edu.harvard.iq.dataverse.util.BundleUtil;

diff --git a/src/main/java/edu/harvard/iq/dataverse/FilePage.java b/src/main/java/edu/harvard/iq/dataverse/FilePage.java
@@ -52,6 +52,8 @@
 import java.util.Set;
 import java.util.logging.Level;
 import java.util.logging.Logger;
+import java.util.stream.Collectors;
+
 import jakarta.ejb.EJB;
 import jakarta.ejb.EJBException;
 import jakarta.faces.application.FacesMessage;
@@ -244,12 +246,10 @@ public String init() {
             if (file.isTabularData()) {
                 contentType=DataFileServiceBean.MIME_TYPE_TSV_ALT;
             }
-            configureTools = externalToolService.findFileToolsByTypeAndContentType(ExternalTool.Type.CONFIGURE, contentType);
-            exploreTools = externalToolService.findFileToolsByTypeAndContentType(ExternalTool.Type.EXPLORE, contentType);
-            queryTools = externalToolService.findFileToolsByTypeAndContentType(ExternalTool.Type.QUERY, contentType);
-            Collections.sort(exploreTools, CompareExternalToolName);
-            toolsWithPreviews  = sortExternalTools();
-
+            loadExternalTools();
+
+
+
             if (toolType != null) {
                 if (toolType.equals("PREVIEW")) {
                     if (!toolsWithPreviews.isEmpty()) {
@@ -282,6 +282,22 @@ public String init() {
         return null;
     }
 
+    private void loadExternalTools() {
+        String contentType= file.getContentType();
+        configureTools = externalToolService.findFileToolsByTypeAndContentType(ExternalTool.Type.CONFIGURE, contentType);
+        exploreTools = externalToolService.findFileToolsByTypeAndContentType(ExternalTool.Type.EXPLORE, contentType);
+        queryTools = externalToolService.findFileToolsByTypeAndContentType(ExternalTool.Type.QUERY, contentType);
+        Collections.sort(exploreTools, CompareExternalToolName);
+        toolsWithPreviews  = sortExternalTools();
+        //For inaccessible files, only show the tools that have access to aux files (which are currently always accessible)
+        if(!StorageIO.isDataverseAccessible(DataAccess.getStorageDriverFromIdentifier(file.getStorageIdentifier()))) {
+            configureTools = configureTools.stream().filter(tool ->tool.accessesAuxFiles()).collect(Collectors.toList());
+            exploreTools = exploreTools.stream().filter(tool ->tool.accessesAuxFiles()).collect(Collectors.toList());
+            queryTools = queryTools.stream().filter(tool ->tool.accessesAuxFiles()).collect(Collectors.toList());
+            toolsWithPreviews = toolsWithPreviews.stream().filter(tool ->tool.accessesAuxFiles()).collect(Collectors.toList());
+        }
+    }
+
     private void displayPublishMessage(){
         if (fileMetadata.getDatasetVersion().isDraft()  && canUpdateDataset()
                 &&   (canPublishDataset() || !fileMetadata.getDatasetVersion().getDataset().isLockedFor(DatasetLock.Reason.InReview))){

diff --git a/src/main/java/edu/harvard/iq/dataverse/SettingsWrapper.java b/src/main/java/edu/harvard/iq/dataverse/SettingsWrapper.java
@@ -6,9 +6,9 @@
 package edu.harvard.iq.dataverse;
 
 import edu.harvard.iq.dataverse.branding.BrandingUtil;
-import edu.harvard.iq.dataverse.dataaccess.AbstractRemoteOverlayAccessIO;
 import edu.harvard.iq.dataverse.dataaccess.DataAccess;
 import edu.harvard.iq.dataverse.dataaccess.GlobusAccessibleStore;
+import edu.harvard.iq.dataverse.dataaccess.StorageIO;
 import edu.harvard.iq.dataverse.settings.JvmSettings;
 import edu.harvard.iq.dataverse.settings.Setting;
 import edu.harvard.iq.dataverse.settings.SettingsServiceBean;
@@ -344,7 +344,7 @@ public boolean isDownloadable(FileMetadata fmd) {
         if(isGlobusFileDownload()) {
             String driverId = DataAccess.getStorageDriverFromIdentifier(fmd.getDataFile().getStorageIdentifier());
 
-            downloadable = downloadable && !AbstractRemoteOverlayAccessIO.isNotDataverseAccessible(driverId); 
+            downloadable = downloadable && StorageIO.isDataverseAccessible(driverId); 
         }
         return downloadable;
     }

diff --git a/src/main/java/edu/harvard/iq/dataverse/api/Datasets.java b/src/main/java/edu/harvard/iq/dataverse/api/Datasets.java
@@ -45,6 +45,7 @@
 import edu.harvard.iq.dataverse.search.IndexServiceBean;
 import edu.harvard.iq.dataverse.settings.JvmSettings;
 import edu.harvard.iq.dataverse.settings.SettingsServiceBean;
+import edu.harvard.iq.dataverse.storageuse.UploadSessionQuotaLimit;
 import edu.harvard.iq.dataverse.util.*;
 import edu.harvard.iq.dataverse.util.bagit.OREMap;
 import edu.harvard.iq.dataverse.util.json.*;
@@ -2220,42 +2221,6 @@ public Response deleteCurationStatus(@Context ContainerRequestContext crc, @Path
         }
     }
 
-    @GET
-    @AuthRequired
-    @Path("{id}/uploadsid")
-    @Deprecated
-    public Response getUploadUrl(@Context ContainerRequestContext crc, @PathParam("id") String idSupplied) {
-        try {
-            Dataset dataset = findDatasetOrDie(idSupplied);
-
-            boolean canUpdateDataset = false;
-            canUpdateDataset = permissionSvc.requestOn(createDataverseRequest(getRequestUser(crc)), dataset).canIssue(UpdateDatasetVersionCommand.class);
-            if (!canUpdateDataset) {
-                return error(Response.Status.FORBIDDEN, "You are not permitted to upload files to this dataset.");
-            }
-            S3AccessIO<?> s3io = FileUtil.getS3AccessForDirectUpload(dataset);
-            if (s3io == null) {
-                return error(Response.Status.NOT_FOUND, "Direct upload not supported for files in this dataset: " + dataset.getId());
-            }
-            String url = null;
-            String storageIdentifier = null;
-            try {
-                url = s3io.generateTemporaryS3UploadUrl();
-                storageIdentifier = FileUtil.getStorageIdentifierFromLocation(s3io.getStorageLocation());
-            } catch (IOException io) {
-                logger.warning(io.getMessage());
-                throw new WrappedResponse(io, error(Response.Status.INTERNAL_SERVER_ERROR, "Could not create process direct upload request"));
-            }
-
-            JsonObjectBuilder response = Json.createObjectBuilder()
-                    .add("url", url)
-                    .add("storageIdentifier", storageIdentifier);
-            return ok(response);
-        } catch (WrappedResponse wr) {
-            return wr.getResponse();
-        }
-    }
-
     @GET
     @AuthRequired
     @Path("{id}/uploadurls")
@@ -2274,6 +2239,22 @@ public Response getMPUploadUrls(@Context ContainerRequestContext crc, @PathParam
                 return error(Response.Status.NOT_FOUND,
                         "Direct upload not supported for files in this dataset: " + dataset.getId());
             }
+            Long maxSize = systemConfig.getMaxFileUploadSizeForStore(dataset.getEffectiveStorageDriverId());
+            if (maxSize != null) {
+                if(fileSize > maxSize) {
+                    return error(Response.Status.BAD_REQUEST,
+                            "The file you are trying to upload is too large to be uploaded to this dataset. " +
+                                    "The maximum allowed file size is " + maxSize + " bytes.");
+                }
+            }
+            UploadSessionQuotaLimit limit = fileService.getUploadSessionQuotaLimit(dataset);
+            if (limit != null) {
+                if(fileSize > limit.getRemainingQuotaInBytes()) {
+                    return error(Response.Status.BAD_REQUEST,
+                            "The file you are trying to upload is too large to be uploaded to this dataset. " +
+                                    "The remaing file size quota is " + limit.getRemainingQuotaInBytes() + " bytes.");
+                }
+            }
             JsonObjectBuilder response = null;
             String storageIdentifier = null;
             try {
@@ -3485,6 +3466,16 @@ public Response getGlobusUploadParams(@Context ContainerRequestContext crc, @Pat
             params.add(key, substitutedParams.get(key));
         });
         params.add("managed", Boolean.toString(managed));
+        if (managed) {
+            Long maxSize = systemConfig.getMaxFileUploadSizeForStore(storeId);
+            if (maxSize != null) {
+                params.add("fileSizeLimit", maxSize);
+            }
+            UploadSessionQuotaLimit limit = fileService.getUploadSessionQuotaLimit(dataset);
+            if (limit != null) {
+                params.add("remainingQuota", limit.getRemainingQuotaInBytes());
+            }
+        }
         if (transferEndpoint != null) {
             params.add("endpoint", transferEndpoint);
         } else {

diff --git a/src/main/java/edu/harvard/iq/dataverse/dataaccess/AbstractRemoteOverlayAccessIO.java b/src/main/java/edu/harvard/iq/dataverse/dataaccess/AbstractRemoteOverlayAccessIO.java
@@ -334,11 +334,5 @@ protected String getStoragePath() throws IOException {
         logger.fine("fullStoragePath: " + fullStoragePath);
         return fullStoragePath;
     }
-
-    public static boolean isNotDataverseAccessible(String storeId) {
-        return Boolean.parseBoolean(StorageIO.getConfigParamForDriver(storeId, FILES_NOT_ACCESSIBLE_BY_DATAVERSE));
-    }
-
-
 
 }