Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IQSS/6763-multi-part upload API calls #6995

Merged
merged 59 commits into from
Aug 31, 2020
Merged
Show file tree
Hide file tree
Changes from 95 commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
a02d41d
Merge remote-tracking branch 'IQSS/develop' into IQSS/6763
qqmyers Jun 26, 2020
e6fd3c8
add maxpartsize jvm option
qqmyers Jul 7, 2020
c9a03e5
add some test calls with variants of the original
qqmyers Jul 7, 2020
92f1cc2
remove @Singleton
qqmyers Jul 7, 2020
26be2c4
Don't update lastapiusetime
qqmyers Jul 7, 2020
78540d0
Merge branch 'apitesting' into IQSS/6763
qqmyers Jul 8, 2020
2bc7e0d
try RequiresNew transaction and add some addt'l tests
qqmyers Jul 8, 2020
6810c4d
Combine single and MP uploads
qqmyers Jul 9, 2020
0f76d80
complete response outside for loop
qqmyers Jul 9, 2020
9974a38
remove blocking tests
qqmyers Jul 9, 2020
4c1a21b
restore singleton
qqmyers Jul 9, 2020
131e531
change to min-part-size
qqmyers Jul 9, 2020
997f93f
try to cleanup by calling abort on failure of complete upload
qqmyers Jul 9, 2020
bd05496
add release note
djbrooke Jul 10, 2020
b53ef80
method to get one or many upload URLS as needed
qqmyers Jul 10, 2020
8487388
initial convert to use class
qqmyers Jul 10, 2020
4447ba1
change page to match method name change
qqmyers Jul 10, 2020
5758799
pass fileSize correctly
qqmyers Jul 10, 2020
c3bb1b7
Merge pull request #1 from djbrooke/IQSS/6763
qqmyers Jul 10, 2020
c442b47
bug fixes
qqmyers Jul 10, 2020
603b8d9
Merge branch 'IQSS/6763' of https://github.com/GlobalDataverseCommuni…
qqmyers Jul 10, 2020
b7c0c02
Fix for #7060
qqmyers Jul 14, 2020
1e4232b
start mp support
qqmyers Jul 14, 2020
8f207ab
test session restart in login
qqmyers Jul 21, 2020
5b7be84
typos
qqmyers Jul 21, 2020
4b39cca
debug logging
qqmyers Jul 21, 2020
caf5ec8
Merge remote-tracking branch 'IQSS/develop' into IQSS/6763
qqmyers Jul 21, 2020
319b4b3
typos
qqmyers Jul 21, 2020
5ee4f87
copy attributes to new session
qqmyers Jul 22, 2020
e0ebf62
Merge branch 'IQSS/3254' into IQSS/6763
qqmyers Jul 22, 2020
13abba2
support MP direct upload for datasets being created
qqmyers Jul 22, 2020
f5593e8
remove debug logging
qqmyers Jul 22, 2020
54739ed
add tsv to recognized extensions for mimetype determination
qqmyers Jul 22, 2020
ad92b0f
use mimetype determination by extension for direct upload
qqmyers Jul 22, 2020
cdd0db5
enable direct upload/ingest of text/tsv
qqmyers Jul 23, 2020
14beb64
fix file upload form updating in create and edit modes
qqmyers Jul 23, 2020
16a0f77
mp upload fixes
qqmyers Jul 24, 2020
372d83c
script fixes w.r.t. progress
qqmyers Jul 24, 2020
8388ed3
formatting
qqmyers Jul 24, 2020
832d196
fix perms -allow session user to abort/complete mp upload
qqmyers Jul 24, 2020
b18a583
switch to query params and drop dataset.* from logging
qqmyers Jul 24, 2020
9c696bd
fix blob slicing and etag uploads, cleanup
qqmyers Jul 24, 2020
fdbfb37
apikey auth fix in complete call
qqmyers Jul 24, 2020
d2c8c8c
add expose etags header for UI mp upload
qqmyers Jul 24, 2020
6320049
Merge remote-tracking branch 'IQSS/develop' into IQSS/6763
qqmyers Jul 31, 2020
8ee0d07
Merge remote-tracking branch 'IQSS/develop' into IQSS/6763
qqmyers Aug 10, 2020
63aa365
Merge remote-tracking branch 'IQSS/develop' into IQSS/6763
qqmyers Aug 20, 2020
0b2f722
Merge remote-tracking branch 'IQSS/develop' into IQSS/6763
qqmyers Aug 24, 2020
751f0b9
merge issues
qqmyers Aug 24, 2020
04c6f44
up min-part-size to 1 GB, remove debug logging
qqmyers Aug 25, 2020
c37795f
add more documentation
qqmyers Aug 25, 2020
d81b05c
cleanup
qqmyers Aug 25, 2020
f490340
add UI info to release notes/change to 1 GB default
qqmyers Aug 25, 2020
7d673a8
handle when last part upload fails
qqmyers Aug 25, 2020
a08c5d7
fix minimum size logic after 1GB default
qqmyers Aug 26, 2020
35d6735
allow exact minimum to be set (not just greater)
qqmyers Aug 26, 2020
ec7eb04
throttle sending parts to the browser
qqmyers Aug 27, 2020
8b63d56
fix abort call, handle cancels, fix progress, limit parts to 10
qqmyers Aug 27, 2020
c6d06e5
cancel direct uploads in progress in edit mode via 'Done' button
qqmyers Aug 28, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/release-notes/6763-multipart-uploads.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Large Data Support (continued)

Installations configured for direct S3 upload will be able to use the [Dataverse Uploader](https://github.com/GlobalDataverseCommunityConsortium/dataverse-uploader) to upload large (>5 GB) files to Dataverse.
3 changes: 2 additions & 1 deletion doc/sphinx-guides/source/developers/big-data-support.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,8 @@ with the contents of the file cors.json as follows:
{
"AllowedOrigins": ["https://<DATAVERSE SERVER>"],
"AllowedHeaders": ["*"],
"AllowedMethods": ["PUT", "GET"]
"AllowedMethods": ["PUT", "GET"],
"ExposeHeaders": ["ETag"]
}
]
}
Expand Down
2 changes: 1 addition & 1 deletion src/main/java/META-INF/mime.types
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ text/comma-separated-values csv CSV
text/plain txt TXT
text/xml xml XML
# Common statistical data formats
text/tab-separated-values tab TAB tsv TSV
text/tsv tab TAB tsv TSV
text/x-fixed-field dat DAT asc ASC
application/x-rlang-transport Rdata RData rdata RDATA
type/x-r-syntax r R
Expand Down
34 changes: 34 additions & 0 deletions src/main/java/edu/harvard/iq/dataverse/EditDatafilesPage.java
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@
import org.primefaces.model.file.UploadedFile;
import javax.json.Json;
import javax.json.JsonObject;
import javax.json.JsonObjectBuilder;
import javax.json.JsonArray;
import javax.json.JsonReader;
import org.apache.commons.httpclient.HttpClient;
Expand Down Expand Up @@ -1721,6 +1722,7 @@ public String getRsyncScriptFilename() {
return rsyncScriptFilename;
}

@Deprecated
public void requestDirectUploadUrl() {


Expand All @@ -1742,6 +1744,38 @@ public void requestDirectUploadUrl() {
PrimeFaces.current().executeScript("uploadFileDirectly('" + url + "','" + storageIdentifier + "')");
}

public void requestDirectUploadUrls() {

Map<String, String> paramMap = FacesContext.getCurrentInstance().getExternalContext().getRequestParameterMap();

String sizeString = paramMap.get("fileSize");
long fileSize = Long.parseLong(sizeString);

S3AccessIO<?> s3io = FileUtil.getS3AccessForDirectUpload(dataset);
if (s3io == null) {
FacesContext.getCurrentInstance().addMessage(uploadComponentId,
new FacesMessage(FacesMessage.SEVERITY_ERROR,
BundleUtil.getStringFromBundle("dataset.file.uploadWarning"),
"Direct upload not supported for this dataset"));
}
JsonObjectBuilder urls = null;
String storageIdentifier = null;
try {
storageIdentifier = FileUtil.getStorageIdentifierFromLocation(s3io.getStorageLocation());
urls = s3io.generateTemporaryS3UploadUrls(dataset.getGlobalId().asString(), storageIdentifier, fileSize);

} catch (IOException io) {
logger.warning(io.getMessage());
FacesContext.getCurrentInstance().addMessage(uploadComponentId,
new FacesMessage(FacesMessage.SEVERITY_ERROR,
BundleUtil.getStringFromBundle("dataset.file.uploadWarning"),
"Issue in connecting to S3 store for direct upload"));
}

PrimeFaces.current().executeScript(
"uploadFileDirectly('" + urls.build().toString() + "','" + storageIdentifier + "','" + fileSize + "')");
}

public void uploadFinished() {
// This method is triggered from the page, by the <p:upload ... onComplete=...
// attribute.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -378,7 +378,7 @@ protected AuthenticatedUser findAuthenticatedUserOrDie() throws WrappedResponse
private AuthenticatedUser findAuthenticatedUserOrDie( String key ) throws WrappedResponse {
AuthenticatedUser authUser = authSvc.lookupUser(key);
if ( authUser != null ) {
authUser = userSvc.updateLastApiUseTime(authUser);
authUser = userSvc.updateLastApiUseTime(authUser);

return authUser;
}
Expand Down
181 changes: 179 additions & 2 deletions src/main/java/edu/harvard/iq/dataverse/api/Datasets.java
Original file line number Diff line number Diff line change
Expand Up @@ -78,10 +78,8 @@
import edu.harvard.iq.dataverse.ingest.IngestServiceBean;
import edu.harvard.iq.dataverse.privateurl.PrivateUrl;
import edu.harvard.iq.dataverse.S3PackageImporter;
import static edu.harvard.iq.dataverse.api.AbstractApiBean.error;
import edu.harvard.iq.dataverse.api.dto.RoleAssignmentDTO;
import edu.harvard.iq.dataverse.batch.util.LoggingUtil;
import edu.harvard.iq.dataverse.dataaccess.DataAccess;
import edu.harvard.iq.dataverse.dataaccess.ImageThumbConverter;
import edu.harvard.iq.dataverse.dataaccess.S3AccessIO;
import edu.harvard.iq.dataverse.engine.command.exception.CommandException;
Expand Down Expand Up @@ -128,8 +126,10 @@
import javax.json.Json;
import javax.json.JsonArray;
import javax.json.JsonArrayBuilder;
import javax.json.JsonException;
import javax.json.JsonObject;
import javax.json.JsonObjectBuilder;
import javax.json.JsonReader;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import javax.ws.rs.Consumes;
Expand All @@ -153,6 +153,8 @@
import org.glassfish.jersey.media.multipart.FormDataContentDisposition;
import org.glassfish.jersey.media.multipart.FormDataParam;

import com.amazonaws.services.s3.model.PartETag;

@Path("datasets")
public class Datasets extends AbstractApiBean {

Expand Down Expand Up @@ -1485,6 +1487,7 @@ public Response returnToAuthor(@PathParam("id") String idSupplied, String jsonBo

@GET
@Path("{id}/uploadsid")
@Deprecated
public Response getUploadUrl(@PathParam("id") String idSupplied) {
try {
Dataset dataset = findDatasetOrDie(idSupplied);
Expand All @@ -1494,6 +1497,7 @@ public Response getUploadUrl(@PathParam("id") String idSupplied) {
canUpdateDataset = permissionSvc.requestOn(createDataverseRequest(findUserOrDie()), dataset).canIssue(UpdateDatasetVersionCommand.class);
} catch (WrappedResponse ex) {
logger.info("Exception thrown while trying to figure out permissions while getting upload URL for dataset id " + dataset.getId() + ": " + ex.getLocalizedMessage());
throw ex;
}
if (!canUpdateDataset) {
return error(Response.Status.FORBIDDEN, "You are not permitted to upload files to this dataset.");
Expand All @@ -1520,6 +1524,179 @@ public Response getUploadUrl(@PathParam("id") String idSupplied) {
return wr.getResponse();
}
}

@GET
@Path("{id}/uploadurls")
public Response getMPUploadUrls(@PathParam("id") String idSupplied, @QueryParam("size") long fileSize) {
try {
Dataset dataset = findDatasetOrDie(idSupplied);

boolean canUpdateDataset = false;
try {
canUpdateDataset = permissionSvc.requestOn(createDataverseRequest(findUserOrDie()), dataset)
.canIssue(UpdateDatasetVersionCommand.class);
} catch (WrappedResponse ex) {
logger.info(
"Exception thrown while trying to figure out permissions while getting upload URLs for dataset id "
+ dataset.getId() + ": " + ex.getLocalizedMessage());
throw ex;
}
if (!canUpdateDataset) {
return error(Response.Status.FORBIDDEN, "You are not permitted to upload files to this dataset.");
}
S3AccessIO<DataFile> s3io = FileUtil.getS3AccessForDirectUpload(dataset);
if (s3io == null) {
return error(Response.Status.NOT_FOUND,
"Direct upload not supported for files in this dataset: " + dataset.getId());
}
JsonObjectBuilder response = null;
String storageIdentifier = null;
try {
storageIdentifier = FileUtil.getStorageIdentifierFromLocation(s3io.getStorageLocation());
response = s3io.generateTemporaryS3UploadUrls(dataset.getGlobalId().asString(), storageIdentifier, fileSize);

} catch (IOException io) {
logger.warning(io.getMessage());
throw new WrappedResponse(io,
error(Response.Status.INTERNAL_SERVER_ERROR, "Could not create process direct upload request"));
}

response.add("storageIdentifier", storageIdentifier);
return ok(response);
} catch (WrappedResponse wr) {
return wr.getResponse();
}
}

@DELETE
@Path("mpupload")
public Response abortMPUpload(@QueryParam("globalid") String idSupplied, @QueryParam("storageidentifier") String storageidentifier, @QueryParam("uploadid") String uploadId) {
try {
Dataset dataset = datasetSvc.findByGlobalId(idSupplied);
//Allow the API to be used within a session (e.g. for direct upload in the UI)
User user =session.getUser();
if (!user.isAuthenticated()) {
try {
user = findAuthenticatedUserOrDie();
} catch (WrappedResponse ex) {
logger.info(
"Exception thrown while trying to figure out permissions while getting aborting upload for dataset id "
+ dataset.getId() + ": " + ex.getLocalizedMessage());
throw ex;
}
}
boolean allowed = false;
if (dataset != null) {
allowed = permissionSvc.requestOn(createDataverseRequest(user), dataset)
.canIssue(UpdateDatasetVersionCommand.class);
} else {
/*
* The only legitimate case where a global id won't correspond to a dataset is
* for uploads during creation. Given that this call will still fail unless all
* three parameters correspond to an active multipart upload, it should be safe
* to allow the attempt for an authenticated user. If there are concerns about
* permissions, one could check with the current design that the user is allowed
* to create datasets in some dataverse that is configured to use the storage
* provider specified in the storageidentifier, but testing for the ability to
* create a dataset in a specific dataverse would requiring changing the design
* somehow (e.g. adding the ownerId to this call).
*/
allowed = true;
}
if (!allowed) {
return error(Response.Status.FORBIDDEN,
"You are not permitted to abort file uploads with the supplied parameters.");
}
try {
S3AccessIO.abortMultipartUpload(idSupplied, storageidentifier, uploadId);
} catch (IOException io) {
logger.warning("Multipart upload abort failed for uploadId: " + uploadId + " storageidentifier="
+ storageidentifier + " dataset Id: " + dataset.getId());
logger.warning(io.getMessage());
throw new WrappedResponse(io,
error(Response.Status.INTERNAL_SERVER_ERROR, "Could not abort multipart upload"));
}
return Response.noContent().build();
} catch (WrappedResponse wr) {
return wr.getResponse();
}
}

@PUT
@Path("mpupload")
public Response completeMPUpload(String partETagBody, @QueryParam("globalid") String idSupplied, @QueryParam("storageidentifier") String storageidentifier, @QueryParam("uploadid") String uploadId) {
try {
Dataset dataset = datasetSvc.findByGlobalId(idSupplied);
//Allow the API to be used within a session (e.g. for direct upload in the UI)
User user =session.getUser();
if (!user.isAuthenticated()) {
try {
user=findAuthenticatedUserOrDie();
} catch (WrappedResponse ex) {
logger.info(
"Exception thrown while trying to figure out permissions to complete mpupload for dataset id "
+ dataset.getId() + ": " + ex.getLocalizedMessage());
throw ex;
}
}
boolean allowed = false;
if (dataset != null) {
allowed = permissionSvc.requestOn(createDataverseRequest(user), dataset)
.canIssue(UpdateDatasetVersionCommand.class);
} else {
/*
* The only legitimate case where a global id won't correspond to a dataset is
* for uploads during creation. Given that this call will still fail unless all
* three parameters correspond to an active multipart upload, it should be safe
* to allow the attempt for an authenticated user. If there are concerns about
* permissions, one could check with the current design that the user is allowed
* to create datasets in some dataverse that is configured to use the storage
* provider specified in the storageidentifier, but testing for the ability to
* create a dataset in a specific dataverse would requiring changing the design
* somehow (e.g. adding the ownerId to this call).
*/
allowed = true;
}
if (!allowed) {
return error(Response.Status.FORBIDDEN,
"You are not permitted to complete file uploads with the supplied parameters.");
}
List<PartETag> eTagList = new ArrayList<PartETag>();
logger.info("Etags: " + partETagBody);
try {
JsonReader jsonReader = Json.createReader(new StringReader(partETagBody));
JsonObject object = jsonReader.readObject();
jsonReader.close();
for(String partNo : object.keySet()) {
eTagList.add(new PartETag(Integer.parseInt(partNo), object.getString(partNo)));
}
for(PartETag et: eTagList) {
logger.info("Part: " + et.getPartNumber() + " : " + et.getETag());
}
} catch (JsonException je) {
logger.info("Unable to parse eTags from: " + partETagBody);
throw new WrappedResponse(je, error( Response.Status.INTERNAL_SERVER_ERROR, "Could not complete multipart upload"));
}
try {
S3AccessIO.completeMultipartUpload(idSupplied, storageidentifier, uploadId, eTagList);
} catch (IOException io) {
logger.warning("Multipart upload completion failed for uploadId: " + uploadId +" storageidentifier=" + storageidentifier + " globalId: " + idSupplied);
logger.warning(io.getMessage());
try {
S3AccessIO.abortMultipartUpload(idSupplied, storageidentifier, uploadId);
} catch (IOException e) {
logger.severe("Also unable to abort the upload (and release the space on S3 for uploadId: " + uploadId +" storageidentifier=" + storageidentifier + " globalId: " + idSupplied);
logger.severe(io.getMessage());
}

throw new WrappedResponse(io, error( Response.Status.INTERNAL_SERVER_ERROR, "Could not complete multipart upload"));
}
return ok("Multipart Upload completed");
} catch (WrappedResponse wr) {
return wr.getResponse();
}
}

/**
* Add a File to an existing Dataset
*
Expand Down