Configuring file stores (storage drivers) for individual datasets #7272

landreev · 2020-09-16T16:32:10Z

What this PR does / why we need it:
Even though the title of the issue specifically mentions "direct S3 upload", the PR adds something more general - an ability to designate a file store for a specific dataset. The file store in question does not have to be S3. But enabling direct S3 upload is the main use case behind this PR. For example, if we want to enable direct upload for a specific dataset in prod., without opening it for everybody, we will achieve it by

Creating another S3-type store, pointing to the same storage bucket (for ex., "s3direct")
Enable direct upload on the above.
Configure "s3direct" to be the designated storage driver for the dataset in question, using the API added in this PR:
curl -H "X-Dataverse-key: XXX" -X PUT -d s3direct http://localhost:8080/api/datasets/NNNN/storageDriver

The words "API based" in the issue title refer to the fact that a file store can be configured for a dataset via API only; i.e. there's no GUI for that (like we have for dataverses). But once a direct upload-enabled store has been assigned to a dataset, the uploads will work both via the dataset page, or the API (using the DVUploader utility); just like when it's enabled dataverse-wide.

Which issue(s) this PR closes:

Closes #6872

Special notes for your reviewer:

Suggestions on how to test this:

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

…ses and datasets (but not datafiles). #6872

…d on a dataset (#6872)

…erse-level equivalents. (#6872)

coveralls · 2020-09-16T16:44:30Z

Coverage decreased (-0.007%) to 19.476% when pulling 49ebafc on 6872-direct-upload-for-datasets into dbf0bca on develop.

qqmyers

Looks good to me. The only ~issue I see is that there's no UI indicator that a Dataset isn't inheriting the storageDriver from it's Dataverse, so it may not be obvious when a Dataset has been switched. No reason that can be a future issue though.

landreev · 2020-09-16T18:30:21Z

@qqmyers

Looks good to me. The only ~issue I see is that there's no UI indicator that a Dataset isn't inheriting the storageDriver from it's Dataverse, so it may not be obvious when a Dataset has been switched. No reason that can be a future issue though.

We don't otherwise have a UI indicator showing which store the files are going to, when it's inherited from the dataverse... do we?
Thanks for bringing this up - made me check and realize that there was a whole bunch of places where the upload size limit inherited from the dataverse would be used, instead of the dataset's own, etc...

… checked... all needed to be changed not to assume that it's inherited from the parent dataverse. (#6872)

…IQSS/dataverse into 6872-direct-upload-for-datasets

qqmyers · 2020-09-16T18:35:45Z

Good catch! I was just thinking that an admin can check (or change) the Dataverse setting under General Information, but that value may not be what the dataset has.

kcondon · 2020-09-16T19:51:10Z

(I meant to reply to this comment, not to edit it earlier; on my phone, sorry)

get current driver fails silently with bad id: {} and 500 error in server log. Does indicate bad id if drop last /
This works now.
get available drivers seems to double the driver names:
curl -H "X-Dataverse-key: xxxx-yyyy-zzz" http://localhost:8080/api/admin/dataverse/storageDrivers
{"status":"OK","data":{"s31":"s31","s32":"s32","file2":"file2","file1":"file1"}}'
OK, noted this is due to label and value being the same so label:value is what is displayed here.

landreev · 2020-09-16T20:11:47Z

2. get available drivers seems to double the driver names:
   curl -H "X-Dataverse-key: xxxx-yyyy-zzz" http://localhost:8080/api/admin/dataverse/storageDrivers
   {"status":"OK","data":{"s31":"s31","s32":"s32","file2":"file2","file1":"file1"}}'

Please note that this is not part of my PR - this is an existing API; that was merged some months ago.
It doesn't "double" the driver names. These are "id" and "label" pairs describing each driver. We've been using the same value for both in all the practical use cases (I'm not sure why anyone would want to use different values? Something more descriptive for the label - idk); but they can be different.

qqmyers · 2020-09-16T20:16:50Z

FWIW: TDL has s3 with label "TDL" and s3tacc with label "TACC". Harvard might want labels like 'Normal' and 'Large Data', etc.

landreev · 2020-09-16T20:21:51Z

FWIW: TDL has s3 with label "TDL" and s3tacc with label "TACC". Harvard might want labels like 'Normal' and 'Large Data', etc.

I was thinking something along these lines - there may be a situation where we need to spell out the name of the institution that owns a specific storage location (or even the name of the grant that pays for it)... We may need/want to display this kind of info on the page later on...

…#6872)

landreev · 2020-09-16T20:30:34Z

1. get current driver fails silently with bad id: {} and 500 error in server log.

True; just checked in a fix. Should be printing the same "no such dataset" error message as the PUT and DELETE versions now.

landreev added 6 commits September 14, 2020 18:17

new get/set/delete APIs for file stores on datasets (#6872)

ee30bdd

renamed the new apis, for consistency; per discussion earlier. #6872

246250d

code rearranged to have the dedicated storage driver for both dataver…

3de07ec

…ses and datasets (but not datafiles). #6872

modifications needed to allow direct uploads when specifically enable…

9ac5e8c

…d on a dataset (#6872)

other parts of the PR for #6872: flyway script, documentation.

872ac52

made the behavior of the APIs more consistent with the existing datav…

66c00e8

…erse-level equivalents. (#6872)

landreev added this to Code Review 🦁 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) via automation Sep 16, 2020

landreev requested a review from qqmyers September 16, 2020 16:35

adding release notes

8ae2fa9

qqmyers approved these changes Sep 16, 2020

View reviewed changes

IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) automation moved this from Code Review 🦁 to QA 🔎✅ Sep 16, 2020

kcondon self-assigned this Sep 16, 2020

landreev added 2 commits September 16, 2020 14:30

Places in the code where the upload size limit and similar things are…

50b12f3

… checked... all needed to be changed not to assume that it's inherited from the parent dataverse. (#6872)

Merge branch '6872-direct-upload-for-datasets' of https://github.com/…

9c7a4ab

…IQSS/dataverse into 6872-direct-upload-for-datasets

better "no such dataset" error message in the GET version of the API. (…

49ebafc

…#6872)

kcondon merged commit efcd24d into develop Sep 16, 2020

IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) automation moved this from QA 🔎✅ to Done 🚀 Sep 16, 2020

kcondon deleted the 6872-direct-upload-for-datasets branch September 16, 2020 21:00

djbrooke added this to the 5.1 milestone Sep 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuring file stores (storage drivers) for individual datasets #7272

Configuring file stores (storage drivers) for individual datasets #7272

landreev commented Sep 16, 2020

coveralls commented Sep 16, 2020 •

edited

qqmyers left a comment

landreev commented Sep 16, 2020

qqmyers commented Sep 16, 2020

kcondon commented Sep 16, 2020 •

edited

landreev commented Sep 16, 2020

qqmyers commented Sep 16, 2020

landreev commented Sep 16, 2020

landreev commented Sep 16, 2020

Configuring file stores (storage drivers) for individual datasets #7272

Configuring file stores (storage drivers) for individual datasets #7272

Conversation

landreev commented Sep 16, 2020

coveralls commented Sep 16, 2020 • edited

qqmyers left a comment

Choose a reason for hiding this comment

landreev commented Sep 16, 2020

qqmyers commented Sep 16, 2020

kcondon commented Sep 16, 2020 • edited

landreev commented Sep 16, 2020

qqmyers commented Sep 16, 2020

landreev commented Sep 16, 2020

landreev commented Sep 16, 2020

coveralls commented Sep 16, 2020 •

edited

kcondon commented Sep 16, 2020 •

edited