Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple support for remote web stores #7325

Merged

Conversation

qqmyers
Copy link
Member

@qqmyers qqmyers commented Oct 13, 2020

What this PR does / why we need it: Enables a class of remote stores with a general mechanism

Which issue(s) this PR closes:

Closes #7324

Special notes for your reviewer:

Suggestions on how to test this: The API should allow testing with any web server. Create a file at a remote URL and verify that Dataverse works (file is downloadable, normal aux files can be created). To test security, the signed URL can be retrieved from the remote server and 'hand' validated (removing the "token" param and running a SHA512 hash on the remainder on an online site to compare with the token.)

EDIT: Also, see the comment below, with the asadmin commands etc. for setting up a sample HTTP store and creating a working Datafile entry with a storage identifier pointing to a remote URL.
There is more information in the issue, and in the design document linked there - https://docs.google.com/document/d/1rDhL2QBY2NhVqan3Mwp11BJ0O9naiFEhvnlbKVDQkBc/edit?usp=sharing - on how this "HTTP overlay" works, and how it stores extra files, etc. (L.A.)

Also - since this PR includes a refactoring of classes related to external tools (due to some code sharing with the OpenDP work which leverages the same URL signing introduced here, some basic regression testing of external tool config/launch should be done. Essentially, the code to create the parameters needed by a previewer (or other tool) on its command line has been moved to a new base class. If configuring a previewer and seeing the preview works, it would confirm that move hasn't accidentally broken anything. (This change should not make any changes to how external tools work, it is just internal refactoring.)

Does this PR introduce a user interface change? If mockups are available, please link/include them here: API only at this point.

Is there a release notes update needed for this change?:

Additional documentation:

@coveralls
Copy link

coveralls commented Oct 13, 2020

Coverage Status

Coverage increased (+0.2%) to 19.791% when pulling 70a8b3b on GlobalDataverseCommunityConsortium:IQSS/7324_TRSA-HTTP-store into 6bd3ec6 on IQSS:develop.

@landreev landreev self-requested a review October 13, 2020 22:39
@landreev landreev self-assigned this Oct 13, 2020
@qqmyers
Copy link
Member Author

qqmyers commented Oct 14, 2020

Verified this now works with file backing store. For example, using a default ansible install of this branch,

cd /usr/local/payara5/glassfish
bin/asadmin create-jvm-options -Ddataverse.files.trsa.type=http
bin/asadmin create-jvm-options -Ddataverse.files.trsa.label=trsa
bin/asadmin create-jvm-options -Ddataverse.files.trsa.secretkey=12345
bin/asadmin create-jvm-options -Ddataverse.files.trsa.baseStore=file
bin/asadmin create-jvm-options -Ddataverse.files.trsa.url-expiration-minutes=120
bin/asadmin create-jvm-options '-Ddataverse.files.trsa.baseUrl=https\://qdr.syr.edu'
service payara restart

Create a dataset, get your api key and submit a 'file', e.g.
curl -H X-Dataverse-key:affc67b6-c57d-430f-b1a7-18b518460817 -X POST -F 'jsonData={"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"trsa://themes/custom/qdr/images/CoreTrustSeal-logo-transparent.png", "checksumType":"MD5", "md5Hash":"509EF88AFA907EAF2C17C1C8D8FDE77E", "label" : "testlogo.png", "fileName":"testlogo.png","mimeType":"image/png"}' "http://localhost:8080/api/datasets/:persistentId/add?persistentId=doi:10.5072/FK2/LLR3PR"

The main difference from submitting a file is that instead of a file, you send fileName, mimeType, storageIdentifier, and md5Hash values in the jsonData parameter (which is what the S3 direct upload does as well.)

In the db, the storageIdentifier is expanded to :
trsa://1752915917f-637c086e157e//themes/custom/qdr/images/CoreTrustSeal-logo-transparent.png
which includes the storageIdentifier used by the underlying file store.

normal download works, and setting:
bin/asadmin create-jvm-options -Ddataverse.files.trsa.download-redirect=true
and restarting results in a redirect to:
https://qdr.syr.edu/themes/custom/qdr/images/CoreTrustSeal-logo-transparent.png?until=2020-10-15T00:58:32.487&method=GET&token=a4d2115eb0b4a6e762455e45e3ed62ffd0e88bcccf1a9d5142f39d34b98b48d194ff561f5b6cec39592032adbc6ece736202008966202fd3f63e1c660951be09
Removing the token and rehashing the URL up to and including "&token=" appended with the secret key results in the same SHA512 token. Using it for a GET and prior to the time shown, would represent a valid URL. (The extra params are ignored at QDR as the URL is public and they haven't implemented a validator. - It was just a convenient URL to use).

Testing aux files and S3 next.

@landreev landreev removed their assignment Oct 15, 2020
@qqmyers
Copy link
Member Author

qqmyers commented Oct 15, 2020

Aux files work with a file store:
image

  • I think this means the current PR works as intended for file backing stores if anyone would like to try it out.
    Still looking for S3 issues _ I expect there are some.

@qqmyers
Copy link
Member Author

qqmyers commented Oct 15, 2020

S3 is also working now. That's everything that was originally defined.

@pdurbin
Copy link
Member

pdurbin commented Aug 10, 2022

it would be better to add a note

@akio-sone thanks, I talked to @qqmyers and he plans to add a note.

@kcondon kcondon self-assigned this Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
HDC Harvard Data Commons HDC: 1 Harvard Data Commons Obj. 1 NIH OTA: 1.1.1 1 | 1.1.1 | Minimum Viable Product (MVP) for registering metadata in the repository and connectin... pm.GREI-d-1.1.1 NIH, yr1, aim1, task1: MVP for registering metadata in the repository
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simple support for remote web stores
10 participants