-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved handling of Globus uploads (experimental async framework) #10781
Conversation
* externally. (?) | ||
*/ | ||
@NamedQueries({ | ||
@NamedQuery( name="ExternalFileUploadInProgress.deleteByTaskId", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [reviewdog] <com.puppycrawl.tools.checkstyle.checks.whitespace.FileTabCharacterCheck> reported by reviewdog 🐶
File contains tab characters (this is the first instance).
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I'm not sure what's going on with Jenkins/integration tests. But I'll figure that out separately. I'm going to go ahead and un-draft the PR... |
This comment has been minimized.
This comment has been minimized.
src/main/java/edu/harvard/iq/dataverse/engine/command/impl/UpdateDatasetVersionCommand.java
Outdated
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Looks good to me. I made some comments/responded to notes in the code, but other than any open @todos and some minor cleanup, it looks ready for QA to me. Goodbye to exec-ing curl!
I'll put request changes to avoid it switching columns until @landreev is ready, but probably no need for another review of changes.
src/main/java/edu/harvard/iq/dataverse/ExternalFileUploadInProgress.java
Outdated
Show resolved
Hide resolved
@@ -4034,6 +4035,8 @@ public Response addGlobusFilesToDataset(@Context ContainerRequestContext crc, | |||
return wr.getResponse(); | |||
} | |||
|
|||
// @todo check if the dataset is already locked! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense - could there be a constraint when trying to create a lock, e.g. only one lock of a given time per dataset rather than a separate check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You meant "of a given type", right? The addDatasetLock() already has a check for an existing lock of the same type (then it will just return the existing lock instead of creating a new one).
As for implementing this as a hard constraint - are we positive we'll never want multiple locks of the same type - some workflows maybe?
I addressed this specific situation with a separate lock check. We don't check for locks consistently in our APIs, I'm assuming under the assumption that it will be done when the the relevant commands are executed. But in this specific case, there is a potential for doing a huge amount of work before that UpdateDatasetVersionCommand is called in the end.
src/main/java/edu/harvard/iq/dataverse/engine/command/impl/UpdateDatasetVersionCommand.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/harvard/iq/dataverse/globus/TaskMonitoringServiceBean.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/harvard/iq/dataverse/ingest/IngestServiceBean.java
Outdated
Show resolved
Hide resolved
} catch (UnknownHostException ex) { | ||
Logger.getLogger(DataverseTimerServiceBean.class.getName()).log(Level.SEVERE, null, ex); | ||
} | ||
|
||
if (timer.getInfo() instanceof MotherTimerInfo) { | ||
logger.info("Behold! I am the Master Timer, king of all timers! I'm here to create all the lesser timers!"); | ||
logger.fine("Behold! I am the Master Timer, king of all timers! I'm here to create all the lesser timers!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to miss these :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can still have it - at a low price of one FINE logging setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha! It's like an old friend at this point. 😄
I resolved the merge conflicts yesterday. Also added some info under "How to Test". |
Resolved conflicts: src/main/java/edu/harvard/iq/dataverse/api/Datasets.java (#10623)
Resolved another merge conflict. |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like my comments have been addressed. I still see the review dog complaint about tab chars in ExternalFileUploadInProgress - can that be fixed so we avoid the style fail?
This comment has been minimized.
This comment has been minimized.
@qqmyers good catch. I put in a commit to quiet down that yappy dog: 7b6f81e I also made a small doc tweak: 2baf62e Then I tested on internal. With the new flag off I was able to upload file. ✅ With the flag on, Globus is telling me "transfer complete" but the file is not in the dataset and the dataset has the "Globus Transfer in Progress" lock. ❌ @landreev is going to take a look. |
Conflicts: doc/sphinx-guides/source/installation/config.rst
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
📦 Pushed preview images as
🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name. |
Turned out to be a doc problem fixed now. Works great! Especially for an experimental feature. 😜 Also, API tests are passing as of the last code change: https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-10781/7/testReport/ Merging. |
What this PR does / why we need it:
It solves the 2 problems described in the linked issue:
/addFiles
via curl has been replaced with calling the corresponding method in AddReplaceFileHelper. In addition to the problems with the curl implementation already described in the issue, the old code was not making any attempt to parse the output of the api; so if/addFiles
dropped one or more files from the submitted entries, for whatever reason, the Globus service method was still registering that as a complete success. The new code checks the results in a more careful way.The handling of diagnostics and notifications have generally been improved in the PR. Among other things, there are now explicit failure notifications that are sent to the users when uploads fail completely. The existing implementation only logs such events without notifying the user.
Which issue(s) this PR closes:
Special notes for your reviewer:
Note that I don't use the EJB timers for the scheduled execution of the task polling. I used the
jakarta.enterprise.concurrent.ManagedScheduledExecutorService
instead which, from what I understand, is a preferred way now.Suggestions on how to test this:
In order to test this PR, and other Globus-related things going forward, I added a Globus setup to dataverse-internal yesterday (Globus-enabled storage configuration plus the upload/download web app). This volume:
Is connected to remote storage at NESE that was set up for us specifically for testing uploads and downloads. This setup is identical to what was previously configured on demo.dataverse.org. I haven't tested it much myself though.
The instructions on how to upload data via Globus in Datataverse can be found here: https://github.com/IQSS/dataverse.harvard.edu/blob/master/doc/globus/upload.md.
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?:
Additional documentation:
Preview at https://dataverse-guide--10781.org.readthedocs.build/en/10781/installation/config.html#dataverse-globus-taskmonitoringserver