Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DURACLOUD-1268: refactor the RetrievalTool to create full list of contentIds, including chunked content, when a list-file is used #143

Merged
merged 6 commits into from Jul 21, 2021

Conversation

nwoodward
Copy link
Contributor

This PR changes the DuraStoreSpecifiedRetrievalSource class so that when a list-file is specified it creates a full list of contentIds to be retrieved, including any chunked content of files in the list-file.


JIRA Ticket: https://duracloud.atlassian.net/browse/DURACLOUD-1268

What does this Pull Request do?

This PR changes the class used by the RetrievalTool when a list-file is specified to work more like the default behavior when retrieving a space. As as result of this change, the two classes can now retrieve content with the same function. This PR resolves the case where the RetrievalTool with a list-file specified fails to transfer when restarting from partially chunked files.

How should this be tested?

  • Create a space in DuraCloud and use the SyncTool to add one or more large files that will be chunked
  • Use the version 7.0 of the RetrievalTool with a file containing the contentIds in the space to start retrieving the space
  • Hit Ctrl-C to stop the process. Restart the process and observe that it fails to retrieve the files
  • Repeat the above process with the RetrievalTool built from this branch, and observe that the process finishes successfully

Interested parties

@duracloud/committers

Copy link
Member

@bbranan bbranan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together Nick! This approach seems like a good one. I've made a few suggestions. My primary concern is that we don't end up dropping files from the retrieval list if they haven't yet made it into the space manifest (more details in the specific comment).

It's worth noting that this change will result in no longer being able to request a list of chunk manifest files (or chunk files) to be downloaded independently by the Retrieval Tool. While this need doesn't come up often, I know that this approach has been used to retrieve all chunk manifests in order to perform a final verification of chunk transfers. I still think this is the right direction, I'm just calling out a change that may otherwise be unexpected.

@bbranan bbranan merged commit 80e8116 into duracloud:develop Jul 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants