Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TDL/7493 improve bag generator failure handling part 2 #8773

Conversation

qqmyers
Copy link
Member

@qqmyers qqmyers commented Jun 3, 2022

What this PR does / why we need it: It looks like a merge with develop after other PRs in the original #8609 PR undid the changes that were supposed to be in the PR. This PR reapplies those and adds a bug fix for the checksum validation that is done when using the local archiver (the paths to find the files in the validator were not updated when the RDA work added a top-level dir within the bag structure.)

Which issue(s) this PR closes:

Closes

Special notes for your reviewer: Argh - I messed up in breaking the 3A work into PRs. Nominally the main changes here were reviewed before. The only addition is passing the bagName to the BagValidationJob class since the bag now has paths like doi-10-5072-fk2abcdef/data/filepathname and the manifest only includes the data/filepathname part.

Suggestions on how to test this: From the #8603 issue - the fix is primarily about avoiding hung connections in cases where physical files are missing. Not sure how far to go in trying to test that, but minimally this should be regression tested (i.e. in the normal case, archival bags are still produced with the local archiver, etc. Similarly in the normal case, there shouldn't be any waring/sever messages from the BagValidationJob class when using the local archiver.

Does this PR introduce a user interface change? If mockups are available, please link/include them here: no

Is there a release notes update needed for this change?: part of #8611

Additional documentation:

@qqmyers qqmyers added HDC Harvard Data Commons HDC: 3a Harvard Data Commons Obj. 3A labels Jun 3, 2022
@qqmyers qqmyers mentioned this pull request Jun 3, 2022
@coveralls
Copy link

coveralls commented Jul 21, 2022

Coverage Status

Coverage decreased (-0.007%) to 19.729% when pulling 57ec341 on TexasDigitalLibrary:TDL/7493-improve_BagGnerator_failure_handling into a97102f on IQSS:develop.

@pdurbin pdurbin self-assigned this Jul 25, 2022
Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything weird in this pull request so I'm sending it to QA.

However, I'm not exactly sure how to test it so @qqmyers you might need to explain a bit more.

@pdurbin pdurbin removed their assignment Jul 25, 2022
@qqmyers
Copy link
Member Author

qqmyers commented Jul 25, 2022

As with other resources not getting released issues, aside from regression testing in the all-good case, the only way to see the problem and that the PR improves things is to configure an error. In this case, one way to cause the file-retrieval HTTP calls (done when creating the archival bag) to fail is to remove the physical files in a dataset you archive (how we discovered the issue at TDL and verified that this fix helps - some old test datasets did not have physical files) but any way of causing file retrieval API calls to fail would work (e.g. using a proxy of some sort.) As I noted in the test instructions though, I'm not sure how much effort on the failure cases in QA makes sense - deleting some test files might be simple enough.

@kcondon kcondon self-assigned this Jul 29, 2022
@kcondon kcondon merged commit 751a008 into IQSS:develop Jul 29, 2022
@pdurbin pdurbin added this to the 5.12 milestone Aug 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
HDC Harvard Data Commons HDC: 3a Harvard Data Commons Obj. 3A
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants