-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Check if item version is already preserved before bagging (Issue #102) #103
Feat: Check if item version is already preserved before bagging (Issue #102) #103
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I can continue testing, please address
- The issues in my test results sheet.
- The comments in the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The summary log behavior for collections is different than for articles. E.g. when a collection is already in Wasabi, a warning is output for collections but not for articles. Also the counters and text are somewhat different. For consistency, they should be the same ( minus the parts about matching obviously)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your comments have been implemented. You can continue to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The code seems to fail on embargoed content. Please see the spreadsheet.
- I also further cleaned up and prettified the log outputs. The counts haven't changed, I just made them display more consistently (I think). Have a look to see if it makes sense
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sizes of skipped items have been excluded from size calculation for space check.
Description
During preprocessing, this PR checks if a bag exists in AP Trust and Wasabi S3 bucket. It compares the hash of the current item version being prepared for bagging with the item's version hash in AP Trust if the item version has already been preserved. The article version will be skipped if a match is found else its bag will be updated. All activities are logged.
NOTE: This feature may sometimes put a name other than the first author's name in the eventual preservation package file due to the metadata sorting during metadata hash computation.
PROPOSED SOLUTION: Ignore authors' list during sorting while computing metadata hash. This is not included in this PR.
See #93
Documentation Update
Implementation Notes
This PR contains Utils.py in the figshare directory which houses utility functions. The following functions are available in this PR:
Bag checks are carried out in Article.py and Collection.py inside the figshare directory. Logging is done in app.py