-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem: newer pipelines may not be able to handle AIPs made with older pipelines #24
Comments
Progress UpdateThree development branches in three different repos have been created to deal with this issue: The most significant of these is the SS one, which introduces the The general methodology employed here was to use Experiment 1:
|
@hakamine, I've updated JD's branches with a number of fixes. The main changes to support old METS:
|
This is the latest re-ingested METS (using AM68's AIP): METS.xml. There is a known issue with the validator:
I haven't found a way to combine in the same document elements using |
Hi @sevein I have a question re the error output shown in the previous comment. Do the errors show in the output of the |
Hi @hakamine! It's good news that it's working for you. The error mentioned above is only seen during the validation of the METS document generated by reingest of the imported AIP. The validation issue has not been solved yet, unfortunately. We're still trying to find the best way to go. The root cause seems to be that you can only associate a XML prefix to a namespace once. E.g. if |
When trying to index the AM68 AIP, getting the following error:
It seems that the script is looking for an attribute that is not present in the METS file |
@hakamine thanks for the report. I've just added a new commit to handle that scenario where |
Thank you @sevein for the fix. I tested it and the indexing of management the AIP "AM68" (using the custom management command When I try to download any of the files, getting an internal server error, the dashboard log shows a message like:
(not sure if this could be caused by the "vintage" METS or by a bug in the reindex command script that affects all AIPs. If you think it could be the latter let me know and I'll open a separate github issue) |
Thanks, I'll take a look! |
Progress update!
Known issues:
|
@sevein would it be ok if I rebase AM dev/issue-24-handle-old-aips on top of current qa/1.x ? I would like to check if the ES upgrade changes fix the problem with the indexing of AIP files. |
Please do! Thanks. |
It looks like there is something in AM dev/issue-24-handle-old-aips break indexing of AIP files. I did the following test for both AM qa/1.x (c6396b9) and dev/issue-24-handle-old-aips (d4ac060) branches:
While the management command finishes without errors for both branches, the resulting
aipfiles index is attached (dump produced with, e.g., |
@sevein I added some commits to AM branch dev/issue-24-aip-index-premis2-fallback (which is based off dev/index-24-handle-old-aips), in order to try to fix the indexing issues mentioned in the previous comment. So far it seems to be working (I'll continue testing with other AIPs). Please let me know if the code looks good and I'll merge these commits to branch dev/index-24-handle-old-aips It looks we are getting closer to the goal! Should we create PRs for the AM/SS/metsrw dev/issue-24-handle-old-aips branches? Update 1: while the index worked for the "images" sampledata, I am having some problems with AIP AM68 files. Update 2: the problem with the AM68 AIP is detailed in #504 Update 3: added a commit to dev/issue-24-aip-index-premis2-fallback that fixes #504 Update 4: tested the AIP import and reindex management commands with a few more AIPs as shown in the screenshot below. Both the import and reindex commands completed without error messages for all of these tested. However, there is an issue with the "Date stored" value in some AIPs, instead of the date they were originally ingested, it shows the date of the reindex (red box in the screenshot) |
@hakamine nice job on dev/issue-24-aip-index-premis2-fallback, thanks for your comment and all the updates. It's all working nicely locally so I'd suggest to update dev/index-24-handle-old-aips with all your changes. |
@hakamine, FYI Evelyn noticed that after reingest we have
It may be preferably to use LASTMODDATE instead (see https://github.com/artefactual-labs/mets-reader-writer/blob/1f4b7fdbb37512f2c7c42a750b9f08bc73738d9f/metsrw/mets.py#L183-L192), but we won't be addressing that now since it's not a deal breaker. |
Thank you @sevein, dev/issue-24-handle-old-aips has been updated (and deleted dev/issue-24-aip-index-premis-fallback) |
Update:
|
@hakamine can you take a look at this and change the label to Done if you feel confident that this is resolved? |
Tested on AM qa/1.x 50affa (1.10.0-rc.1) SS qa/0.x fbdf31 (0.15.0-rc.1). Import and reindex of CVA old AIPs is working. |
Please describe the problem you'd like to be solved.
Sometimes an institution has been creating AIPs for a long time using various versions of Archivematica and the Storage Service and the institution wants to make sure that those AIPs are usable by their modern (currently installed) Archivematica version, e.g., in order to perform additional preservation actions like adding metadata.
Without modifying the content of such older AIPs, they may need to be compressed and given pointer files that document that compression.
Old AIPs may also have METS files that lack namespaces in their XML elements. It is unclear whether a modern Archivematica pipeline knows how to read such AIPs. Archivematica's METS interpretation functionality (in particular mets-reader-writer) may need to be modified in order to handle such AIPs.
Old AIPs may have accidentally been deleted from a Storage Service database. Their database tables in the SS db may need to be reconstructed from their METS files.
Similarly, old AIPs may have been created on a pipeline that has since been destroyed. Such AIPs may need their SS and MCP (pipeline) database tables reconstructed.
Describe the solution you'd like to see implemented.
I would like to see a case study of the techniques used (or attempted) to make Archivematica flexible and accommodating to various types of vintage AIP. I expect AIP re-ingest may be useful in compressing and generating pointer files for older AIPs, but API calls, Django management commands, and bespoke import/modernize scripts may all be necessary. I expect to see examples of using the AM and SS GUIs and APIs to interact with older AIPs and to discover whether older AIPs can be re-ingested.
Additional context
Imagine you had this:
How would you get to this:?
The text was updated successfully, but these errors were encountered: