Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have a way of specifying the order by which SIP should be ingested in order to support AIP updates via the ingest process #6

Open
jmaferreira opened this issue Sep 26, 2018 · 4 comments
Assignees
Labels
DILCIS review Issues that need to go for discussion within the DILCIS Board meetings

Comments

@jmaferreira
Copy link
Contributor

jmaferreira commented Sep 26, 2018

In scenarios where SIP updates/replacements are a reality, the order by which the SIPs are applied over the existing AIP is important.

The SIP should be able to identify the AIP and its version/revision so that a verification can be made during ingest.

Ingesting SIPs in the incorrect order will render considerable different results.

@jmaferreira jmaferreira self-assigned this Sep 26, 2018
@jmaferreira jmaferreira changed the title Order of SIP updates How to check the Order of SIP updates Sep 26, 2018
@jmaferreira jmaferreira changed the title How to check the Order of SIP updates Have a way of specifying the order by which SIP should be ingested Dec 19, 2018
@jmaferreira jmaferreira added this to the Unplanned milestone Dec 19, 2018
@jmaferreira jmaferreira removed their assignment Dec 19, 2018
@jmaferreira jmaferreira added the DILCIS review Issues that need to go for discussion within the DILCIS Board meetings label Mar 13, 2019
carlwilson added a commit that referenced this issue Sep 11, 2019
@luis100
Copy link

luis100 commented Dec 23, 2020

@carlwilson why is this issue closed? we believe this is still an issue.

Current strategy in RODA is to have the SIP version number, where the create would be version 1, the first update version 2, the sencond update version 3. The AIP should also have the version number. When ingesting a SIP update, the version number must be exactly equal to the AIP version plus one. As this cannot currently be defined by SIP and AIP metadata, we are defining this in custom descriptive metadata and enforcing these rules in the ingest procedure.

@jmaferreira jmaferreira reopened this Nov 10, 2022
@jmaferreira
Copy link
Contributor Author

This issue has been discussed on the DILCIS board. It has been agreed that there should be instructions about the UPDATE process both on the SIP and the AIP.

@luis100 Can you provide a few paragraphs about how the SIP needs to be changed to support AIP Updates and what needs to happen at the AIP level so that the AIP spec is updated as well?

The decision to include this change will be discussed on the next DILCIS Board meeting.

@jmaferreira jmaferreira changed the title Have a way of specifying the order by which SIP should be ingested Have a way of specifying the order by which SIP should be ingested in order to support AIP updates via the ingest process Nov 10, 2022
@luis100
Copy link

luis100 commented Nov 22, 2022

Currently, the E-ARK SIP specification defined RECORDSTATUS as a way to define how multiple deliveries should be interpreted by the repository.

RECORDSTATUS (string/O): Specifies the status of the METS document. It is used for internal processing purposes.

SIP3: Package status metsHdr/@RECORDSTATUS
A way of indicating the status of the package and to instruct the OAIS on how to properly handle the package. If not set, the expected behaviour is equal to NEW.

The metsHdr is also used to indicate the type of behaviour to be expected from the OAIS when processing a particular SIP. For example, one might indicate that an SIP should be used to "replace" a particular AIP in the repository or that an SIP is meant for "testing" purposes and therefore it should not create an AIP at the end of the ingest process (see attribute metsHdr/@RECORDSTATUS).
From E-ARK SIP specification v2.0

The values include:

  • NEW: A new delivery.
  • SUPPLEMENT: Extends the previous delivery.
  • REPLACEMENT: Replaces a previous delivery.
  • TEST: A test delivery. No AIP should be created.
  • VERSION: A delivery with same content regarding files but one or more files have a new version.
  • DELETE: An order from the Producer to remove an existing AIP.
  • OTHER: Status not in list.

Although is not clear how the SUPPLEMENT and VERSION should affect the AIP, it is usual for such a feature to be needed in production systems, specially when the content transferred to the archive continues to live in the production system. New updates, for example to descriptive metadata, need to be carried on to the AIP accordingly.

But, when a series of "deliveries" of SIPs related or affecting the same AIP, it is of the out-most importante to know if we have all the deliveries and if they are submitted in the correct order.

For example, in a case where a record was defined as ready to be transferred to the archive in the production system, and there was a first "NEW" delivery, then it received an update to the descriptive metadata and set again are ready to be transferred to the archive, creating a new "VERSION" delivery, but again the descriptive metadata was updated, spawning a new "VERSION" delivery. It is important to ensure that the first "VERSION" delivery was applied before the second "VERSION" delivery, or we will end up with the wrong descriptive metadata version. It is also important that we ensure when we are applying the second "VERSION" delivery on top of the first "VERSION" delivery as we might receive the first one later on.

The same can be stated on more complex use cases that would use "SUPPLEMENT", "REPLACEMENT", "VERSION" and "DELETE" deliveries.

The recommendation is to add an additional field of "SUBMISSION_NUMBER", where a "NEW" delivery will always get the "SUBMISSION_NUMBER=0" (default), and any following deliveries will need to increment this number.

The AIP should have a record of all submissions incorporated into it and possibly some additional information of what was affected, for example:

  • Submission number
  • Submission record status (from the RECORDSTATUS vocabulary)
  • Files added, updated or deleted. (or we can delegate this to PREMIS events)
  • Date of incorporation of the submission (or we can delegate this to PREMIS events)
  • Agents (human or machine) involved in the incorporation (or we can delegate this to PREMIS events)

@jmaferreira
Copy link
Contributor Author

jmaferreira commented Dec 15, 2022

During the DILCIS Board (2022-12-15) it has been stated that:

  • FGS has the same concept. Its is called PACKAGENUMBER.
  • Archivematica also has a similar concept on their SIP.

Action:

  • @luis100 - The SIP has a creation date and a modification date. @luis100 Could that work as a way to determine the order of ingest?

Decision:

  • This suggestion will remain open until it has more endorsement from the community.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DILCIS review Issues that need to go for discussion within the DILCIS Board meetings
Projects
None yet
Development

No branches or pull requests

3 participants