Skip to content

Tutorial@Custom Ingest

Jan Tomášek edited this page Dec 6, 2022 · 12 revisions

This tutorial shows how to set up ARCLib to be able to ingest a new custom type of packages.

General info on ARCLib ingest

ARCLib can ingest any kind of package (so any kind of data) with at least one XML file which contains an identifier of the SIP package. However, the structure of the input package (SIP) has an impact on ARCLib functionality (for example whether ARCLib is able to do fixity checks). The more an input package follows the recommended format (see SIP), the more ARCLib can do.

The process of adopting a new custom package type consists of several steps:

  1. Creating and setting up SIP profile, especially SIP XSL to populate or create input METS metadata and set up paths to a unique SIP ID: Usage@Sip Profiles
  2. Preparing validation profile to provides necessary checks of an input package (e.g. if there are metadata missing, files missing etc.): Usage@Validation Profiles
  3. Creating/selecting workflow profile (defining what steps should be done with the input package): Usage@Workflow Definitions
  4. (Fill in workflow configuration): Usage@Workflow Definitions
  5. (Administrative task: set up producer and tight it with the profiles and workflows above): Usage@Producer Profiles

Creating SIP profile (XSL template)

ARCLib ingest process works with the METS format which must be prepared via XSL template. If the SIP itself contains any METS metadata file then the process of creating new ingest type is easier and more straight forward, because most of the input METS can be just copied. If not, the METS must be created by transforming input XML (metadata) file.

The sample XSL templates are provided.

Only very few METS elements and attributes are mandatory.

Mandatory fields (XPath)

  • /METS:mets/METS:metsHdr/METS:agent
  • /METS:mets/METS:metsHdr/METS:agent/@ROLE
  • /METS:mets/METS:metsHdr/METS:agent/@TYPE
  • /METS:mets/METS:metsHdr/METS:agent/METS:name

For full list of mandatory and optional METS elements see ARCLib XML Index Config.

The other sections of METS document (e.g. METS:structMap) can be provided/copied, except the METS:fileSec which must not be provided by the SIP XSL template.

It is recommended to provide the digital object description metadata in /METS:mets/METS:dmdSec because this is not created by ARCLib. The rest of metadata (e.g. /METS:mets/METS:amdSec/) section are added by ARCLib, but there can be added few sections related to the ARClib namespace (see the ARClib XML schema).

Preparing validation profile

A validation profile defines rules for a SIP package. The rules can check the presence of certain files in SIP package, can validate XML against a defined XML schema and can check a value of any XML node to match a defined pattern. See Usage@Validation Profiles for further details.

Creating/selecting workflow profile

A workflow profiles defines steps which ARCLib will do with the package (check the BPM tasks). For most types of packages the default provided profile does not need to be changed, so can be used as it is. If the SIP package does not provide METS metadata, the workflow profile with switched off fixity check must be used. See the DSpace SAF profile where the workflow profile without fixity check is used.

Finalizing custom profile

The previous profiles can be then combined using the Usage@Producer Profiles. A producer profile must be selected at the start of any SIP package ingest. When creating/editing a producer profile a workflow configuration should be set. The workflow configuration specifies the parameters for the particular BPM tasks and uses JSON format, for further details see Workflow config. The workflow configuration can also be specified at the start of the ingest of a SIP package, which overrides defaults. To leave the defaults set it {}.

Debugging

Please switch on the Debugging mode active to allow the newly created profiles to be changed. By default the created profiles, once set, are not allowed to be changed (this is intended in production environment).

Clone this wiki locally