Skip to content

Predefined Profiles

kerschfilip edited this page Nov 3, 2021 · 9 revisions

There are several ingest profiles created during ARCLib development tuned to particular SIP formats produced by various Czech libraries. This page describes these predefined profiles and provides SQL insert scripts to get started.

All predefined profiles are set to run in debug mode so no data are sent to Archival Storage.
All predefined profiles uses a default producer. Transfer area of the default producer is set to default -> folder named default must be created inside the folder defined at arclib.path.fileStorage property in application.yml.

INSERT INTO public.arclib_producer(id, created, updated, name, transfer_area_path) VALUES ('2873084d-a73e-40e8-85ed-7d77e13b36bb', current_timestamp, current_timestamp, 'DefaultProducer', 'default');

If you want to edit the predefined profiles, read tutorial for custom ingest for more information on the rules the output of SIP profile XSLT must follow to create valid ARCLib XML.

NDK

NDK SIP profile is prepared for ingestion of packages that were created based on specifications from National library of Czech Republic. These specifications are published on website: https://old.ndk.cz/standardy-digitalizace/metadata. To this date (February 2021), NDK predefined profile allows ingestion of packages created according to Definition of metadata formats for periodicals, version 1.7.1 (https://old.ndk.cz/standardy-digitalizace/dmf_periodika_1-7-1_opravena_verze_rijen2018) and Definition of metadata formats for monoghraphs, version 1.3.2 (https://old.ndk.cz/standardy-digitalizace/dmf_monografie_1-3-2). Also some packages created based on older versions of these specifications were succesfully ingested during test period. (asi ještě upřesnit).

NDK SIP profile in Arclib extracts these metadata from the incoming SIP package:

  1. descriptive metadata in Dublin Core and MODS schemas for the level of Title, Volume, Issue and Articles in package with periodicals and level of Title, Volume and Chapters in package with monographs. These metadata are extracted from the main mets document in packages.
  2. administrative metadata in PREMIS and MIX metadata schemas are extracted from the secondary mets documents. The aim is to record the format of the files and the tools (HW, SW) that were used during the creation of the package.
  3. copyright metadata in copyrightMD schema might be also extracted, but they are optional in the source package (see https://github.com/LIBCAS/ARCLib/issues/112). Command for extraction of these metadata from NDK SIP package is <xsl:copy-of select="METS:amdSec"/>.

ProArc

This profile is ment for ingesting data created in ProArc (version 3.6 or newer)

Preparation of data for ingestion

Data created in ProArc must be exported using the Archivace type of export.
For the profile to work properly, the resulting package must then be packed according to the BagIt specification. This allows ARCLib to effectively check the fixity of the SIP and all files it consists of.

The structure of SIP.zip must therefore correspond to:

SIP.zip/
|-- SIP
|   \-- data
|       \-- AUDIT
|       \-- DESCRIPTION
|       \-- FOXML
|       \-- FULL
|       \-- NDK
|	    \-- packageid
|	        \-- alto
|	        \-- amdsec
|	        \-- mastercopy
|	        \-- txt
|	        \-- usercopy
|	        \-- info.xml
|	        \-- md5.md5
|	        \-- mets.xml
|       \-- PREVIEW
|       \-- RAW
|       \-- RAW_MIX
|       \-- RELS-EXT
|       \-- THUMBNAIL
|       \-- mets.xml
|   -- manifest-md5.txt
\   -- bagit.txt

Metadata extraction

The profile uses the main METS file of the NDK part of the SIP (located in data/NDK/*/mets*.xml) for extraction of descriptive metadata in Dublin Core and MODS and files in data/NDK/*/admsec for extraction of administrative metadata in PREMIS and MIX.
The method and amount of data extracted into ARCLib AIP XML is thus the same as in the case of NDK SIP profile.

Kramerius

This predefined profile can be used for ingesting FOXML files exported from the Kramerius system. To be sure that ARCLib extracts all metadata correctly (especcialy the label of the SIP), it is highly recommended to always export FOXML files with their parents.

Since packages created by exporting FOXML from Kramerius do not contain one METS file with checksums describing all files enclosed in the SIP, the Fixity checker is not used during ingestion.

DSpace

Simple Archive Format

This format is supported by the almost all versions of DSpace. The SIP package is generated by the [dspace]/bin/dspace export utility. As an --id options only the handle for Item can be used, ARCLib cannot ingest whole Collection now. The exported directory must be zipped and this zip package is ready to be ingested by ARCLib. The MD5 checksum of this zip package is required be the ARCLib.

Since the DSpace Simple Archive Format does not provide metadata in METS, the fixity check for the included Bistreams is not done by ARCLib.

The export example:

$ [dspace]/bin/dspace export --type=ITEM --id=item_handle --dest=/path/to/destination --number=seq_num
$ zip -r dspace_sip.zip /path/to/destination/seq_num
$ md5sum dspace_sip.zip
Clone this wiki locally