Skip to content
ccacciari edited this page Sep 12, 2017 · 12 revisions

B2SAFE metadata component

After the initial design and protoytpe, we have keep improving the component.

The current version is a proof of concept (PoC), which offers the following features:

  • automatic generation of the manifest, using as starting point a description of the data-metadata relations expressed in json-ld format.
  • validation of the manifest
  • upload of the metadata to the graphDB
  • versioning of the manifest
  • comparison of the multiple versions of a manifest and update of the graph according to the differences between them

The current component works as an extension of the b2safe core package (https://github.com/EUDAT-B2SAFE/B2SAFE-core), according to its architecture. So it cannot be deployed independently.

Configuration

The specific rule set for the metadata needs to be added to the iRODS configuration in /etc/irods/server_config.json.
The specific python scripts need to be linked under the iRODS path: /var/lib/irods/msiExecCmd_bin.
Assuming the component is deployed in the following path: /opt/eudat/b2safe-metadata, then a set of configuration files is placed under /opt/eudat/b2safe-metadata/conf. In particular:

  • mets_factory.conf
  • b2safe_neo4j.conf
  • EudatControlledVocabulary.jsonld
  • metadata.json

they must be modified according to the documentation.
Moreover the path to b2safe_neo4j.conf must be added in file the rulebase/metadata.re:
getMetadataConfParameters(*mdConfPath) { *mdConfPath="/opt/eudat/b2safe-metadata/conf/b2safe_neo4j.conf"; } Finally two additional software are required:

Quick start

Once configured the component can be used to publish the metadata to the local metadata store.
There are a couple of test scripts, which can be used to verify that the configuration is fine.
A typical workflow could be the following one:

  • the user uploads a collection with the manifest,
  • the B2SAFE administrator defines a cron job to execute periodically the rule to extract the system metadata (EUDATPushMetadata(*path, *queue) in the b2safe core component) and push them to a messaging system
  • the B2SAFE administrator defines a cron job to execute periodically the rule (EUDATStoreMetadata(*collPath, *user)) to parse the manifest and create/update the graph in the graphDB (local metadata store)

The manifest can be written by the user or generated automatically, starting from the metadata.json, using the script cmd/mets_factory.py (see executables)

Future developments

Known issues

  • the script cmd/mets_factory.py is able to produce a manifest document, taking as input the description of the metadata relations in json linked data format and the iRODS path of the collection. It is able to link together the root manifest with others, in case they are available in the sub-collections. However, it is not smart enough to exclude from the root manifest the objects, already tracked in the sub-collection manifests, therefore it is possible that the manifests within the same hierarchy have overlaps.

Clone this wiki locally