-
Notifications
You must be signed in to change notification settings - Fork 0
Architecture
The following picture describe the metadata flow and the architecture of the metadata component:

The workflow includes three major software components, indicated with the surrounding boxes: the iRODS instance containing the data, the messaging system, where the changes in the data and metadata are collected and the Neo4j graph as the database holding the representation of the relations between data and metadata and the metadata itself.
The starting point is then the need to safe the data with B2SAFE in iRODS or to save a change of it. This one can see in the figure represented by the blue rectangle marked as a collection structure with the note, that it was uploaded by the use into iRODS. This user action is a trigger that then will be detected by an specific iRODS rule that decide to execute the generation of an collection describing METS manifest according to the conditions of the collection. This generation will be performed by the mets_factory.py script. In case of a change of the collection versioning will be activated and a copy of the old manifest will be saved before the generation of the new one.
Creation or change of a manifest is a trigger for an iRODS rule that pushes this information about a change or creation of a data collection as a message to the messaging system.
This information will than be used by the b2safe_neo4j_client that is scheduled to run periodically by a cronjob. A major script that validates the manifest and uploads the metadata about a new collection into the Neo4j graph DB, if it is a valid METS document, that describes the files that are physically existing in the collection and is without inconsistencies between the FileSec and the StructMap parts. In case of a collection change the b2safe_neo4j_client will compare the old and the new version of the manifest analyzing the changes and decide what than needs to be changed in the graph. This is needed because the collections in iRODS are mostly very large and a creation of a graph representing the whole collection is a time and resource consuming task.