Overview

The Metadabus was created to automate management of data that was being loaded into Solr for the Canadiana Access Platform (CAP). The metadata is based on a schema called the Canadiana Metadata Repository (CMR) record.

The metadata bus is a series of data processing scripts and tools which allow metadata to flow between stages from when an artifact is first acquired by Canadiana all the way to when it is viewable on the platform. The output of the Metadata Bus processes are derivatives of the source data collected during the preservation and archive processes, which are formatted in such a way that allow for easy public consumption.
Changes or new additions to the source data are queued, processed and updated across public platforms.

Services

The Metadata Bus includes the following services:

Smelter:

Reads METS records from the repository and generates canvas and manifest records.

Manifest records have a noid for an _id and a slug which is set by Smelter

Canvas records have a noid for an _id, and are not tied to any specific manifest.

Hammer2:

Handles the processing of individual manifests

Reads a _view in the manifest and collection documents to read data from those documents, and potentially from XML descriptive metadata files (in swift) and updates the cosearch and copresentation databases.

Solrstream:

Streams updates that occur in the search database to individual Solr cores. Solr is an enterprise search engine platform.

Reposync:

Keeps the dipstaging and wipmeta database up to date with the public availability of replicas of AIP content in repositories.

OCR:

todo

DMD:

todo

Databases

The above services interact with the following 'Access Databases':

Dipstaging:

Derived Data

Ids are AIP IDs

Used by Smelter and reposync

Process data from the repository to create manifests dents

Documents are created by reposync on data in the repository

Canvas:

Source Data

Ids are noids

Used to store information about individual images

Analogous to sequences within internalmeta records

Access - Manifest:

Source Data

Ids are noids

Used to store information about groups of canvases

Analogous to internalmeta records

Access - Collection:

Source Data

Ids are noids

Used to store info about groups of manifests and/or other collections.

Combines both the concepts of series records and the collection tags in internalmeta

An ordered collection references it's child manifests

Before, an issue pointed to a parent series

COSearch:

Derived Data

Ids are noids or slug

Analogous to cosearch database

Streamed to Solr

COPresentation:

Derived Data

Ids are noids or slug

Analogous to copresentation database

Read by CAP (Canadiana Access Platform)

OCR:

todo

DMD Task:

todo

Name		Name	Last commit message	Last commit date
Latest commit History 219 Commits
CIHM-Meta		CIHM-Meta
CIHM-Normalise		CIHM-Normalise
CIHM-Swift @ 4f56f6e		CIHM-Swift @ 4f56f6e
data		data
dist		dist
doc		doc
xml		xml
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
HISTORY.md		HISTORY.md
LICENSE		LICENSE
README.md		README.md
aliases		aliases
cpanfile		cpanfile
deployimage.sh		deployimage.sh
docker-compose.yml		docker-compose.yml
env-dist		env-dist
log4perl.conf		log4perl.conf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Services

Smelter:

Hammer2:

Solrstream:

Reposync:

OCR:

DMD:

Databases

Dipstaging:

Canvas:

Access - Manifest:

Access - Collection:

COSearch:

COPresentation:

OCR:

DMD Task:

More Information

About

Contributors 3

Languages

License

crkn-rcdr/cihm-metadatabus

Folders and files

Latest commit

History

Repository files navigation

Overview

Services

Smelter:

Hammer2:

Solrstream:

Reposync:

OCR:

DMD:

Databases

Dipstaging:

Canvas:

Access - Manifest:

Access - Collection:

COSearch:

COPresentation:

OCR:

DMD Task:

More Information

About

Resources

License

Stars

Watchers

Forks

Contributors 3

Languages