Skip to content
Anna Bernasconi edited this page Feb 6, 2020 · 28 revisions

What is Metadata-Manager?

Metadata-Manager is a tool used within GeCo (Data Driven Genomic Computing) projects to manage metadata in several configurable modes.

Through the Metadata-Manager pipeline metadata are made available for all genomic datasets used in GeCo and can be queried in two ways:

  • From GenoSurf (Canakoglu et al. 2019), a web-interface that allows to locate genomic samples based on metadata selection. This is based on the organization of metadata into an integrated relational database, built on the Genomic Conceptual Model (GCM, Bernasconi et al. 2017). Metadata queries over GCM are made available for general use, independent from the GMQL system. They produce the URIs of the relevant data in the source repositories.

  • From the GenoMetric Query Language (GMQL, Masseroli et al. 2015), whose interface is available at http://www.gmql.eu/, described in Masseroli et al. 2019. When metadata queries are applied to the GMQL repository and system, they produce sample identifiers in the Genomic Data Model (GDM, Masseroli et al. 2016).

The Metadata-Manager is structured as a six steps process involving:

  • Downloader: imports metadata at the repository site;
  • Transformer: translates metadata to raw attribute-value pairs;
  • Cleaner: produces a collection of clean metadata pairs for each source;
  • Mapper: extracts information from cleaned pairs and adds it to GCM;
  • Normalizer/Enricher: normalizes GCM values (resorting to generic term-ids that may take specific sets of values) and enriches them (by means of external ontologies) [Work in progress can be found at Metadata-Enricher];
  • Constraint Checker: checks the consistency of the database content with respect to integrity constraints [Work in progress].

Using the Loader to load metadata into the interfaces, three possible modes are available:

  1. metadata are used as they are after transformation;
  2. metadata are used as they are after cleaning;
  3. cleaned metadata are unified with integrated metadata and converted to a flattened format (using the Flattener) usable with GMQL.

overall process

Configuration

To run the applications included in this project, please refer to User Manual

To configure the execution of the Metadata-Manager, please refer to Configuration, which describes in detail the needed XML configuration file. To record the running configurations, the tool uses a database, which allows easy control of changes in the process.

Supporting tools

The Metadata-Manager, described above, is supported by two other small tools which allow user interaction:

  • The Rule Base Generator helps the integration designer to create rules to be used during the Cleaning phase.
  • The User Feedback helps the integration designer to assist the Enrichment process by choosing among proposed solutions, by adding new annotations, or by correcting existing ones. [Work in progress]

Note that Changelog page collects relevant changes concerning the content of the metadata database produced as a result of Metadata-Manager project.