# Validating metadata (OMEX Metadata files)

This tutorial illustrates how to validate [OMEX Metadata](https://sys-bio.github.io/libOmexMeta/) files. OMEX Metadata is a format for capturing metadata about COMBINE/OMEX archives and/or individual items inside archives, such as the organism represented by a model, the author of a simulation experiment, or the date when an archive was created. OMEX Metadata represents metadata as sets of [Resource Description Framework (RDF)](https://en.wikipedia.org/wiki/Resource_Description_Framework) [triples](https://en.wikipedia.org/wiki/Semantic_triple) of subjects (e.g., model in an archive), predicates (e.g., organism that the model represents), and objects (e.g., *Escherichia coli*) that represent entities and relationships between them. OMEX Metadata can flexibly capture a broad range of metadata in machine-readable form.

BioSimulators supports two minimum information conventions for OMEX Metadata files:
* **OMEX Metadata** (`rdf_triples`): No minimum set of information is required. This convention supports anything that can be encoded in RDF. Validation is performed using [libOmexMeta](https://sys-bio.github.io/libOmexMeta/).
* **BioSimulations** (`biosimulations`): We recommend using the BioSimulations minimum information convention. This convention imposes additional requirements beyond the OEMX Metadata convention for minimal metdata about a COMBINE archive, including a title, at least one creator, and a license. More information about this convention is available at [https://docs.biosimulations.org](https://docs.biosimulations.org/concepts/conventions/simulation-project-metadata/).

<div class="alert alert-block alert-info">
    OMEX Metadata files can describe thumbnails for simulation projects. OMEX Metadata files can be validated independently from these thumbnails. However, this will not validate the thumbnails. More comprehensive validation, including related images, is available through the validation of simulation projects (COMBINE/OMEX archives). Please see the <a href="4.%20Validating%20simulation%20projects%20(COMBINE-OMEX%20archives).ipynb">simulation project validation tutorial</a> for more information.
</div>

<div class="alert alert-block alert-info">
    BioSimulations' minimum metadata conventions are required for publication to BioSimulations'. Currently, publication to BioSimulations is also restricted to RDF-XML OMEX Metadata files.
</div>

## 1. Validate an OMEX Metadata file online

The easiest way to validate OMEX Metadata files is to use the web interface at https://run.biosimulations.org. An HTTP API for validating OMEX Metadata files is also available at [https://combine.api.biosimulations.org](https://combine.api.biosimulations.org/).

## 2. Validate an OMEX Metadata file with the BioSimulators command-line application

First, install [BioSimulators-utils](https://github.com/biosimulators/Biosimulators_utils). Installation instructions are available at [https://docs.biosimulators.org](https://docs.biosimulators.org/Biosimulators_utils). A Docker image with BioSimulators utils and all dependencies is also available ([`ghcr.io/biosimulators/biosimulators`](https://github.com/biosimulators/Biosimulators/pkgs/container/biosimulators)).

In [1]:
!biosimulators-utils --help

usage: biosimulators-utils [-h] [-d] [-q] [-v]
                           {convert,exec,validate-project,validate-metadata,validate-simulation,validate-model,build-project}
                           ...

Utilities for working with containerized biosimulation tools

optional arguments:
  -h, --help            show this help message and exit
  -d, --debug           full application debug mode
  -q, --quiet           suppress all console output
  -v, --version         show program's version number and exit

sub-commands:
  {convert,exec,validate-project,validate-metadata,validate-simulation,validate-model,build-project}
    convert             Convert files among formats
    exec                Execute a model project (COMBINE/OMEX archive)
    validate-project    Validate a model project (COMBINE/OMEX archive)
    validate-metadata   Validate metadata (OMEX Metadata file)
    validate-simulation
                        Validate a simulation experiment (SED-ML file)
    validate-model   

In [2]:
!biosimulators-utils validate-metadata --help

usage: biosimulators-utils validate-metadata [-h] filename

Validate metadata (OMEX Metadata file)

positional arguments:
  filename    Path to an OMEX Metadata file

optional arguments:
  -h, --help  show this help message and exit


Next, use the command-line program to validate [metadata about a modeling project](../_data/Ciliberto-J-Cell-Biol-2003-morphogenesis-checkpoint-continuous.rdf).

In [3]:
!OMEX_METADATA_SCHEMA=BioSimulations \
    biosimulators-utils validate-metadata \
        ../_data/Ciliberto-J-Cell-Biol-2003-morphogenesis-checkpoint-continuous.rdf



If the metadata is invalid, a list of errors will be printed to your console.

## 3. Validate an OMEX Metadata file model programmatically with Python

First, install [BioSimulators-utils](https://github.com/biosimulators/Biosimulators_utils). Installation instructions are available at [https://docs.biosimulators.org](https://docs.biosimulators.org/Biosimulators_utils). Note, BioSimulators-utils must be installed with the installation options for the model languages that you wish to validate. A Docker image with BioSimulators utils and all dependencies is also available ([`ghcr.io/biosimulators/biosimulators`](https://github.com/biosimulators/Biosimulators/pkgs/container/biosimulators)).

Next, import BioSimulators-utils' method for reading OMEX Metadata files.

In [4]:
from biosimulators_utils.config import Config
from biosimulators_utils.omex_meta.data_model import OmexMetadataInputFormat, OmexMetadataSchema
from biosimulators_utils.omex_meta.io import read_omex_meta_file

Next, use the `read_omex_meta_file` method to read the OMEX Metadata file. 

In [5]:
omex_metadata_filename = '../_data/Ciliberto-J-Cell-Biol-2003-morphogenesis-checkpoint-continuous.rdf'
config=Config(
    OMEX_METADATA_INPUT_FORMAT=OmexMetadataInputFormat.rdfxml,
    OMEX_METADATA_SCHEMA=OmexMetadataSchema.biosimulations,
)    
metadata, errors, warnings = read_omex_meta_file(omex_metadata_filename, config=config)

The second and third outputs (`errors` and `warnings`) are nested lists of error and warning messages. Next, use the `flatten_nested_list_of_strings` method to print out human-readable messages.

In [6]:
from biosimulators_utils.utils.core import flatten_nested_list_of_strings
from warnings import warn

if warnings:
    warn(flatten_nested_list_of_strings(warnings), UserWarning)

if errors:
    raise ValueError(flatten_nested_list_of_strings(errors))

The first output of `read_omex_meta_file` (`metadata`) contains the metadata parsed from the OMEX Metadata file. If the BioSimulations schema is used, this is a list of dictionaries that follow this schema. If the RDF schema is used, this is list of tuples.

In [7]:
metadata

[{'uri': '.',
  'title': 'Morphogenesis checkpoint in budding yeast (continuous) (Ciliberto et al., J Cell Biol, 2003)',
  'abstract': 'Continuous kinetic model of the molecular network that controls Swe1 and the regulation of cyclin-dependent kinase.',
  'keywords': ['cell cycle',
   'cell division',
   'cyclin-dependent kinase',
   'morphogenesis',
   'signal transduction'],
  'description': 'Set of ordinary differential equations that model the molecular network that controls Swe1 and the regulation of cyclin-dependent kinase. The model is based on published observations of budding yeast and analogous control signals in fission yeast. Simulations of the model accurately reproduce the phenotypes of a dozen checkpoint mutants. Among other predictions, the model attributes a new role to Hsl1, a kinase known to play a role in Swe1 degradation: Hsl1 must also be indirectly responsible for potent inhibition of Swe1 activity. The model supports the idea that the morphogenesis checkpoint, l