# Validating simulation projects (COMBINE-OMEX archives)

This tutorial illustrates how to validate simulation projects encapsulated as [COMBINE/OMEX archives](https://combinearchive.org/). By default, this includes the validations listed below. As discussed in the final section, these validations can optionally be disabled.

* Archive is a valid zip file
* [Simulation Experiment Description Markup Language (SED-ML)](http://sed-ml.org/) files
* The source (e.g., BNGL, CellML, RBA, SBML, XPP file) of each model of each SED-ML file
* The targets for these model sources specified in the SED-ML files
* Image files (BMP, GIF, JPEG, PNG, TIFF, WEBP)
* [OMEX Metadata](https://sys-bio.github.io/libOmexMeta/) files
* [OMEX Manifest](https://combinearchive.org/) files

This format uses URIs to describe the format of each file in archives. `biosimulators_utils.combine.data_model.CombineArchiveContentFormat` enumerates many formats commonly used with COMBINE/OMEX archives.

<div class="alert alert-block alert-info">
    SED-ML is under active development. The SED-ML community is working toward a single interpretation of SED-ML. BioSimulators supports the interpretation of SED-ML described at <a href="https://docs.biosimulations.org/concepts/conventions/simulation-experiments/">https://docs.biosimulations.org</a>.
</div>

<div class="alert alert-block alert-info">
    BioSimulations relies on community-contributed validation tools for each model language to validate models. The thoroughness of these tools varies. In addition, some tools currently provide limited diagnostic information. We welcome contributions of improved validation tools.
</div>

## 1. Validate a COMBINE/OMEX archive online

The easiest way to validate a COMBINE/OMEX archive file is to use the web interface at https://run.biosimulations.org. An HTTP API for validating COMBINE/OMEX archive files is also available at [https://combine.api.biosimulations.org](https://combine.api.biosimulations.org/).

## 2. Validate a COMBINE/OMEX archive file with the BioSimulators command-line application

First, install [BioSimulators-utils](https://github.com/biosimulators/Biosimulators_utils). Installation instructions are available at [https://docs.biosimulators.org](https://docs.biosimulators.org/Biosimulators_utils). Note, BioSimulators-utils must be installed with the installation options for the model languages that you wish to validate. A Docker image with BioSimulators utils and all dependencies is also available ([`ghcr.io/biosimulators/biosimulators`](https://github.com/biosimulators/Biosimulators/pkgs/container/biosimulators)).

Inline help for the `biosimulators-utils` command-line program is available by running the program with the `--help` option.

In [1]:
!biosimulators-utils --help

usage: biosimulators-utils [-h] [-d] [-q] [-v]
                           {convert,exec,validate-project,validate-metadata,validate-simulation,validate-model,build-project}
                           ...

Utilities for working with containerized biosimulation tools

optional arguments:
  -h, --help            show this help message and exit
  -d, --debug           full application debug mode
  -q, --quiet           suppress all console output
  -v, --version         show program's version number and exit

sub-commands:
  {convert,exec,validate-project,validate-metadata,validate-simulation,validate-model,build-project}
    convert             Convert files among formats
    exec                Execute a model project (COMBINE/OMEX archive)
    validate-project    Validate a model project (COMBINE/OMEX archive)
    validate-metadata   Validate metadata (OMEX Metadata file)
    validate-simulation
                        Validate a simulation experiment (SED-ML file)
    validate-model   

In [2]:
!biosimulators-utils validate-project --help

usage: biosimulators-utils validate-project [-h] filename

Validate a modeling project (COMBINE/OMEX archive and its contents)

positional arguments:
  filename    Path to a COMBINE/OMEX archive

optional arguments:
  -h, --help  show this help message and exit


Next, use the command-line program to validate the simulation project (`../_data/Ciliberto-J-Cell-Biol-2003-morphogenesis-checkpoint-continuous.omex`).

In [3]:
!biosimulators-utils validate-project ../_data/Ciliberto-J-Cell-Biol-2003-morphogenesis-checkpoint-continuous.omex

  - Model `Ciliberto2003_Morphogenesis` may be invalid.
    - The model file `BIOMD0000000297_url.xml` may be invalid.
      - The value of the 'sboTerm' attribute on a <species> is expected to be an SBO identifier (http://www.biomodels.net/SBO/). In SBML Level 2 prior to Version 4 it is expected to refer to a participant physical type (i.e., terms derived from SBO:0000236, "participant physical type"); in Versions 4 and above it is expected to refer to a material entity (i.e., terms derived from SBO:0000240, "material entity").
        Reference: L2V4 Section 5
         SBO term 'SBO:0000014' on the <species> is not in the appropriate branch.
        
      - The value of the 'sboTerm' attribute on a <species> is expected to be an SBO identifier (http://www.biomodels.net/SBO/). In SBML Level 2 prior to Version 4 it is expected to refer to a participant physical type (i.e., terms derived from SBO:0000236, "participant physical type"); in Versions 4 and above it is expected to refer to 

If the archive is invalid, a list of errors will be printed to your console.

## 3. Validate a COMBINE/OMEX archive file programmatically with Python

First, install [BioSimulators-utils](https://github.com/biosimulators/Biosimulators_utils). Installation instructions are available at [https://docs.biosimulators.org](https://docs.biosimulators.org/Biosimulators_utils). Note, BioSimulators-utils must be installed with the installation options for the model languages that you wish to validate. A Docker image with BioSimulators utils and all dependencies is also available ([`ghcr.io/biosimulators/biosimulators`](https://github.com/biosimulators/Biosimulators/pkgs/container/biosimulators)).

Next, use the `CombineArchiveReader` class to read and unpack the archive. Note, `CombineArchiveReader` raises exceptions on archives that cannot be reader (e.g., corrupt zip files).

In [4]:
from biosimulators_utils.combine.io import CombineArchiveReader
import os
import tempfile

archive_filename = '../_data/Ciliberto-J-Cell-Biol-2003-morphogenesis-checkpoint-continuous.omex'
if not os.path.isdir('tmp'):
    os.mkdir('tmp')
archive_dirname = tempfile.mkdtemp(dir='tmp')
archive = CombineArchiveReader().run(archive_filename, archive_dirname)

Next, use the `validate` method to validate the content of the archive.

In [5]:
from biosimulators_utils.combine.validation import validate
errors, warnings = validate(archive, archive_dirname)

  - Model `Ciliberto2003_Morphogenesis` may be invalid.
    - The model file `BIOMD0000000297_url.xml` may be invalid.
      - The value of the 'sboTerm' attribute on a <species> is expected to be an SBO identifier (http://www.biomodels.net/SBO/). In SBML Level 2 prior to Version 4 it is expected to refer to a participant physical type (i.e., terms derived from SBO:0000236, "participant physical type"); in Versions 4 and above it is expected to refer to a material entity (i.e., terms derived from SBO:0000240, "material entity").
        Reference: L2V4 Section 5
         SBO term 'SBO:0000014' on the <species> is not in the appropriate branch.
        
      - The value of the 'sboTerm' attribute on a <species> is expected to be an SBO identifier (http://www.biomodels.net/SBO/). In SBML Level 2 prior to Version 4 it is expected to refer to a participant physical type (i.e., terms derived from SBO:0000236, "participant physical type"); in Versions 4 and above it is expected to refer to 

Human-readable messages can then be printed out using the `flatten_nested_list_of_strings` method.

In [6]:
from biosimulators_utils.utils.core import flatten_nested_list_of_strings
from warnings import warn

if warnings:
    warn(flatten_nested_list_of_strings(warnings), UserWarning)

if errors:
    raise ValueError(flatten_nested_list_of_strings(errors))

  - Model `Ciliberto2003_Morphogenesis` may be invalid.
    - The model file `BIOMD0000000297_url.xml` may be invalid.
      - The value of the 'sboTerm' attribute on a <species> is expected to be an SBO identifier (http://www.biomodels.net/SBO/). In SBML Level 2 prior to Version 4 it is expected to refer to a participant physical type (i.e., terms derived from SBO:0000236, "participant physical type"); in Versions 4 and above it is expected to refer to a material entity (i.e., terms derived from SBO:0000240, "material entity").
        Reference: L2V4 Section 5
         SBO term 'SBO:0000014' on the <species> is not in the appropriate branch.
        
      - The value of the 'sboTerm' attribute on a <species> is expected to be an SBO identifier (http://www.biomodels.net/SBO/). In SBML Level 2 prior to Version 4 it is expected to refer to a participant physical type (i.e., terms derived from SBO:0000236, "participant physical type"); in Versions 4 and above it is expected to refer to 

The output of the `CombineArchiveReader.run` method (`archive`) is an instance of `biosimulators_utils.combine.data_model.CombineArchive` that represents the contents of the archive. This includes the path to each file, the format of each file (described using URIs as explained above), and a flag (`master`) that indicates whether each file should be considered the primary entrypoint to the archive.

In [7]:
vars(archive)

{'contents': [<biosimulators_utils.combine.data_model.CombineArchiveContent at 0x7f04cea0f220>,
  <biosimulators_utils.combine.data_model.CombineArchiveContent at 0x7f04cea0f1c0>,
  <biosimulators_utils.combine.data_model.CombineArchiveContent at 0x7f04cea0f2b0>,
  <biosimulators_utils.combine.data_model.CombineArchiveContent at 0x7f04cea0f2e0>,
  <biosimulators_utils.combine.data_model.CombineArchiveContent at 0x7f04cea0f310>,
  <biosimulators_utils.combine.data_model.CombineArchiveContent at 0x7f04cea0f340>,
  <biosimulators_utils.combine.data_model.CombineArchiveContent at 0x7f04cea0f370>,
  <biosimulators_utils.combine.data_model.CombineArchiveContent at 0x7f04cea14ee0>],
 'description': None,
 'authors': [],
 'created': None,
 'updated': None}

In [8]:
vars(archive.contents[0])

{'location': './BIOMD0000000297_url.xml',
 'format': 'http://identifiers.org/combine.specifications/sbml',
 'master': False,
 'description': None,
 'authors': [],
 'created': None,
 'updated': None}

# 4. Validation options

By default, BioSimulators aims to validate COMBINE/OMEX archives as comprehensively as possible. Optionally, each validation can be disabled. All of the validation interfaces support the same options.

The online validator provides a graphical menu of validation options. The options for the HTTP API are described in its documentation. Validation options for the command-line application can be set via environment variables. Validation options for the Python API can be set by passing an additional `config` keywords argument to the `CombineArchiveReader.run` and `validate` methods.

| Option                                     | HTTP API query parameter  | CLI environment variable     | Python configuration attribute | Default          | 
| ------------------------------------------ | ------------------------- | ---------------------------- | ------------------------------ | ---------------- |
| OMEX Metadata format                       | `omexMetadataInputFormat` | `OMEX_METADATA_INPUT_FORMAT` | `OMEX_METADATA_INPUT_FORMAT`   | `rdfxml`         |
| OMEX Metadata schema                       | `omexMetadataSchema`      | `OMEX_METADATA_SCHEMA`       | `OMEX_METADATA_SCHEMA`         | `BioSimulations` |
| Validate OMEX Manifest                     | `validateOmexManifest`    | `VALIDATE_OMEX_MANIFESTS`    | `VALIDATE_OMEX_MANIFESTS`      | `True`           |
| Validate SED-ML files                      | `validateSedml`           | `VALIDATE_SEDML`             | `VALIDATE_SEDML`               | `True`           |
| Validate sources of models of SED-ML files | `validateSedmlModels`     | `VALIDATE_SEDML_MODELS`      | `VALIDATE_SEDML_MODELS`        | `True`           |
| Validate OMEX Metadata files               | `validateOmexMetadata`    | `VALIDATE_OMEX_METADATA`     | `VALIDATE_OMEX_METADATA`       | `True`           |
| Validate image files                       | `validateImages`          | `VALIDATE_IMAGES`            | `VALIDATE_IMAGES`              | `True`           |

Two conventions for minimum metadata are supported:
* **OMEX Metadata** (`rdf_triples`): This convention does not require any minimal set of metadata. This convention supports anything that can be encoded in RDF. Validation is performed using [libOmexMeta](https://sys-bio.github.io/libOmexMeta/).
* **BioSimulations** (`BioSimulations`): We recommend using the BioSimulations convention. This convention imposes additional requirements beyond the OMEX Metadata convention for minimal metdata about a simulation project, including a title, at least one creator, and a license. More information about this convention is available at [https://docs.biosimulations.org](https://docs.biosimulations.org/concepts/conventions/simulation-project-metadata/).

In [9]:
from biosimulators_utils.config import Config
from biosimulators_utils.omex_meta.data_model import OmexMetadataSchema
config = Config(
    VALIDATE_OMEX_MANIFESTS=True,
    VALIDATE_SEDML=True,
    VALIDATE_SEDML_MODELS=True,
    VALIDATE_OMEX_METADATA=True,
    OMEX_METADATA_SCHEMA=OmexMetadataSchema.biosimulations,
    VALIDATE_IMAGES=True,
)
archive = CombineArchiveReader().run(archive_filename, archive_dirname, config=config)
errors, warnings = validate(archive, archive_dirname, config=config)

<div class="alert alert-block alert-info">
    Plain zip archives can be validated by setting <code>VALIDATE_OMEX_MANIFESTS=False</code>.
</div>

<div class="alert alert-block alert-info">
    Comprehensive validation of SED-ML requires <code>VALIDATE_SEDML=True</code> and <code>VALIDATE_SEDML_MODELS=True</code>. If the latter is false, the sources of the models in SED-ML files and targets to these models will not be validated.
</div>

<div class="alert alert-block alert-info">
    OMEX Manifests are required for validating OMEX Metadata files. As a result, <code>VALIDATE_OMEX_METADATA=False</code> should often be used together with <code>VALIDATE_OMEX_MANIFESTS=False</code>.
</div>

<div class="alert alert-block alert-info">
    When OMEX Metadata files describe thumbnail images (e.g., for projects), <code>VALIDATE_IMAGES=True</code> is required to comprehensively validate OMEX Metadata files, including linked images.
</div>

<div class="alert alert-block alert-info">
    BioSimulations' minimum metadata conventions are required for publication to BioSimulations'. Currently, publication to BioSimulations is also restricted to RDF-XML OMEX Metadata files.
</div>