Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Different languages for model specification #538

Merged
merged 13 commits into from
Jul 3, 2024

Conversation

dweindl
Copy link
Member

@dweindl dweindl commented Mar 18, 2022

Motivation

There are a number of formats for specifying models in systems biology, each with their specific strengths and weaknesses. PEtab version 1.0.0 only allows Systems Biology Markup Language (SBML) models. While SBML is supported by a large number of tools, there are good reasons to use other formats. For example, rule-based model formats (e.g., BioNetGenLanguage) permit more abstract and compact specification of models based on rules, which are generalisations of reactions. Therefore, and based on user request (#436), we propose to lift PEtab’s restriction to SBML models and allow arbitrary model formats.

Proposed changes

  • Changes to the PEtab YAML file:

    • Change sbml_files to models
    • models entries will be model IDs (following the existing conventions for PEtab IDs) mapping to:
      • location: path / URL to the model
      • language: model format
        Initial set of model format identifiers (to be extended as needed):
        • SBML: sbml
        • CellML: cellml
        • BNGL: bngl
        • PySB: pysb
    • An additional entry for mapping tables (see below) is added

    Example:

    Before:

    format_version: 1
    parameter_file: parameters.tsv
    problems:
    - condition_files:
      - conditions.tsv
      measurement_files:
      - measurements.tsv
      observable_files:
      - observables.tsv
      sbml_files:
      - model1.xml

    After:

    format_version: 2.0.0
    parameter_file: parameters.tsv
    problems:
    - condition_files:
      - conditions.tsv
      measurement_files:
      - measurements.tsv
      observable_files:
      - observables.tsv
      mapping_file: mappings.tsv # optional 
      models:
        id_for_model1:
          location: model1.xml
          language: sbml
  • Changes to the format of existing tables/files:

    • Condition/Observable/Parameter Table
      All symbols that previously referenced the ID of SBML entities, such as parameter IDs or compartment IDs, now refer to (globally unique) named entities in the model, such as parameters, observables, expressions. For example, condition table columns may correspond to parameters, states, species of the referenced model.
      For species, assignments in the condition table set the initial value at the beginning of the simulation for that condition, potentially replacing the initialization from preequilibration. For all other entities, values are statically replaced at all time points. For entities that assign values to other entities, such as SBML AssignmentRules, the value of the target of that rule is statically replaced at all time points.
  • Additional files

    • Mapping Table:
      Mapping PEtab entity IDs to entity IDs in the model. This optional file may be used to reference model entities in PEtab files where the ID in the model would not be a valid identifier in PEtab (e.g., due to containing blanks, dots, or other special characters).
      The tsv file has two mandatory columns: petabEntityId, modelEntityId. Additional columns are allowed. modelEntityIds must be unique identifiers in the model. The mapping table must not map modelEntityIds to petabEntityIds that are also defined in any other part of the PEtab problem. modelEntityId may not refer to other petabEntityIds, including those defined in the mapping table. petabEntityIds defined in the mapping table may be referenced in condition, measurement, parameter and observable tables, but cannot be referenced in the model itself.
      For example, in SBML, local parameters may be referenced as $reactionId.$localParameterId, which are not valid PEtab IDs as they contain a . character. Similarly, this table may be used to reference specific species in a BGNL model which may contain many unsupported characters such as ,, ( or .. However, please note that IDs must exactly match the species names in the BNGL generated network file and no pattern matching will be performed.

Implications

  • Tools need to check the model format and provide an informative message if the given format cannot be handled
  • Validators will skip model-dependent validation when encountering unknown model types - ideally there would be some plugin mechanisms to provide validation

Co-authored by @FFroehlich @fbergmann. Also thanks to everybody participating in these discussions during the last COMBINE meeting.

@paulflang
Copy link
Contributor

no pattern matching will be performed

So just that I understand this correctly, if I want to do pattern matching, I can still create a BNGL observable in the BNGL model like so

begin molecule types
    A(b~c~d)
end molecule types
begin observables
    Species myObservable A()
end observables
...

and use its ID myObservable in PEtab 2.0.0, for instance in the observable table (even without the mapping table, unless for whatever reason, I've decided to use an ID for the BNGL observable that contains characters that are not allowed in PEtab), but not in the condition table (as I would not know how that makes sense, also not if I map it to another petabEntityId in the mapping table, of course).

Anyway, awesome that PEtab will soon allow other model specifications. Thanks and congratulations!

@FFroehlich
Copy link
Collaborator

no pattern matching will be performed

So just that I understand this correctly, if I want to do pattern matching, I can still create a BNGL observable in the BNGL model like so

begin molecule types
    A(b~c~d)
end molecule types
begin observables
    Species myObservable A()
end observables
...

and use its ID myObservable in PEtab 2.0.0, for instance in the observable table (even without the mapping table, unless for whatever reason, I've decided to use an ID for the BNGL observable that contains characters that are not allowed in PEtab), but not in the condition table (as I would not know how that makes sense, also not if I map it to another petabEntityId in the mapping table, of course).

Yes, that is correct. In principle, you could also assign the value of the observable in the condition table, but there probably are only a few circumstances where this would make sense.

Anyway, awesome that PEtab will soon allow other model specifications. Thanks and congratulations!

Thank you!

Copy link
Member

@dilpath dilpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some suggestions, nothing major.

  • "SBML" occurs a few times (e.g. in "doc/documentation_data_format.rst").
  • The order of tables in "doc/documentation_data_format.rst" used to be alphabetical.

doc/_static/petab_schema.yaml Outdated Show resolved Hide resolved
doc/documentation_data_format.rst Outdated Show resolved Hide resolved
doc/documentation_data_format.rst Outdated Show resolved Hide resolved
doc/documentation_data_format.rst Outdated Show resolved Hide resolved
doc/documentation_data_format.rst Outdated Show resolved Hide resolved
doc/documentation_data_format.rst Outdated Show resolved Hide resolved
doc/documentation_data_format.rst Outdated Show resolved Hide resolved
doc/documentation_data_format.rst Outdated Show resolved Hide resolved
doc/documentation_data_format.rst Outdated Show resolved Hide resolved
doc/_static/petab_schema.yaml Outdated Show resolved Hide resolved
Copy link
Collaborator

@FFroehlich FFroehlich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

dweindl added a commit to PEtab-dev/libpetab-python that referenced this pull request Jun 22, 2022
Add abstraction for models. This helps to keep libsbml code closely together and to potentially accommodate non-SBML models in the future(PEtab-dev/PEtab#538).

Wraps `libsbml.Model` using `petab.models.Model` and replaces the respective function arguments. In the (what I consider) most relevant function for downstream use, the `sbml_model` argument is kept, but deprecated.


Co-authored-by: Dilan Pathirana <59329744+dilpath@users.noreply.github.com>
@FFroehlich FFroehlich marked this pull request as ready for review January 31, 2023 13:37
dweindl added a commit to PEtab-dev/petab_test_suite that referenced this pull request Feb 28, 2023
including test cases for multilanguage proposal  PEtab-dev/PEtab#538
doc/_static/petab_schema.yaml Outdated Show resolved Hide resolved
@dweindl dweindl linked an issue Jun 27, 2023 that may be closed by this pull request
@dweindl dweindl linked an issue May 23, 2024 that may be closed by this pull request
@dweindl dweindl linked an issue Jul 3, 2024 that may be closed by this pull request
@dweindl dweindl mentioned this pull request Jul 3, 2024
@dweindl dweindl merged commit 5c66e3d into release/2.0.0 Jul 3, 2024
1 check passed
@dweindl dweindl deleted the multilanguage branch July 3, 2024 15:13
dweindl added a commit that referenced this pull request Jul 3, 2024
# Motivation

There are a number of formats for specifying models in systems biology, each with their specific strengths and weaknesses. PEtab version 1.0.0 only allows  Systems Biology Markup Language (SBML) models. While SBML is supported by a large number of tools, there are good reasons to use other formats. For example, rule-based model formats (e.g., BioNetGenLanguage) permit more abstract and compact specification of models based on rules, which are generalisations of reactions. Therefore, and based on user request (#436), we propose to lift PEtab’s restriction to SBML models and allow arbitrary model formats.

# Proposed changes

* Changes to the PEtab YAML file:
  * Change `sbml_files` to `models`
  * `models` entries will be model IDs (following the existing conventions for PEtab IDs) mapping to:
    * `location`: path / URL to the model
    * `language`: model format
      Initial set of model format identifiers (to be extended as needed):
      * SBML: `sbml`
      * CellML: `cellml`
      * BNGL: `bngl`
      * PySB: `pysb`
  * An additional entry for mapping tables (see below) is added

  Example:

  **Before:**
  ```yaml
  format_version: 1
  parameter_file: parameters.tsv
  problems:
  - condition_files:
    - conditions.tsv
    measurement_files:
    - measurements.tsv
    observable_files:
    - observables.tsv
    sbml_files:
    - model1.xml
  ```

  **After:**
  ```yaml
  format_version: 2.0.0
  parameter_file: parameters.tsv
  problems:
  - condition_files:
    - conditions.tsv
    measurement_files:
    - measurements.tsv
    observable_files:
    - observables.tsv
    mapping_file: mappings.tsv # optional 
    models:
      id_for_model1:
        location: model1.xml
        language: sbml
  ```



* Changes to the format of existing tables/files:
  * Condition/Observable/Parameter Table
    All symbols that previously referenced the ID of SBML entities, such as parameter IDs or compartment IDs, now refer to (globally unique) named entities in the model, such as parameters, observables, expressions. For example, condition table columns may correspond to parameters, states, species of the referenced model. 
    For species, assignments in the condition table set the initial value at the beginning of the simulation for that condition, potentially replacing the initialization from preequilibration. For all other entities, values are statically replaced at all time points. For entities that assign values to other entities, such as SBML AssignmentRules, the value of the target of that rule is statically replaced at all time points.    
* Additional files
  * Mapping Table: 
    Mapping PEtab entity IDs to entity IDs in the model. This optional file may be used to reference model entities in PEtab files where the ID in the model would not be a valid identifier in PEtab (e.g., due to containing blanks, dots, or other special characters).
    The tsv file has two mandatory columns: `petabEntityId`, `modelEntityId`. Additional columns are allowed. modelEntityIds must be unique identifiers in the model. The mapping table must not map modelEntityIds to petabEntityIds that are also defined in any other part of the PEtab problem. modelEntityId may not refer to other petabEntityIds, including those defined in the mapping table. petabEntityIds defined in the mapping table may be referenced in condition, measurement, parameter and observable tables, but cannot be referenced in the model itself.
    For example, in SBML, local parameters may be referenced as `$reactionId.$localParameterId`, which are not valid PEtab IDs as they contain a `.` character. Similarly, this table may be used to reference specific species in a BGNL model which may contain many unsupported characters such as `,`, `(` or `.`. However, please note that IDs must exactly match the species names in the BNGL generated network file and no pattern matching will be performed. 

# Implications

* Tools need to check the model format and provide an informative message if the given format cannot be handled
* Validators will skip model-dependent validation when encountering unknown model types - ideally there would be some plugin mechanisms to provide validation

--- 

Co-authored by @FFroehlich @fbergmann. Also thanks to everybody participating in these discussions during the last COMBINE meeting.

---------



Co-authored-by: FFroehlich <fabian@schaluck.com>
Co-authored-by: Dilan Pathirana <59329744+dilpath@users.noreply.github.com>
Co-authored-by: Frank T. Bergmann <frank.thomas.bergmann@gmail.com>
dweindl added a commit that referenced this pull request Jul 3, 2024
# Motivation

There are a number of formats for specifying models in systems biology, each with their specific strengths and weaknesses. PEtab version 1.0.0 only allows  Systems Biology Markup Language (SBML) models. While SBML is supported by a large number of tools, there are good reasons to use other formats. For example, rule-based model formats (e.g., BioNetGenLanguage) permit more abstract and compact specification of models based on rules, which are generalisations of reactions. Therefore, and based on user request (#436), we propose to lift PEtab’s restriction to SBML models and allow arbitrary model formats.

# Proposed changes

* Changes to the PEtab YAML file:
  * Change `sbml_files` to `models`
  * `models` entries will be model IDs (following the existing conventions for PEtab IDs) mapping to:
    * `location`: path / URL to the model
    * `language`: model format
      Initial set of model format identifiers (to be extended as needed):
      * SBML: `sbml`
      * CellML: `cellml`
      * BNGL: `bngl`
      * PySB: `pysb`
  * An additional entry for mapping tables (see below) is added

  Example:

  **Before:**
  ```yaml
  format_version: 1
  parameter_file: parameters.tsv
  problems:
  - condition_files:
    - conditions.tsv
    measurement_files:
    - measurements.tsv
    observable_files:
    - observables.tsv
    sbml_files:
    - model1.xml
  ```

  **After:**
  ```yaml
  format_version: 2.0.0
  parameter_file: parameters.tsv
  problems:
  - condition_files:
    - conditions.tsv
    measurement_files:
    - measurements.tsv
    observable_files:
    - observables.tsv
    mapping_file: mappings.tsv # optional 
    models:
      id_for_model1:
        location: model1.xml
        language: sbml
  ```



* Changes to the format of existing tables/files:
  * Condition/Observable/Parameter Table
    All symbols that previously referenced the ID of SBML entities, such as parameter IDs or compartment IDs, now refer to (globally unique) named entities in the model, such as parameters, observables, expressions. For example, condition table columns may correspond to parameters, states, species of the referenced model. 
    For species, assignments in the condition table set the initial value at the beginning of the simulation for that condition, potentially replacing the initialization from preequilibration. For all other entities, values are statically replaced at all time points. For entities that assign values to other entities, such as SBML AssignmentRules, the value of the target of that rule is statically replaced at all time points.    
* Additional files
  * Mapping Table: 
    Mapping PEtab entity IDs to entity IDs in the model. This optional file may be used to reference model entities in PEtab files where the ID in the model would not be a valid identifier in PEtab (e.g., due to containing blanks, dots, or other special characters).
    The tsv file has two mandatory columns: `petabEntityId`, `modelEntityId`. Additional columns are allowed. modelEntityIds must be unique identifiers in the model. The mapping table must not map modelEntityIds to petabEntityIds that are also defined in any other part of the PEtab problem. modelEntityId may not refer to other petabEntityIds, including those defined in the mapping table. petabEntityIds defined in the mapping table may be referenced in condition, measurement, parameter and observable tables, but cannot be referenced in the model itself.
    For example, in SBML, local parameters may be referenced as `$reactionId.$localParameterId`, which are not valid PEtab IDs as they contain a `.` character. Similarly, this table may be used to reference specific species in a BGNL model which may contain many unsupported characters such as `,`, `(` or `.`. However, please note that IDs must exactly match the species names in the BNGL generated network file and no pattern matching will be performed. 

# Implications

* Tools need to check the model format and provide an informative message if the given format cannot be handled
* Validators will skip model-dependent validation when encountering unknown model types - ideally there would be some plugin mechanisms to provide validation

--- 

Co-authored by @FFroehlich @fbergmann. Also thanks to everybody participating in these discussions during the last COMBINE meeting.

---------



Co-authored-by: FFroehlich <fabian@schaluck.com>
Co-authored-by: Dilan Pathirana <59329744+dilpath@users.noreply.github.com>
Co-authored-by: Frank T. Bergmann <frank.thomas.bergmann@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update PETab to be model format agnostic Add support for non-SBML models Handling of local parameters
5 participants