Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for non-SBML models #436

Open
dweindl opened this issue May 10, 2020 · 6 comments · Fixed by #538
Open

Add support for non-SBML models #436

dweindl opened this issue May 10, 2020 · 6 comments · Fixed by #538
Labels
Milestone

Comments

@dweindl
Copy link
Member

dweindl commented May 10, 2020

Would be great if PEtab would be usable with non-SBML models. Formats to consider would include e.g. cellML, bngl, pysb, .... Personally interested in pysb support.

Will be a significant implementation effort, but probably worth it. Need to abstract from model implementation. For preparation, all libsbml.Model instances should be replaced by new class Model, abstract base class. Concrete implementations would be SbmlModel, PysbModel, ...

This does not imply any changes in file formats (yet). For the next format update, it would be good to already think about how to handle that e.g. in yaml files. Allow for sbml_model_file, xx_model_file, ... or add model_type?

Anybody interested in joining in for that?

@dweindl dweindl added the enhancement New feature or request label May 10, 2020
@dweindl dweindl added this to the Support for additional model formats [PySB] milestone Jul 23, 2020
@dweindl
Copy link
Member Author

dweindl commented Jul 28, 2020

Some notes from discussion with @FFroehlich about enabling use of PySB models:

PEtab problem definition:

  • Tables can be used as is, with minor limitations:
    • PySB models don't have meaningful species IDs and species names are a bit of a pain to be used as identifiers in any formulas. Therefore, observables and complex noise models would not be directly specified in observableFormula and noiseFormula in the measurement table, but one would reference pysb.Expression IDs there.
    • For the same reason, setting initial concentrations in the conditions table is problematic. This would be handled by having parameterized pysb.Initials.
  • YAML file entry sbml_files needs to be generalized or an additional field needs to be added. (Would one want to allow combining models of different formats?)

PEtab library:

  • Add model abstraction layer
    • Create class Model base class
    • Gather all SBML-specific code in class SbmlModel
    • Create class PysbModel

Misc:

  • Would be good to have a copy of the current PEtab test suite with PySB model instead of SBML model

@paulflang
Copy link
Contributor

I had the problem that my model was specified in BNGL and I wanted to create a PEtab problem from it to test several optimizers. SBML export from BNGL worked well, but PEtab did not support many of the characters used in the SBML export, so I wrote a script to remove them.

From my perspective the main advantage of rule-based modelling languages is the higher level of abstraction in model formulation. I think it would be good if PEtab for rule-based modelling languages would support the same level of abstraction, i.e. use patterns rather than species (except that a pattern may uniquely specify a species). In particular, this would be the case for initial concentrations in the conditions table and observableFormula and noiseFormula in the observable table (probably the whole observable table could become optional for rule-based models, as they allow explicit observable specification anyway, which is in contrast to SBML as far as I know; a reasonable default could be chosen for the noiseFormula).

For optimisation toolboxes that only work with SBML, one could provide a petab.Problem.to_sbml method that relies on the export functionality of BNGL/PySB. Sure, some auto-converted SBML species names and observableFormulas would be ugly, but (a) I see no need to write the auto-converted PEtab problem to files (other than for debugging purposes) and (b) even if they are inspected by humans, I could not think of a more systematic and concise way to represent the combinatorial complexity arising from rule-based models (I guess the developers of rule-based languages have spent a lot of thought on that).

Like I said above, I would benefit from BNGL or PySB support myself and would be happy to contribute as far as my abilities allow.

@dweindl
Copy link
Member Author

dweindl commented Feb 8, 2021

Thanks for sharing your views @paulflang.

Agreed, that it would be convenient to allow for patterns in observable/noise formulas and species in the condition table. Might be a bit of a pain to parse, but should be feasible.

For optimisation toolboxes that only work with SBML, one could provide a petab.Problem.to_sbml method that relies on the export functionality of BNGL/PySB.

Sounds useful.

@alubbock
Copy link

alubbock commented Mar 4, 2021

Hey, I'm one of the PySB developers, this sounds like something that would be useful. Happy to support this from the PySB side, if anything needs adding there.

Re: Species IDs, you're correct @dweindl that species aren't canonically named in PySB, since they're generated at runtime via network generation. However, observables are named in a PySB model, so one option would be to refer to observables directly. Likewise for initial conditions, which typically refer to a named parameter (or a constant expression, which are also named).

@dweindl
Copy link
Member Author

dweindl commented Mar 4, 2021

Hi @alubbock, great to hear that. I had some further discussion with @FFroehlich and I think we have a rough plan. We'll post an update here soon. It would be mostly what you suggested, i.e., having initials and observables defined in the PySB model and only referencing those in the other PEtab files.

@dweindl dweindl modified the milestones: Support for additional model formats [PySB], file format version 2 Nov 16, 2021
@dweindl
Copy link
Member Author

dweindl commented Mar 18, 2022

It took significantly longer than expected, but we finally posted a proposal to allow for different model formats in PEtab -> #538

@dweindl dweindl linked a pull request Jun 27, 2023 that will close this issue
dweindl added a commit that referenced this issue Jul 3, 2024
# Motivation

There are a number of formats for specifying models in systems biology, each with their specific strengths and weaknesses. PEtab version 1.0.0 only allows  Systems Biology Markup Language (SBML) models. While SBML is supported by a large number of tools, there are good reasons to use other formats. For example, rule-based model formats (e.g., BioNetGenLanguage) permit more abstract and compact specification of models based on rules, which are generalisations of reactions. Therefore, and based on user request (#436), we propose to lift PEtab’s restriction to SBML models and allow arbitrary model formats.

# Proposed changes

* Changes to the PEtab YAML file:
  * Change `sbml_files` to `models`
  * `models` entries will be model IDs (following the existing conventions for PEtab IDs) mapping to:
    * `location`: path / URL to the model
    * `language`: model format
      Initial set of model format identifiers (to be extended as needed):
      * SBML: `sbml`
      * CellML: `cellml`
      * BNGL: `bngl`
      * PySB: `pysb`
  * An additional entry for mapping tables (see below) is added

  Example:

  **Before:**
  ```yaml
  format_version: 1
  parameter_file: parameters.tsv
  problems:
  - condition_files:
    - conditions.tsv
    measurement_files:
    - measurements.tsv
    observable_files:
    - observables.tsv
    sbml_files:
    - model1.xml
  ```

  **After:**
  ```yaml
  format_version: 2.0.0
  parameter_file: parameters.tsv
  problems:
  - condition_files:
    - conditions.tsv
    measurement_files:
    - measurements.tsv
    observable_files:
    - observables.tsv
    mapping_file: mappings.tsv # optional 
    models:
      id_for_model1:
        location: model1.xml
        language: sbml
  ```



* Changes to the format of existing tables/files:
  * Condition/Observable/Parameter Table
    All symbols that previously referenced the ID of SBML entities, such as parameter IDs or compartment IDs, now refer to (globally unique) named entities in the model, such as parameters, observables, expressions. For example, condition table columns may correspond to parameters, states, species of the referenced model. 
    For species, assignments in the condition table set the initial value at the beginning of the simulation for that condition, potentially replacing the initialization from preequilibration. For all other entities, values are statically replaced at all time points. For entities that assign values to other entities, such as SBML AssignmentRules, the value of the target of that rule is statically replaced at all time points.    
* Additional files
  * Mapping Table: 
    Mapping PEtab entity IDs to entity IDs in the model. This optional file may be used to reference model entities in PEtab files where the ID in the model would not be a valid identifier in PEtab (e.g., due to containing blanks, dots, or other special characters).
    The tsv file has two mandatory columns: `petabEntityId`, `modelEntityId`. Additional columns are allowed. modelEntityIds must be unique identifiers in the model. The mapping table must not map modelEntityIds to petabEntityIds that are also defined in any other part of the PEtab problem. modelEntityId may not refer to other petabEntityIds, including those defined in the mapping table. petabEntityIds defined in the mapping table may be referenced in condition, measurement, parameter and observable tables, but cannot be referenced in the model itself.
    For example, in SBML, local parameters may be referenced as `$reactionId.$localParameterId`, which are not valid PEtab IDs as they contain a `.` character. Similarly, this table may be used to reference specific species in a BGNL model which may contain many unsupported characters such as `,`, `(` or `.`. However, please note that IDs must exactly match the species names in the BNGL generated network file and no pattern matching will be performed. 

# Implications

* Tools need to check the model format and provide an informative message if the given format cannot be handled
* Validators will skip model-dependent validation when encountering unknown model types - ideally there would be some plugin mechanisms to provide validation

--- 

Co-authored by @FFroehlich @fbergmann. Also thanks to everybody participating in these discussions during the last COMBINE meeting.

---------



Co-authored-by: FFroehlich <fabian@schaluck.com>
Co-authored-by: Dilan Pathirana <59329744+dilpath@users.noreply.github.com>
Co-authored-by: Frank T. Bergmann <frank.thomas.bergmann@gmail.com>
dweindl added a commit that referenced this issue Jul 3, 2024
# Motivation

There are a number of formats for specifying models in systems biology, each with their specific strengths and weaknesses. PEtab version 1.0.0 only allows  Systems Biology Markup Language (SBML) models. While SBML is supported by a large number of tools, there are good reasons to use other formats. For example, rule-based model formats (e.g., BioNetGenLanguage) permit more abstract and compact specification of models based on rules, which are generalisations of reactions. Therefore, and based on user request (#436), we propose to lift PEtab’s restriction to SBML models and allow arbitrary model formats.

# Proposed changes

* Changes to the PEtab YAML file:
  * Change `sbml_files` to `models`
  * `models` entries will be model IDs (following the existing conventions for PEtab IDs) mapping to:
    * `location`: path / URL to the model
    * `language`: model format
      Initial set of model format identifiers (to be extended as needed):
      * SBML: `sbml`
      * CellML: `cellml`
      * BNGL: `bngl`
      * PySB: `pysb`
  * An additional entry for mapping tables (see below) is added

  Example:

  **Before:**
  ```yaml
  format_version: 1
  parameter_file: parameters.tsv
  problems:
  - condition_files:
    - conditions.tsv
    measurement_files:
    - measurements.tsv
    observable_files:
    - observables.tsv
    sbml_files:
    - model1.xml
  ```

  **After:**
  ```yaml
  format_version: 2.0.0
  parameter_file: parameters.tsv
  problems:
  - condition_files:
    - conditions.tsv
    measurement_files:
    - measurements.tsv
    observable_files:
    - observables.tsv
    mapping_file: mappings.tsv # optional 
    models:
      id_for_model1:
        location: model1.xml
        language: sbml
  ```



* Changes to the format of existing tables/files:
  * Condition/Observable/Parameter Table
    All symbols that previously referenced the ID of SBML entities, such as parameter IDs or compartment IDs, now refer to (globally unique) named entities in the model, such as parameters, observables, expressions. For example, condition table columns may correspond to parameters, states, species of the referenced model. 
    For species, assignments in the condition table set the initial value at the beginning of the simulation for that condition, potentially replacing the initialization from preequilibration. For all other entities, values are statically replaced at all time points. For entities that assign values to other entities, such as SBML AssignmentRules, the value of the target of that rule is statically replaced at all time points.    
* Additional files
  * Mapping Table: 
    Mapping PEtab entity IDs to entity IDs in the model. This optional file may be used to reference model entities in PEtab files where the ID in the model would not be a valid identifier in PEtab (e.g., due to containing blanks, dots, or other special characters).
    The tsv file has two mandatory columns: `petabEntityId`, `modelEntityId`. Additional columns are allowed. modelEntityIds must be unique identifiers in the model. The mapping table must not map modelEntityIds to petabEntityIds that are also defined in any other part of the PEtab problem. modelEntityId may not refer to other petabEntityIds, including those defined in the mapping table. petabEntityIds defined in the mapping table may be referenced in condition, measurement, parameter and observable tables, but cannot be referenced in the model itself.
    For example, in SBML, local parameters may be referenced as `$reactionId.$localParameterId`, which are not valid PEtab IDs as they contain a `.` character. Similarly, this table may be used to reference specific species in a BGNL model which may contain many unsupported characters such as `,`, `(` or `.`. However, please note that IDs must exactly match the species names in the BNGL generated network file and no pattern matching will be performed. 

# Implications

* Tools need to check the model format and provide an informative message if the given format cannot be handled
* Validators will skip model-dependent validation when encountering unknown model types - ideally there would be some plugin mechanisms to provide validation

--- 

Co-authored by @FFroehlich @fbergmann. Also thanks to everybody participating in these discussions during the last COMBINE meeting.

---------



Co-authored-by: FFroehlich <fabian@schaluck.com>
Co-authored-by: Dilan Pathirana <59329744+dilpath@users.noreply.github.com>
Co-authored-by: Frank T. Bergmann <frank.thomas.bergmann@gmail.com>
dweindl added a commit that referenced this issue Jul 3, 2024
# Motivation

There are a number of formats for specifying models in systems biology, each with their specific strengths and weaknesses. PEtab version 1.0.0 only allows  Systems Biology Markup Language (SBML) models. While SBML is supported by a large number of tools, there are good reasons to use other formats. For example, rule-based model formats (e.g., BioNetGenLanguage) permit more abstract and compact specification of models based on rules, which are generalisations of reactions. Therefore, and based on user request (#436), we propose to lift PEtab’s restriction to SBML models and allow arbitrary model formats.

# Proposed changes

* Changes to the PEtab YAML file:
  * Change `sbml_files` to `models`
  * `models` entries will be model IDs (following the existing conventions for PEtab IDs) mapping to:
    * `location`: path / URL to the model
    * `language`: model format
      Initial set of model format identifiers (to be extended as needed):
      * SBML: `sbml`
      * CellML: `cellml`
      * BNGL: `bngl`
      * PySB: `pysb`
  * An additional entry for mapping tables (see below) is added

  Example:

  **Before:**
  ```yaml
  format_version: 1
  parameter_file: parameters.tsv
  problems:
  - condition_files:
    - conditions.tsv
    measurement_files:
    - measurements.tsv
    observable_files:
    - observables.tsv
    sbml_files:
    - model1.xml
  ```

  **After:**
  ```yaml
  format_version: 2.0.0
  parameter_file: parameters.tsv
  problems:
  - condition_files:
    - conditions.tsv
    measurement_files:
    - measurements.tsv
    observable_files:
    - observables.tsv
    mapping_file: mappings.tsv # optional 
    models:
      id_for_model1:
        location: model1.xml
        language: sbml
  ```



* Changes to the format of existing tables/files:
  * Condition/Observable/Parameter Table
    All symbols that previously referenced the ID of SBML entities, such as parameter IDs or compartment IDs, now refer to (globally unique) named entities in the model, such as parameters, observables, expressions. For example, condition table columns may correspond to parameters, states, species of the referenced model. 
    For species, assignments in the condition table set the initial value at the beginning of the simulation for that condition, potentially replacing the initialization from preequilibration. For all other entities, values are statically replaced at all time points. For entities that assign values to other entities, such as SBML AssignmentRules, the value of the target of that rule is statically replaced at all time points.    
* Additional files
  * Mapping Table: 
    Mapping PEtab entity IDs to entity IDs in the model. This optional file may be used to reference model entities in PEtab files where the ID in the model would not be a valid identifier in PEtab (e.g., due to containing blanks, dots, or other special characters).
    The tsv file has two mandatory columns: `petabEntityId`, `modelEntityId`. Additional columns are allowed. modelEntityIds must be unique identifiers in the model. The mapping table must not map modelEntityIds to petabEntityIds that are also defined in any other part of the PEtab problem. modelEntityId may not refer to other petabEntityIds, including those defined in the mapping table. petabEntityIds defined in the mapping table may be referenced in condition, measurement, parameter and observable tables, but cannot be referenced in the model itself.
    For example, in SBML, local parameters may be referenced as `$reactionId.$localParameterId`, which are not valid PEtab IDs as they contain a `.` character. Similarly, this table may be used to reference specific species in a BGNL model which may contain many unsupported characters such as `,`, `(` or `.`. However, please note that IDs must exactly match the species names in the BNGL generated network file and no pattern matching will be performed. 

# Implications

* Tools need to check the model format and provide an informative message if the given format cannot be handled
* Validators will skip model-dependent validation when encountering unknown model types - ideally there would be some plugin mechanisms to provide validation

--- 

Co-authored by @FFroehlich @fbergmann. Also thanks to everybody participating in these discussions during the last COMBINE meeting.

---------



Co-authored-by: FFroehlich <fabian@schaluck.com>
Co-authored-by: Dilan Pathirana <59329744+dilpath@users.noreply.github.com>
Co-authored-by: Frank T. Bergmann <frank.thomas.bergmann@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants