Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SBML.jl and SciML integration #31

Closed
anandijain opened this issue Mar 31, 2021 · 17 comments
Closed

SBML.jl and SciML integration #31

anandijain opened this issue Mar 31, 2021 · 17 comments

Comments

@anandijain
Copy link
Collaborator

Hello, I've been working with @paulflang on building an SBML parser that would convert SBML ODEs into ModelingToolkit.jl form, to be used with the SciML ecosystem.

Here is the repo https://github.com/anandijain/SBML.jl/
It would not make sense for us to double up work.

If any maintainers are on slack, we have a chat for building biological model importers (SBML, CellML, bionetgen) into SciML.
It'd be good to see how we can help y'all and still get tight integration with SciML.

@paulflang
Copy link
Collaborator

Thanks @anandijain and hello also from my side!

@laurentheirendt
Copy link
Member

Hello @anandijain and @paulflang 😄

Good to see your efforts! The main reason behind using SBML_jll and libSBML is that it is a well-established library and it was the most straightforward way to interface to that library from Julia without having to resort to writing a parser from scratch.

For our use case, @exaexa and I thought it would not make too much sense to implement a full Julia parser from scratch - of course, this is always welcome, but I doubt we would see any major benefits performance wise.

What I could imagine is to actually support both, a native SBML.jl package and a libSBML.jl package (that would be this one). We should also have a chat with @skeating or @fbergmann from the SBML team to get their input 👍🏼

@exaexa
Copy link
Collaborator

exaexa commented Apr 1, 2021

I originally thought that writing the XML parser from below would be viable too, but given all the complexity in SBML specification this was not the way to go. Technically, it would be best to have the SBML schema published in a machine-accessible format and automatically generate the XML parser from it -- there have been certainly some efforts on writing the schema, but I'm not at all sure how that's going.

@paulflang
Copy link
Collaborator

Great suggestion to hear Sarah Keating's and Frank Bergmann’s thoughts. Would also be interesting to hear were we stand with the schema.

Since you are more interested in FBA and we are more interested in ODEs and SSA, what we do with the SBML model once it is imported into Julia is different. But I see you have a Model type that is somewhat similar to our SbmlModel type. I could imagine that defining a common type would make sense (no matter if we use Julia native parsers or not, and no matter what we do downstream with the model). But I am not familiar with how FBA models are specified in SBML. So I will have to read up on that, as well as having a look at your code.

@exaexa
Copy link
Collaborator

exaexa commented Apr 1, 2021

But I see you have a Model type that is somewhat similar to our SbmlModel type. I could imagine that defining a common type would make sense (no matter if we use Julia native parsers or not, and no matter what we do downstream with the model).

If you can list "features" extracted from SBML that you're missing, we'll be happy to add them. Refer to libSBML manual for extractable items, ideally this API guide: http://sbml.org/Software/libSBML/docs/c-api/annotated.html

But I am not familiar with how FBA models are specified in SBML.

Well not really, SBML is a descriptive rather than computation-oriented format. Despite of that it's pretty easy to get the information relevant to FBA, see e.g. here: https://lcsb-biocore.github.io/SBML.jl/stable/#SBML.getS-Tuple{Model}

@anandijain
Copy link
Collaborator Author

anandijain commented Apr 1, 2021

Foreword: I am a noobie, so any corrections are appreciated. This may provide a bit of context regarding why I am interested in integrating with SciML:

Here's a more complete example of the API we are aiming for: CellMLToolkit.jl. The main functionality is lowering an XML document into the ModelingToolkit symbolic form (ODE, DAE, SDE, PDE, etc). We do this using a native Julia MathML parser. I'm not sure how familiar you all are with the SciML ecosystem, but an overview of MTK is found here.

The value is being able to do a host of analyses, parameter estimation, global sensitivity analysis, etc. These tools are particularly nice in SciML because the DE Solvers are differentiable, allowing for productive use of multiple dispatch. See @ChrisRackauckas blog post on other extensions of differentiable programming in Julia here. I am also interested in hierarchical SBML models and model composition, which is supported by MTK's acausal component modeling system.

I'm not that familiar with FBA, but from my skimming this primer article, it seems largely based on constraint programming. There are plans to eventually support inequality constraints on MTK parameters, but we do not have this yet. FYI.

In terms of constructive ways that I can contribute to your efforts, I could help in testing infrastructure to start with. In CellMLToolkit, we have been tracking how much coverage we get on the entire CellML model repository. issue tracking progress. We have been building similar tooling for testing our SBML package on all of the SBML models on Auckland's Biomodels page and the SBML test-suite.

I've written some scripts to download all of the BIOMD SBML files here. I think it might be useful for you all to also track how much coverage you are getting over these databases.

If this is something you're interested in I'm happy to make a PR. I'm thinking of essentially testing loadmodels.jl on all 2216 BIOMD models found here

and yes, @exaexa I agree, so far my brief experiences with SBML are demonstrating that the schema is quite ad-hoc or requires meticulous parsing. CellML seems to have more constraints when it comes to defining a model, almost always just defines an ODE in MathML.

@exaexa
Copy link
Collaborator

exaexa commented Apr 1, 2021

@anandijain thanks for explanation! In separate points:

  • we're more interested in the linear programming and constraint-ish part now, so we're not extracting any MathML (yet). If you can show a document with MathML contained + some rough illustration how the result shold look like, we will be (eventually) able to implement it, and will happily merge/improve PRs :] Same applies for multiple/hierarchical model loading; we'd likely need a slight API change for that.
  • "FBA" is basically just finding if there's an optimal feasible solution for the constrained linear system.
  • testing: if you can add a few different repository sources of test files for the SBML.jl test suite, please do, a bit of diversity is very welcome :] The testing is quite straightforward now, there's just a list of files + some expected parameters in test/loadmodels.jl, feel free to extend. For more in-depth testing of files, start by copying of test/ecoli_flux.jl and modifying to whatever needs to be tested. (I'm not sure that pulling all 2k BIOMD models into the testing pipeline is feasible, a random choice of 3 should suffice I guess.)

Re SBML, we already had a few discussions about the design rationale there. The XML schema design is mirroring a java-ish class structure, which has both advantages and disadvantages, and becomes slightly uncomfortable for non-OOP tools. It is, for sure, very useful for annotation/curation efforts. Even for programming, it's still much better than e.g. many other java-based RDF-data exchange tools :]

@anandijain
Copy link
Collaborator Author

anandijain commented Apr 1, 2021

If you can show a document with MathML contained + some rough illustration how the result shold look like

Biomodels lists this breakdown of the distribution of Modeling Approaches for their SBML models:

 Ordinary differential equation model (975)
 Constraint-based model (163)
 Logical model (20)
 Petri net (5)

So we are looking at parsing that first category mainly. Here is a model that contains MathML that we want to lower into MTK form: https://www.ebi.ac.uk/biomodels/BIOMD0000000796.

Edit: this is just a short response that I can follow up on later

@anandijain
Copy link
Collaborator Author

I've made a repo for downloading the BioModels SBML dataset (2216 models) here https://github.com/anandijain/SBMLBioModelsRepository.jl.

I'm not sure how much use this will be to you all. I will start doing tests using your package soon for a coverage tracker.

@skeating
Copy link

skeating commented Apr 3, 2021

Hi

This is probably a conversation on its own; but I'll add it here since it was first asked here.

There are schemas for SBML.

For Level 3 we use RNG details here: http://sbml.org/RNG_validation

For levels 1 & 2 there are xsd schemas with each specification.

Just to point (although I know I am biased !) that using libSBML as the parser adds much more validation that creating a parser direct from a schema. libSBML validation checks semantics as well as syntax.

Sarah

@anandijain
Copy link
Collaborator Author

Thanks,

I'm coming around to using libSBML as our parser. If we are to go this route though, it'd be great if someone could explain how we might get the kineticLaws out in order to lower to ModelingToolkit.ReactionSystem and define the differential equations. ReactionSystem docs

@fbergmann
Copy link

LibSBML does include a converter, that can take out the reactions and instead add the rateRule objects to the model. Maybe that would be helpful for you:

https://github.com/sbmlteam/libsbml/blob/development/examples/c%2B%2B/convertReactions.cpp

What it does, is basically to take the speed of the reaction (the formula in the kinetic law) and for substrates multiplies it with the negative of the stoichiometry and for products it will be multiplied with the stoichiometry and keep this in a map fir each reactant / product of all reactions. Then for each species in that map a rate rule (ODE) is created that sums all of those temporary results for each species up.

@paulflang
Copy link
Collaborator

Thank you very much for pointing out the advantages of using LibSBML, @skeating. We’ve used the Python version to convert SBML to ModelingToolkit in SbmlInterface and to convert PEtab optimisation problems to Julia JuMP in SBML2Julia and now wanted to move away from the Python dependency with anandijain/SBML.jl. Since this repository here nicely shows how this ca be done with the cpp version of LibSBML and @anandijain is fine with that, I think this is the way to go.

@fbergmann: The kineticLaw to rateRule conversion may indeed become useful for converting SBML directly to ModelingToolkit.ODESystems. At the moment however, we are converting SBML to ModelingToolkit.ReactionSystems (for later conversion to ODESystems with ModelingToolkit's convert() method), for which the kineticLaws are fine, with two exceptions:

  • We cannot use the string representation, but have to use the MathML representation. The MathML.jl package can convert these to the symbolic Variable and Parameter type object needed in ModelingToolkit.
  • ModelingToolkit.ReactionSystem interprets abundances as extensive quantities. This is fine when all SBML species have hasOnlySubstanceUnits=true, but not otherwise. Is there a LibSBML converter that sets all SBML species to hasOnlySubstanceUnits=true and modifies their initialConcentration/Amount and the kineticLaws accordingly?

@paulflang
Copy link
Collaborator

During today's meeting we've sorted out the first issue with kineticLaws. However the second point still remains. Does LibSBML have such an intensive to extensive unit converter?

@fbergmann
Copy link

fbergmann commented Apr 9, 2021 via email

@paulflang
Copy link
Collaborator

Thanks, @fbergmann ! But I thought if hOSU=false, it means that species have to be plugged in (e.g. kineticLaws) as concentrations rather than amounts. So a species that has initialAmount instead of initialConcentration would have to be divided by the compartment size before plugging it into a kineticLaw. Or what is wrong in this thinking?

Do you know of an example where poeple have created custom converters for C++ LibSBML. I do not know C++, but perhaps one of my colleagues does. Otherwise I would have to handle this further downstream in Julia.

@exaexa
Copy link
Collaborator

exaexa commented Jun 7, 2021

Closing this, the work is either done or will continue in #40.

@exaexa exaexa closed this as completed Jun 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants