Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of parameter estimation problems with multiple sbml models #49

Closed
elbaraim opened this issue Feb 8, 2019 · 18 comments
Closed
Assignees
Labels
question Further information is requested specification change

Comments

@elbaraim
Copy link
Collaborator

elbaraim commented Feb 8, 2019

In this model there are to model files and, therefore, two measurement data files and experimental condition files. But only one for the parameter file. The naming is:

  • model_Becker_Science2010__BaF3_Exp.xml
  • model_Becker_Science2010__binding.xml

At the moment, this throws an error in PEtab (as it does not follow the standard naming we established).

How should we circumvent this problem?

  • Should we allow for finding a list of model names in the model folder and then check the respective pairs?
  • Should we re-define the naming convention for such exception cases?
@dweindl
Copy link
Member

dweindl commented Feb 10, 2019

Yeah case of multiple model files has not been discussed in detail yet. I think it is better to have custom and potentially more descriptive names than naming things model_$name_1 model_$name_2. Also petablint.py can currently not handle the case of multiple models properly, even if names are correct. One could add some JSON / YAML file to specify what belongs to what. To me it is also not clear if in such cases it is mandatory to have only one single parameter file.

@dweindl dweindl added the question Further information is requested label Feb 10, 2019
@dweindl dweindl changed the title How to handle exception ? Example: Becker_Science2010 model Handling of parameter estimation problems with multiple sbml models Feb 21, 2019
@dweindl
Copy link
Member

dweindl commented Feb 21, 2019

Some notes from today's discussion:

  • For multi-model problems there should be for each model exactly one condition table and one measurement table, but only a single parameter table

  • The parameter namespace is global. Parameter of the same name in different models always have the same value

TODO:

  • Update documentation
  • Proposed naming scheme was...?

@paulstapor
Copy link
Contributor

I'm currently going over the PEtab documentation and stumbled across this issue.

We either need a standardized naming scheme, or (maybe more error proof), an additional table, which specifies the SBML-models and the corresponding files...

In any case: This issue is important (but not urgent), as fixing it would make it possible in more cases to translate between PEtab and SED-ML. So, solving this issue may be quite interesting for the SED-ML community...

@yannikschaelte
Copy link
Member

I would prefer a standard naming scheme. I could imagine that the following should work:

  • All files must follow the same scheme measurementData_{BLABLA}, experimentalCondition_{BLABLA} with common prefix and postfixes {BLABLA1}, {BLABLA2}, or
  • if e.g. the measurementData file is only required once, then there can only be one measurementData_... file in the whole folder.

Alternatively, a table with columns: the different setups, and rows: the files, would of course also work, just maybe create a little overhead.

@dweindl
Copy link
Member

dweindl commented Dec 3, 2019

I'd like to revive this discussion on how to handle multiple models and other more complex setups.

I am not too happy with the current or any other naming conventions. There may be good reasons why somebody may want to name his files differently.

I would like a yaml file such as:

petab_version: 0.1

problems:
  - measurements:
    - dataset1_1.csv
    - dataset1_2.csv
    conditons: cond1.csv
    model: m.xml
    parameters: p.csv
    visualization: vis.csv
  - measurements:
    - dataset2_1.csv
    conditons: cond2.csv
    model: m2.xml
    parameters: p.csv

Because:

  • This gives people freedom to choose their naming scheme
  • It is easy to include the same model/data file in different parameter estimation problems without replicating the file
  • We can allow to combine multiple measurement files (which one might want to keep separately to have them easily regrouped or edited)
  • It provides a natural place to include a file format version number (something we didn't address anywhere, but I think it's is mandatory as the format is sure to evolve (or die))
  • It can also easily to be adapted to anything more complex in the future if necessary
  • It is easy to validate

@yannikschaelte
Copy link
Member

yannikschaelte commented Dec 3, 2019

fully agree. I don't like making it mandatory, but as you describe it, it does seem quite helpful, thus might be good to create it for all models, and require to have it for all models.

We also wanted to have a (optimizer/simulator) settings yml. Should that be in the same file, or have another one (and a link from this one)? Both solutions would seem sensible.

@JanHasenauer
Copy link
Contributor

I like the suggestion. I'm only wondering about the separate parameter files. This might require complex handling of conflicts. The we would have to set up detailed rules.

Should we directly allow for the inclusion of validation data, e.g. as subfield "prediction"?

@dweindl
Copy link
Member

dweindl commented Dec 3, 2019

I'm only wondering about the separate parameter files. This might require complex handling of conflicts. The we would have to set up detailed rules.

Yeah, that may need some more elaboration, was a quick sketch. To be decided if there should be only one global file, or if we allow plugging together multiple ones where one could consider having an option for global or local parameter namespace. Not important for me at this point, but would be possible to adapt in future.

Should we directly allow for the inclusion of validation data, e.g. as subfield "prediction"?

Yeah, also talked about that with @LeonardSchmiester today. Would be straightforward to do at least.

@paulstapor
Copy link
Contributor

I like the idea pretty much. yaml has the very strong advantage of being easily human-readable. Moreover, such a file allows to specify all those things, for which we did not have any place so far. I especially like the idea with the version number.

At least for the moment, I would also go for things like simulator/optimizer settings in the same file. At the moment, I would not expect that this would blow up the yaml file so much that separation makes sense.

This whole idea has the strong benefit of allowing to make the format more similar to SED-ML concerning solvers/optimizers... Which is pretty nice. This allows for way less information loss when converting between those two formats...

@dweindl
Copy link
Member

dweindl commented Dec 3, 2019

We also wanted to have a (optimizer/simulator) settings yml. Should that be in the same file, or have another one (and a link from this one)? Both solutions would seem sensible.

I think that this will be too tool-specific to have a generic interpretation. If we add it here, I would clearly mark it as tool-specific and not put it there as a general optimizer_options or simulator_options but more as something like app_specific_settings['amici']['sensitivity_method'] = 'adjoint'. Maybe it's better to keep it separate right away.

@yannikschaelte
Copy link
Member

We also wanted to have a (optimizer/simulator) settings yml. Should that be in the same file, or have another one (and a link from this one)? Both solutions would seem sensible.

I think that this will be too tool-specific to have a generic interpretation. If we add it here, I would clearly mark it as tool-specific and not put it there as a general optimizer_options or simulator_options but more as something like app_specific_settings['amici']['sensitivity_method'] = 'adjoint'. Maybe it's better to keep it separate right away.

Will it be tool-specific? Things like optimization options (number of multistarts, number of iterations, tolerances, ...) and simulation options (tolerances, solver, ...) are pretty generic. In the import to each tool, simply the subset of options supported could be selected. Maybe we should make up a list of options planned to be supported.

@paulstapor
Copy link
Contributor

Classic discussion without an ideal solution...
But I agree to some degree with Yannik: One could specify an optimizer by saying "gauss-newton", an each tool would map that accordingly: "ls-trf" in pyPESTO, "lsqnonlin" in D2D...

@dweindl
Copy link
Member

dweindl commented Dec 3, 2019

tolerances

Not sure every tool uses the same measure

solver

Unlikely to have a substantial overlap across different tools

Convergence criteria would be nice to have, but they vary so greatly across optimizers.

In the import to each tool, simply the subset of options supported could be selected

Not sure I like that too much.

Maybe we should make up a list of options planned to be supported.

Good idea. Would collect it in ICB-DCM/pyPESTO#117 though.

@dweindl dweindl pinned this issue Dec 3, 2019
@dweindl
Copy link
Member

dweindl commented Dec 4, 2019

PS: If adding anything like simulation or optimization options there, it would ideally be based on KISAO

@LeonardSchmiester
Copy link
Collaborator

Would agree with Daniel to keep it seperate. For simulation we could use KISAO. But for optimization algorithms nothing like this exists right?! I have the feeling this could open a lot of potential issues, if we want to properly define everything in the yaml file.

@dweindl
Copy link
Member

dweindl commented Dec 4, 2019

KISAO. But for optimization algorithms nothing like this exists right?!

In terms of algorithms there is something below http://bioportal.bioontology.org/ontologies/KISAO/?p=classes&conceptid=http%3A%2F%2Fwww.biomodels.net%2Fkisao%2FKISAO%23KISAO_0000470. For optimizer settings it looks rather poor though.

@LeonardSchmiester
Copy link
Collaborator

This is not enough to get a unique mapping to the actual algorithm used right? There could be several implementations of the same algorithm even within one tool etc.

I think at the end you would still need a tool specific file to really run the optimization. Then we can directly leave these things out of PEtab I would suggest.

@dweindl dweindl self-assigned this Dec 15, 2019
@dweindl dweindl unpinned this issue Dec 16, 2019
@dweindl
Copy link
Member

dweindl commented Dec 17, 2019

I consider that closed. Having multiple models is addressed by #183 . Further extensions are covered by #185 and #188. To follow up on the discussion of simulator optimizer options, create a new issue.

@dweindl dweindl closed this as completed Dec 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested specification change
Projects
None yet
Development

No branches or pull requests

6 participants