Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observables in SBML should not start with observable_ #201

Closed
matthiaskoenig opened this issue Dec 18, 2019 · 9 comments
Closed

Observables in SBML should not start with observable_ #201

matthiaskoenig opened this issue Dec 18, 2019 · 9 comments
Assignees

Comments

@matthiaskoenig
Copy link

matthiaskoenig commented Dec 18, 2019

Id overloading is a very bad idea in general and will break many things. In this case you require ids which should be compared to experimental data to start with observable_.
As a consequence one has to write the model for the parameter fitting task which will not work.

  • with every new experiment new observables could be used so the model has to be changed as soon as new data becomes available (?!)
  • existing models don't have their ids starting with observable_, so PETab does not work with existing models (?!)
    Only solution is to let every id start with observable_ when creating SBML models, but then you can just leave it out right away.

Solution: just specify the ids which are observables for a parameter fitting experiment, no need for an obervable_ prefix anywhere.

Similar parameter overloading occurred in the fbc (flux balance community) and it basically broke all the tools and made models not exchangable between tools. I see the same here with PETab, because I cannot run a simulation experiment on an existing SBML model. Never enforce a model to have tool specific ids!

@JanHasenauer
Copy link
Contributor

We discussed a lot about this topic and I see your points.

I agree that "observable_" could be skipped, however I don't think that this addresses your points.

  1. Even if you skip "observable_" we still have "sigma_". We have to be able to assign a specific noise level to an observable. Accordingly, models still need to be adapted.

  2. Removing "observable_" will not address the problem that models need to be extended if new data become available. The only alternative I see it to put the observable specification (e.g. the mathML expressions and corresponding IDs) in the data files. This is possible, but it would mean that the data has to be changed as soon as we exchange the model. This is also nothing we would like to do.
    We asked ourselves what happens more frequently: changing of the model or changing of the dataset. In our experience the former is more likely.

I'm I missing something?

@dweindl
Copy link
Member

dweindl commented Dec 18, 2019

Thanks for your feedback @matthiaskoenig.

Id overloading is a very bad idea in general and will break many things.

Fully agreed. We discussed at some point specifying observables in a separate table.

Something like:

observableID observableMath noiseMath observableTransformation
relativeTotalProtein1 observableParameter1 * (protein1 + phospho_protein1 ) noiseParameter1 lin
... ... ... ...

Would something like that address your concerns?

This would have the advantage of providing a clear separation of the dynamic model and the observation and error model.

Disadvantage could be that when the model changes frequently, it may be tedious to keep things in sync. It depends a bit on the use case as Jan mentioned. If we want to parameterize some existing model, it would be preferable to not modify the SBML file at all. During model development it may be more convenient to have everything inside the SBML. (On the other hand, it would be straightforward to convert a model of the current type to this more general format automatically).

Another disadvantage with just using an existing dynamic model and building the observation model around it outside the SBML file would be that we don't have annotations or units of additional output parameters (unless we make the schema significantly more complex, which I would very much like to avoid).

@dweindl
Copy link
Member

dweindl commented Jan 16, 2020

After some further discussions, here an updated suggestion with pros and cons and request for comments:

Change: Get rid of the imposed AssignmentRule naming by providing a separate observation table like:

observableId observableFormula observableTransformation noiseFormula noiseDistribution
relativeTotalProtein1 observableParameter1 * (protein1 + phospho_protein1 ) lin noiseParameter1 normal
... ... ... ... ...

Description:

  • observableId

    Any identifier which would be valid in SBML

  • observableFormula

    Observation function as plain text (parsable by sympy -- is there anything close to standard definition for that?). May contain any symbol defined in the SBML model or parameter table. In the simplest case just a species or an AssignmentRule target. May introduce new parameters of the form observableParameter${n}, which are overridden by observableParameter in the measurement table.

  • observableTransformation

    As previously specified in the measurement table. Reduces redundancy there.

  • noiseFormula

    Noise model parameters as plain text formula. (What was previously an SBML AssignmentRule)

  • noiseDistribution

    ['laplace'|'normal']. Replaces noiseDistribution column in the measurement table (reducing redundancy).


Pros/Cons compared to previous definition as AssignmentRules:

Cons

  • Change of format (more work)
  • Formulas in plain text are less flexible than MathML and no nice display as e.g. in Copasi
  • Extra file to keep synced. Changes in models usually require changes here.
  • No unit definitions for observable parameters

Pros

  • Not forcing a naming schema upon users (which may be incompatible with existing models)
  • Enables re-use of existing models, e.g. from biomodels without modifications
  • Separation of dynamics and observations
  • Can still define observables as SBML AssignmentRules and just use that parameter in observableFormula --> easy to convert existing models
  • Removes redundant entries in measurement table (observableTransformation and noiseDistribution, which had to be repeated for every single data point)

@matthiaskoenig @fbergmann @refisch What do you think?

@fbergmann
Copy link
Contributor

since we still have some reserved names that are being used. I'd be in favor of leaving things as they are right now. The argument, that a new model can be used 'as is' will be only true for some tools, others will have additional work of transforming the model and introducing the observable transformations. This will be harder to do than renaming a couple of elements in the model currently. Since both observableTransformation and noiseDistribution are optional parameters in the measurement table, also the reduced redundancy would hold not for all.

@dweindl dweindl pinned this issue Jan 17, 2020
@dweindl
Copy link
Member

dweindl commented Jan 19, 2020

Thanks for your feedback @fbergmann.

we still have some reserved names that are being used

You mean {observable,noise}Parameter${n}_${observableId}? Right. Not too happy about them either. Not because of potential name collisions, but mostly because they make it pretty hard to read the measurement table in case of multiple parameters. Literally every time, I am wondering which parameter was parameter1 and which one was parameter2. This could be addressed by explicitly stating the observable parameter to override in the {observable|noise}Parameters columns (e.g. offset:400;scaling:2.5). This way, it would be more intuitive and we could get rid of the remaining reserved / imposed names. Downside is, that it doesn't increase readability in case there is only one parameter (although then the prefix could be made optional).

The argument, that a new model can be used 'as is' will be only true for some tools, others will have additional work of transforming the model and introducing the observable transformations.

That's right. However, above, the "used as is" aspect was more directed at the modeller than at the simulation tool. From a user perspective I don't care too much about the latter. For the tool it needs to be implemented once. The user would have to change every/many model(s).

Since both observableTransformation and noiseDistribution are optional parameters in the measurement table, also the reduced redundancy would hold not for all.

True.

Any other opinions?

@refisch
Copy link

refisch commented Jan 22, 2020

Thanks for the necessary and fruitful discussion. We agree that Id overload should be omitted. In general we see the following advantages when using the approach of @dweindl:

  1. The conceptual benefit to keep the data/experiment and the model separate

    From our perspective the decision to keep data and experiment separate will enable a clear understanding of the model and deconvolute the format.

  2. SBML is a model format that does not aim to describe data

    The main reason for developing peTab as a format was the fact, that SBML does not aim to describe data but is aimed at describing models and pathways. From our perspective, it thus seems pertinent to leave information about observations and the data out of the SBML file. More concretely, assignment rules are in general not meant for error and observable functions. Therefore, we would prefer to externalize the observation and error function.

    Additionally, this makes the peTab format more robust with respect to changes made in new SBML releases, which makes for less maintenance work.

Furthermore, some additional comments:

Changes in models usually require changes here.

Most frequently, if you change something in the model, no need to have changes in observation function. Furthermore, if new data sets for the same observableIDs are added, this file does not change.

Can still define observables as SBML AssignmentRules and just use that parameter in observableFormula --> easy to convert existing models

We would prefer to not give this option to the user. From our experience, multiple options in where to encode the same information can lead to confusion, for both users and developers. Instead, we'd prefer to use the new format-file only.

(after discussions with @adrianhauber, @fgwieland, @JanineEgert)

@yannikschaelte
Copy link
Member

Thanks for your feedback @refisch. We would then tentatively accept the introduction of the observable file, since it should solve all of the mentioned problems, and modularly separate the model and the experiment.

We would prefer to not give this option to the user. From our experience, multiple options in where to encode the same information can lead to confusion, for both users and developers. Instead, we'd prefer to use the new format-file only.

In practice, the user cannot really be prevented from specifying arbitrary ids as observable targets, including assignment rules. This may not be the nicest way in terms of separation of model and experiment, but on the other hand e.g. enables using visual editors for defining formulas inside SBML. Thus, I suggest encouraging the clear separation, and change all benchmark models accordingly, but there is no way of explicitly forbidding the other way.

(after discussion with @dweindl @LeonardSchmiester)

@yannikschaelte
Copy link
Member

Remains to decide on the additional naming conventions on observable and noise parameters. Suggestion of @dweindl :

You mean {observable,noise}Parameter${n}_${observableId}? Right. Not too happy about them either. Not because of potential name collisions, but mostly because they make it pretty hard to read the measurement table in case of multiple parameters. Literally every time, I am wondering which parameter was parameter1 and which one was parameter2. This could be addressed by explicitly stating the observable parameter to override in the {observable|noise}Parameters columns (e.g. offset:400;scaling:2.5). This way, it would be more intuitive and we could get rid of the remaining reserved / imposed names. Downside is, that it doesn't increase readability in case there is only one parameter (although then the prefix could be made optional).

As an extension, one could also add additional columns observableParameters and noiseParameters to the observable file, having as content e.g. scaling;offset. This would allow to in the measurement file not have to write scaling:2.5;offset:400, but still allow the previous short form 2.5;400, with the order specified by the column in the observable file. This is similar to named arguments in python. In that approach, one would have thus the possibility to write both forms in the measurement file, at the cost of one additional parsing step.

@yannikschaelte
Copy link
Member

discussion on the more flexible handling of obervable and noise parameters has moved to #242

@dweindl dweindl unpinned this issue Jan 25, 2020
@dweindl dweindl self-assigned this Jan 25, 2020
LeonardSchmiester added a commit that referenced this issue Feb 11, 2020
* Add pylint config

* Fixes ys (#237)

* fix merge error

* add petablint yaml test

* add parameters test

* Parameter mapping should include all model parameters (#235)

Closes #103

* Parameter mapping should include all model parameters
* Known values should be filled in
* Extend and update tests
* Refactor parameter mapping
* properly handle estimated and non-estimated parameters
* Fix wrong parameter scale returned from mapping
* ...

* Allow initial concentrations / sizes in condition table (#238)

* Allow species and compartments in condition table

* Updated doc allowing for states etc in condition file

* Update pylint: allow lower-case constants

* Export __format_version__

* Fix returning floats as strings in case there are parameter names in the condition table

Co-authored-by: LeonardSchmiester <leonard.schmiester@helmholtz-muenchen.de>

* Barplots and Replicates with Simulation data (#214)

Fixes #196, fixes #210, fixes #213

* Cleanup visualization (#240)

* Add constants for visualization field IDs
* .. and some others
* start using them
* formatting, ...

* Observables table instead of SBML assignment rules (#244)

Closes #201, closes #241 

* Update data format doc for observable table

* Add field name constants for observable table

* Add observables table to petab.Problem

* Update YAML schema and CompositeProblem

* Add functions for writing PEtab dataframes to files

* Deprecate SBML-observable functions

* Implement validation for observable table

* Add function for converting SBML-observable models to observable table

* Use costants for PEtab table fields

* Update PEtab files illustration

* Fix most pylint issues (Closes #234)

* Update vis to observalble table (Closes #246)

  No need to check for equal NOISE_DISTRIBUTION and OBSERVABLE_TRANSFORMATION anymore, so they are no longer included in the measurement table, and cannot differ for the same observableId

* Fix and update flatten_timepoint_specific_output_overrides

  Closes #247

  Was creating wrong observables before

* Address review comments

  Co-authored-by: Yannik Schälte <31767307+yannikschaelte@users.noreply.github.com>

* Release 0.1.0; file format version 1

* Fix parameter mapping: include output parameters not present in SBML model

* Add convenience functions to petab.Problem

* get_optimization_parameter_scales
* get_optimization_to_simulation_scale_mapping
* add tests

* Fix petab/petab_schema.yaml missing in pypi package

* Update pylint ignorelist

* Update README

* Remove obsolete functions

... related to hierarchical optimization, which should be kept outside PEtab

* Let get_placeholders return an (ordered) list of placeholders

because it is much more useful

* Add check for valid identifiers (Closes #179) (#253)

Co-authored-by: Polina Lakrisenko <p.lakrisenko@gmail.com>

* Deprecate petab.problem.from_folder (Closes #245)

... as well as get_default_*_file_name

* Release 0.1.1

* Barplot uniform coloring & yScale=log fix #196 (#255)


* resolves #197

* small fix

* change all barplot colors to blue

* allow to extract only estimate parameters (#256)

* allow to extract only estimate parameters

* add docstrings

* Visu callobs par (#262)

* fix #261

* corrected flake8 error - line too long

* deleted white space

* add F403 to falke8 tests

* Fix handling of numeric observable/noiseFormula in observable table (Fixes #264)

* Add properties for fixed/free and scaled values (#268)

* allow to extract only estimate parameters

* add docstrings

* return scaled versions of arrays

* Update petab/problem.py

Co-Authored-By: Daniel Weindl <dweindl@users.noreply.github.com>

Co-authored-by: Daniel Weindl <dweindl@users.noreply.github.com>

* Observables function (#269)

* allow to extract only estimate parameters

* add docstrings

* return scaled versions of arrays

* Update petab/problem.py

Co-Authored-By: Daniel Weindl <dweindl@users.noreply.github.com>

* add get_observables function

* add observables test; use observables file in petab test

* fix typo

* move get_observables to ..._ids

* remove unused arg

* add docstring

* fix lint

* fix pylint

* fix flake8

Co-authored-by: Daniel Weindl <dweindl@users.noreply.github.com>

* Fix documentation hierarchy

* Add functions to get all of fixed|free, scaled parameter values (#273)

* allow to extract only estimate parameters

* add docstrings

* return scaled versions of arrays

* Update petab/problem.py

Co-Authored-By: Daniel Weindl <dweindl@users.noreply.github.com>

* add get_observables function

* add observables test; use observables file in petab test

* fix typo

* move get_observables to ..._ids

* remove unused arg

* add docstring

* fix lint

* fix pylint

* fix flake8

* fix typo

* streamline fixed|free|all, and scaled values

* fix default args

* fix codacy

* address reviewer comment: return empty list

* add docstring#

* add more docstrings

* fix var type error

* address reviewer comments

Co-authored-by: Daniel Weindl <dweindl@users.noreply.github.com>

* Default column to look for simulation results should be 'simulation'

* PEtab COMBINE archives (#271)

* Add create_combine_archive for generation of COMBINE archives

* Add support for reading PEtab COMBINE archives

* Add tests for COMBINE archive r/w

* Fix sbml_observables_to_table - got broken in eb5453

* Increase test coverage (#278)

* allow to extract only estimate parameters

* add docstrings

* return scaled versions of arrays

* Update petab/problem.py

Co-Authored-By: Daniel Weindl <dweindl@users.noreply.github.com>

* add get_observables function

* add observables test; use observables file in petab test

* fix typo

* move get_observables to ..._ids

* remove unused arg

* add docstring

* fix lint

* fix pylint

* fix flake8

* fix typo

* streamline fixed|free|all, and scaled values

* fix default args

* fix codacy

* address reviewer comment: return empty list

* add docstring#

* add more docstrings

* fix var type error

* address reviewer comments

* add parameter properties test

* add tests for get/write_parameter_df

* add measurements tests

* add conditions tests

* fix conditions create function

* add parameter tests

* add observables tests

* fixup

* add docstrings

* address reviewer comments

* random edit to see if codacy is happy

* random stuff to annoy codecov

Co-authored-by: Daniel Weindl <dweindl@users.noreply.github.com>

Co-authored-by: Daniel Weindl <dweindl@users.noreply.github.com>
Co-authored-by: Yannik Schälte <31767307+yannikschaelte@users.noreply.github.com>
Co-authored-by: LeonardSchmiester <leonard.schmiester@helmholtz-muenchen.de>
Co-authored-by: Simon Merkt <49190262+MerktSimon@users.noreply.github.com>
Co-authored-by: Polina Lakrisenko <p.lakrisenko@gmail.com>
Co-authored-by: LaraFuhrmann <55209716+LaraFuhrmann@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants