Skip to content

Setting up models

Niklas Neubrand edited this page Jan 16, 2024 · 10 revisions

Setting up models

In the folder /Examples we provide some showcase applications that can be used as a starting point for new modeling projects.

Section 1 describes how a model can be set up and Section 2 how experimental data can be linked to the model.

Please note that the software make heavy use of identifiers for all model and data quantities, for instance for dynamic variables, inputs, parameters, observables, ... It is very important to keep these identifiers consistent throughout one project.

1. Model Definition File

A model definition file is a plain text file with the ending .def located in the subfolder /Models of the current MATLAB working directory. The illustration used in the following stem from the example applications in the /examples subfolder of the main code folder: Becker_2010 and Bachmann_2011. The model definition files are structured in various section:


DESCRIPTION
...

PREDICTOR
...

COMPARTMENTS
...

STATES
...

INPUTS
...

REACTIONS (or ODES)
...

DERIVED
...

OBSERVABLES
...
        
ERRORS
...

SUBSTITUTIONS (optional)
...

CONDITIONS
...

Each section will be explained in the following. Lines in the model definition file can be commented out using //.

1.1 DESCRIPTION section

Can be used to store meta information about the model as quoted lines of text, e.g.


DESCRIPTION
"Model of Epo receptor internalization"
"Assembled by RG Timmer & RG Klingmüller"
"Date: 2010"
"Version: 1.0"

1.2 PREDICTOR section

Is used to define the independent variable / the predictor. For dynamics model, this is usually time, e.g.


PREDICTOR
t               T   "min"     "time"	0	100
  • The 1st argument specifies the unique identifier (here t) that can be used in the mathematical expressions later.
  • The 2nd-4th argument specify the unit type (here T = time), the unit itself (here "min"), and a label used for plots (here "time").
  • The 5th and 6th argument specify the range of the independent variable (here 0-100 minutes).

1.3 Defining COMPARTMENTS

Can be used to define compartments in a model, for instance cytosol and nuclear compartment in the cell


COMPARTMENTS
cyt             V   "pl"      "vol."        0.4
nuc             V   "pl"      "vol."        0.275
...

The concentration dynamics of the model, in particular the transport reactions, are solved with respect to the size of the compartments.

  • The 1st argument specifies the unique identifiers (here cyt and nuc) that can be used in the mathematical expressions later.
  • The 2nd-4th argument specify the unit types (here V = volume), the units itself (here "pl"), and labels used for plots (here "vol."). These labels do not affect the calculations.
  • The 5th argument specifies the numerical size of the compartments. Omitting the 5th argument will expose a parameter called vol_# where # is the compartment name. This parameter can then be estimated or specified in a condition. Note that by default, the parameter is set to constant. If you wish to estimate it, change it to fitted using arSetPars.

If no compartments are required, this section should be left empty. Note that in D2D, rates are always specified relative to the origin of the species. If an in- and out rate of a compartment should be equal and is estimated using a single parameter (e.g. to model passive diffusion), it is important to correct the kinetic rates by the appropriate volumes.

1.4 Defining the dynamic STATES of the model

This section describes the dynamic variables / states that evolve over time as will be specified in the REACTIONS / ODES section later.


STATES
STAT5           C   "nM"      "conc."       cyt 1 "STAT5"                 1
pSTAT5          C   "nM"      "conc."       cyt 1 "phospho STAT5"         1
npSTAT5         C   "nM"      "conc."       nuc 1 "nuclear phospho STAT5" 1
...
  • The 1st argument specifies the unique identifiers (here STAT5, pSTAT5 and npSTAT5) that can be used in the mathematical expressions (reactions / ODE system) later. Initial conditions for each dynamic variable will be generated as free parameters using the prefix init_.
  • The 2nd-4th argument specify the unit types (here C = concentration), the unit itself (here "nM"), and a plain text used for plots (here "conc."). These labels do not affect the calculations.
  • The 5th argument specify the compartment where the state lives in (here either cyt or nuc). This can be left empty if no compartments were specified above. If the latter arguments are to be used but no compartment is present it is possible to create an default compartment in the section before in which all state live in.
  • The 6th argument is a flag if the state should be showed in plots (0=no, 1=yes), default is yes.
  • The 7th argument a clear text label for plotting.
  • The 8th argument specifies if the state is strictly positive (0=no, 1=yes), default is no, but for inherently non-negative states yes is recommended.

1.5 Defining the INPUTS

In this section input function, usually depending on the independent variable, can be specified. The input function can be used in the mathematical expression later, such as the reaction rate equations or the observables. Please note that the input do not belong to a specific compartment. The user has to ensure plausibility of the usage of the input function himself.

INPUTS
Epo             C   "units/cell"   "conc."   "k1*exp(-k2*t)"
SAv             C   "units/cell"   "conc."   "k3"
...
  • The 1st argument specifies the unique identifiers (here Epo and SAv) that can be used in the mathematical expressions later.
  • The 2nd-4th argument specify the unit types (here C = concentration), the unit itself (here "units/cell"), and a plain text used for plots (here "conc.").
  • The 5th argument specify the mathematical expression of the input function (here "k1*exp(-k2*t)" is an exponential decay and "k3" is a constant with unknown value).

Sometimes the input function should account for a change in the value of a dynamic variable, see the bolus injection example. Input estimation is described here

1.5.1 Step functions, cubic splines and user defined functions

Step functions, splines and user defined functions can be used as mathematical expression of the input function.

Step functions can be defined by:

  • "step1(t, level1, switch_time, level2)" giving a simple step function at the value switch_time of the independent variable (usually time).
  • "step2(t, level1, switch_time1, level2, switch_time2, level3)" giving a double step function at the values switch_time1/2 of the independent variable t.

Discontinuous input require events for proper integration, see here and here

Smooth step functions can be defined by:

  • "smoothstep1(t, level1, switch_time, level2, smoothness)" giving a smooth step function at the value switch_time of the independent variable (usually time).
  • "smoothstep2(t, level1, switch_time1, level2, switch_time2, level3, smoothness)" giving a smooth double step function at the values switch_time1/2 of the independent variable t.

The smoothness parameter controls the smoothness of the step. For large values, the step becomes increasingly sigmoidal.

Bolus injections can be defined by:

  • "bolus(t, amount, time_point, duration)" giving a bolus injection at value time point of the independent variable t. This corresponds to a Gaussian curve with area amount and standard deviation duration.

Cubic splines with 3, 4, 5 or 10 knots can be defined by:

  • "spline3(t, t_knot1, p_knot1, t_knot2, p_knot2, t_knot3, p_knot3, q_initial_slope_constraint, initial_slope)"
  • "spline4(t, t_knot1, p_knot1, t_knot2, p_knot2, t_knot3, p_knot3, t_knot4, p_knot4, q_initial_slope_constraint, initial_slope)"
  • "spline5(t, t_knot1, p_knot1, t_knot2, p_knot2, t_knot3, p_knot3, t_knot4, p_knot4, t_knot5, p_knot5, q_initial_slope_constraint, initial_slope)"
  • "spline10(t, t_knot1, p_knot1, t_knot2, p_knot2, t_knot3, p_knot3, t_knot4, p_knot4, t_knot5, p_knot5, t_knot6, p_knot6, t_knot7, p_knot7, t_knot8, p_knot8, t_knot9, p_knot9, t_knot10, p_knot10, q_initial_slope_constraint, initial_slope)"

Here, t denotes the independent variable, t_knot1-5 indicate the locations of the spline knots (these should be fixed to a numeric value), p_knot1-5 denote the spline parameters (these can either be fixed or left as free parameters to be estimated) and q_initial_slope_constraint is a flag (0 = no, 1 = yes) that indicates if the slope of the spline at the first knot is to be constrained to the value given in initial_slope (both values should be fixed to a numeric value). The spline parameter p_knot is defined as the value u(t_knot) of the spline at t_knot. If the spline should be constrained to positive values use the functions spline_pos3, spline_pos4 and spline_pos5 the same way as described above. An example using splines is given in Examples/Swameye_PNAS2003.

Monotonic splines with 3, 4, 5 or 10 knots can be defined by:

  • "monospline3(t, t_knot1, p_knot1, t_knot2, p_knot2, t_knot3, p_knot3)"
  • "monospline4(t, t_knot1, t_knot1, p_knot1, t_knot2, p_knot2, t_knot3, p_knot3, t_knot4, p_knot4)"
  • "monospline5(t, t_knot1, p_knot1, t_knot2, p_knot2, t_knot3, p_knot3, t_knot4, p_knot4, t_knot5, p_knot5)"
  • "monospline10(t, t_knot1, p_knot1, t_knot2, p_knot2, t_knot3, p_knot3, t_knot4, p_knot4, t_knot5, p_knot5, t_knot6, p_knot6, t_knot7, p_knot7, t_knot8, p_knot8, t_knot9, p_knot9, t_knot10, p_knot10)"

Monotonic are splines which are monotonic between knots. Their syntax is similar to that of the regular splines. Monotonic splines have as an extra advantage that they do not suffer from over and undershoot behaviour as much as the regular cubic splines, considering that the interpolating function between two knots is guaranteed to be monotonic. They are useful for modelling input sources which are not noisy, or inputs requiring a large number of knots. Setting the first and last two knot parameters to the same value guarantees that the monotonic spline remains constant outside the spline range. This means that monotonic splines can be combined into splines comprising of more than 10 knots (see Examples/LongSplines).

For an example comparing the spline types see Examples/Splines and Examples/LongSplines (using splines > 10 knots). Note, when the splines are slow, invoke ar.config.turboSplines = 1, to use a more performant spline implementation.

  • "inputspline(t, N, [t_knot1, t_knot2, ... t_knotN], [p_knot1, p_knot2, ... p_knotN])" The inputspline is a special version of the monotonic spline. It allows for a spline of arbitrary length, but cannot be fitted (no parameters are allowed in the time or parameter series).

Custom input function can be defined in the c-files arInputFunctionsC.c and arInputFunctionsC.h.

1.5.2 A general input function

Often, an input consists of transient and sustained parts. Such behaviour can be implemented by the following expression:

"gif_amp_trans*(1-exp(-t/gif_timescale_sust))*exp(-t/(gif_timescale_trans)) + gif_amp_sust*(1-exp(-t/gif_timescale_sust))"

The function has three parameters, two amplitudes (gif_amp_trans and gif_amp_sust) and two time scales (gif_timescale_trans and gif_timescale_sust), that encode the transient and sustained parts.

1.6 Defining the dynamics

This section describes the ODE system that determines the time evolution of the states. One can either specify REACTIONS or ODES directly.

1.6.1 Using REACTIONS

The ODE system can be specified as biochemical reaction network.

REACTIONS
Epo + EpoR      ->  Epo_EpoR    CUSTOM "k1 * Epo * EpoR"    "label1"
STAT5           ->  pSTAT5      CUSTOM "k2 * STAT5"         "label2"
pSTAT5          ->  npSTAT5     CUSTOM "k3 * pSTAT5"        "label3"
...
  • The 1st argument describes the reaction in a simple notation (here Epo + EpoR -> Epo_EpoR is a complex formation, STAT5 -> pSTAT5 a protein modification, and pSTAT5 -> npSTAT5 a transport reaction between compartments).
  • The 2nd argument is a keyword that specifies the reaction type, CUSTOM or MASSACTION. CUSTOM is the recommended usage.
  • In case of a CUSTOM reaction, the 3rd argument specifies the mathematical expression of the reaction rate equation (here "k1 * Epo * EpoR", "k2 * STAT5" and "k3 * pSTAT5"). In case of a MASSACTION reaction, the mathematical expression in the 3rd argument will be just one parameter name that will be the rate constant corresponding to this reaction. In case of a MASSACTION reaction, one can use <-> in the 1st argument to specify a reversible reaction. In this case, there will be two parameters with suffix name_1 for the forward and name_2 for the back reaction.
  • The 4th argument (optional) specifies a label for the reaction used for plotting.

1.6.2 Using ODES

In contrast to the REACTIONS notation, the ordinary differential equations can be specified directly:

ODES
"mathematical expression 1"
"mathematical expression 2"
"mathematical expression 3"
...

where these mathematical expressions define the right-hand sides of the ODEs. Take care that there are as many ODEs as dynamic variables in the model.

1.7 Defining DERIVED variables

In this section variable that are derived from the dynamic and input variables can be defined. Derived variables can be used in the OBSERVABLES and REACTIONS sections.

DERIVED
Epo_ext         C   "pM"   "conc."   "Epo + dEpo_e"
Epo_int         C   "pM"   "conc."   "Epo_EpoR_i + dEpo_i"
...
  • The 1st argument specifies the unique identifiers (here Epo_ext and Epo_int) that can be used in the mathematical expressions later.
  • The 2nd-4th argument specify the unit types (here C = concentration), the unit itself (here "pM"), and a plain text used for plots (here "conc.").
  • The 5th argument specify the mathematical expression of the input function (here "Epo + dEpo_e" and "Epo_EpoR_i + dEpo_i").

1.8 Defining the OBSERVABLES

At this point of the model definition files one can specify the default model observables and their corresponding error model (Section 1.9). While the section in the model definition file are optional the corresponding sections in the data definition file (see Section 2.3) are mandatory and override the default specifications given here.

OBSERVABLES
tSTAT5_au           C   "au"  "conc."   1   1   "scale_tSTAT5 * (STAT5 + pSTAT5)"
pSTAT5_au           C   "au"  "conc."   1   1   "offset_pSTAT5 + scale_pSTAT5 * pSTAT5"
...
  • The 1st argument specifies the unique identifiers (here tSTAT5_au and pSTAT5_au). Please note that this identifier should be the same as the corresponding column header in the data sheet. Also, Observable identifiers are not allowed to be the same as dynamic variable identifiers.
  • The 2nd-4th argument specify the unit types (here C = concentration), the unit itself (here "au"), and a plain text used for plots (here "conc.").
  • The 5th argument corresponds to a flag (0=no, 1=yes) that indicates if the maximal value in the data sheet of the respective observable should be rescaled to one.
  • The 6th argument corresponds to a flag (0=no, 1=yes) that indicates if both the raw data form the spread sheet and the model observables should be compared on a log-10 scale. This is frequently used for concentration data that contains log-normal measurement noise. The transformation is applied after rescaling in case of argument 5 applies.
  • The 7th argument specify the mathematical expression of the observable function (here "scale_tSTAT5 * (STAT5 + pSTAT5)" and "offset_pSTAT5 + scale_pSTAT5 * pSTAT5". It will be places inside the log-10 of argument in case of argument 6 applies.

1.9 Defining the magnitude of the measurement noise in the ERRORS section

In the likelihood function, the measurement noise is modeled as normal or log-normal distribution, see 2.2 argument 6. The magnitude of the measurement noise, i.e. the standard deviation of the normal distribution (or standard deviation of the normal distribution in the log-space for log-normally distributed noise), can be implemented as parameterized function

ERRORS
tSTAT5_au		"sd_STAT5_au"
pSTAT5_au		"sd_STAT5_au"
...
  • The 1st argument corresponds to an unique identifiers as given in the observables section (here tSTAT5_au and pSTAT5_au). It is mandatory to specify the measurement noise for each observable in the data definition file.
  • The 2nd argument specify the mathematical expression of the noise function (here "sd_STAT5_au" and "sd_STAT5_au" indicates that both measurements have the same measurement noise). Relative measurement noise can be implemented by using the observable identifier, e.g. "sqrt(sd_STAT5_abs^2 + sd_STAT5_rel^2 * tSTAT5_au^2)".

1.9.1 Defining the measurement noise in the data sheet

Sometimes, instead of using a parametrized function for the measurement noise, one like to directly specify the amount of noise calculated from replicates in the data sheet. Put these value in the data sheet with column header same as the observable name but with the addition _std (here e.g. tSTAT5_au_std and pSTAT5_au_std). Please note that these values will be interpreted as standard deviation of the data not as variance! An example for this usage can be found in the example application /Example/Swameye_PNAS2003.

The value of ar.config.fiterrors controls how experimental error enter the calculations:

ar.config.fiterrors = -1: Only experimental error bars used for fitting.

ar.config.fiterrors = 0: Use experimental errors by default and revert to the error model for data points that have no experimental error specified (This is the default).

ar.config.fiterrors = 1: Only the error model is used for fitting. Experimental errors specified in the data sheet are ignored.

The visual output is controlled by the following flags:

ar.config.fiterror = 0: Data is plotted as fitted.

ar.config.fiterror = -1: Plot prediction bands as calculated by PPL.

ar.config.fiterror = -2: Do not plot errors

and

ar.config.ploterrors = 0: Observables are plotted as fitted. ar.config.ploterrors = 1: Data uncertainty is plotted as error bar. ar.config.ploterrors = 2: Only error bands are plotted.

In order to use multiple modes, set ar.config.useFitErrorMatrix = 1 and specify the different modes in ar.config.fiterrors_matrix(model index, data index). Accordingly, the different plotting modes can be specified in ar.config.ploterrors_matrix(model index, data index).

1.10 SUBSTITUTIONS section

In this section identifiers can be specified which can subsequently be used in model conditions. They will be iteratively substituted in the expressions in the CONDITIONS sections.

SUBSTITUTIONS
mySteadyState    "kpro/myDegradation"
myDegradation    "2*kdeg"
...
  • The 1st argument specifies a unique identifier, which may not have the same name as any model parameter
  • The 2nd argument specifies a mathematical expression that will replace located instances of the identifier. Note that these will be evaluated repeatedly until there are no further substitutions. Hence, in this example mySteadyState will equate to 0.5 * kpro / kdeg. Cyclic substitutions are not allowed.

1.11 CONDITIONS section

In this section conditions in term of the model parameters can be specified. All expression that are not uniquely defined above as either compartments, states or inputs will be treated as free parameters.

CONDITIONS
kD               "koff/kon"
init_Epo_EpoR    "0"
STAT5ActJAK2     "STAT5ActJAK2/init_EpoRJAK2"
...
  • The 1st argument specifies a target parameter that occurred in the model definition above.
  • The 2nd argument specifies a mathematical expression that will replace the target parameter in the model (here "koff/kon", "0" and "STAT5ActJAK2/init_EpoRJAK2"). This can be any mathematical expression of composed of new and old parameters. Take care not to implement recursive conditions. All replacements are executed once and sequentially, so order matters.

1.12 PARAMETERS section

In this optional section the default settings for parameter can be specified. Please note that the intension of this section is to save final parameter values for documentation purpose. During work in progress parameter values should be manipulate "soft" in the MATLAB workspace variables.

PARAMETERS
#ar.pExternLabels   ar.pExtern    ar.qFitExtern    ar.qLog10Extern    ar.lbExtern    ar.ubExtern
init_PX             0             1                0                  0              1000
...

1.13 The pre-processor

In some cases, you may want to experiment with including different aspects in a single model. It is trivial to set up multiple .def files, which in some cases may be an appropriate approach. Sometimes however, it can be nice to maintain a single model file and selectively comment / uncomment sections of the model. For this, a pre-processor was included in the model parser. The preprocessor can be used to incorporate basic compile time logic by defining or un-defining labels using #define and #undefine. Code blocks can then be bracketed by #ifdef, #else, #endif statements in order to selectively activate/deactivate specific parts of the model. An example use:

#define HAS_HILL

#ifdef HAS_HILL
    // This code will be evaluated
#else
    // This code will not be evaluated
#end

#undefine HAS_HILL

#ifdef HAS_HILL
    // This code will not be evaluated
#else
    // This code will be evaluated
#end

#ifndef HAS_HILL
    // This code will be evaluated
#else
    // This code will not be evaluated
#end

//#define TEST
#ifdef TEST
    // This code will not be evaluated
#end

2. Data Definition File

Similar to the model definition file experiments have data definition files that contain experiment specific information. A data definition file is a plain text file with the ending .def located in the subfolder /Data of the current MATLAB working directory. Each data set, stored in the subfolder /Data as .xls or .csv file, must have its own data definition file with the same name but the suffix .def. The data definition files are structured in various section:


DESCRIPTION
...

PREDICTOR (or PREDICTOR-DOSERESPONSE)
...

INPUTS
...

OBSERVABLES
...

ERRORS
...

SUBSTITUTIONS (optional)
...

CONDITIONS
...

Sections DESCRIPTION and INVARIANTS are same as in the model definition file, see 1.1 and 1.7. The remaining sections will be explained in the following. As in the model definition file lines in the data definition file can be commented out using //.

2.1 PREDICTOR and PREDICTOR-DOSERESPONSE sections

Is used to define the independent variable / the predictor. For the data definition file there are two modes available, PREDICTOR and PREDICTOR-DOSERESPONSE. For using the PREDICTOR mode see description in Section 1.2, the independent variable in this case is usually time. In the PREDICTOR-DOSERESPONSE mode one can use an existing parameter as additional independent variable


PREDICTOR-DOSERESPONSE parameter
t               T   "min"     "time"	0	100

In this mode, there has to be a column named parameter in the data sheet, besides the first column that encodes the standard independent variable (here time t). Please note that formally this alternative mode does not change the results, however the plot of the model fit to the data will be modified. For instance if parameter corresponds to the concentration of a certain input, the result will the a dose response plot.

2.2 Defining the INPUTS

Each experiment can have a different input function, e.g. instance implementing a different treatment scheme. Therefore, in the data definition file the input can be redefined for this specific experiment.

INPUTS
Epo             "10 + sin(k1*t)"
SAv             "1 + 0.1*t"
...
  • The 1st argument corresponds to an unique identifiers as given in the model definition file (here Epo and SAv).
  • The 2nd argument specify the mathematical expression of the input function (here "10 + sin(k1*t)" and "1 + 0.1*t"), see 1.5 for more detailed explanation.

2.3 OBSERVABLES and ERRORS sections

At this point of the model definition files one can specify observables and their corresponding error model that override the default specifications from the model definition file (see Sections 1.8 and 1.9).

2.4 Defining the SUBSTITUTIONS for the specific experiment

Like in the model definition file, see 1.10 for detailed explanation, experiment specific substitutions can be specified.

2.5 Defining the CONDITIONS for the specific experiment

Like in the model definition file, see 1.11 for detailed explanation, experiment specific conditions can be specified.

CONDITIONS
init_Epo_EpoR    "10"
...

Please note that these "data" conditions are applied to the model after the "model" conditions have been applied.

2.6 PARAMETERS section

Like in the model definition file, see 1.12 for detailed explanation, the default settings for parameter can be specified also on the level of the data definition file, i.e. for the additional parameter that might be introduced there.

Clone this wiki locally