# How to analyse metabolic networks with Maud

This document explains how to use [Maud](https://github.com/biosustain/Maud/) to fit Bayesian statistical models of steady-state metabolic networks, and how to investigate the results.

## An example Maud input

To start with we'll look through the files in one of the example datasets we have provided. 

You can find a discussion of the input data format in Maud's documentation [here](https://maud-metabolic-models.readthedocs.io/en/latest/usage/inputting.html), but this guide should be enough to get started.

First let's check out the contents of the folder `data/in/linear` by running the first cell below.

In [1]:
!ls data/in/linear

config.toml             kinetic_model.toml      priors.csv
experiments.csv         linear.json             true_params_linear.json


The most important of these files, and the only one whose name is important, is `config.toml`. This file tells Maud where to look for the other files it needs, and stores some configuration information that is passed on to Stan. 

Here is what our example config file looks like:

In [2]:
!cat data/in/linear/config.toml

name = "linear"
kinetic_model = "kinetic_model.toml"
priors = "priors.csv"
experiments = "experiments.csv"
likelihood = true

[cmdstanpy_config]
iter_warmup = 20
iter_sampling = 20
chains = 4
save_warmup = true
show_progress = "notebook"

[ode_config]
abs_tol_forward = 1e-4
rel_tol_forward = 1e-4
abs_tol_backward = 1e-4
rel_tol_backward = 1e-4
abs_tol_quadrature = 1e-4
rel_tol_quadrature = 1e-4
max_num_steps = 1e6
timepoint = 1e3



The first few lines of this file define top-level fields. These tell Maud where to look for the files it needs, and do some high level configuration (`name` picks a name for the input and `likelihood` tells Maud whether or not to run in priors-only mode).

The second section defines a table called `cmdstanpy_config`. Each field here specifies a keyword argument to [cmdstanpy's `sample` method](https://cmdstanpy.readthedocs.io/en/v0.9.77/api.html#cmdstanpy.CmdStanModel.sample). In this case we tell Stan to run 4 markov chains, each of which should use 200 iterations in both its warmup and sampling phases, and to save both warmup and sampling draws in its output csv files.

The fields in the final section specify tuning parameters for the Stan function [`ode_adjoint_tol_ctl`](https://mc-stan.org/docs/2_27/functions-reference/functions-ode-solver.html#adjoint-sensitivity-solver), except for `timepoint`, which tells Maud how long to simulate this function for.

Now let's look at the other files in the folder, starting with `kinetic_model.toml`.

In [3]:
!cat data/in/linear/kinetic_model.toml

###### Kinetic model ######
[[compartments]]
id = 'c'
name = 'cytosol'
volume = 1

[[compartments]]
id = 'e'
name = 'external'
volume = 1

[[metabolites]]
id = 'M1'
name = 'External metabolite number 1'
balanced = false
compartment = 'e'

[[metabolites]]
id = 'M2'
name = 'External metabolite number 2'
balanced = false
compartment = 'e'

[[metabolites]]
id = 'M1'
name = 'Metabolite number 1'
balanced = true
compartment = 'c'

[[metabolites]]
id = 'M2'
name = 'Metabolite number 2'
balanced = true
compartment = 'c'

[[reactions]]
id = 'r1'
name = 'reaction number 1'
stoichiometry = { M1_e = -1, M1_c = 1}
mechanism = "reversible_modular_rate_law"
[[reactions.enzymes]]
id = 'r1'
name = 'the enzyme that catalyses reaction r1'
[[reactions.enzymes.modifiers]]
modifier_type = 'allosteric_activator'
mic_id = 'M2_c'

[[reactions]]
id = 'r2'
name = 'reaction number 2'
stoichiometry = { M1_c = -1, M2_c = 1 }
mechanism = "reversible_modular_rate_law

This file consists of a table of `compartments`, a table of `metabolites` and a table of `reactions`.

## Step 1: check the initial conditions

Rather than going straight ahead and sampling, we usually prefer to run a single, fixed-parameter HMC iteration at the initial parameter values and inspect the results before proceeding futher, using the command `maud
simulate`. By default Maud initialises at the prior mean, but initial values can also be specified.

The main reason for running `maud simulate` before sampling is to quickly catch cases where the sampler has difficulty traversing the posterior distribution early in the run. This might be due to an error in the provided input data, in which case it is quite likely that the printed output will be weird in a way that makes it easier to track down the problem. Alternatively, it could be that the prior mean happens to be near a natural saddle point or other tricky part of the posterior distribution. In this case it might be necessary to specify custom initial values.

In [4]:
!maud simulate data/in/linear --output_dir="data/out"

Creating output directory: data/out/maud_output_sim-linear-20210914220230
Copying user input from data/in/linear to data/out/maud_output_sim-linear-20210914220230/user_input
INFO:cmdstanpy:found newer exe file, not recompiling
INFO:cmdstanpy:compiled model file: /Users/tedgro/Code/Maud/src/maud/model
INFO:cmdstanpy:start chain 1
INFO:cmdstanpy:finish chain 1

Simulated concentrations and fluxes:
experiments  mics
condition_1  M1_c    0.086876
             M1_e    0.100000
             M2_c    0.107584
             M2_e    0.100000
condition_2  M1_c    0.086881
             M1_e    0.100000
             M2_c    0.107572
             M2_e    0.100000
Name: conc, dtype: float64
experiments  reactions
condition_1  r1           0.000628
             r2           0.000628
             r3           0.000628
condition_2  r1           0.000628
             r2           0.000629
             r3           0.000627
Name: flux, dtype: float64
experiments  enzymes
condition_1  r1         0.1
       

## Step 2: generate posterior samples



In [None]:
!maud sample data/in/linear --output_dir="data/out"

Creating output directory: data/out/maud_output-linear-20210914220232
Copying user input from data/in/linear to data/out/maud_output-linear-20210914220232/user_input
INFO:cmdstanpy:found newer exe file, not recompiling
INFO:cmdstanpy:compiled model file: /Users/tedgro/Code/Maud/src/maud/model
HBox(children=(FloatProgress(value=0.0, description='Chain 1 - warmup', max=1.0, style=ProgressStyle(description_width='initial')), HTML(value='')))
HBox(children=(FloatProgress(value=0.0, description='Chain 2 - warmup', max=1.0, style=ProgressStyle(description_width='initial')), HTML(value='')))
HBox(children=(FloatProgress(value=0.0, description='Chain 3 - warmup', max=1.0, style=ProgressStyle(description_width='initial')), HTML(value='')))
HBox(children=(FloatProgress(value=0.0, description='Chain 4 - warmup', max=1.0, style=ProgressStyle(description_width='initial')), HTML(value='')))




## Step 3: analyse the samples



## Step 4: out-of-sample predictions