merrin
is a Python3 tool to compute metabolic regulatory rules from time series observations.
This implementation rely on merrinasp
, extension of the Answer Set Programming (ASP) solver clingo
with quantified linear constraints.
To install the merrin
package from the GitHub repository, run the pip command:
python3.X -m pip install git+https://github.com/bioasp/merrin
merrin
can be used in the terminal as follows:
merrin [-h] -sbml SBML -pkn PKN -obj OBJ -obs OBS [-out OUTPUT] [--lpsolver {glpk,gurobi}] [--timelimit TIMELIMIT]
[--optimization {all,subsetmin}] [--projection {network,node}]
Mandatory arguments:
-sbml SBML, --SBML SBML
Metabolic network in SBML file format.
-pkn PKN, --PKN PKN Prior Knowledge Network.
-obj OBJ, --objective-reaction OBJ
Objective reaction.
-obs OBS, --observations OBS
JSON file describing the input timeseries.
Optional arguments:
-out OUTPUT, --output-file OUTPUT
Output CSV file (default: ./merrin-<optimization>-<projection>-<timestamp>.csv)
--lpsolver {glpk,gurobi}
Linear solver to use (default: glpk)
--timelimit TIMELIMIT
Timelimit for each resolution, -1 if none (default: -1)
--optimization {all,subsetmin}
Select optimization mode: all networks or subset minimal ones only (default: subsetmin)
--projection {network,node}
Select project mode (default: network):
- node: enumerate the candidate rules for each node;
- network: enumerate all the rules of each network
Metabolic network should be in SBML
(Systems Biology Markup Language) version 3 format.
Prior Knowledge Network (PKN) is a text file where each line is such that:
node_1 sign node_2
with:
node_1
andnode_2
are two components of the regulatory or metabolic systems.sign
in (0
,-1
,1
) such that:-1
is an inhibition effect ofnode_1
onnode_2
;1
is an activation effect ofnode_1
onnode_2
;0
is an unknown effect (either activation or inhibition effect) ofnode_1
onnode_2
;
Examples
Carbon1 0 RPcl
RPcl 1 Tc2
Tc2 -1 RPcl
In this example, RPcl
regulatory rule depends on an unknown interaction with Carbon1
and an inhibition effect of Tc2
.
merrin
is compatible with any combination of the following datatypes: kinetics, fluxomics and transcriptomics.
The observations can be noisy. Note that it is preferable not to enter observations that are not certain.
Observations are described in a json
file. Each time series observation is defined as follows:
{
"file": "path/to/the/csv/file",
"type": ["Kinetics","Fluxomics","Transcriptomics"], <- any non-empty subset
"constraints": {
"mutations": {
"node_u": true, <- forced activation
"node_v": false, <- forced inhibition
},
"bounds": {
"reaction": [lower_bound, upper_bound]
}
}
}
The csv
file describing the observation needs to have a Time
column with an integer timestamp for each observed time step.
For kinetics and fluxomics data types:
- Metabolites: real-values, modeling the metabolite concentration in the substrate.
- Need to contain a
biomass
column with the measured value of the biomass.
For fluxomics data types:
- Reaction: real-values, modeling the reaction activity rates in the metabolic network.
For transcriptomics data types:
- All values are binary (
0
or1
), modeling the activity (1
) or inactivity (0
) of a component (metabolite, reaction, regulatory nodes).
merrin
generates a CSV
file describing the inferred regulatory networks.
A rule set to 1
represents a constant value (i.e. always activated) for which no regulatory rules are necessary to explain the component dynamics.
Remarks 1: If no regulatory networks are returned, then the instance is unsatisfiable.
Try to change the max_gap
and max_error
variables before launching merrin
again.
Remarks 2: For unsatisfiable instances with kinetics and/or fluxomics data, launching merrin
with the observation declared as transcriptomics data only can sometimes allow inferring some regulatory networks.
Regulatory rules are returned in disjunctive normal form (DNF) with the following syntax:
R := 1 || C || (C_1 | ... | C_n)
C := L || (L_1 & ... & L_m)
L := N || !N
N := regulatory component name
with !
denoting the negation, &
the logical and, and |
the logical or.
An example is provided in ./examples
.
The instance ./examples/ecoli-small
has been generated from the regulatory metabolic network and the experiments described in (Covert et al., 2001).
To solve the instance using the console command, see the bash file: ./examples/run-merrin.sh
.
It can be executed with:
sh ./examples/run-merrin.sh
To solve the instance using a Python script using merrin
, check the jupyter notebook: ./examples/notebook-merrin.ipynb
.
Network projection: Infer regulatory networks.
Each row of the output CSV
is a regulatory network and each column is the rules for a given regulatory component.
Example 1: Network projection + All optimization
R2a,R2b,R5a,R5b,R7,R8a,RPO2,RPb,RPcl,RPh,Rres,Tc2
!RPb,1,1,!RPO2,1,!RPh,!Oxygen,R2b,Carbon1,Hext,1,!RPcl
!RPb,1,1,!RPO2,!RPb,!RPh,!Oxygen,R2b,Carbon1,Hext,1,!RPcl
!RPb,1,!RPO2,!RPO2,!RPb,!RPh,!Oxygen,R2b,Carbon1,Hext,1,!RPcl
!RPb,1,!RPO2,!RPO2,!RPb,!RPh,!Oxygen,R2b,Carbon1,Hext,!RPO2,!RPcl
...
Only the first 4 inferred regulatory networks are shown.
The node R2b
is always set to 1
, it does not have any regulatory rules, and so, is always activated.
Example 2: Network projection + Subset minimal optimization
R2a,R2b,R5a,R5b,R7,R8a,RPO2,RPb,RPcl,RPh,Rres,Tc2
!RPb,1,1,1,1,!RPh,!Oxygen,R2b,Carbon1,Hext,1,!RPcl
Node projection: Infer possible regulatory rules for each regulatory component. Output file will only contain 1 row. Each cell contains a set of compatible regulatory rules separated by ';'.
Example 3: Node projection + All optimization
R2a,R2b,R5a,R5b,R7,R8a,RPO2,RPb,RPcl,RPh,Rres,Tc2
!RPb,1,!RPO2;1,!RPO2;1;RPO2,!RPb;1,!RPh,!Oxygen,R2b,Carbon1,Hext,!RPO2;1,!RPcl
The node R5a
has 2 possible regulatory rules: !RPO2
or 1
(unregulated).
Example 4: Node projection + Subset minimal optimization
R2a,R2b,R5a,R5b,R7,R8a,RPO2,RPb,RPcl,RPh,Rres,Tc2
!RPb,1,1,1,1,!RPh,!Oxygen,R2b,Carbon1,Hext,1,!RPcl
To cite this tool:
Kerian Thuillier, Caroline Baroukh, Alexander Bockmayr, Ludovic Cottret, Loïc Paulevé, Anne Siegel, MERRIN: MEtabolic regulation rule INference from time series data, Bioinformatics, Volume 38, Issue Supplement_2, September 2022, Pages ii127–ii133, https://doi.org/10.1093/bioinformatics/btac479 [pdf]