# Genome-scale metabolic modeling

Metabolic reactions underpin cellular function, converting nutrients into energy and cellular building blocks. Genome-scale metabolic models (GEMs) offer a holistic representation of these pathways based on an organism's genome, providing insights into its metabolic potential under various conditions. GEMs represent the collection of metabolic reactions as a [biochemical network](https://en.wikipedia.org/wiki/Metabolic_network), in which nodes represent metabolites (chemical compounds) and edges are reactions in which these metabolites participate either as substrates or as products. Additionally, GEMs include the _stoichiometry_ with which a metabolite participates in each reaction (that is, the number of molecules of each metabolite involved in the reaction).By convention, positive stoichiometric coefficients represent products and negative stoichiometric coefficients represent substrates.

> As a curiosity, biochemical networks cannot be represented by simple graphs, but rather by [hypergraphs](https://en.wikipedia.org/wiki/Hypergraph), in which edges connect more than two nodes. This is because biochemical reactions may involve multiple substrates and products. 

As in with any graph, we can represent a biochemical network by its [incidence matrix](https://en.wikipedia.org/wiki/Incidence_matrix), the stoichiometric matrix, $S$, which is a central piece of a GEM. In $S$, columns represent reactions and rows metabolites. Thus, its entries $s_{ij}$ represent the stoichiometric coefficient of metabolite $i$ in reaction $j$. Most metabolites participate in a small number of reactions, so $S$ is a sparse matrix, where most of its entries are 0.

### A toy network example

Here is a toy biochemical network with 6 metabolites $A-F$ and 5 reactions $R_{1-5}$ to help us visualize how the stoichiometric matrix is constructed. Note that in this case all stoichiometric coefficients are 1, but in actual GEMs they can be any integer (or even a fraction in the case of the biomass pseudo-reaction). Note that column vectors represent reactions, while row vectors, metabolites:

<div style="width:100%; text-align:center;">
<img src="images/toy_network.png" alt="toy network" style="width: 20%; height: auto; margin: auto">
</div>

The stoichiometric matrix $S$ is constructed by stacking the column vectors of the network, so that the first column of $S$ is the vector representing reaction $R_1$, while the first row of $S$ is the vector representing metabolite $A$, and so on. Thus, the first entry of $S$ is the stoichiometric coefficient of metabolite $A$ in reaction $R_1$ (acting as a substrate in this case):

$$
S = 
\begin{align*}
\begin{pmatrix}
-1 & 0 & 0 & -1 & 0 \\
1 & -1 & 0 & 0 & 0 \\
1 & 1 & -1 & 0 & 0 \\
0 & 0 & 1 & 0 & 0 \\
0 & 0 & 0 & 1 & -1 \\
0 & 0 & 0 & 0 & 1 \\
\end{pmatrix}
\end{align*}
$$

Now, reactions transform substrates into products at a given rate, or flux. This flux is typically measured in $mmol$ of transformed product per unit time per dry weight. The interesting thing about $S$ is that it allows us to model reaction fluxes, let's name the vector of fluxes $v$. The flux vector $v$ is a vector of real numbers, where each entry $v_j$ represents the flux of reaction $j$, and it is related to the stoichiometric matrix $S$ by the following equation:

$$\frac{d}{dt}x = Sv,$$

and under [steady-state]() conditions, where the concentration of each metabolite is constant, we have:

$$Sv = 0.$$

The above is typically an underdetermined system of linear equations. Thus, instead of a single solution, we end up with a space of possible solutions, $v$, which is called the _nullspace_ of $S$. However, reaction fluxes are subject to maximum capacity and thermodynamic constraints. Ther former restricts the maximum flux of each reaction, while the latter restricts the direction of fluxes. These constraints are typically represented by a vector of lower and upper bounds, $v_{min}$ and $v_{max}$, respectively. Thus, the space of possible solutions is further restricted to the _feasible flux space_, a.k.a. the [flux cone]() $F$, which is the set of all flux vectors satisfying:

$$F = \{v: Sv = 0, v_{min} \leq v \leq v_{max}\}.$$ 


# Genome-scale models in cobrapy

`cobrapy` (COnstraint-Based Reconstruction and Analysis in PYthon) is a Python package for constraint-based modeling of metabolic networks. It provides a set of tools for the analysis of GEMs, including the ability to read and write GEMs in the [SBML]() format, and to perform [flux balance analysis]() (FBA) and [flux variability analysis]() (FVA). In this tutorial, we will use `cobrapy` to analyze a GEM of the marine diatom _Thalassiosira pseudonana_.

In [1]:
import numpy as np
import cobra

In [2]:
model = cobra.io.read_sbml_model("../data/models/iTps1432_high.xml")
model

Set parameter Username
Academic license - for non-commercial use only - expires 2023-11-05


0,1
Name,iTps1432
Memory address,7fcad64e1490
Number of metabolites,2792
Number of reactions,6079
Number of genes,1432
Number of groups,93
Objective expression,4.0*DM_biomass_c - 4.0*DM_biomass_c_reverse_c49c1
Compartments,"plastid, mitochondria, cytosol, peroxisome, thylakoid membrane, extracellular"


In the summary above, we see that the model contains over 6000 biochemical reactions and more than 2700 metabolites, which are partitioned into 6 compartments: plastid, mitochondria, cytosol, peroxisome, thylakoid membrane and extracellular space. Note that _Thalassiosira_ is a phototrophic eukaryote (a microalga), thus it contains chloroplasts alongside mitochondria. A special reaction within the model is the _biomass_ reaction, a pseudo-reaction that represents the growth of the organism (more on this later), and which is set as the objective function of the model.

We can extract the stoichiometric matrix of the model as follows:

In [3]:
S = cobra.util.create_stoichiometric_matrix(model)
print(f"The shape of S is {S.shape}")
print(f"S contains {100 * np.count_nonzero(S) / S.size}% non-zero elements")
print(" ")
print(S)

The shape of S is (2792, 6079)
S contains 0.1577486683217295% non-zero elements
 
[[ 0.          0.          0.         ...  0.          0.
   0.        ]
 [ 0.          0.          0.         ...  0.          0.
   0.        ]
 [ 0.          0.          0.         ...  0.          0.
   0.        ]
 ...
 [ 0.          0.          0.         ... -0.04010769  0.
   0.        ]
 [ 0.          0.          0.         ...  1.          0.
   0.        ]
 [ 0.          0.          0.         ... -0.01311877  0.
   1.        ]]


As expected, the number of rows of the stoichiometric matrix $S$ coincides with the number of metabolites in the model, while the number of columns is the number of reactions. Additionally, only 15.8% of its entries (stoichiometric coefficients) are non-zero, which is a consequence of the sparsity of the matrix.

## Flux balance analysis

## Analyzing the flux cone: flux variability analysis and flux sampling