<h1 align="center" style="font-variant: small-caps"><strong>GBApy</strong> Tutorial: How to build a genome-scale GBA model</h1>
<h2 align="center">PART 4: Build a first version of the GBA model</h2>
<h5 align="center"><code>Version 4</code> (January 2025, Charles Rocabert)</h5>

## Table of content

- [1) Objective](#objective)
- [2) Load the model builder](#load)
- [3) Check and adjust mass balance for every reactions](#mass_balance)
- [4) Convert model units](#convert)
- [5) Build the GBA model](#build_GBA)
- [6) Create external conditions](#conditions)
- [7) Add activation reactions](#activation)
- [8) Make water transport constant](#water)
- [9) Enforce the directionality of reactions based on the FBA solution](#direction)
- [10) Export the GBA model](#export)
- [11) Test for an initial solution](#initial_solution)

## 1) Objective <a id="objective"></a>

Now that we have checked the structure of the model and made a first round of simplifications, we can convert it. <strong>GBApy</strong> provides automated methods for this task. The conversion steps are fully described in Dourado et al. (2023).
See also [section 4](#convert).


## 2) Load the model builder <a id="load"></a>

The model builder "GBA_MMSYN" is loaded from the backup created in the previous tutorial.
We also re-load the original SBML model of a minimal cell.

In [1]:
import os
import sys
import cobra

sys.path.append('./src/gba/')

from GbaBuilder import *
from GbaModel import *

builder    = load_GBA_builder("./backup/mmsyn_build_3.gbabuild")
sbml_model = cobra.io.read_sbml_model("./data/manual_curation/MMSYN_annotated.xml")

Set parameter Username
Academic license - for non-commercial use only - expires 2025-06-06
Read LP format model from file /var/folders/5h/n0xqh1xj2zg6yyd747k95bc80000gn/T/tmpb5ony_dd.lp
Reading time = 0.00 seconds
: 156 rows, 376 columns, 1558 nonzeros


## 3) Check and adjust mass balance for every reactions <a id="mass_balance"></a>

But first, we need to check the mass balance of each reaction, <em>i.e.</em> the total mass of the reactants equals the total mass of the products. By default, <strong>GBApy</strong> uses a threshold of <strong>0.1 Da</strong>, which represents One tenth of a proton mass.

In [2]:
builder.check_mass_balance(verbose=True)



We must correct the molecular mass of a few metabolites to reach a mass balance:

In [3]:
# Amino-acid and tRNA metabolism
builder.metabolites["x_AA"].mass   += 17.007345999998506
builder.metabolites["AA"].mass     += 17.007345999998506
builder.metabolites["tRNA"].mass   -= 94.58300315200177
builder.metabolites["AAtRNA"].mass -= 94.58300315200177

# Macro-molecules
builder.metabolites["rRNA"].mass         -= 1546.343817033805
builder.metabolites["dUTPase"].mass      += 2600.639
builder.metabolites["apoACP"].mass       += 1752.813
builder.metabolites["ACP"].mass          += 1752.813
builder.metabolites["ACP_R"].mass        += 1757.909
builder.metabolites["PdhC"].mass         += 12940.79
builder.metabolites["lpl_PdhC"].mass     += 12934.08
builder.metabolites["acdhlpl_PdhC"].mass += 12934.08
builder.metabolites["dhlpl_PdhC"].mass   += 12934.08
builder.metabolites["Protein"].mass      += 8932.936

# Small molecules
builder.metabolites["x_5fthf"].mass  += 292.2431019999999
builder.metabolites["5fthf"].mass    += 292.2431019999999
builder.metabolites["fmettrna"].mass -= 96.15291
builder.metabolites["lgltn"].mass    += 7.5569999999988795
builder.metabolites["cdpdag"].mass   += 0.13765799800012246
builder.metabolites["dnad"].mass     += 0.31494099599990477
builder.metabolites["thfglu3"].mass  += 30.026026999999885

builder.check_mass_balance(verbose=True)

> Info: Model build 'GBA_MMSYN' is mass balanced


The model is now mass-balanced (with a threshold of 0.1 Da).

## 4) Convert model units <a id="convert"></a>

<strong>GBApy</strong> provides automatized methods to convert a model into mass units (this implies the conversion of stoichiometric coefficients and kinetic parameters; see Dourado et al. 2023 for a description).

- <strong>(i)</strong> The stoichiometry of each reaction is converted. For each metabolite $i$ and reaction $j$:
  - Each stoichiometric coefficient is multiplied by the mass of the metabolite $m_i$,
    $$S_{i,j} = S_{i,\cdot} \times m_i$$
  - The calculated mass stoichiometry is then normalized (the sum of reactant coefficients is equal to -1, the sum of product coefficients is equal to 1),
    $$
    M_{i,j} =
    \left\{
    \begin{array}{rl}
    \dfrac{S_{i,j}}{\sum_{i \in \text{reactants}} \lvert S_{i,j} \rvert} & \text{ if $i$ reactant}\\\\
    \dfrac{S_{i,j}}{\sum_{i \in \text{products}} S_{i,j}} & \text{ if $i$ product}
    \end{array}
    \right.
    $$
  - For each steps, <strong>GBApy</strong> check calculations.

- <strong>(ii)</strong> $K_\text{M}$ values are converted to mass units:
  - Each $K_\text{M}$ value is multiplied by the mass of the metabolite
    $$K_\text{M,i} = K_\text{M,i} \times m_i \text{ for each metabolite }i$$

- <strong>(iii)</strong> $k_\text{cat}$ values are converted to mass units:
  - Each $k_\text{cat}$ value is multipled by the sum of the product masses times their stoichiometric coefficient, and divided by the mass of the enzyme $e$. Therefore for the forward $k_\text{cat,j}^f$ of reaction $j$:
    $$k_\text{cat,j}^f = k_\text{cat,j}^f \times \dfrac{\sum_{i \in \text{products}} S_{ij} \times m_i}{e_j}$$
  - And for the backward $k_\text{cat,j}^b$:
    $$k_\text{cat,j}^b = k_\text{cat,j}^b \times \dfrac{\sum_{i \in \text{reactants}} S_{ij} \times m_i}{e_j}$$

- <strong>(iv)</strong> We also need to parametrize the mass $k_\text{cat}$ of the ribosomal reaction `Ribosome`. By default, this value is set to $4.55\ h^{-1}$. 

In [4]:
builder.convert(ribosome_byproducts=False)

## 5) Build the GBA variables <a id="build_GBA"></a>

A GBA model is a set of matrices and vectors:
- The mass fraction matrix $M$, which is the pendant in normalized mass units of the stoichiometric matrix,
- The forward and backward $k_\text{cat}$ vectors,
- The forward and backward $K_\text{M}$ matrices,
- The activation matrix $K_\text{A}$,
- The inhibition matrix $K_\text{I}$.

All these objects are created through the method `builder.build_GBA_variables()`.

In [5]:
builder.build_GBA_variables()

## 6) Create external conditions <a id="conditions"></a>

To run a GBA model, it is necessary to define the external conditions, that is, the concentration of external metabolites. We also need to define the total density of the cell $\rho$.

As stated in Breuer et al. (2019), we will assume that all concentrations are saturating, except for glucose, which is the only sugar source in the medium:
- In the SP4 medium, 5 g/L of glucose are added. However, the broth also contains glucose, leading to an additional amount of glucose that must be quantified. For our modeling approach, we will create a range of glucose concentration between 0 and 6 g/L.
- As water is included in our model, we will assume a constration of 1,000 g/L for water,
- We will also assume a pH of 7, that is, a concentration of $10^{-7}$ g/L of protons.
- Every other external metabolites will be set up to the saturating concentration of 1 g/L.

Regarding the total density, as water is included in the model, we must define the buoyant density. This value is unknown for the minimal cell. We will use <em>E. coli</em> mean buoyant density of 1,100 g/L.



In [6]:
glucose   = 6.0
step      = 0.1
condition = 1
rho       = 1100.0
builder.clear_conditions()
while glucose > step/2:
    builder.add_condition(condition_id=condition, rho=1100.0, default_concentration=1.0,
                          metabolites={"x_glc__D": glucose, "x_h2o": 1000.0, "x_h": 1e-7})
    glucose -= step
    condition += 1

## 7) Add activator metabolites <a id="activators"></a>

To make the model more realistic, we will design specific metabolites as activators of important reactions in the model:
- DNA is an activator of its own production, as well as an activator of the transcription of RNA molecules (<em>i.e.</em> rRNAs and tRNAs).
- rRNA is an activator of the translation of proteins through the ribosomal reaction `Ribosome`.

According to Breuer et al. (2019), the mass fractions of these metabolites are:
- DNA: 5.5% of dry mass,
- rRNA: 16.274% of dry mass.

As the buoyant density of the cell is $\rho = 1100$ g/L, and minimal cells contain 72% of water, the expected mass concentration of DNA and rRNA in the cell are:
- DNA = $\rho \times (1-0.72) \times 0.055 = 16.94$ g/L,
- rRNA = $\rho \times (1-0.72) \times 0.16274 = 50.12392$ g/L.

We can therefore set the $K_\text{A}$ constants with these mass concentrations, as an approximation:

In [7]:
# DNA is a self-activator
builder.add_activation_constant("DNA", "DNAP", 16.94)

# DNA activates the transcription of rRNAs
builder.add_activation_constant("DNA", "rRNAP", 16.94)

# DNA activates the transcription of tRNA
#builder.add_activation_constant("DNA", "tRNAP", 16.94)

# rRNAs activate the translation of proteins
builder.add_activation_constant("rRNA", "Ribosome", 50.12392)

## 8) Make water transport constant <a id="water"></a>

According to Breuer et al. (2019), minimal cells contain 72% of water. We will then enforce water transport to this value to ensure the maintenance of this quantity. As the buoyant density $\rho = 1100$, the influx of water should be:

$$f_{H_2O} = \rho \times 0.72 = 792 \text{ g/L}$$

In [8]:
builder.define_constant_reaction("H2Ot", 0.72)

## 8) Enforce the directionality of reactions based on the FBA solution <a id="direction"></a>

To facilitate the research of an initial solution, it is usually better to define irreversible reactions (where fluxes are positive or null). A majority of JCVI-syn3A reactions are forward irreversible (<strong>73.58%</strong>), only one is backward irreversible, and <strong>25.94%</strong> of reactions are reversible:

In [9]:
builder.summary()

Category,Count,Unnamed: 2_level_0,Unnamed: 3_level_0
Category,Count,Percentage,Unnamed: 3_level_1
Category,Count,Percentage,Unnamed: 3_level_2
Category,Count,Percentage,Unnamed: 3_level_3
Proteins,460,,
Metabolites,156,,
Reactions,154,,
Small molecules,133,85.26%,
Macro-molecules,19,12.18%,
DNA(s),1,0.64%,
RNA(s),3,1.92%,
Proteins,0,0.00%,
Unknown,0,0.00%,
Metabolic,119,77.27%,

Category,Count
Proteins,460
Metabolites,156
Reactions,154

Category,Count,Percentage
Small molecules,133,85.26%
Macro-molecules,19,12.18%
DNA(s),1,0.64%
RNA(s),3,1.92%
Proteins,0,0.00%
Unknown,0,0.00%

Category,Count,Percentage
Metabolic,119,77.27%
Transport,35,22.73%
Exchange,0,0.00%

Category,Count,Percentage
Forward,97,62.99%
Backward,1,0.65%
Reversible,56,36.36%


To define the direction of reversible reactions, we will use the flux directions from the FBA solution. The backward reaction will also be converted into a forward reaction:

In [10]:
sol = sbml_model.optimize()
sol = builder.FBA_solution
builder.enforce_directionality(sol.fluxes)

## 9) Export the GBA model <a id="export"></a>

We will now save the builder, and export the GBA model with just built in the standard format of a set of CSV files.

In [11]:
backup_GBA_builder(builder, "mmsyn_build_4", "./backup")
builder.export_GBA_model("./models/")

## 10) Test for an initial solution <a id="initial_solution"></a>

To test for an initial solution, we will load the freshly created GBA model as a `GbaModel` object.
This class is optimized for GBA algorithms, and is separated from the GBA builder (class `GbaBuilder`) for practical reasons.

First, we load the GBA model from the CSV files:

In [12]:
model = GbaModel(builder.name)
model.read_from_csv(path="./models/")

Then, we look for an initial solution. This is done in two steps:
- 1) A linear routine is calculated to generate an initial flux fraction vector `GbaModel.LP_solution`,
- 2) The state of the model is calculated from this solution,
- 3) The consistency of the model is checked (mainly, positivity of main variables and a total density of 1). If the model is consistent we save the initial solution.

#### Note
The RHS factor (`rhs_factor`) is a weighting factor used to adjust the right-hand side of the linear problem. One can adjust its value to find a solution (usually, a smaller RHS factor increases the chance to find a consistent solution). Here, we choose `rhs_factor = 7` after testing different values by hand.

In [13]:
solved = model.solve_local_linear_problem(rhs_factor=6)
if solved:
    model.set_condition("1")
    model.set_f0(model.LP_solution)
    model.calculate_state()
    model.check_model_consistency()
    if model.consistent:
        print("> Model is consistent with mu =", model.mu)
        model.save_f0(path="./models/")
    else:
        print("> Model is inconsistent")
else:
    print("> Impossible to find an initial solution")

> Model is consistent with mu = 0.00035896830087001346


In [None]:
builder.

We see that our linear routine can find one consistent initial solution, with a very low growth rate ($\mu \approx 4 \times 10^{-4}$).