<h1 align="center" style="font-variant: small-caps"><strong>GBApy</strong> Tutorial: How to build a genome-scale GBA model</h1>
<h2 align="center">PART 5: Model size reduction</h2>

## Table of content

- [1) Objective](#objective)
- [2) Load the model builder](#load)

## 1) Objective <a id="objective"></a>

In the previous tutorial (part 4), we finalized a near complete version of the minimal cell and obtained what appears to be a correct initial solution. However, such a model cannot be optimized easily as its matrix is not full column rank (see below).
It has been shown that if the matrix is full column rank, the problem is expected to be convex. This is usually the case when the matrix represents a single elementary flux mode (EFM).

In this tutorial, we will then reduce the model to obtain a full column rank and consistent GBA model. Many approaches can be used to obtain an accurate EFM:
- Standard FBA optimization,
- More elaborated linear approaches, such as Gecko, or resource balance analysis (RBA).

As a first approach, we will use our reconstructed FBA model using the original biomass reaction to generate an EFM.

## 2) Load the model builder <a id="load"></a>

The model builder "GBA_MMSYN" is loaded from the backup created in the previous tutorial.

In [1]:
import os
import sys
import cobra
import numpy as np

sys.path.append('./src/gba/')

from GbaBuilder import *
from GbaModel import *

builder    = load_GBA_builder("./backup/mmsyn_build_4.gbabuild")
sbml_model = cobra.io.read_sbml_model("./data/manual_curation/MMSYN_annotated.xml")

Set parameter Username
Academic license - for non-commercial use only - expires 2025-06-06
Read LP format model from file /var/folders/5h/n0xqh1xj2zg6yyd747k95bc80000gn/T/tmp3c8n6dt1.lp
Reading time = 0.00 seconds
: 156 rows, 376 columns, 1558 nonzeros


## 3) Create a routine to check the model structure, the FBA growth rate and the mass fraction column rank

In [2]:
def check_compression( builder ):
    builder.check_model(test_structure=True)
    builder.build_FBA_model()
    builder.convert(ribosome_byproducts=False)
    builder.build_GBA_variables()
    growth = builder.FBA_solution.objective_value
    rank   = np.linalg.matrix_rank(builder.GBA_intM)
    nbcol  = builder.GBA_intM.shape[1]
    print("> Growth rate: ", growth)
    print("> Rank: ", rank, "/", nbcol, "(diff = %d)" % (nbcol-rank))
    print("(M shape: %s)" % str(builder.GBA_intM.shape))
    sol       = builder.FBA_solution.fluxes.to_dict()
    to_remove = [r_id for r_id in sol if sol[r_id] == 0.0 and not r_id.startswith("EX_")]
    print(">", len(to_remove), "reactions to remove")
    print(to_remove)

## 4) Detect inactive reactions in the FBA solution

In [3]:
sol       = builder.FBA_solution.fluxes.to_dict()
to_remove = [r_id for r_id in sol if sol[r_id] == 0.0 and not r_id.startswith("EX_")]
print(">", len(to_remove), "reactions to remove")
to_remove

> 40 reactions to remove


['ACALDt',
 'ADEt2',
 'ADNabc',
 'CTPDP',
 'CTPSDUMP',
 'DCDPMP',
 'DCMPDA',
 'DCTPDP',
 'DCTPMP',
 'DRPA',
 'DURIK1',
 'DURIabc',
 'DUTPDP',
 'GAPDP',
 'GUAt2',
 'LDH_L',
 'LIPTA',
 'L_LACt2r',
 'NADHK',
 'NAt3',
 'NTD1',
 'NTD5',
 'NTD6',
 'NTD8',
 'NTD9',
 'PDH_acald',
 'PGK3',
 'PGK4',
 'PPM2',
 'PUNP1',
 'PYK',
 'PYK9',
 'PdhC_transl',
 'RNDR1',
 'RNDR2',
 'RNDR3',
 'RNDR4',
 'rRNAP',
 'rRNA_degrad',
 'tRNAP']

We see that we should, in principle, remove 59 reactions to reach an EFM. However, we must proceed carefully because some inactive reactions in FBA are production reactions, and cannot be removed rom the GBA model (<em>e.g.</em> RNA and tRNA production reactions).

There is thus some inconsistencies here. To get a more realistic estimate, we will enforce the production of trdox and RNA molecules by modifying FBA boundaries, and recalculate the solution. We will then proceed pathway by pathway, by using the map provided in Breuer et al. (2019).

In [4]:
production_reactions = ['tRNAP',
                        'rRNAP',
                        'PdhC_transl',
                        'dUTPase_transl']
min_flux = 1e-8
for r_id in production_reactions:
    builder.reactions[r_id].lb = min_flux
builder.build_FBA_model()

sol       = builder.FBA_solution.fluxes.to_dict()
to_remove = [r_id for r_id in sol if sol[r_id] == 0.0 and not r_id.startswith("EX_")]
print(">", len(to_remove), "reactions to remove")
to_remove

> 34 reactions to remove


['ACALDt',
 'ADEt2',
 'CTPDP',
 'CTPSDUMP',
 'DCDPMP',
 'DCMPDA',
 'DCTPDP',
 'DCTPMP',
 'DRPA',
 'DURIK1',
 'DURIabc',
 'DUTPDP',
 'GAPDP',
 'GUAt2',
 'NADK',
 'NAt3',
 'NTD1',
 'NTD5',
 'NTD6',
 'NTD8',
 'NTD9',
 'PDH_acald',
 'PGK4',
 'PPM2',
 'PUNP2',
 'PUNP4',
 'PYK3',
 'PYK5',
 'PYK9',
 'PYRt2r',
 'RNDR1',
 'RNDR2',
 'RNDR3',
 'RNDR4']

We see that despite PDH_acald and trdoxt are still zero (we want them to be active), we have much less inactive reactions to consider.

## 5) Remove reactions pathway by pathway

In [5]:
builder.reset_conversion()
###########################
FBA_inactive_reactions = ["ADEt2", "NAt3", "GUAt2",# "PYRt2r",
                          "RNDR1", "RNDR2", "RNDR3", #"TRDR",# "RNDR4",
                          "PUNP2", "PUNP4",
                          "PYK", "PYK5",# "PYK9",
                          "NTD5", "NTD6", "NTD8", "NTD9",
                          "CTPDP", "DCDPMP", "DCTPDP", #"DUTPDP",
                          "NADK", "PGK3",
                          "GAPDP",
                          "FMETTRS", "GHMT", "MTHFD", "MTHFC",
                          "RNDR4", "PYK9", "DUTPDP",
                          "GHMT2",
                          "FMNAT", "Ht", "TRDR", "NADHK",
                          "PDH_acald", "PPM2", "DRPA", "ACALDt",
                          "DCTPMP", "trdoxt", "trdrdt",
                          "NAt"]
FBA_inactive_metabolites = ["x_ade", "x_gua",# "x_pyr",
                            "2dr1p", "trdrd", "trdox", "x_trdox", "x_trdrd",
                            "fmettrna", "thfglu3", "mlthfglu3", "10fthfglu3",
                            "dudp", "dump", "dutp",
                            "fad", "nadp", "nadph",
                            "x_acald", "acald", "2dr5p",
                            "x_na1", "na1"]
###########################
GBA_non_independent_reactions   = ["ATPase"]#, "L_LACt2r"]
GBA_non_independent_metabolites = []#"x_lac__L"]
###########################
builder.remove_reactions(FBA_inactive_reactions)
builder.remove_metabolites(FBA_inactive_metabolites)
#if remove_non_independent_GBA_reactions:
builder.remove_reactions(GBA_non_independent_reactions)
builder.remove_metabolites(GBA_non_independent_metabolites)
builder.check_model(test_structure=True)
    

> Info: No missing mass in the model
> Info: No missing kinetic parameters in the model
> Info: No connectivity issues in the model
> Info: No unproduced metabolites in the model
> Info: No isolated metabolites in the model


In [6]:
check_compression(builder)

> Info: No missing mass in the model
> Info: No missing kinetic parameters in the model
> Info: No connectivity issues in the model
> Info: No unproduced metabolites in the model
> Info: No isolated metabolites in the model
> Growth rate:  None
> Rank:  106 / 113 (diff = 7)
(M shape: (106, 113))
> 14 reactions to remove
['DCYTK', 'DCYTabc', 'DURIK1', 'DURIabc', 'LIPTA', 'NACabc', 'NADS', 'NCTPPRT', 'NNATr', 'NTD1', 'PYRt2r', 'PdhC_transl', 'RBFK', 'RIBFLVabc']




### 5.6) Additional reactions to remove to reach full column rank

We see that the internal mass fraction matrix is almost full column rank.
We need to remove 4 more duplicate reactions:

In [7]:
builder.check_mass_balance(verbose=True)
builder.convert(ribosome_byproducts=False)
builder.build_GBA_variables()



In [8]:
#sbml_model = cobra.io.read_sbml_model("./data/manual_curation/MMSYN_annotated.xml")
#sol = sbml_model.optimize()
sol = builder.FBA_solution
for item in sol.fluxes.items():
    r_id  = item[0]
    f_val = item[1]
    if r_id != "BIOMASS" and not r_id.startswith("EX_") and r_id in builder.reactions:
        if f_val >= 0.0 or r_id == "H2Ot":
            builder.convert_GBA_reaction_to_forward_irreversible(r_id, ReactionDirection.Forward)
        elif f_val < 0.0 and r_id != "H2Ot":
            print(r_id, f_val)
            builder.convert_GBA_reaction_to_forward_irreversible(r_id, ReactionDirection.Backward)

glucose   = 5.0
step      = 0.1
condition = 1
rho       = 1100.0
builder.clear_conditions()
while glucose > step/2:
    builder.add_condition(condition_id=condition, rho=1100.0, default_concentration=1.0,
                          metabolites={"x_glc__D": glucose, "x_h2o": 1000.0, "x_h": 1e-7})
    glucose -= step
    condition += 1

builder.clear_constant_reactions()
builder.define_constant_reaction("H2Ot", 0.72)

# DNA is a self-activator
builder.add_activation_constant("DNA", "DNAP", 16.94)

# DNA activates the transcription of rRNAs
builder.add_activation_constant("DNA", "rRNAP", 16.94)

# DNA activates the transcription of tRNAs
builder.add_activation_constant("DNA", "tRNAP", 16.94)

# rRNAs activate the translation of proteins
builder.add_activation_constant("rRNA", "Ribosome", 50.12392)

CO2t -6.9
FORt -1.5873771676905557e-07
NH3t -0.7945049848876744
PGK4 -3.0844392418210775


In [9]:
builder.export_GBA_model("./models/")
model = GbaModel(builder.name)
model.read_from_csv(path="./models/")
solved = model.solve_local_linear_problem(rhs_factor=7)
if solved:
    model.set_condition("1")
    model.set_f0(model.LP_solution)
    model.calculate_state()
    model.check_model_consistency()
    if model.consistent:
        print("> Model is consistent with mu =", model.mu)
        model.save_f0(path="./models/")
    else:
        print("> Model is inconsistent")
else:
    print("> Model could not be solved")

> Model is consistent with mu = 0.0003693449915986038
