### What is flux-balance analysis?

Flux-balance analysis is a computational biology method that is significantly used in systems biology. It has a wide variety of applications including but not limited to:

1. Culture medium optimization,
2. Identifying gene knockouts (competitive pathways),
3. Metabolic Engineering,
4. Neuroscience.

Many iGEM teams have cited [this paper](https://www.nature.com/articles/nbt.1614) to explain how flux-balance analysis works. In a nutshell, it derives from the idea that reactions and metabolites in a metabolism can be represented as a matrix as follows:

Let's take an arbitrary reaction, for example:

$$2Na + 2HCl \rightarrow 2NaCl + H2$$

This reaction can be represented as a column vector:

$v = \begin{pmatrix} -2 \\\\ -2 \\\\ 2 \\\\ 1 \end{pmatrix}$

where 

- $v[0]$ represents consumption of $Na$, 
- $v[1]$ that of $HCl$, 
- $v[2]$ for $NaCl$, and 
- $v[3]$ for $H2$.

This principle applied to thousands of metabolites and reactions, we get the matrix.


### What is a GSMM

Genome-scale metabolic models (GSMM or GEM for short) are formatted data files that contains all the essential information of the metabolism 
of an organism. Not only the set of reactions and metabolites, but they hold much more information such as gene associations (for turning on/off phenotypes), model compartments, flux constraints, etc. Development of comprehensive and accurate GSMMs for non-model organisms is one of the greatest bottlenecks of systems biology.

Most GSMMs are published and used in the file format called SBML, which stands for **systems biology markup language**. However, the JSON file format, originally created for data transfer in web software applications, is also used for a similar purpose.

### Practice

The [iJO1366](http://bigg.ucsd.edu/models/iJO1366) model was developed to model the metabolism of *Escherichia coli*. In the following practice, you'll be introduced to the basics of the python library COBRApy that is used to compile these models and run very basic forms of flux-balance analysis.

First start off by installing COBRApy and making necessary imports into the notebook.

In [30]:
!pip3 install cobra

from cobra import io, Model, Solution
import argparse, os, sys



Now, we read our model file into a `Cobra.Model` object.

In [31]:
model: Model = io.read_sbml_model('./iJO1366.xml')

five_met_ids = [met.id for met in model.metabolites[:5]]
five_met_names = [met.name for met in model.metabolites[:5]]
five_rxns_ids = [rxn.id for rxn in model.reactions[:5]]
five_rxns_names = [rxn.name for rxn in model.reactions[:5]]
print(f'First 5 metabolites: {five_met_ids}')
print(f'First 5 metabolite names: {five_met_names}')
print(f'First 5 reactions: {five_rxns_ids}')
print(f'First 5 reaction names: {five_rxns_names}')

First 5 metabolites: ['10fthf_c', '12dgr120_c', '12dgr140_c', '12dgr141_c', '12dgr160_c']
First 5 metabolite names: ['10-Formyltetrahydrofolate', '1,2-Diacyl-sn-glycerol (didodecanoyl, n-C12:0)', '1,2-Diacyl-sn-glycerol (ditetradecanoyl, n-C14:0)', '1,2-Diacyl-sn-glycerol (ditetradec-7-enoyl, n-C14:1)', '1,2-Diacyl-sn-glycerol (dihexadecanoyl, n-C16:0)']
First 5 reactions: ['EX_cm_e', 'EX_cmp_e', 'EX_co2_e', 'EX_cobalt2_e', 'DM_4crsol_c']
First 5 reaction names: ['Chloramphenicol exchange', 'CMP exchange', 'CO2 exchange', 'Co2+ exchange', 'Sink needed to allow p-Cresol to leave system']


The above code reads the `sbml` file into the `Model` object, and then reads out the names and ids of the first 5 reactions and metabolites in the array. To understand how data is stored in the `Model` object, we can further look into the `Metabolite` and `Reaction` classes. The output is too long, but if you want to read their documentation, you can uncomment the print statements in the below cell block.

In [32]:
from cobra import Metabolite, Reaction

# print(help(Metabolite))
# print(help(Reaction))

Flux-balance analysis makes a major assumption; that all metabolites are produced and consumed at the same amount. This assumption describes a steady-state (equilibrium for a set of reactions). Within this assumption, we're trying to find an arrangement of fluxes that yield the highest (maximum) flux through our target reaction (optimization objective).

Let's assume we want to optimize for the **biomass** objective function. The name of this function may vary based on the GSMM and the database that was used to construct it, but the **biomass** keyword is usually in there somewhere.

With the following script, we can identify the biomass reaction.

In [33]:
biomass_rxns = [rxn for rxn in model.reactions if 'biomass' in rxn.id.lower()]

print(f'Biomass Reactions: ')
for rxn in biomass_rxns:
    print(f'- {rxn.id}: {rxn.name}')

Biomass Reactions: 
- BIOMASS_Ec_iJO1366_WT_53p95M: E. coli biomass objective function (iJO1366) - WT - with 53.95 GAM estimate
- BIOMASS_Ec_iJO1366_core_53p95M: E. coli biomass objective function (iJO1366) - core - with 53.95 GAM estimate


Some models may have multiple biomass functions. Usually different biomass functions indicate 

1. the estimations via different experiments, or 
2. predictions under different medium conditions, representing different modes of the organism.

As an example, in our [iGEM 2025 (Sterosaurus)](https://2025.igem.wiki/mcmasteru/) project, we used the iCre1355 model of the *C. reinhardtii* algae organism. There were 3 biomass functions representing the autotrophic, mixotrophic, and heterotrophic mode. These modes represent the growth function of the organism under different medium conditions i.e. O2 stress, CO2 stress, etc. You can find the paper for the model [here](https://onlinelibrary.wiley.com/doi/10.1111/tpj.13059).

For our optimization purposes, we'll use the core biomass function `BIOMASS_Ec_iJO1366_core_53p95M`. Let's now optimize our model and get the flux results.

In [34]:
biomass_obj = 'BIOMASS_Ec_iJO1366_core_53p95M'

model.objective = biomass_obj
solution: Solution = model.optimize()

solution.fluxes.sort_values(ascending=False, inplace=True)
print(f'Top 5 fluxes in optimal solution:')
print(solution.fluxes.head(5))

print(f'\nOptimal growth rate: {solution.objective_value} 1/hr')

Top 5 fluxes in optimal solution:
ATPS4rpp      55.815247
EX_h2o_e      45.619430
CYTBO3_4pp    35.149539
NADH16pp      31.021723
EX_co2_e      19.675223
Name: fluxes, dtype: float64

Optimal growth rate: 0.9823718127269743 1/hr


In this notebook, we've explored what flux-balance analysis is, and how we can use COBRApy to load GSMMs from `sbml` files and optimize for a certain objective function.

In most applications of flux-balance analysis, it is valuable to read the paper of the GSMMs model release as it contains important information relevant to our purpose.

In the future

### Supplementary Material

You can check out last year's [GitHub repo](https://github.com/FarukEfe/McMasterU-iGEM-2025-Modelling) for flux-balance analysis. It can give you a good idea of how an fba project can be structured.