## Overview

Welcome to the first notebook of the fba workshops! The learning objectives of this notebook are as follows:

- Understanding flux-balance analysis
- Understanding the use of GEMs/GSMMs in flux-balance analysis
- Getting familiar with the COBRApy framework
    - Importing a GSMM
    - Finding reactions & metabolites
    - Objective functions and running the analysis
    - Altering reaction upper and lower bounds

The take-home practice section will then try to give you intuition on:

- hybrid objective functions
- competing pathways
- knock-outs

## What is flux-balance analysis?

Flux-balance analysis is a predictive modeling method that is significantly used in systems biology. It has a wide variety of applications including (but not limited to):

1. Media optimization for the culture
2. Finding knock-out targets
3. Understanding metabolic constraints
4. Neuroscience
5. Biomanufacturing
6. Cancer Research

Many iGEM teams have cited [this paper](https://www.nature.com/articles/nbt.1614) to explain how flux-balance analysis works. In a nutshell, it derives from the idea that reactions and metabolites in a metabolism can be represented as a matrix by the following **example:** 

Let's take an arbitrary reaction:

$$2Na + 2HCl \rightarrow 2NaCl + H_2$$

This reaction can be represented as a column vector:

$$v = \begin{pmatrix} -2 & -2 & 2 & 1 \end{pmatrix}^T$$

Where each item represents the coefficient consumed/produced of that metabolite in the reaction. This principle applied to thousands of metabolites and reactions, and we get a linear optimization problem formulated as $Sv = 0$ where:

- $S \in \R^{M \times N}$ represents the stoichiometric matrix with each column a reaction, and each row a metabolite.
- $v \in \R^{N}$ represents the fluxes (rates) of reactions, usually in units $mmol/gDW/h$. 
- an assumption is made that the metabolic network is at steady-state, where each metabolite is consumed and produced at the same amount.

Given a metabolite concentration $[x]$ with the corresponding coefficient $s_i$ in the $i'th$ reaction, and $t$ for time, an equivalent formulation of the problem would be $$ \sum_{i=1}^N s_i v_i = \frac{d[x]}{dt} = 0 $$

As simple as this linear optimization looks, real use of flux-balance analysis pairs with [multiomics data](https://en.wikipedia.org/wiki/Multiomics) to carry out powerful analyses and infer more about the metabolism.

### What are genome-scale metabolic models?

Genome-scale metabolic models (GSMM or GEM for short) are formatted data files that contains all the essential information of the metabolism 
of an organism. Not only the set of reactions and metabolites, but they hold much more information such as gene associations (for turning on/off phenotypes), model compartments, flux constraints, etc. Development of comprehensive and accurate GSMMs for non-model organisms is one of the greatest bottlenecks of systems biology.

Most GSMMs are published and used in the file format called SBML, which stands for **systems biology markup language**. However, the JSON file format, originally created for data transfer in web software applications, is also used for a similar purpose.

### The COBRA Toolbox

The [iJO1366](http://bigg.ucsd.edu/models/iJO1366) model was developed to mimic the metabolism of *Escherichia coli*. In the following practice, you'll be introduced to the basics of the python library COBRApy that is used to compile these models and run very basic forms of flux-balance analysis.

First start off by installing COBRApy and making necessary imports into the notebook.

In [72]:
!pip3 install cobra -qqq

from cobra import io, Model, Solution
import argparse, os, sys

Now, we read our model file into a `Cobra.Model` object.

In [73]:
model: Model = io.read_sbml_model('./iJO1366.xml')

five_mets = [f'\n({i+1}) ID: {met.id} - Name: {met.name}' for i, met in enumerate(model.metabolites[:5])]
five_rxns = [f'\n({i+1}) ID: {rxn.id} - Name: {rxn.name}' for i, rxn in enumerate(model.reactions[:5])]
print(f'First 5 metabolites: {''.join(five_mets)}\n')
print(f'First 5 reactions: {''.join(five_rxns)}')

First 5 metabolites: 
(1) ID: 10fthf_c - Name: 10-Formyltetrahydrofolate
(2) ID: 12dgr120_c - Name: 1,2-Diacyl-sn-glycerol (didodecanoyl, n-C12:0)
(3) ID: 12dgr140_c - Name: 1,2-Diacyl-sn-glycerol (ditetradecanoyl, n-C14:0)
(4) ID: 12dgr141_c - Name: 1,2-Diacyl-sn-glycerol (ditetradec-7-enoyl, n-C14:1)
(5) ID: 12dgr160_c - Name: 1,2-Diacyl-sn-glycerol (dihexadecanoyl, n-C16:0)

First 5 reactions: 
(1) ID: EX_cm_e - Name: Chloramphenicol exchange
(2) ID: EX_cmp_e - Name: CMP exchange
(3) ID: EX_co2_e - Name: CO2 exchange
(4) ID: EX_cobalt2_e - Name: Co2+ exchange
(5) ID: DM_4crsol_c - Name: Sink needed to allow p-Cresol to leave system


The above code reads the `sbml` file into the `Model` object, and then reads out the names and ids of the first 5 reactions and metabolites in the array. To understand how data is stored in the `Model` object, we can further look into the `Metabolite` and `Reaction` classes.

Here's a neater way to display the model information on jupyter notebook, which neatly renders as a table:

In [74]:
model

0,1
Name,iJO1366
Memory address,1fde355b860
Number of metabolites,1805
Number of reactions,2583
Number of genes,1367
Number of groups,0
Objective expression,1.0*BIOMASS_Ec_iJO1366_core_53p95M - 1.0*BIOMASS_Ec_iJO1366_core_53p95M_reverse_5c8b1
Compartments,"cytosol, extracellular space, periplasm"


The output is too long, but if you want to read the documentation of the `Metabolite` and `Reaction` objects, feel free to uncomment the below lines. Alternatively, you can visit the [COBRApy documentation page](https://cobrapy.readthedocs.io/en/latest/) for more information.

In [75]:
from cobra import Metabolite, Reaction

# print(help(Metabolite))
# print(help(Reaction))

As previously discussed, flux-balance analysis makes a major assumption; that all metabolites are produced and consumed at the same amount. This assumption describes a steady-state (equilibrium for a set of reactions). Within this assumption, we're trying to find an arrangement of fluxes that yield the highest (maximum) flux through our target reaction (objective).

Let's assume we want to optimize for the **biomass** objective function. The name of this function may vary based on the GSMM and the database that was used to construct it, but the **biomass** keyword is usually in there somewhere.

With the following script, we can identify the biomass reaction by looking for the **biomass** keyword in all reaction IDs.

In [76]:
biomass_rxns = [rxn for rxn in model.reactions if 'biomass' in rxn.id.lower()]

print(f'Biomass Reactions: ')
for rxn in biomass_rxns:
    print(f'- {rxn.id}: {rxn.name}')

Biomass Reactions: 
- BIOMASS_Ec_iJO1366_WT_53p95M: E. coli biomass objective function (iJO1366) - WT - with 53.95 GAM estimate
- BIOMASS_Ec_iJO1366_core_53p95M: E. coli biomass objective function (iJO1366) - core - with 53.95 GAM estimate


Some models may have multiple biomass functions. Usually different biomass functions indicate 

1. the estimations via different experiments, or 
2. predictions under different medium conditions, representing different modes of the organism.

As an example, in our [iGEM 2025 (Sterosaurus)](https://2025.igem.wiki/mcmasteru/) project, we used the iCre1355 model of the *C. reinhardtii* algae organism. There were 3 biomass functions representing the autotrophic, mixotrophic, and heterotrophic mode. These modes represent the growth function of the organism under different medium conditions i.e. O2 stress, CO2 stress, etc. You can find the paper for the model [here](https://onlinelibrary.wiley.com/doi/10.1111/tpj.13059).

For our optimization purposes, we'll use the core biomass function `BIOMASS_Ec_iJO1366_core_53p95M`. Let's now optimize our model and get the flux results.

In [77]:
biomass_obj = 'BIOMASS_Ec_iJO1366_core_53p95M'

model.objective = biomass_obj
solution: Solution = model.optimize()

solution.fluxes.sort_values(ascending=False, inplace=True)
print(f'Top 5 fluxes in optimal solution:')
print(solution.fluxes.head(5))

print(f'\nOptimal growth rate: {solution.objective_value} 1/hr')

Top 5 fluxes in optimal solution:
ATPS4rpp      55.815247
EX_h2o_e      45.619430
CYTBO3_4pp    35.149539
NADH16pp      31.021723
EX_co2_e      19.675223
Name: fluxes, dtype: float64

Optimal growth rate: 0.9823718127269743 1/hr


Additionally, reactions can be upper and lower bounded within the model. We can do this by reaching a reaction of our choice within the model and change its `upper_bound` and `lower_bound` parameters. Let's try to do this on the medium uptake of our model.

To find a reaction of interest, we may think we need to iterate through every reaction in the model. However, there's simple ways in which we can reduce the workload. For instance, the cobra `Model` object is known to keep record of the compartments of each metabolite. We can first write a script to print out the compartments available in the model, and then filter only for the exchange reactions that have extracellular metabolites as reactants.

We start by listing out the compartments of our model.

In [78]:
model.compartments

{'c': 'cytosol', 'e': 'extracellular space', 'p': 'periplasm'}

Notice that the `'e': 'extracellular space'` compartment fits our use case. Now, let's print out each metabolite found in this compartment.

In [79]:
# Query for all metabolites with 'o2' in their ID or name
mets = model.metabolites.query('.*o2.*')
# Print only those in the extracellular compartment
output = []
for met in mets:
    if met.compartment != 'e':
        continue
    output.append((met.id, met.name))

output

[('co2_e', 'CO2 CO2'),
 ('h2o2_e', 'Hydrogen peroxide'),
 ('kdo2lipid4_e', 'KDO(2)-lipid IV(A)'),
 ('so2_e', 'Sulfur dioxide'),
 ('no2_e', 'Nitrite'),
 ('o2_e', 'O2 O2'),
 ('o2s_e', 'Superoxide anion')]

Now, let's try to constrain the model based on its oxygen uptake. We now find the oxygen uptake reaction.

In [80]:
o2e_id, o2e_name = output[5]

rxns = []
for rxn in model.reactions:
    if o2e_id in [met.id for met, coef in rxn.metabolites.items() if coef < 0]:
        rxns.append(rxn)

for r in rxns:
    print(f'- {r.id}: {r.name}: {r.lower_bound} to {r.upper_bound} mmol/gDW/hr')

- EX_o2_e: O2 exchange: -1000.0 to 1000.0 mmol/gDW/hr
- O2tex: Oxygen transport via diffusion (extracellular to periplasm): -1000.0 to 1000.0 mmol/gDW/hr


In the context of a metabolic model, a range of -1000 to 1000 is practically infinite. Let's set this much lower. The reason why we take the copy of the original model in the below code is because `Model` and `Reaction` behave as reference types, and the variable `oxygen_uptake_rxn` can mutate the original object in the model. Making a copy prevents us from overwriting the original data.

In [82]:
# Original optimization
solution = model.optimize()

print(f'\nOptimal growth rate: {solution.objective_value} 1/hr')

# With new boundaries
copy_model = model.copy()
oxygen_uptake_rxn = copy_model.reactions.get_by_id('EX_o2_e')
oxygen_uptake_rxn.lower_bound = -5.0
oxygen_uptake_rxn.upper_bound = 5.0 

solution = copy_model.optimize()

print(f'\nOptimal growth rate: {solution.objective_value} 1/hr')


Optimal growth rate: 0.9823718127269705 1/hr

Optimal growth rate: 0.49154514798795895 1/hr


Now that we've covered some of the basics of COBRApy, we can get started with some take-home practice questions.

## Take-Home Practice

#### Part 1: Hybrid Objective for Optimization

In this notebook, we've explored what flux-balance analysis is, and how we can use COBRApy to load GSMMs from `sbml` files and optimize for a certain objective function.

In most applications of flux-balance analysis, it is valuable to read the paper of the GSMMs model release as it contains important information relevant to our purpose.

In the future

### Supplementary Material

You can check out last year's [GitHub repo](https://github.com/FarukEfe/McMasterU-iGEM-2025-Modelling) for flux-balance analysis. It can give you a good idea of how an fba project can be structured.

As previously mentioned, feel free to check out the [COBRApy documentation](https://cobrapy.readthedocs.io/en/latest/) on the use of the library.

If you're curious, you can also check out [our software repository](https://gitlab.igem.org/2025/software-tools/mcmasteru) from iGEM 2025, where the flux-balance analysis code can be found under the `fba-codebase` folder.