# Genome-scale metabolic models

## Preparation

In [37]:
from cameo import load_model

model = load_model('data/iJO1366.xml.gz')

## Model content

### Metabolites

The model contains a list of metabolites. Here are the first ten.

In [2]:
model.metabolites[0:10]

[<Metabolite 10fthf_c at 0x11b219438>,
 <Metabolite 12dgr120_c at 0x11b0c7a90>,
 <Metabolite 12dgr140_c at 0x11b0c7d30>,
 <Metabolite 12dgr141_c at 0x11b0c7fd0>,
 <Metabolite 12dgr160_c at 0x11b0ca2b0>,
 <Metabolite 12dgr161_c at 0x11b0ca550>,
 <Metabolite 12dgr180_c at 0x11b0ca748>,
 <Metabolite 12dgr181_c at 0x11b0caa90>,
 <Metabolite 12ppd__R_c at 0x11adfa550>,
 <Metabolite 12ppd__S_c at 0x11adfa780>]

There are 1805 metabolites in the model.

In [3]:
len(model.metabolites)

1805

One can access a specific metabolite using dot notation.

In [4]:
model.metabolites.g3p_c

0,1
Id,g3p_c
Name,Glyceraldehyde 3-phosphate
Formula,C3H5O6P


<div class="alert alert-warning">

**Warning:** One cannot use dot notation to access metabolites, reactions, or genes if their identifiers do not resemble proper Python variable names.

</div>

In [7]:
model.metabolites.10fthf_c

SyntaxError: invalid syntax (<ipython-input-7-2772b042e9ed>, line 1)

<div class="alert alert-success">

**Solution:** Use the method `get_by_id` instead!

</div>

In [8]:
model.metabolites.get_by_id('10fthf_c')

0,1
Id,10fthf_c
Name,10-Formyltetrahydrofolate
Formula,C20H21N7O7


Metabolites are associated with compartments in the cell. Glyceraldehyde 3-phosphate (`g3p_c`) is associated with the `c` (Cytosol) compartment.

In [9]:
model.metabolites.g3p_c.compartment

'c'

The _E. coli_ model has three compartments.

In [10]:
model.compartments

{'c': 'cytosol', 'e': 'extracellular space', 'p': 'periplasm'}

Some metabolites (like Glucose for example) can be associated with multiple compartments.

In [11]:
model.metabolites.glc__D_c.compartment

'c'

In [12]:
model.metabolites.glc__D_p.compartment

'p'

The full name of the metabolite is available via the `.name` attribute. 

In [13]:
model.metabolites.glc__D_c.name

'D-Glucose'

One can look up the molecular formula of glucose.

In [14]:
model.metabolites.g3p_c.formula

'C3H5O6P'

The `.elements` attribute returns a dictionary representation of the formula.

In [15]:
model.metabolites.g3p_c.elements

{'C': 3, 'H': 5, 'O': 6, 'P': 1}

Furthermore, one can look up the molecular weight of a metabolite.

In [16]:
model.metabolites.g3p_c.formula_weight

168.041961

One can gather additional information (like references to external datbases) about the metabolite through the annotation attribute.

In [17]:
model.metabolites.g3p_c.annotation

{'SBO': 'SBO:0000247',
 'bigg.metabolite': 'g3p',
 'biocyc': 'META:GAP',
 'chebi': ['CHEBI:12983',
  'CHEBI:12984',
  'CHEBI:14333',
  'CHEBI:17138',
  'CHEBI:181',
  'CHEBI:18324',
  'CHEBI:21026',
  'CHEBI:29052',
  'CHEBI:5446',
  'CHEBI:58027',
  'CHEBI:59776'],
 'hmdb': 'HMDB01112',
 'kegg.compound': ['C00118', 'C00661'],
 'metanetx.chemical': 'MNXM2378',
 'reactome': '29578',
 'seed.compound': 'cpd00102',
 'unipathway.compound': ['UPC00118', 'UPC00661']}

One can use these annotations to look up the compound on [KEGG](http://www.genome.jp/dbget-bin/www_bget?cpd:C00118) for example.

Metabolites are not isolated things. They participate in reactions as substrates and products.

In [18]:
model.metabolites.g3p_c.reactions

frozenset({<Reaction TRPS3 at 0x11b20b828>,
           <Reaction DRPA at 0x11f006630>,
           <Reaction F6PA at 0x11fb870b8>,
           <Reaction GAPD at 0x11fba72b0>,
           <Reaction TRPS1 at 0x11b20b4e0>,
           <Reaction TPI at 0x11b208518>,
           <Reaction EDA at 0x11fb8e320>,
           <Reaction TGBPA at 0x11b1ff128>,
           <Reaction TKT2 at 0x11b204d68>,
           <Reaction DDPGALA at 0x11f367978>,
           <Reaction FBA at 0x11fb95198>,
           <Reaction DXPS at 0x11ef95390>,
           <Reaction TALA at 0x11b1fb3c8>,
           <Reaction TKT1 at 0x11b204be0>})

### Reactions

The model contains a list of reactions. Here are the first 10 of them.

In [19]:
model.reactions[0:10]

[<Reaction DM_4crsol_c at 0x1118cf4e0>,
 <Reaction DM_5drib_c at 0x1118cfa90>,
 <Reaction DM_aacald_c at 0x11f0dbc50>,
 <Reaction DM_amob_c at 0x11fb64e10>,
 <Reaction DM_mththf_c at 0x11fb64b70>,
 <Reaction DM_oxam_c at 0x11fb64e48>,
 <Reaction BIOMASS_Ec_iJO1366_WT_53p95M at 0x11fb64d30>,
 <Reaction BIOMASS_Ec_iJO1366_core_53p95M at 0x11fb52588>,
 <Reaction EX_12ppd__R_e at 0x11fb3fd68>,
 <Reaction EX_12ppd__S_e at 0x11fb3fd30>]

There are 2583 reactions in the model.

In [20]:
len(model.reactions)

2583

Let's take a closer look at the reactions associated with Glyceraldehyde 3-phosphate (`g3p`).

In [21]:
for reaction in model.metabolites.g3p_c.reactions:
    print(reaction.id, reaction, reaction.name)

TRPS3 TRPS3: 3ig3p_c --> g3p_c + indole_c Tryptophan synthase (indoleglycerol phosphate)
DRPA DRPA: 2dr5p_c --> acald_c + g3p_c Deoxyribose-phosphate aldolase
F6PA F6PA: f6p_c <=> dha_c + g3p_c Fructose 6-phosphate aldolase
GAPD GAPD: g3p_c + nad_c + pi_c <=> 13dpg_c + h_c + nadh_c Glyceraldehyde-3-phosphate dehydrogenase
TRPS1 TRPS1: 3ig3p_c + ser__L_c --> g3p_c + h2o_c + trp__L_c Tryptophan synthase (indoleglycerol phosphate)
TPI TPI: dhap_c <=> g3p_c Triose-phosphate isomerase
EDA EDA: 2ddg6p_c --> g3p_c + pyr_c 2-dehydro-3-deoxy-phosphogluconate aldolase
TGBPA TGBPA: tagdp__D_c <=> dhap_c + g3p_c Tagatose-bisphosphate aldolase
TKT2 TKT2: e4p_c + xu5p__D_c <=> f6p_c + g3p_c Transketolase
DDPGALA DDPGALA: 2dh3dgal6p_c <=> g3p_c + pyr_c 2-dehydro-3-deoxy-6-phosphogalactonate aldolase
FBA FBA: fdp_c <=> dhap_c + g3p_c Fructose-bisphosphate aldolase
DXPS DXPS: g3p_c + h_c + pyr_c --> co2_c + dxyl5p_c 1-deoxy-D-xylulose 5-phosphate synthase
TALA TALA: g3p_c + s7p_c <=> e4p_c + f6p_c Tran

The second reaction in this list is Glyceraldehyde-3-phosphate dehydrogenase (GAPD).

In [22]:
model.reactions.GAPD.name

'Glyceraldehyde-3-phosphate dehydrogenase'

## Objective

In [23]:
print(model.objective)

Maximize
-1.0*BIOMASS_Ec_iJO1366_core_53p95M_reverse_5c8b1 + 1.0*BIOMASS_Ec_iJO1366_core_53p95M


## ATP maintenance requirement

In [24]:
model.reactions.ATPM

0,1
Id,ATPM
Name,ATP maintenance requirement
Stoichiometry,atp_c + h2o_c --> adp_c + h_c + pi_c
GPR,
Lower bound,3.150000
Upper bound,1000.000000


## The math (scary!)

In [25]:
print(model.solver.__str__()[0:1550])

\* Problem: Unknown *\

Maximize
 obj: + BIOMASS_Ec_iJO1366_core_53p95M
 - BIOMASS_Ec_iJO1366_core_53p95M_reverse_5c8b1

Subject To
 r_1: + FTHFLi + FTHFD_reverse_44321 + ULA4NFT_reverse_07217
 - 0.000223 BIOMASS_Ec_iJO1366_WT_53p95M - AICART
 + FMETTRS_reverse_3b6c6 + MTHFC - FTHFD + AICART_reverse_b7b59
 - FMETTRS - ULA4NFT - FTHFLi_reverse_a6dc7 - GARFT
 + 0.000223 BIOMASS_Ec_iJO1366_WT_53p95M_reverse_06c4a
 - MTHFC_reverse_f6fcc - 0.000223 BIOMASS_Ec_iJO1366_core_53p95M
 + 0.000223 BIOMASS_Ec_iJO1366_core_53p95M_reverse_5c8b1
 + GARFT_reverse_7ecb6 = 0
 r_2: - PAPA120_reverse_75d70 + DAGK120_reverse_7cd00 + PAPA120 - x_666
 - DAGK120 + x_665 = 0
 r_3: + PAPA140 - DAGK140 + x_667 + DAGK140_reverse_87f8f - x_668
 - PAPA140_reverse_255f5 = 0
 r_4: - DAGK141 + PAPA141 + DAGK141_reverse_f6e5f - x_670 + x_669
 - PAPA141_reverse_945ab = 0
 r_5: - PAPA160_reverse_c64df - x_672 + PAPA160 + DAGK160_reverse_0238d
 + x_671 - DAGK160 = 0
 r_6: - x_674 + PAPA161 - DAGK161 - PAPA161_reverse_1bc33

## Gene-Protein-Reaction associations

Glyceraldehyde-3-phosphate dehydrogenase is associated with a single gene. 

In [26]:
model.reactions.GAPD.gene_reaction_rule

'b1779'

Phosphofructokinase (PFK) on the other hand seems to be associated with to isozymes.

In [27]:
model.reactions.PFK.gene_reaction_rule

'b3916 or b1723'

One can display the gene names (typical 4 letter gene codes) instead of the identifiers (Blattner numbers in this case).

In [28]:
model.reactions.PFK.gene_name_reaction_rule

'pfkA or pfkB'

Here a very complicated gene to reaction mapping (ATP synthase).

In [29]:
model.reactions.ATPS4rpp.gene_name_reaction_rule

'(atpA and atpD and atpC and atpG and atpH and atpE and atpF and atpB and atpI) or (atpB and atpF and atpE and atpH and atpG and atpC and atpD and atpA)'

## Finding things

One can use `.query('search term', 'attribute_to_search_in')` to search in model metabolites, reactions, and genes. For example, one can search metabolites that contain the term _glucose_ in their name.

In [30]:
for metabolite in model.metabolites.query('glucose', 'name'):
    print(metabolite.name)

6-Acetyl-D-glucose
ADPglucose
DTDP-4-dehydro-6-deoxy-D-glucose
DTDPglucose
UDPglucose
UDPglucose
UDPglucose


## The stoichiometric matrix S

In [36]:
S = model.copy().to_array_based_model().S.todense()
S

matrix([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        ..., 
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ..., -1., -1.,  1.]])

This is how the the stoichiometry matrix S looks like when visualized as a matrix plot.

In [33]:
import matplotlib.pyplot as plt
plt.spy(S, precision=0.01, markersize=.1)

<matplotlib.lines.Line2D at 0x11d96b748>

## Simulating models

In [38]:
model.solve()

Unnamed: 0,fluxes,reduced_costs
DM_4crsol_c,0.000219,0.000000
DM_5drib_c,0.000221,0.000000
DM_aacald_c,0.000000,0.000000
DM_amob_c,0.000002,0.000000
...,...,...
ZN2t3pp,0.000000,-0.002074
ZN2tpp,0.000335,0.000000
ZNabcpp,0.000000,-0.008295
Zn2tex,0.000335,0.000000


Somtimes a solution cannot be found. For example, setting the lower bound of the objective funtion to a very high value that the model cannot achieve will raise an exception when trying to solve the model.

In [39]:
infeasible_model = model.copy()
infeasible_model.reactions.BIOMASS_Ec_iJO1366_core_53p95M.lower_bound = 100000
infeasible_model.solve()

Infeasible: Solving model iJO1366 did not return an optimal solution. The returned solution status is "infeasible"

One can use `try` and `except` statements to handle those cases.

In [40]:
from cameo.exceptions import Infeasible

In [41]:
try:
    infeasible_model.solve()
except Infeasible:
    print('Oh no! The model is infeasible!')

Oh no! The model is infeasible!
