# ecModeling for biotin production

This notebook contains an analysis of the ecModel of _E. coli_ for biotin production.

Benjamín J. Sánchez, 2019-10-29

## 1. Loading & pre-processing models

We will use:

* The batch version of `eciML1515`, i.e. without proteomics and letting the model choose the required enzyme amounts (from a shared pool).
* The original `iML1515` (as reference).

The models are available at:

* iML1515: http://bigg.ucsd.edu/models/iML1515
* eciML1515_batch: https://github.com/SysBioChalmers/ecModels/blob/chore/updateiML1515/eciML1515/model

(temporal, eventually the ecModel will be available in the master branch)

In [1]:
# Metabolic model:
import cameo
met_model = cameo.load_model("iML1515")

# ecModel:
import cobra
! [ ! -f "eciML1515_batch.xml" ] && curl -O -L "https://raw.githubusercontent.com/SysBioChalmers/ecModels/chore/updateiML1515/eciML1515/model/eciML1515_batch.xml"
ec_model = cobra.io.read_sbml_model("eciML1515_batch.xml")



Model modifications for biotin production:

In [2]:
import math

# Replacing any infinite bound by 1000 in the ecModel:
for reaction in ec_model.reactions:
    if math.isinf(reaction.upper_bound):
        reaction.upper_bound = 1000

# Remove repetition in ecModel met ids:
for metabolite in ec_model.metabolites:
    metabolite.id = metabolite.id.split("[")[0]

# Close biotin uptake:
met_model.reactions.EX_btn_e.lower_bound = 0
ec_model.reactions.EX_btn_e_REV.upper_bound = 0

#Open biotin transport between periplasm and cytoplasm
met_model.reactions.BTNt2ipp.lower_bound = -1000
ec_model.reactions.BTNt2ipp.lower_bound = -1000
# TODO: insert "_REV" reaction instead for ecModel (for consistency with rest of the model)

## 2. Simulating the model

For the ecModel we will use normal FBA followed by pFBA minimizing glucose:

In [3]:
def simulate_ec_model(model, rxn_id):
    # First optimization: maximize input
    model.objective = rxn_id
    solution = model.optimize()
    
    # Second optimization: minimize glucose
    model.reactions.get_by_id(rxn_id).lower_bound = solution.fluxes[rxn_id]
    glc_rxn = model.reactions.get_by_id("EX_glc__D_e_REV")
    model.objective = {glc_rxn: -1}
    solution = cobra.flux_analysis.pfba(model)
    
    return solution

def print_fluxes(model, solution, bio_rxn_id):
    # Show growth:
    print("growth: " + str(solution.fluxes[bio_rxn_id]))
    
    # Show exchange reactions:
    for rxn in model.reactions:
        if len(rxn.metabolites) == 1 and solution.fluxes[rxn.id] != 0:
            print(rxn.id + ": " + str(solution.fluxes[rxn.id]))
            

### 2.1 Wild type conditions:

100% of the carbon going towards biomass.

In [4]:
ec_model.objective.expression.args[0]

1.0*BIOMASS_Ec_iML1515_core_75p37M

In [5]:
# Metabolic model:
biomass_rxn_id = "BIOMASS_Ec_iML1515_core_75p37M"
met_sol_biomass = cobra.flux_analysis.pfba(met_model)
print_fluxes(met_model, met_sol_biomass, biomass_rxn_id)

growth: 0.8769972144269648
EX_pi_e: -0.8459567750195919
EX_co2_e: 24.00329327297595
EX_h_e: 8.058200328043592
EX_mn2_e: -0.0006060050751690326
EX_fe2_e: -0.014085452260911482
EX_glc__D_e: -10.0
EX_zn2_e: -0.00029905605011959496
DM_amob_c: 1.7539944288539295e-06
EX_mg2_e: -0.00760795083515392
EX_ca2_e: -0.004564770501092352
EX_ni2_e: -0.0002832701002599096
EX_meoh_e: 1.7539944288539295e-06
EX_cu2_e: -0.000621791025028718
EX_cobalt2_e: -2.192493036067412e-05
EX_h2o_e: 47.16236480869456
EX_mobd_e: -6.138980500988753e-06
EX_so4_e: -0.22084543853699787
EX_nh4_e: -9.471495371048116
EX_k_e: -0.17118371727564255
EX_cl_e: -0.004564770501092352
DM_5drib_c: 0.00019732437324606708
EX_o2_e: -22.13176323894597
DM_4crsol_c: 0.00019557037881721314


In [6]:
# ecModel:
ec_sol_biomass = simulate_ec_model(ec_model, biomass_rxn_id)
print_fluxes(ec_model, ec_sol_biomass, biomass_rxn_id)

growth: 0.5743255542929773
EX_h_e: 134.23049520072192
EX_co2_e: 0.44381983561439686
EX_pyr_e: 128.95364197420142
EX_glyclt_e: 0.0002561491972145913
DM_amob_c: 1.1486511085859545e-06
EX_meoh_e: 1.1486511085859545e-06
EX_h2o_e: 143.6072291162812
EX_etoh_e: 0.9549506261917605
DM_5drib_c: 0.00025729784832325385
DM_mththf_c: 0.00025729784832325385
EX_pi_e_REV: 0.5539978756242818
EX_mn2_e_REV: 0.0003968589580164473
EX_fe2_e_REV: 0.009224242727499507
EX_glc__D_e_REV: 68.79801305099797
EX_thm_e_REV: 0.00012807459860733394
EX_zn2_e_REV: 0.00019584501401390524
EX_mg2_e_REV: 0.004982274183491578
EX_ca2_e_REV: 0.0029893645100949467
EX_ni2_e_REV: 0.00018550715403663166
EX_cobalt2_e_REV: 1.4358138857324433e-05
EX_nh4_e_REV: 6.202154870297497
EX_mobd_e_REV: 4.0202788800508405e-06
EX_so4_e_REV: 0.14449858648345054
EX_k_e_REV: 0.11210432791910911
EX_o2_e_REV: 62.74068178758716
EX_cu_e_REV: 0.0004071968179937209
EX_cl_e_REV: 0.0029893645100949467
prot_pool_exchange: 0.0909113178074544


### 2.2 Increased biotin conditions:

A 20% of carbon will be shifted towards biotin.

In [7]:
# Metabolic model:
biotin_rxn_id = "EX_btn_e"
met_model.reactions.get_by_id(biomass_rxn_id).lower_bound = 0.8 * met_sol_biomass.fluxes[biomass_rxn_id]
met_model.objective = biotin_rxn_id
met_sol_biotin = cobra.flux_analysis.pfba(met_model)
print_fluxes(met_model, met_sol_biotin, biomass_rxn_id)

growth: 0.7015977715415719
EX_pi_e: -0.6767654200156121
EX_co2_e: 23.030377374564207
EX_h_e: 7.501045068088627
EX_mn2_e: -0.00048480406013522615
EX_btn_e: 0.2636212014134408
EX_fe2_e: -0.011268361808729185
EX_glc__D_e: -10.0
EX_zn2_e: -0.000239244840095676
DM_amob_c: 0.2636226046089839
EX_mg2_e: -0.0060863606681231365
EX_ca2_e: -0.003651816400873882
EX_ni2_e: -0.00022661608020792772
EX_meoh_e: 0.2636226046089839
EX_cu2_e: -0.0004974328200229745
EX_cobalt2_e: -1.75399442885393e-05
EX_hacolipa_e: 5.958036727449863e-31
EX_h2o_e: 46.56643742999425
EX_mobd_e: -4.911184400791003e-06
EX_so4_e: -0.7039187536564803
EX_nh4_e: -9.422544706732523
EX_k_e: -0.13694697382051405
EX_cl_e: -0.003651816400873882
DM_5drib_c: 0.2637790609120377
EX_o2_e: -20.742289743099874
DM_4crsol_c: 0.00015645630305377052


In [8]:
# ecModel:
ec_model.reactions.get_by_id(biomass_rxn_id).lower_bound = 0.8 * ec_sol_biomass.fluxes[biomass_rxn_id]
ec_sol_biotin = simulate_ec_model(ec_model, biotin_rxn_id)
print_fluxes(ec_model, ec_sol_biotin, biomass_rxn_id)

growth: 0.45946044343438186
EX_h_e: 110.46048967943926
EX_co2_e: 0.3254633767493096
EX_pyr_e: 106.22585487967065
EX_btn_e: 0.0032880546380194774
EX_glyclt_e: 0.00020491935777169524
DM_amob_c: 0.0032889735589063463
EX_meoh_e: 0.0032889735589063463
EX_h2o_e: 117.97174097580061
EX_etoh_e: 0.7738246648674736
DM_5drib_c: 0.0034938929166780804
DM_mththf_c: 0.0034938929166780804
EX_pi_e_REV: 0.44319830049939624
EX_mn2_e_REV: 0.00031748716641315786
EX_fe2_e_REV: 0.0073793941819996055
EX_glc__D_e_REV: 56.58796539146224
EX_thm_e_REV: 0.00010245967888586715
EX_zn2_e_REV: 0.00015667601121112422
EX_mg2_e_REV: 0.003985819346793263
EX_ca2_e_REV: 0.0023914916080759576
EX_ni2_e_REV: 0.00014840572322930533
EX_cobalt2_e_REV: 1.1486511085859547e-05
EX_nh4_e_REV: 4.984740278704186
EX_mobd_e_REV: 3.216223104040673e-06
EX_so4_e_REV: 0.12217497846279869
EX_k_e_REV: 0.0896834623352873
EX_o2_e_REV: 51.67469526065419
EX_cu_e_REV: 0.00032575745439497676
EX_cl_e_REV: 0.0023914916080759576
prot_pool_exchange: 0.090

### 3. Comparing enzyme usage

First let's build a dataframe with all enzyme usages under both conditions. We are looking for this at the `draw_prot_XXXXXX` rxns, i.e. they are in units of `mmol/gDW`.

In [9]:
import pandas as pd

usage_df = pd.DataFrame()
for reaction in ec_model.reactions:
    if "draw_prot_" in reaction.id:
        gene_id = reaction.gene_reaction_rule
        new_line = pd.DataFrame(index = [reaction.id[-6:]],
                                data = {"gene":[met_model.genes.get_by_id(gene_id).name],
                                        "biomass_usage":[ec_sol_biomass.fluxes[reaction.id]],
                                        "biotin_usage":[ec_sol_biotin.fluxes[reaction.id]]})
        usage_df = usage_df.append(new_line)

print(usage_df)

        gene  biomass_usage  biotin_usage
O32583  thiS   0.000000e+00  0.000000e+00
P00350   gnd   0.000000e+00  0.000000e+00
P00363  frdA   0.000000e+00  0.000000e+00
P00370  gdhA   4.731316e-08  3.804181e-08
P00393   ndh   0.000000e+00  0.000000e+00
...      ...            ...           ...
Q59385  copA   0.000000e+00  0.000000e+00
Q6BEX0  ytfR   0.000000e+00  0.000000e+00
Q6BF16  dgoA   0.000000e+00  0.000000e+00
Q6BF17  dgoD   0.000000e+00  0.000000e+00
Q93K97  nudF   0.000000e+00  0.000000e+00

[1259 rows x 3 columns]


Let's make sure all values are positive:

In [10]:
usage_df = usage_df.sort_values(by=['biomass_usage'])
print(usage_df.head(n=5))
usage_df = usage_df.sort_values(by=['biotin_usage'])
print(usage_df.head(n=5))

        gene  biomass_usage  biotin_usage
O32583  thiS            0.0           0.0
P32674  pflD            0.0           0.0
P32669  fsaB            0.0           0.0
P32664  nudC            0.0           0.0
P32662   gph            0.0           0.0
        gene  biomass_usage  biotin_usage
P06999  pfkB   4.740742e-16 -5.645324e-16
O32583  thiS   0.000000e+00  0.000000e+00
P0AE12   amn   0.000000e+00  0.000000e+00
P0AE22  aphA   0.000000e+00  0.000000e+00
P09158  speE   0.000000e+00  0.000000e+00


One value was slightly negative, due to solver approximations (probably). Let's change that:

In [11]:
usage_df.loc[usage_df["biotin_usage"] < 0, "biotin_usage"] = 0
print(usage_df.head(n=5))

        gene  biomass_usage  biotin_usage
P06999  pfkB   4.740742e-16           0.0
O32583  thiS   0.000000e+00           0.0
P0AE12   amn   0.000000e+00           0.0
P0AE22  aphA   0.000000e+00           0.0
P09158  speE   0.000000e+00           0.0


There are a lot of rows with zero usage under both conditions, so let's filter them out:

In [12]:
usage_df = usage_df.query("biomass_usage > 0 or biotin_usage > 0")
print(usage_df)

        gene  biomass_usage  biotin_usage
P06999  pfkB   4.740742e-16  0.000000e+00
P76536  yfeX   1.368349e-12  1.094679e-12
P09833  modC   3.918405e-12  3.134724e-12
P37329  modA   3.918405e-12  3.134724e-12
P0AF01  modB   3.918405e-12  3.134724e-12
...      ...            ...           ...
P0AC62  grxC   1.177509e-04  9.965358e-05
P0A953  fabB   1.551985e-04  1.355756e-04
P21151  fadA   2.255829e-04  1.804663e-04
P0A9B2  gapA   2.695114e-04  2.217190e-04
P13000  bioD   2.279039e-07  6.525652e-04

[300 rows x 3 columns]


Now let's compute usage changes. We will look at both absolute changes (the difference between both conditions) and relative changes (the fold change or ratio between them).

In [13]:
usage_df["abs_changes"] = usage_df["biotin_usage"] - usage_df["biomass_usage"]
usage_df["rel_changes"] = usage_df["biotin_usage"] / usage_df["biomass_usage"]
print(usage_df)

        gene  biomass_usage  biotin_usage   abs_changes  rel_changes
P06999  pfkB   4.740742e-16  0.000000e+00 -4.740742e-16     0.000000
P76536  yfeX   1.368349e-12  1.094679e-12 -2.736698e-13     0.800000
P09833  modC   3.918405e-12  3.134724e-12 -7.836810e-13     0.800000
P37329  modA   3.918405e-12  3.134724e-12 -7.836810e-13     0.800000
P0AF01  modB   3.918405e-12  3.134724e-12 -7.836810e-13     0.800000
...      ...            ...           ...           ...          ...
P0AC62  grxC   1.177509e-04  9.965358e-05 -1.809734e-05     0.846308
P0A953  fabB   1.551985e-04  1.355756e-04 -1.962292e-05     0.873562
P21151  fadA   2.255829e-04  1.804663e-04 -4.511658e-05     0.800000
P0A9B2  gapA   2.695114e-04  2.217190e-04 -4.779232e-05     0.822670
P13000  bioD   2.279039e-07  6.525652e-04  6.523373e-04  2863.335554

[300 rows x 5 columns]


We can now sort and take a look at the top 10 of enzymes that:

* Increased their absolute usage the most:

In [14]:
usage_df = usage_df.sort_values(by=['abs_changes'])
print(usage_df.tail(n=10).iloc[::-1])

        gene  biomass_usage  biotin_usage   abs_changes  rel_changes
P13000  bioD   2.279039e-07  6.525652e-04  6.523373e-04  2863.335554
P12998  bioF   5.317795e-09  1.522663e-05  1.522131e-05  2863.335554
P45578  luxS   1.786779e-07  2.426299e-06  2.247621e-06    13.579177
P0A817  metK   1.192288e-07  1.922128e-06  1.802899e-06    16.121332
P12995  bioA   3.988346e-10  1.141997e-06  1.141599e-06  2863.335554
P76194  sufE   2.112517e-07  1.330013e-06  1.118762e-06     6.295871
P77444  sufS   2.112517e-07  1.330013e-06  1.118762e-06     6.295871
P0AF12   mtn   4.535749e-08  6.159174e-07  5.705599e-07    13.579177
P0AEK4  fabI   5.362593e-11  1.535490e-07  1.534954e-07  2863.335554
P0A6R0  fabH   4.991923e-11  1.429355e-07  1.428856e-07  2863.335554


* Decreased their absolute usage the most:

In [15]:
print(usage_df.head(n=10))

        gene  biomass_usage  biotin_usage  abs_changes  rel_changes
P0A9B2  gapA       0.000270      0.000222    -0.000048     0.822670
P21151  fadA       0.000226      0.000180    -0.000045     0.800000
P0A953  fabB       0.000155      0.000136    -0.000020     0.873562
P0AC62  grxC       0.000118      0.000100    -0.000018     0.846308
P17854  cysH       0.000118      0.000100    -0.000018     0.846308
P00934  thrC       0.000088      0.000071    -0.000017     0.807782
P0A9C5  glnA       0.000083      0.000067    -0.000016     0.807312
P0A6E4  argG       0.000069      0.000055    -0.000014     0.800000
P36683  acnB       0.000065      0.000052    -0.000013     0.800000
P0A6C5  argA       0.000060      0.000048    -0.000012     0.800000


* Increased their relative usage the most:

In [16]:
usage_df = usage_df.sort_values(by=['rel_changes'])
print(usage_df.tail(n=10).iloc[::-1])

        gene  biomass_usage  biotin_usage   abs_changes  rel_changes
P0A6Q6  fabZ   3.303061e-13  9.457772e-10  9.454469e-10  2863.335554
P0AEK4  fabI   5.362593e-11  1.535490e-07  1.534954e-07  2863.335554
P12998  bioF   5.317795e-09  1.522663e-05  1.522131e-05  2863.335554
P12999  bioC   1.844389e-12  5.281105e-09  5.279260e-09  2863.335554
P13001  bioH   1.724700e-11  4.938394e-08  4.936669e-08  2863.335554
P12996  bioB   2.295464e-11  6.572685e-08  6.570389e-08  2863.335554
P0A6R0  fabH   4.991923e-11  1.429355e-07  1.428856e-07  2863.335554
P13000  bioD   2.279039e-07  6.525652e-04  6.523373e-04  2863.335554
P12995  bioA   3.988346e-10  1.141997e-06  1.141599e-06  2863.335554
P0A817  metK   1.192288e-07  1.922128e-06  1.802899e-06    16.121332


* Decreased their relative usage the most:

In [17]:
print(usage_df.head(n=10))

        gene  biomass_usage  biotin_usage   abs_changes  rel_changes
P06999  pfkB   4.740742e-16  0.000000e+00 -4.740742e-16     0.000000
P07003  poxB   2.359266e-06  1.871006e-06 -4.882603e-07     0.793046
P77218  eutD   1.173126e-06  9.363002e-07 -2.368261e-07     0.798124
P0A6A3  ackA   3.245753e-07  2.590514e-07 -6.552398e-08     0.798124
P0A790  panD   1.654252e-07  1.323402e-07 -3.308505e-08     0.800000
P0A6I6  coaD   2.215657e-08  1.772525e-08 -4.431314e-09     0.800000
P0A9J4  panE   4.301017e-09  3.440814e-09 -8.602035e-10     0.800000
P0A6I9  coaE   2.235476e-06  1.788381e-06 -4.470952e-07     0.800000
P0ABQ0   dfp   1.592685e-07  1.274148e-07 -3.185370e-08     0.800000
P31663  panC   7.680360e-08  6.144288e-08 -1.536072e-08     0.800000


### 4. Blocking the Fe-S assembly machinery

For creatig biotin 1 sulfur is required, which is taken from a 2Fe-2S cluster.

In [18]:
print(met_model.reactions.BTS5)

BTS5: 2fe2s_c + amet_c + dtbt_c --> 2fe1s_c + btn_c + dad_5_c + h_c + met__L_c


This cluster therefore has to be regenerated, which can be achieved using either the iron-sulfur cluster (ISC) or sulphur assimilation (SUF) machineries.

##### ISC:

In [19]:
print(met_model.reactions.I2FE2SR)
print(met_model.reactions.I2FE2ST)
print(met_model.reactions.ICYSDS)

I2FE2SR: 2fe1s_c + iscssh_c + iscu_c --> 4.0 h_c + iscs_c + iscu_2fe2s_c
I2FE2ST: 4.0 h_c + iscu_2fe2s_c --> 2fe2s_c + iscu_c
ICYSDS: cys__L_c + iscs_c --> ala__L_c + iscssh_c


Note that the net reaction of adding up all 3 is:

```
NET RXN: 2fe1s_c + cys__L_c --> 2fe2s_c + ala__L_c
```

##### SUF:

In [20]:
print(met_model.reactions.S2FE2SR)
print(met_model.reactions.S2FE2ST)
print(met_model.reactions.SCYSDS)

S2FE2SR: 2fe1s_c + atp_c + h2o_c + sufbcd_c + sufsesh_c --> adp_c + 5.0 h_c + pi_c + sufbcd_2fe2s_c + sufse_c
S2FE2ST: 4.0 h_c + sufbcd_2fe2s_c --> 2fe2s_c + sufbcd_c
SCYSDS: cys__L_c + sufse_c --> ala__L_c + sufsesh_c


The net reaction of adding up these 3 instead is:

```
NET RXN: 2fe1s_c + atp_c + h2o_c + cys__L_c --> adp_c + h_c + pi_c + 2fe2s_c + ala__L_c
```

In both cases, production of biotin is not associated to any cost regarding the assembly machineries. This might be an ok assumption for normal conditions (as biotin production is normally not very significant compared to other costs in the cell), but as we are studying biotin over-production conditions, production of this machinery could be relevant. However, as both machineries are protein complexes, accounting for their costs is out of the scope of our modeling approach, as GEMs do not have explicit representations of protein production.

We can confirm that these machineries cannot be produced by modifying reactions `I2FE2ST` and `S2FE2ST` to consume the complexes:

In [21]:
# Metabolic model:
met_model.reactions.I2FE2ST.subtract_metabolites({met_model.metabolites.iscu_c: +1})
met_model.reactions.S2FE2ST.subtract_metabolites({met_model.metabolites.sufbcd_c: +1})
print(met_model.reactions.I2FE2ST)
print(met_model.reactions.S2FE2ST)

I2FE2ST: 4.0 h_c + iscu_2fe2s_c --> 2fe2s_c
S2FE2ST: 4.0 h_c + sufbcd_2fe2s_c --> 2fe2s_c


Let's see if the model can grow now:

In [22]:
# WT conditions - metabolic model:
met_model.objective = biomass_rxn_id
met_model.reactions.get_by_id(biomass_rxn_id).lower_bound = 0
met_model.reactions.get_by_id(biotin_rxn_id).lower_bound = 0
met_sol_biomass = cobra.flux_analysis.pfba(met_model)
print_fluxes(met_model, met_sol_biomass, biomass_rxn_id)

growth: 0.0
EX_co2_e: 1.7514893617021292
EX_ac_e: -1.8332837034655592e-15
EX_tyr__L_e: 2.1633015706022764e-15
EX_glc__D_e: -0.2919148936170214
EX_nac_e: 5.720157415115007e-17
EX_ala_B_e: -1.1102230246251578e-16
EX_fad_e: -9.244463733058732e-33
EX_h2o_e: 1.7514893617021274
EX_nh4_e: -4.13888245527365e-16
EX_tyrp_e: -2.1633015706022764e-15
EX_o2_e: -1.7514893617021288
EX_cys__D_e: -1.1102230246251583e-16


The model cannot grow, due to the fact that both ISC and SUF cannot be produced by the model. This can also be seen by trying to optimize production of either species in the original model:

In [23]:
def test_production(model, met_id):
    # Add exchange reaction:
    ex_rxn_id = "EX_" + met_id
    ex_rxn = cobra.Reaction(ex_rxn_id)
    ex_rxn.lower_bound = 0
    ex_rxn.upper_bound = +1000
    ex_rxn.add_metabolites({model.metabolites.get_by_id(met_id): -1})
    model.add_reactions([ex_rxn])
    print(model.reactions.get_by_id(ex_rxn_id).reaction)
    
    # Test production:
    model.objective = ex_rxn_id
    solution = cobra.flux_analysis.pfba(model)
    print("Maximum production: " + str(solution.fluxes[ex_rxn_id]))

# Reset model:
met_model = cameo.load_model("iML1515")

# ISC:
test_production(met_model, "iscu_c")

# SUF:
test_production(met_model, "sufbcd_c")

iscu_c --> 
Maximum production: 0.0
sufbcd_c --> 
Maximum production: 0.0


### 5. Model cleanup

In [24]:
import os
os.remove("eciML1515_batch.xml")