# ecModeling for biotin production

This notebook contains an analysis of the ecModel of _E. coli_ for biotin production.

Benjamín J. Sánchez, 2019-10-29

## 1. Loading & pre-processing models

We will use:

* The batch version of `eciML1515`, i.e. without proteomics and letting the model choose the required enzyme amounts (from a shared pool).
* The original `iML1515` (as reference).

The models are available at:

https://github.com/SysBioChalmers/ecModels/blob/chore/updateiML1515/eciML1515/model

(temporal, eventually they will be available in the master branch)

In [1]:
import cobra

! [ ! -f "iML1515.xml" ] && curl -O -L "https://raw.githubusercontent.com/SysBioChalmers/ecModels/chore/updateiML1515/eciML1515/model/iML1515.xml"
! [ ! -f "eciML1515_batch.xml" ] && curl -O -L "https://raw.githubusercontent.com/SysBioChalmers/ecModels/chore/updateiML1515/eciML1515/model/eciML1515_batch.xml"

met_model = cobra.io.read_sbml_model("iML1515.xml")
ec_model = cobra.io.read_sbml_model("eciML1515_batch.xml")





Model modifications for biotin production:

In [2]:
import math

# Replacing any infinite bound by 1000 in the ecModel:
for reaction in ec_model.reactions:
    if math.isinf(reaction.upper_bound):
        reaction.upper_bound = 1000

# Close biotin uptake:
met_model.reactions.EX_btn_e.lower_bound = 0
ec_model.reactions.EX_btn_e_REV.upper_bound = 0

#Open biotin transport between periplasm and cytoplasm
met_model.reactions.BTNt2ipp.lower_bound = -1000
ec_model.reactions.BTNt2ipp.lower_bound = -1000
# TODO: insert "_REV" reaction instead for ecModel (for consistency with rest of the model)

## 2. Simulating the model

For the ecModel we will use normal FBA followed by pFBA minimizing glucose:

In [3]:
def simulate_ec_model(model, rxn_id):
    # First optimization: maximize input
    model.objective = rxn_id
    solution = model.optimize()
    
    # Second optimization: minimize glucose
    model.reactions.get_by_id(rxn_id).lower_bound = solution.fluxes[rxn_id]
    glc_rxn = model.reactions.get_by_id("EX_glc__D_e_REV")
    model.objective = {glc_rxn: -1}
    solution = cobra.flux_analysis.pfba(model)
    
    return solution

def print_fluxes(model, solution, bio_rxn_id):
    # Show growth:
    print("growth: " + str(solution.fluxes[bio_rxn_id]))
    
    # Show exchange reactions:
    for rxn in model.reactions:
        if len(rxn.metabolites) == 1 and solution.fluxes[rxn.id] != 0:
            print(rxn.id + ": " + str(solution.fluxes[rxn.id]))
            

### 2.1 Wild type conditions:

100% of the carbon going towards biomass.

In [4]:
ec_model.objective.expression.args[0]

1.0*BIOMASS_Ec_iML1515_core_75p37M

In [5]:
# Metabolic model:
biomass_rxn_id = "BIOMASS_Ec_iML1515_core_75p37M"
met_sol_biomass = cobra.flux_analysis.pfba(met_model)
print_fluxes(met_model, met_sol_biomass, biomass_rxn_id)

growth: 0.8769972144269816
EX_pi_e: -0.8459567750195397
EX_h_e: 8.058200328043696
EX_mn2_e: -0.0006060050751690443
EX_co2_e: 24.003293272975615
EX_fe2_e: -0.014085452260911752
EX_glc__D_e: -10.0
EX_zn2_e: -0.00029905605011960076
DM_amob_c: 1.7539944288539631e-06
EX_mg2_e: -0.007607950835154066
EX_ca2_e: -0.00456477050109244
EX_ni2_e: -0.0002832701002599151
EX_meoh_e: 1.7539944288539631e-06
EX_cu2_e: -0.00062179102502873
EX_cobalt2_e: -2.1924930360674543e-05
EX_h2o_e: 47.16236480869426
EX_nh4_e: -9.471495371048157
EX_mobd_e: -6.138980500988871e-06
EX_so4_e: -0.22084543853700256
EX_k_e: -0.17118371727564582
EX_o2_e: -22.131763238945563
EX_cl_e: -0.00456477050109244
DM_5drib_c: 0.00019732437324607087
DM_4crsol_c: 0.0001955703788172169


In [6]:
# ecModel:
ec_sol_biomass = simulate_ec_model(ec_model, biomass_rxn_id)
print_fluxes(ec_model, ec_sol_biomass, biomass_rxn_id)

growth: 0.5743255542929773
EX_h_e: 134.23049520072192
EX_co2_e: 0.44381983561439686
EX_pyr_e: 128.95364197420142
EX_glyclt_e: 0.0002561491972145913
DM_amob_c: 1.1486511085859545e-06
EX_meoh_e: 1.1486511085859545e-06
EX_h2o_e: 143.6072291162812
EX_etoh_e: 0.9549506261917605
DM_5drib_c: 0.00025729784832325385
DM_mththf_c: 0.00025729784832325385
EX_pi_e_REV: 0.5539978756242818
EX_mn2_e_REV: 0.0003968589580164473
EX_fe2_e_REV: 0.009224242727499507
EX_glc__D_e_REV: 68.79801305099797
EX_thm_e_REV: 0.00012807459860733394
EX_zn2_e_REV: 0.00019584501401390524
EX_mg2_e_REV: 0.004982274183491578
EX_ca2_e_REV: 0.0029893645100949467
EX_ni2_e_REV: 0.00018550715403663166
EX_cobalt2_e_REV: 1.4358138857324433e-05
EX_nh4_e_REV: 6.202154870297497
EX_mobd_e_REV: 4.0202788800508405e-06
EX_so4_e_REV: 0.14449858648345054
EX_k_e_REV: 0.11210432791910911
EX_o2_e_REV: 62.74068178758716
EX_cu_e_REV: 0.0004071968179937209
EX_cl_e_REV: 0.0029893645100949467
prot_pool_exchange: 0.0909113178074544


### 2.2 Increased biotin conditions:

A 20% of carbon will be shifted towards biotin.

In [7]:
# Metabolic model:
biotin_rxn_id = "EX_btn_e"
met_model.reactions.get_by_id(biomass_rxn_id).lower_bound = 0.8 * met_sol_biomass.fluxes[biomass_rxn_id]
met_model.objective = biotin_rxn_id
met_sol_biotin = cobra.flux_analysis.pfba(met_model)
print_fluxes(met_model, met_sol_biotin, biomass_rxn_id)

growth: 0.7015977715415853
EX_pi_e: -0.6767654200155779
EX_h_e: 7.501045068088468
EX_mn2_e: -0.00048480406013523547
EX_co2_e: 23.03037737456458
EX_btn_e: 0.2636212014134208
EX_fe2_e: -0.0112683618087294
EX_glc__D_e: -10.0
EX_zn2_e: -0.00023924484009568058
DM_amob_c: 0.2636226046089639
EX_mg2_e: -0.006086360668123253
EX_ca2_e: -0.0036518164008739517
EX_ni2_e: -0.00022661608020793205
EX_meoh_e: 0.2636226046089639
EX_cu2_e: -0.000497432820022984
EX_cobalt2_e: -1.7539944288539634e-05
EX_h2o_e: 46.56643742999425
EX_nh4_e: -9.422544706732372
EX_mobd_e: -4.911184400791097e-06
EX_so4_e: -0.7039187536564432
EX_k_e: -0.13694697382051665
EX_o2_e: -20.74228974310021
EX_cl_e: -0.0036518164008739517
DM_5drib_c: 0.2637790609120177
DM_4crsol_c: 0.00015645630305377353


In [8]:
# ecModel:
ec_model.reactions.get_by_id(biomass_rxn_id).lower_bound = 0.8 * ec_sol_biomass.fluxes[biomass_rxn_id]
ec_sol_biotin = simulate_ec_model(ec_model, biotin_rxn_id)
print_fluxes(ec_model, ec_sol_biotin, biomass_rxn_id)

growth: 0.45946044343438186
EX_h_e: 110.46048967943926
EX_co2_e: 0.3254633767493096
EX_pyr_e: 106.22585487967065
EX_btn_e: 0.0032880546380194774
EX_glyclt_e: 0.00020491935777169524
DM_amob_c: 0.0032889735589063463
EX_meoh_e: 0.0032889735589063463
EX_h2o_e: 117.97174097580061
EX_etoh_e: 0.7738246648674736
DM_5drib_c: 0.0034938929166780804
DM_mththf_c: 0.0034938929166780804
EX_pi_e_REV: 0.44319830049939624
EX_mn2_e_REV: 0.00031748716641315786
EX_fe2_e_REV: 0.0073793941819996055
EX_glc__D_e_REV: 56.58796539146224
EX_thm_e_REV: 0.00010245967888586715
EX_zn2_e_REV: 0.00015667601121112422
EX_mg2_e_REV: 0.003985819346793263
EX_ca2_e_REV: 0.0023914916080759576
EX_ni2_e_REV: 0.00014840572322930533
EX_cobalt2_e_REV: 1.1486511085859547e-05
EX_nh4_e_REV: 4.984740278704186
EX_mobd_e_REV: 3.216223104040673e-06
EX_so4_e_REV: 0.12217497846279869
EX_k_e_REV: 0.0896834623352873
EX_o2_e_REV: 51.67469526065419
EX_cu_e_REV: 0.00032575745439497676
EX_cl_e_REV: 0.0023914916080759576
prot_pool_exchange: 0.090

### 3. Comparing enzyme usage

Let's build a dataframe with all enzyme usages under both conditions. For this, we will look at the `draw_prot_XXXXXX` rxns, which are in units of `mmol/gDW`.

In [9]:
import pandas as pd

usage_df = pd.DataFrame()
for reaction in ec_model.reactions:
    if "draw_prot_" in reaction.id:
        new_line = pd.DataFrame(index = [reaction.id[-6:]],
                                 data = {"gene":[reaction.gene_reaction_rule],
                                         "biomass_usage":[ec_sol_biomass.fluxes[reaction.id]],
                                         "biotin_usage":[ec_sol_biotin.fluxes[reaction.id]]})
        usage_df = usage_df.append(new_line)

print(usage_df)

         gene  biomass_usage  biotin_usage
O32583  b4407   0.000000e+00  0.000000e+00
P00350  b2029   0.000000e+00  0.000000e+00
P00363  b4154   0.000000e+00  0.000000e+00
P00370  b1761   4.731316e-08  3.804181e-08
P00393  b1109   0.000000e+00  0.000000e+00
...       ...            ...           ...
Q59385  b0484   0.000000e+00  0.000000e+00
Q6BEX0  b4485   0.000000e+00  0.000000e+00
Q6BF16  b4477   0.000000e+00  0.000000e+00
Q6BF17  b4478   0.000000e+00  0.000000e+00
Q93K97  b3034   0.000000e+00  0.000000e+00

[1259 rows x 3 columns]


Let's make sure all values are positive:

In [10]:
usage_df = usage_df.sort_values(by=['biomass_usage'])
print(usage_df.head(n=5))
usage_df = usage_df.sort_values(by=['biotin_usage'])
print(usage_df.head(n=5))

         gene  biomass_usage  biotin_usage
O32583  b4407            0.0           0.0
P32674  b3951            0.0           0.0
P32669  b3946            0.0           0.0
P32664  b3996            0.0           0.0
P32662  b3385            0.0           0.0
         gene  biomass_usage  biotin_usage
P06999  b1723   4.740742e-16 -5.645324e-16
O32583  b4407   0.000000e+00  0.000000e+00
P0AE12  b1982   0.000000e+00  0.000000e+00
P0AE22  b4055   0.000000e+00  0.000000e+00
P09158  b0121   0.000000e+00  0.000000e+00


One value was slightly negative, due to solver approximations (probably). Let's change that:

In [11]:
usage_df.loc[usage_df["biotin_usage"] < 0, "biotin_usage"] = 0
print(usage_df.head(n=5))

         gene  biomass_usage  biotin_usage
P06999  b1723   4.740742e-16           0.0
O32583  b4407   0.000000e+00           0.0
P0AE12  b1982   0.000000e+00           0.0
P0AE22  b4055   0.000000e+00           0.0
P09158  b0121   0.000000e+00           0.0


There are a lot of rows with zero usage under both conditions, so let's filter them out:

In [12]:
usage_df = usage_df.query("biomass_usage > 0 or biotin_usage > 0")
print(usage_df)

         gene  biomass_usage  biotin_usage
P06999  b1723   4.740742e-16  0.000000e+00
P76536  b2431   1.368349e-12  1.094679e-12
P09833  b0765   3.918405e-12  3.134724e-12
P37329  b0763   3.918405e-12  3.134724e-12
P0AF01  b0764   3.918405e-12  3.134724e-12
...       ...            ...           ...
P0AC62  b3610   1.177509e-04  9.965358e-05
P0A953  b2323   1.551985e-04  1.355756e-04
P21151  b3845   2.255829e-04  1.804663e-04
P0A9B2  b1779   2.695114e-04  2.217190e-04
P13000  b0778   2.279039e-07  6.525652e-04

[300 rows x 3 columns]


Now let's compute usage changes. We will look at both absolute changes (the difference between both conditions) and relative changes (the fold change or ratio between them).

In [13]:
usage_df["abs_changes"] = usage_df["biotin_usage"] - usage_df["biomass_usage"]
usage_df["rel_changes"] = usage_df["biotin_usage"] / usage_df["biomass_usage"]
print(usage_df)

         gene  biomass_usage  biotin_usage   abs_changes  rel_changes
P06999  b1723   4.740742e-16  0.000000e+00 -4.740742e-16     0.000000
P76536  b2431   1.368349e-12  1.094679e-12 -2.736698e-13     0.800000
P09833  b0765   3.918405e-12  3.134724e-12 -7.836810e-13     0.800000
P37329  b0763   3.918405e-12  3.134724e-12 -7.836810e-13     0.800000
P0AF01  b0764   3.918405e-12  3.134724e-12 -7.836810e-13     0.800000
...       ...            ...           ...           ...          ...
P0AC62  b3610   1.177509e-04  9.965358e-05 -1.809734e-05     0.846308
P0A953  b2323   1.551985e-04  1.355756e-04 -1.962292e-05     0.873562
P21151  b3845   2.255829e-04  1.804663e-04 -4.511658e-05     0.800000
P0A9B2  b1779   2.695114e-04  2.217190e-04 -4.779232e-05     0.822670
P13000  b0778   2.279039e-07  6.525652e-04  6.523373e-04  2863.335554

[300 rows x 5 columns]


We can now sort and take a look at the enzymes that:

* Increased their absolute usage the most:

In [14]:
usage_df = usage_df.sort_values(by=['abs_changes'])
print(usage_df.tail(n=10).iloc[::-1])

         gene  biomass_usage  biotin_usage   abs_changes  rel_changes
P13000  b0778   2.279039e-07  6.525652e-04  6.523373e-04  2863.335554
P12998  b0776   5.317795e-09  1.522663e-05  1.522131e-05  2863.335554
P45578  b2687   1.786779e-07  2.426299e-06  2.247621e-06    13.579177
P0A817  b2942   1.192288e-07  1.922128e-06  1.802899e-06    16.121332
P12995  b0774   3.988346e-10  1.141997e-06  1.141599e-06  2863.335554
P76194  b1679   2.112517e-07  1.330013e-06  1.118762e-06     6.295871
P77444  b1680   2.112517e-07  1.330013e-06  1.118762e-06     6.295871
P0AF12  b0159   4.535749e-08  6.159174e-07  5.705599e-07    13.579177
P0AEK4  b1288   5.362593e-11  1.535490e-07  1.534954e-07  2863.335554
P0A6R0  b1091   4.991923e-11  1.429355e-07  1.428856e-07  2863.335554


* Decreased their absolute usage the most:

In [15]:
print(usage_df.head(n=10))

         gene  biomass_usage  biotin_usage  abs_changes  rel_changes
P0A9B2  b1779       0.000270      0.000222    -0.000048     0.822670
P21151  b3845       0.000226      0.000180    -0.000045     0.800000
P0A953  b2323       0.000155      0.000136    -0.000020     0.873562
P0AC62  b3610       0.000118      0.000100    -0.000018     0.846308
P17854  b2762       0.000118      0.000100    -0.000018     0.846308
P00934  b0004       0.000088      0.000071    -0.000017     0.807782
P0A9C5  b3870       0.000083      0.000067    -0.000016     0.807312
P0A6E4  b3172       0.000069      0.000055    -0.000014     0.800000
P36683  b0118       0.000065      0.000052    -0.000013     0.800000
P0A6C5  b2818       0.000060      0.000048    -0.000012     0.800000


* Increased their relative usage the most:

In [16]:
usage_df = usage_df.sort_values(by=['rel_changes'])
print(usage_df.tail(n=10).iloc[::-1])

         gene  biomass_usage  biotin_usage   abs_changes  rel_changes
P0A6Q6  b0180   3.303061e-13  9.457772e-10  9.454469e-10  2863.335554
P0AEK4  b1288   5.362593e-11  1.535490e-07  1.534954e-07  2863.335554
P12998  b0776   5.317795e-09  1.522663e-05  1.522131e-05  2863.335554
P12999  b0777   1.844389e-12  5.281105e-09  5.279260e-09  2863.335554
P13001  b3412   1.724700e-11  4.938394e-08  4.936669e-08  2863.335554
P12996  b0775   2.295464e-11  6.572685e-08  6.570389e-08  2863.335554
P0A6R0  b1091   4.991923e-11  1.429355e-07  1.428856e-07  2863.335554
P13000  b0778   2.279039e-07  6.525652e-04  6.523373e-04  2863.335554
P12995  b0774   3.988346e-10  1.141997e-06  1.141599e-06  2863.335554
P0A817  b2942   1.192288e-07  1.922128e-06  1.802899e-06    16.121332


* Decreased their relative usage the most:

In [17]:
print(usage_df.head(n=10))

         gene  biomass_usage  biotin_usage   abs_changes  rel_changes
P06999  b1723   4.740742e-16  0.000000e+00 -4.740742e-16     0.000000
P07003  b0871   2.359266e-06  1.871006e-06 -4.882603e-07     0.793046
P77218  b2458   1.173126e-06  9.363002e-07 -2.368261e-07     0.798124
P0A6A3  b2296   3.245753e-07  2.590514e-07 -6.552398e-08     0.798124
P0A790  b0131   1.654252e-07  1.323402e-07 -3.308505e-08     0.800000
P0A6I6  b3634   2.215657e-08  1.772525e-08 -4.431314e-09     0.800000
P0A9J4  b0425   4.301017e-09  3.440814e-09 -8.602035e-10     0.800000
P0A6I9  b0103   2.235476e-06  1.788381e-06 -4.470952e-07     0.800000
P0ABQ0  b3639   1.592685e-07  1.274148e-07 -3.185370e-08     0.800000
P31663  b0133   7.680360e-08  6.144288e-08 -1.536072e-08     0.800000


### 4. Model cleanup

In [18]:
import os
os.remove("iML1515.xml")
os.remove("eciML1515_batch.xml")