# Introduction
In this notebook, I will try to validate the model with the 13C data presented in Tang et al. 2009 (DOI 10.1002/bit.22181).

To do this, I will fix the internal fluxes to match that measured by the 13C data and from there see how much the exchange fluxes match the measured data. 

The paper has data about aerobic and microaerobic 13C data. and also have some data about the mixed acid fermentation. Table 2 will provide most of the experimental data to validate against? But these are yields.. I'll just have to make do.

Also, oxygen consumption rates aren't really defined.. They are more just defined based on the observed growth rate? so I will need to find a way around this.

In the paper, the glucose uptake rate is often set to 5 mmol h-1 g-1 biomass. So I will do the same here to make the values more comparable.

N.B.: this is a bit of a different strain! it is the M10EXG strain. 

To compare the calculated 13C fluxes with our model, I will first summarize them in a data frame. For each reaction defined by the C13-model, i will write the flux (relative to the 100 units shown) and try to map the corresponding flux from our model that it should be set to. 

In [25]:
import cameo
import pandas as pd
import cobra.io
from escher import Builder
import copy
from cobra.flux_analysis import flux_variability_analysis
from cobra import Metabolite, Reaction

In [26]:
model = cobra.io.read_sbml_model('../model/g-thermo.xml')

# Aerobic condition
I will first look to see if we can validate the aerobic condition that is tested. I will import the data frame from the file made in the 'databases' folder of the repo. Here i coupled the reactions and fluxes shown in the Tang paper to our model.

In [27]:
#set the glucose consumption to 5 mmol/gCDW/h
model.reactions.EX_glc__D_e.bounds = (-5,1000)

In [28]:
model.optimize().objective_value

0.3275225692669498

Now I need to write a script which will fix the fluxes as defined here and then will see if optimization is possible.
Then I will also check that the predicted products are and if this matches the experimental data.

When I do this with the exact fluxes determined, I get an infeasible solution. This is because for many of the reactions the standard deviation isn't given, allowing very little flexibility in the model. For the reactions with standard deviation, we will include them as flexibility in the bounds as: mean +-2xstDev.
For reactions without any standard deviation, we will add an error of 10%. This isn't ideal but may be necessary.

In [29]:
df_aer = pd.read_csv('../databases/Tang 2009 fluxes-aerobic.csv')

In [192]:
#note: aerobic is the fluxes from the paper, without GLUSy (i.e. the flux to glutamate in the biomass)

In [193]:
del df_aer['N.B.: 0 means no info available']

In [194]:
model_copy = copy.deepcopy(model)
for index, row in df_aer.iterrows():
    rct_id = row['model_id']
    rct = model_copy.reactions.get_by_id(rct_id)
    ub = row['UB']/20 #normalize to the 5 mmol glucose that is being supplied
    lb = row['LB']/20 #normalize to the 5 mmol glucose that is being supplied
    rct.bounds = (lb,ub)
    pfba_solution = cobra.flux_analysis.pfba(model_copy)
    print(rct_id,pfba_solution['biomass'])


GLC__Dtpts 0.32861626365989793
GALUi 0.3286162636599081
G6PDH2r 0.3269833815743214
PRPPS 0.2902505656145208
TKT2 0.2902505656145299
TALA 0.2902505656145228
TKT1 0.2902505656144517
DDPA 0.29025056561478846
PGI 0.29025056561478857
FBA 0.29025056554757445
PGK 0.2902505656145884
ENO 0.2902505656145466
PGCD 0.29025056561456464
GHMT 0.29025056561464996


Infeasible: None (infeasible)

We can see that the solver errors after the GHMT reaction. So I will fix all the other reactions and then see what is going on in metabolism to explain where this issue comes from.



In [233]:
model.reactions.GALUi.bounds = (0.045,0.055)
model.reactions.G6PDH2r.bounds = (-1.7,-1.3)
model.reactions.PRPPS.bounds = (0.18,0.22)
model.reactions.TKT2.bounds = (-0.44, -0.36)
model.reactions.TALA.bounds = (0.405,0.495)
model.reactions.TKT1.bounds = (-0.495,-0.405)
model.reactions.DDPA.bounds = (0.09,0.11)
model.reactions.PGI.bounds = (-3.795,-3.105)
model.reactions.FBA.bounds = (3.87,4.73)
model.reactions.PGK.bounds = (-9.35,-8.55)
model.reactions.ENO.bounds = (7.605,9.295)
model.reactions.PGCD.bounds = (0.45,0.55)
model.reactions.GHMT.bounds = (0.315,0.385)


In [234]:
model.reactions.PDH.bounds = (7.515,9.185)
model.reactions.PFL.bounds = (0,0)

In [235]:
model.reactions.LDH_L.bounds = (-0.44,-0.36)
model.reactions.ALCD2x.bounds = (0,0)
model.reactions.ASPTA.bounds = (-0.44,-0.36)
model.reactions.CS.bounds = (-3.5,-2.9)
model.reactions.ICDHyr.bounds = (1.8,2.2)
model.reactions.ICL.bounds = (0.6,1.8)
model.reactions.SUCOAS.bounds = (-2.2,-1.2)
model.reactions.SUCDi.bounds = (2.61,3.19)
model.reactions.FUM.bounds = (2.61,3.19)
model.reactions.MDH.bounds = (-4.51,-3.69)


In [236]:
pfba_solution = cobra.flux_analysis.pfba(model)
pfba_solution['biomass']

0.1796160053852448

In [40]:
aa =['EX_his__L_e', 'EX_lys__L_e', 'EX_met__L_e', 'EX_phe__L_e', 'EX_thr__L_e', 'EX_trp__L_e', 'EX_asp__L_e', 'EX_asn__L_e', 'EX_ala__L_e', 'EX_arg__L_e','EX_cys__L_e', 'EX_gln__L_e', 'EX_glu__L_e', 'EX_gly_e', 'EX_pro__L_e','EX_ser__L_e', 'EX_tyr__L_e','EX_ile__L_e', 'EX_leu__L_e','EX_val__L_e']

In [238]:
#Are we producing any amino acids???
for acid in aa:
    if pfba_solution[acid] > 0:
        print(acid,pfba_solution[acid])
    else: continue


EX_phe__L_e 0.026745818132085496
EX_pro__L_e 0.04599100831919913


So, we are producing some proline and phenylalanine, but it is not so much so I think it is oke for now.

In this setting I've fit all fluxes except the PSCVT, PPCK and PC reactions. (And GLUSy and PTAr, as we decided).
Now I need to figure out what the cause is of why I can't fix these fluxes?

First: what are the fluxes in the best fit when all other fluxes are fit?

In [239]:
print('PSCVT',pfba_solution['PSCVT'])
print('PPCK',pfba_solution['PPCK'])
print('PC',pfba_solution['PC'])


PSCVT 0.09
PPCK 0.0
PC 0.0


In [240]:
model.reactions.PC.bounds = (0,5.3)
model.reactions.PPCK.bounds = (0,6.5)

So the only flux that I can't fix is that of PSCVT, which matches the reaction that produces phenylalanine. Maybe it should be fit to a different reaction in the model? 

It was fit to the PSCVT, which is the first brance to the shikimate pathway. Maybe I should fix it to a different reaction further downstream? When it is fit to PHETA1 it can be solved... this is unexpected but it does seem to work.

In [241]:
model.reactions.PHETA1.bounds = (-0.22,-0.18)

In [242]:
pfba_solution = cobra.flux_analysis.pfba(model)
pfba_solution['biomass']

0.17961600538522435

In [243]:
for acid in aa:
    if pfba_solution[acid] > 0:
        print(acid,pfba_solution[acid])
    else: continue

EX_phe__L_e 0.026745818132092675
EX_pro__L_e 0.045991008319333074


When the two changes above are made in the table, it should solve the issue.

In [18]:
df_aer = pd.read_csv('../databases/Tang 2009 fluxes-aerobic.csv')

To be able to fit the glutamate being pulled into biomass, we need to add a reaction to the model. Because the GLUSy reaction that forms glutamate is also used in the formation of glutamine, so fixing that would be incorrect (and infeasible). 

To do so, I will add a new metabolite that can pull a specific amount from the glutamate pool, and from there go into biomass.

In [30]:
model_copy = copy.deepcopy(model)

In [31]:
#add metabolite
model_copy.add_metabolites(Metabolite(id='glu__L_c_biomass'))

In [32]:
model_copy.metabolites.glu__L_c_biomass.formula = model_copy.metabolites.glu__L_c.formula

In [33]:
#add pulling reaction
model_copy.add_reaction(Reaction(id='glu_biomass'))

In [34]:
model_copy.reactions.glu_biomass.add_metabolites({
    model_copy.metabolites.glu__L_c:-1,
    model_copy.metabolites.glu__L_c_biomass:1
})

In [35]:
#change biomass consumption to the new metabolites
model_copy.reactions.biomass.add_metabolites({
    model_copy.metabolites.glu__L_c:0.322004464285714,
    model_copy.metabolites.glu__L_c_biomass:-0.322004464285714
})

In [36]:
#also need to add the exchange reaction incase the flux doesn't match the defined aa composition in the biomass reaction
#it would error otherwise.
model_copy.add_boundary(model_copy.metabolites.glu__L_c_biomass,type = 'sink', reaction_id = 'glu_sink')

0,1
Reaction identifier,glu_sink
Name,sink
Memory address,0x02c3e40ba188
Stoichiometry,glu__L_c_biomass <=> <=>
GPR,
Lower bound,-1000.0
Upper bound,1000.0


In [37]:
for index, row in df_aer.iterrows():
    rct_id = row['model_id']
    rct = model_copy.reactions.get_by_id(rct_id)
    ub = row['UB']/20 #normalize to the 5 mmol glucose that is being supplied
    lb = row['LB']/20 #normalize to the 5 mmol glucose that is being supplied
    rct.bounds = (lb,ub)
    pfba_solution = cobra.flux_analysis.pfba(model_copy)
    print(rct_id,pfba_solution['biomass'])


GLC__Dtpts 0.33334548886819815
GALUi 0.33334548886820503
G6PDH2r 0.3316891074370969
PRPPS 0.2902505656145606
TKT2 0.29025056561454443
TALA 0.2902505656145458
TKT1 0.29025056561459855
PHETA1 0.2902505656141614
PGI 0.2902505656141613
FBA 0.2902505656145251
PGK 0.2902505656145146
ENO 0.29025056561460155
PGCD 0.290250565614308
GHMT 0.29025056561459556
PPCK 0.29025056561459556
PC 0.29025056561459556
PDH 0.290250565614577
PFL 0.290250565614577
LDH_L 0.2902505656145655
ALCD2x 0.29025056561459484
ASPTA 0.17961600538523256
CS 0.17961600538522104
ICDHyr 0.17961600538523634
ICL 0.17961600538521652
SUCOAS 0.17961600538520786
SUCDi 0.17961600538519773
FUM 0.1796160053851866
MDH 0.1796160053852685
PTAr 0.17961600538528685
glu_biomass 0.17961600538520311


Now to check which amino acids are being made in overflow from the defined fluxes

In [38]:
pfba_solution = cobra.flux_analysis.pfba(model_copy)
pfba_solution['biomass']

0.17961600538520311

In [41]:
for acid in aa:
    if pfba_solution[acid] > 0:
        print(acid,pfba_solution[acid])
    else: continue

So we seem to not be producing any amino acids in excess, butwe do have quite a bit of glutamate that is being produced too much..

In [42]:
model_copy.optimize()['glu_sink']

0.21216284440879773

Now that the fluxes have been fit, we would want to see what the predicted exchange rates are and then compare this to the published data.


In [43]:
prod = ['EX_glc__D_e', 'EX_lac__L_e','EX_etoh_e', 'EX_ac_e', 'EX_for_e', 'glu_sink','EX_o2_e']

In [44]:
for comp in prod:
    pfba_solution = cobra.flux_analysis.pfba(model_copy)
    print(comp,':  ',pfba_solution[comp]/pfba_solution['EX_glc__D_e'])

EX_glc__D_e :   1.0
EX_lac__L_e :   -0.07477074060758418
EX_etoh_e :   -0.0
EX_ac_e :   -0.7724155030819773
EX_for_e :   -0.036605789443882666
glu_sink :   -0.04406548057182627
EX_o2_e :   2.6079921617276867


In [45]:
with model_copy:
    growth = 0.99*model_copy.optimize().objective_value
    print('growth rate:',growth)
    model_copy2 = copy.deepcopy(model_copy)
    model_copy2.reactions.biomass.bounds = (growth,growth) #fix growth rate at 99% of max:
    for rct in prod:
        print(flux_variability_analysis(model_copy2, rct))

growth rate: 0.1778198453313512
             minimum   maximum
EX_glc__D_e     -5.0 -4.758416
             minimum   maximum
EX_lac__L_e     0.36  0.923168
           minimum  maximum
EX_etoh_e      0.0      0.0
          minimum   maximum
EX_ac_e  1.052856  4.334929
          minimum   maximum
EX_for_e      0.0  0.507997
           minimum   maximum
glu_sink  0.212741  0.272741
           minimum    maximum
EX_o2_e -15.185555 -11.816017


In [46]:
pfba_solution['biomass']

0.17961600538520311

In [47]:
#what is the oxygen consumption rate here?

In [48]:
pfba_solution = cobra.flux_analysis.pfba(model_copy)
pfba_solution['EX_o2_e']

-12.556745734931702

In [49]:
flux_variability_analysis(model_copy, model_copy.reactions.EX_o2_e)

Unnamed: 0,minimum,maximum
EX_o2_e,-15.156511,-11.846128


In [50]:
#check no succinate production
flux_variability_analysis(model_copy2, 'EX_succ_e')

Unnamed: 0,minimum,maximum
EX_succ_e,0.0,0.506275


## Microaerobic
Here an unfortunate thing is that we lack the information about the amount of oxygen that is supplied for the microaerobic experiment. So what i will do is fit the fluxes as they are defined, and then see what oxygen consumption the model predicts. 


In [51]:
#set the glucose consumption to 5 mmol/gCDW/h
model.reactions.EX_glc__D_e.bounds = (-5,1000)

In [52]:
df_micaer = pd.read_csv('../databases/Tang 2009 fluxes-microaerobic.csv')

In [53]:
del df_micaer['N.B.: 0 means no info available']

In [54]:
model_copy = copy.deepcopy(model)

In [55]:
#add metabolite
model_copy.add_metabolites(Metabolite(id='glu__L_c_biomass'))

In [56]:
model_copy.metabolites.glu__L_c_biomass.formula = model_copy.metabolites.glu__L_c.formula

In [57]:
#add pulling reaction
model_copy.add_reaction(Reaction(id='glu_biomass'))

In [58]:
model_copy.reactions.glu_biomass.add_metabolites({
    model_copy.metabolites.glu__L_c:-1,
    model_copy.metabolites.glu__L_c_biomass:1
})

In [59]:
#change biomass consumption to the new metabolites
model_copy.reactions.biomass.add_metabolites({
    model_copy.metabolites.glu__L_c:0.322004464285714,
    model_copy.metabolites.glu__L_c_biomass:-0.322004464285714
})

In [60]:
#also need to add the exchange reaction incase the flux doesn't match the defined aa composition in the biomass reaction
#it would error otherwise.
model_copy.add_boundary(model_copy.metabolites.glu__L_c_biomass,type = 'sink', reaction_id = 'glu_sink')

0,1
Reaction identifier,glu_sink
Name,sink
Memory address,0x02c3e0eb8bc8
Stoichiometry,glu__L_c_biomass <=> <=>
GPR,
Lower bound,-1000.0
Upper bound,1000.0


In [61]:
for index, row in df_micaer.iterrows():
    rct_id = row['model_id']
    rct = model_copy.reactions.get_by_id(rct_id)
    ub = row['UB']/20 #normalize to the 5 mmol glucose that is being supplied
    lb = row['LB']/20 #normalize to the 5 mmol glucose that is being supplied
    rct.bounds = (lb,ub)
    pfba_solution = cobra.flux_analysis.pfba(model_copy)
    print(rct_id,pfba_solution['biomass'])

GLC__Dtpts 0.33334548886819815
GALUi 0.33334548886820503
G6PDH2r 0.3322624702401776
PRPPS 0.14512528280727083
TKT2 0.14512528280725986
TALA 0.14512528280731227
TKT1 0.1451252828072659
PHETA1 0.14512528280722953
PGI 0.14512528280725792
FBA 0.14512528280727846
PGK 0.14512528280726963
ENO 0.14512528280714834
PGCD 0.14512528280728135
GHMT 0.1451252828072371
PPCK 0.145125282807318
PC 0.145125282807258
PDH 0.145125282807258
PFL 0.14512528280725775
LDH_L 0.1451252828073684
ALCD2x 0.14512528280725065
ASPTA 0.11226000336576793
CS 0.11226000336576794
ICDHyr 0.11226000336575401
ICL 0.1122600033657715
SUCOAS 0.11226000336577098
SUCDi 0.11226000336577387
FUM 0.11226000336577523
MDH 0.11226000336575691
PTAr 0.11226000336791238
glu_biomass 0.11226000336576913


In [62]:
pfba_solution = cobra.flux_analysis.pfba(model_copy)
pfba_solution['EX_o2_e']

-5.8962789549615255

In [63]:
flux_variability_analysis(model_copy, model_copy.reactions.EX_o2_e)

Unnamed: 0,minimum,maximum
EX_o2_e,-7.189232,-5.723964


In [64]:
#rati between PFL and PDH?
print(pfba_solution['PDH'])
print(pfba_solution['PFL'])

4.80265288377786
-0.9429140800607146


In [65]:
prod = ['EX_glc__D_e', 'EX_lac__L_e','EX_etoh_e', 'EX_ac_e', 'EX_for_e', 'glu_sink','EX_o2_e']

In [66]:
for comp in prod:
    pfba_solution = cobra.flux_analysis.pfba(model_copy)
    print(comp,':  ',pfba_solution[comp]/pfba_solution['EX_glc__D_e'])

EX_glc__D_e :   1.0
EX_lac__L_e :   -0.6720805600781031
EX_etoh_e :   -0.25778432441351895
EX_ac_e :   -0.4534973741668363
EX_for_e :   -0.21124621345685488
glu_sink :   -0.020224157734742145
EX_o2_e :   1.2063240372685766


In [67]:
with model_copy:
    growth = 0.99*model_copy.optimize().objective_value
    print('growth rate:',growth)
    model_copy2 = copy.deepcopy(model_copy)
    model_copy2.reactions.biomass.bounds = (growth,growth) #fix growth rate at 99% of max:
    for rct in prod:
        print(flux_variability_analysis(model_copy2, rct))

growth rate: 0.11113740333211163
             minimum   maximum
EX_glc__D_e     -5.0 -4.885519
             minimum   maximum
EX_lac__L_e    3.285  4.097161
           minimum  maximum
EX_etoh_e     1.26     1.54
          minimum   maximum
EX_ac_e  0.962507  2.888154
          minimum   maximum
EX_for_e      0.0  2.660788
           minimum   maximum
glu_sink  0.099213  0.112792
          minimum   maximum
EX_o2_e -7.203493 -5.656054


In [68]:
model_copy.optimize().objective_value

0.11226000336576931

In [69]:
#check no succinate production
flux_variability_analysis(model_copy2, 'EX_succ_e')

Unnamed: 0,minimum,maximum
EX_succ_e,0.0,0.153922


## Anaerobic
These fluxes were fit based on the data in figure 3. They are much more limited than the other fluxes. Also they measure the fluxes to all the main fermentation products, so this doesn't really mean the model should differ from the measured data.

I will fix the oxygen rate to be low so we are sure there is no OxPhos activated (-0.5 mmol/gcdw/h).

In [3]:
#set the glucose consumption to 5 mmol/gCDW/h
model.reactions.EX_glc__D_e.bounds = (-5,1000)

In [4]:
df_anaer = pd.read_csv('../databases/Tang 2009 fluxes-anaerobic.csv')

In [5]:
model_copy = copy.deepcopy(model)

In [7]:
#make anaerobic
model_copy.reactions.EX_o2_e.bounds = (-0.01,0)

In [8]:
#add metabolite
model_copy.add_metabolites(Metabolite(id='glu__L_c_biomass'))

In [9]:
model_copy.metabolites.glu__L_c_biomass.formula = model_copy.metabolites.glu__L_c.formula

In [10]:
#add pulling reaction
model_copy.add_reaction(Reaction(id='glu_biomass'))

In [11]:
model_copy.reactions.glu_biomass.add_metabolites({
    model_copy.metabolites.glu__L_c:-1,
    model_copy.metabolites.glu__L_c_biomass:1
})

In [12]:
#change biomass consumption to the new metabolites
model_copy.reactions.biomass.add_metabolites({
    model_copy.metabolites.glu__L_c:0.322004464285714,
    model_copy.metabolites.glu__L_c_biomass:-0.322004464285714
})

In [13]:
#also need to add the exchange reaction incase the flux doesn't match the defined aa composition in the biomass reaction
#it would error otherwise.
model_copy.add_boundary(model_copy.metabolites.glu__L_c_biomass,type = 'sink', reaction_id = 'glu_sink')

0,1
Reaction identifier,glu_sink
Name,sink
Memory address,0x02c3e0c7ba08
Stoichiometry,glu__L_c_biomass <=> <=>
GPR,
Lower bound,-1000.0
Upper bound,1000.0


In [14]:
for index, row in df_anaer.iterrows():
    rct_id = row['model_id']
    rct = model_copy.reactions.get_by_id(rct_id)
    ub = row['UB']/20 #normalize to the 5 mmol glucose that is being supplied
    lb = row['LB']/20 #normalize to the 5 mmol glucose that is being supplied
    rct.bounds = (lb,ub)
    pfba_solution = cobra.flux_analysis.pfba(model_copy)
    print(rct_id,pfba_solution['biomass'])

PGI 0.05334588870220029
G6PDH2r 0.053089938358466676
GAPD 0.05289780218293767
PFL 0.046899310829507955
PDH 0.046899310829507955
PTAr 0.046899310829507816
ACALD 0.04689931083077083
ALCD2x 0.04689931082951051
LDH_L 0.04398303247688881


In [15]:
pfba_solution = cobra.flux_analysis.pfba(model_copy)
pfba_solution['EX_o2_e']

-0.01

In [16]:
flux_variability_analysis(model_copy, model_copy.reactions.EX_o2_e)

Unnamed: 0,minimum,maximum
EX_o2_e,-0.01,-0.01


In [17]:
pfba_solution['biomass']

0.04398303247688881

In [20]:
with model_copy:
    growth = 0.99*model_copy.optimize().objective_value
    print('growth rate:',growth)
    model_copy2 = copy.deepcopy(model_copy)
    model_copy2.reactions.biomass.bounds = (growth,growth) #fix growth rate at 99% of max:
    for rct in prod:
        print(flux_variability_analysis(model_copy2, rct))

growth rate: 0.04354320215212022
             minimum   maximum
EX_glc__D_e     -5.0 -4.949913
             minimum   maximum
EX_lac__L_e     3.85  4.280865
            minimum  maximum
EX_etoh_e  2.384568      2.6
          minimum   maximum
EX_ac_e  2.258283  2.613621
           minimum   maximum
EX_for_e  4.489245  4.936296
           minimum   maximum
glu_sink -0.014021  0.006176
         minimum   maximum
EX_o2_e    -0.01 -0.000642


In [24]:
flux_variability_analysis(model_copy2, 'EX_succ_e')

Unnamed: 0,minimum,maximum
EX_succ_e,0.254079,0.713852


Note:
- for the micro-aerobic conversion of pyr to acetyl-coa, we need to distinguish PDH from PFL.. for aerobic it is clear it is PDH. For microaerobic, it is hard to discern the exact ratio that one could observe, so i think we should set the bounds to allow the rang of either reaction within the measurement made and from there estimate the ratio of activity between PDH and PFL. 




- maybe i can plot the measured vs simulated fluxes/exchanges or so?


In [300]:
solution = model_copy.optimize()

In [301]:
Builder(model = model_copy, map_json = '../map/g-thermo-map.json', reaction_data = solution.fluxes.to_dict())

Builder(reaction_data={'IDPh': 3.452796984101684, 'CAT': 3.699779753750112e-05, 'PDHam1hi': 0.0, 'HYDA': 0.0, …