# Introduction
In this notebook, I will try to validate the model with the 13C data presented in Tang et al. 2009 (DOI 10.1002/bit.22181).

To do this, I will fix the internal fluxes to match that measured by the 13C data and from there see how much the exchange fluxes match the measured data. 

__The paper has data about aerobic and microaerobic 13C data. and also have some data about the mixed acid fermentation. Table 2 will provide most of the experimental data to validate against? But these aren't really exchange rates? So I'll need to see what to do with this...__

__Also, oxygen consumption rates aren't really defined.. They are more just defined based on the observed growth rate? so maybe this becomes my kind of output?__

In the paper, the glucose uptake rate is often set to 5 mmol h-1 g-1 biomass. So I will do the same here to make the values more comparable.

N.B.: this is a bit of a different strain! it is the M10EXG strain. 

To compare the calculated 13C fluxes with our model, I will first summarize them in a data frame. For each reaction defined by the C13-model, i will write the flux (relative to the 100 units shown) and try to map the corresponding flux from our model that it should be set to. 

In [187]:
import cameo
import pandas as pd
import cobra.io
from escher import Builder
import copy

In [230]:
model = cobra.io.read_sbml_model('../model/g-thermo.xml')

### Notes
- GLUSy is used to make all ammonia containing compounds, so is a problem as the flux in the paper is only the amount for the glutamine that goes into biomas, so we neglect it.

- Also neglect PTAr: this is the measured flux that carries the amount of carbon going to acetate, leucine and fatty acids. As we don't know the distribution between these this can give a problem. So I've neglected that too.

# Aerobic condition
I will first look to see if we can validate the aerobic condition that is tested. I will import the data frame from the file made in the 'databases' folder of the repo. Here i coupled the reactions and fluxes shown in the Tang paper to our model.

In [231]:
#set the glucose consumption to 5 mmol/gCDW/h
model.reactions.EX_glc__D_e.bounds = (-5,1000)

In [232]:
model.optimize().objective_value

0.32861628193475195

Now I need to write a script which will fix the fluxes as defined here and then will see if optimization is possible.
Then I will also check that the predicted products are and if this matches the experimental data.

When I do this with the exact fluxes determined, I get an infeasible solution. This is because for many of the reactions the standard deviation isn't given, allowing very little flexibility in the model. For the reactions with standard deviation, we will include them as flexibility in the bounds as: mean +-2xstDev.
For reactions without any standard deviation, we will add an error of 10%. This isn't ideal but may be necessary.

In [191]:
df_aer = pd.read_csv('../databases/Tang 2009 fluxes-aerobic.csv')

In [192]:
#note: aerobic is the fluxes from the paper, without GLUSy (i.e. the flux to glutamate in the biomass)

In [193]:
del df_aer['N.B.: 0 means no info available']

In [194]:
model_copy = copy.deepcopy(model)
for index, row in df_aer.iterrows():
    rct_id = row['model_id']
    rct = model_copy.reactions.get_by_id(rct_id)
    ub = row['UB']/20 #normalize to the 5 mmol glucose that is being supplied
    lb = row['LB']/20 #normalize to the 5 mmol glucose that is being supplied
    rct.bounds = (lb,ub)
    pfba_solution = cobra.flux_analysis.pfba(model_copy)
    print(rct_id,pfba_solution['biomass'])


GLC__Dtpts 0.32861626365989793
GALUi 0.3286162636599081
G6PDH2r 0.3269833815743214
PRPPS 0.2902505656145208
TKT2 0.2902505656145299
TALA 0.2902505656145228
TKT1 0.2902505656144517
DDPA 0.29025056561478846
PGI 0.29025056561478857
FBA 0.29025056554757445
PGK 0.2902505656145884
ENO 0.2902505656145466
PGCD 0.29025056561456464
GHMT 0.29025056561464996


Infeasible: None (infeasible)

We can see that the solver errors after the GHMT reaction. So I will fix all the other reactions and then see what is going on in metabolism to explain where this issue comes from.



In [233]:
model.reactions.GALUi.bounds = (0.045,0.055)
model.reactions.G6PDH2r.bounds = (-1.7,-1.3)
model.reactions.PRPPS.bounds = (0.18,0.22)
model.reactions.TKT2.bounds = (-0.44, -0.36)
model.reactions.TALA.bounds = (0.405,0.495)
model.reactions.TKT1.bounds = (-0.495,-0.405)
model.reactions.DDPA.bounds = (0.09,0.11)
model.reactions.PGI.bounds = (-3.795,-3.105)
model.reactions.FBA.bounds = (3.87,4.73)
model.reactions.PGK.bounds = (-9.35,-8.55)
model.reactions.ENO.bounds = (7.605,9.295)
model.reactions.PGCD.bounds = (0.45,0.55)
model.reactions.GHMT.bounds = (0.315,0.385)


In [234]:
model.reactions.PDH.bounds = (7.515,9.185)
model.reactions.PFL.bounds = (0,0)

In [235]:
model.reactions.LDH_L.bounds = (-0.44,-0.36)
model.reactions.ALCD2x.bounds = (0,0)
model.reactions.ASPTA.bounds = (-0.44,-0.36)
model.reactions.CS.bounds = (-3.5,-2.9)
model.reactions.ICDHyr.bounds = (1.8,2.2)
model.reactions.ICL.bounds = (0.6,1.8)
model.reactions.SUCOAS.bounds = (-2.2,-1.2)
model.reactions.SUCDi.bounds = (2.61,3.19)
model.reactions.FUM.bounds = (2.61,3.19)
model.reactions.MDH.bounds = (-4.51,-3.69)


In [236]:
pfba_solution = cobra.flux_analysis.pfba(model)
pfba_solution['biomass']

0.1796160053852448

In [237]:
aa =['EX_his__L_e', 'EX_lys__L_e', 'EX_met__L_e', 'EX_phe__L_e', 'EX_thr__L_e', 'EX_trp__L_e', 'EX_asp__L_e', 'EX_asn__L_e', 'EX_ala__L_e', 'EX_arg__L_e','EX_cys__L_e', 'EX_gln__L_e', 'EX_glu__L_e', 'EX_gly_e', 'EX_pro__L_e','EX_ser__L_e', 'EX_tyr__L_e','EX_ile__L_e', 'EX_leu__L_e','EX_val__L_e']

In [238]:
#Are we producing any amino acids???
for acid in aa:
    if pfba_solution[acid] > 0:
        print(acid,pfba_solution[acid])
    else: continue


EX_phe__L_e 0.026745818132085496
EX_pro__L_e 0.04599100831919913


So, we are producing some proline and phenylalanine, but it is not so much so I think it is oke for now.

In this setting I've fit all fluxes except the PSCVT, PPCK and PC reactions. (And GLUSy and PTAr, as we decided).
Now I need to figure out what the cause is of why I can't fix these fluxes?

First: what are the fluxes in the best fit when all other fluxes are fit?

In [239]:
print('PSCVT',pfba_solution['PSCVT'])
print('PPCK',pfba_solution['PPCK'])
print('PC',pfba_solution['PC'])


PSCVT 0.09
PPCK 0.0
PC 0.0


In [240]:
model.reactions.PC.bounds = (0,5.3)
model.reactions.PPCK.bounds = (0,6.5)

So the only flux that I can't fix is that of PSCVT, which matches the reaction that produces phenylalanine. Maybe it should be fit to a different reaction in the model? 

It was fit to the PSCVT, which is the first brance to the shikimate pathway. Maybe I should fix it to a different reaction further downstream? When it is fit to PHETA1 it can be solved... this is unexpected but it does seem to work.

In [241]:
model.reactions.PHETA1.bounds = (-0.22,-0.18)

In [242]:
pfba_solution = cobra.flux_analysis.pfba(model)
pfba_solution['biomass']

0.17961600538522435

In [243]:
for acid in aa:
    if pfba_solution[acid] > 0:
        print(acid,pfba_solution[acid])
    else: continue

EX_phe__L_e 0.026745818132092675
EX_pro__L_e 0.045991008319333074


When the two changes above are made in the table, it should solve the issue.

In [217]:
df_aer = pd.read_csv('../databases/Tang 2009 fluxes-aerobic.csv')

In [218]:
#note: aerobic is the fluxes from the paper, without GLUSy (i.e. the flux to glutamate in the biomass)

In [220]:
model_copy = copy.deepcopy(model)
for index, row in df_aer.iterrows():
    rct_id = row['model_id']
    rct = model_copy.reactions.get_by_id(rct_id)
    ub = row['UB']/20 #normalize to the 5 mmol glucose that is being supplied
    lb = row['LB']/20 #normalize to the 5 mmol glucose that is being supplied
    rct.bounds = (lb,ub)
    pfba_solution = cobra.flux_analysis.pfba(model_copy)
    print(rct_id,pfba_solution['biomass'])


GLC__Dtpts 0.17961639818529673
GALUi 0.17961600538522537
G6PDH2r 0.17961600538522537
PRPPS 0.17961600538522537
TKT2 0.17961600538522537
TALA 0.17961600538522537
TKT1 0.1796160053852252
DDPA 0.1796160053852252
PGI 0.1796160053852252
FBA 0.1796160053852252
PGK 0.1796160053852252
ENO 0.17961600538522413
PGCD 0.17961600538522413
GHMT 0.17961600538522413
PHETA1 0.17961600538522413
PPCK 0.17961600538522413
PC 0.17961600538522413
PDH 0.17961600538522568
PFL 0.17961600538522568
LDH_L 0.17961600538522568
ALCD2x 0.17961600538522568
ASPTA 0.17961600538522568
CS 0.17961600538522568
ICDHyr 0.17961600538522568
ICL 0.17961600538522568
SUCOAS 0.17961600538522568
SUCDi 0.17961600538522562
FUM 0.17961600538522562
MDH 0.17961600538522562


Now that the fluxes have been fit, we would want to see what the predicted exchange rates are and then compare this to the published data.


In [226]:
prod = ['EX_glc__D_e', 'EX_lac__L_e','EX_etoh_e', 'EX_ac_e', 'EX_for_e']

In [228]:
for comp in prod:
    pfba_solution = cobra.flux_analysis.pfba(model)
    print(comp,':  ',pfba_solution[comp]/pfba_solution['EX_glc__D_e'])

EX_glc__D_e :   1.0
EX_lac__L_e :   -0.0751844087035775
EX_etoh_e :   -0.0
EX_ac_e :   -0.03838256641316192
EX_for_e :   -0.03680831047146421


## HMM the values are quite different, should i also fix the PTAr reaction?
seems to work alright.. so maybe this could be done?
then It is quite oke!

SO fix this in the excel sheet: so we only neglect GLUSy! 

In [244]:
model.reactions.PTAr.bounds = (3.555,4.345)

In [246]:
for comp in prod:
    pfba_solution = cobra.flux_analysis.pfba(model)
    print(comp,':  ',pfba_solution[comp]/pfba_solution['EX_glc__D_e'])

EX_glc__D_e :   1.0
EX_lac__L_e :   -0.07518440870357829
EX_etoh_e :   -0.0
EX_ac_e :   -0.7794643183170624
EX_for_e :   -0.03680831047148083


## Microaerobic
Here an unfortunate thing is that we lack the information about the amount of oxygen that is supplied for the microaerobic experiment. So what i will do is fit the fluxes as they are defined, and then see what oxygen consumption the model predicts. 

__Alternatively, I set the oxygen supply to an arbitrarily selected 'micro' aerobic system....__ ??


In [33]:
#set the glucose consumption to 5 mmol/gCDW/h
model.reactions.EX_glc__D_e.bounds = (-5,1000)

In [42]:
df_anaer = pd.read_csv('../../databases/Tang 2009 fluxes-microaerobic.csv')

In [43]:
#note: aerobic is the fluxes from the paper, without GLUSy (i.e. the flux to glutamate in the biomass)

In [44]:
del df_anaer['N.B.: 0 means no info available']

In [45]:
model_copy = copy.deepcopy(model)
for index, row in df_anaer.iterrows():
    rct_id = row['model_id']
    rct = model_copy.reactions.get_by_id(rct_id)
    ub = row['UB']/20 #normalize to the 5 mmol glucose that is being supplied
    lb = row['LB']/20 #normalize to the 5 mmol glucose that is being supplied
    rct.bounds = (lb,ub)
    pfba_solution = cobra.flux_analysis.pfba(model_copy)
    print(rct_id,pfba_solution['biomass'])

GLC__Dtpts 0.328616263659893
GALUi 0.32861626365989305
G6PDH2r 0.32754860998855234
PRPPS 0.1451252828072703
TKT2 0.14512528280726988
TALA 0.1451252828072707
TKT1 0.1451252828072714
DDPA 0.14512528280727274
PGI 0.14512528280728362
FBA 0.14512528280728362
PGK 0.14512528280726728
ENO 0.14512528280726422
PGCD 0.1451252828072635
GHMT 0.14512528280726392
PSCVT 0.14512528280726392
PPCK 0.1451252828072688



solver status is 'infeasible'



PC 0.14512528280726877
PDH 0.14512528280726877
PFL 0.14512528280727252
LDH_L 7.712686799789064e-16
ALCD2x -7.61338181953427e-17
ASPTA 4.490791884855722e-16
CS 5.6744648992952885e-31
ACONTa 0.0
ICL 0.0
SUCOAS 0.0
SUCDi 0.0
FUM 1.469056239416829e-17
MDH 1.469056239416669e-17


In [39]:
model_copy.optimize()


solver status is 'infeasible'



Note:
- for the micro-aerobic conversion of pyr to acetyl-coa, we need to distinguish PDH from PFL.. for aerobic it is clear it is PDH. For microaerobic, it is hard to discern the exact ratio that one could observe, so i think we should set the bounds to allow the rang of either reaction within the measurement made and from there estimate the ratio of activity between PDH and PFL. 




- maybe i can plot the measured vs simulated fluxes/exchanges or so?
