# Introduction
In this notebook, I will try to validate the model with the 13C data presented in Tang et al. 2009 (DOI 10.1002/bit.22181).

To do this, I will fix the internal fluxes to match that measured by the 13C data and from there see how much the exchange fluxes match the measured data. 

__The paper has data about aerobic and microaerobic 13C data. and also have some data about the mixed acid fermentation. Table 2 will provide most of the experimental data to validate against? But these aren't really exchange rates? So I'll need to see what to do with this...__

__Also, oxygen consumption rates aren't really defined.. They are more just defined based on the observed growth rate? so maybe this becomes my kind of output?__

In the paper, the glucose uptake rate is often set to 5 mmol h-1 g-1 biomass. So I will do the same here to make the values more comparable.

N.B.: this is a bit of a different strain! it is the M10EXG strain. 

To compare the calculated 13C fluxes with our model, I will first summarize them in a data frame. For each reaction defined by the C13-model, i will write the flux (relative to the 100 units shown) and try to map the corresponding flux from our model that it should be set to. 

In [11]:
import cameo
import pandas as pd
import cobra.io
from escher import Builder
import copy

In [12]:
model = cobra.io.read_sbml_model('../../model/g-thermo.xml')

# Aerobic condition
I will first look to see if we can validate the aerobic condition that is tested. I will import the data frame from the file made in the 'databases' folder of the repo. Here i coupled the reactions and fluxes shown in the Tang paper to our model.

In [13]:
#set the glucose consumption to 5 mmol/gCDW/h
model.reactions.EX_glc__D_e.bounds = (-5,1000)

In [14]:
model.optimize().objective_value

0.32861628193475195

Now I need to write a script which will fix the fluxes as defined here and then will see if optimization is possible.
Then I will also check that the predicted products are and if this matches the experimental data.

When I do this with the exact fluxes determined, I get an infeasible solution. This is because for many of the reactions the standard deviation isn't given, allowing very little flexibility in the model. For the reactions with standard deviation, we will include them as flexibility in the bounds as: mean +-2xstDev.
For reactions without any standard deviation, we will add an error of 10%. This isn't ideal but may be necessary.

In [15]:
df_aer = pd.read_csv('../../databases/Tang 2009 fluxes-aerobic.csv')

In [16]:
#note: aerobic is the fluxes from the paper, without GLUSy (i.e. the flux to glutamate in the biomass)

In [17]:
del df_aer['N.B.: 0 means no info available']

In [19]:
model_copy = copy.deepcopy(model)
for index, row in df_aer.iterrows():
    rct_id = row['model_id']
    rct = model_copy.reactions.get_by_id(rct_id)
    ub = row['UB']/20 #normalize to the 5 mmol glucose that is being supplied
    lb = row['LB']/20 #normalize to the 5 mmol glucose that is being supplied
    rct.bounds = (lb,ub)
    pfba_solution = cobra.flux_analysis.pfba(model_copy)
    print(rct_id,pfba_solution['biomass'])


GLC__Dtpts 0.32861626365989793
GALUi 0.3286162636599081
G6PDH2r 0.3269833815743214
PRPPS 0.2902505656145208
TKT2 0.2902505656145299
TALA 0.2902505656145228
TKT1 0.2902505656144517
DDPA 0.29025056561478846
PGI 0.29025056561478857
FBA 0.29025056554757445
PGK 0.2902505656145884
ENO 0.2902505656145466
PGCD 0.29025056561456464
GHMT 0.29025056561464996


Infeasible: None (infeasible)

- GLUSy is used to make all ammonia containing compounds, so is a problem as the flux in the paper is only the amount for the glutamine that goes into biomas, so maybe we should neglect it?

- Also neglect PTAr: this is the measured flux that carries the amount of carbon going to acetate, leucine and fatty acids. As we don't know the distribution between these this can give a problem. So maybe i should neglect it too?

In [18]:
model_copy.optimize()

In [11]:
model_copy.optimize().objective_value

0.0

It is strange, now it seems that the optimize function is infeasible, but for some reason there is still an objective value?

## Microaerobic
Here an unfortunate thing is that we lack the information about the amount of oxygen that is supplied for the microaerobic experiment. So what i will do is fit the fluxes as they are defined, and then see what oxygen consumption the model predicts. 

__Alternatively, I set the oxygen supply to an arbitrarily selected 'micro' aerobic system....__ ??


In [33]:
#set the glucose consumption to 5 mmol/gCDW/h
model.reactions.EX_glc__D_e.bounds = (-5,1000)

In [42]:
df_anaer = pd.read_csv('../../databases/Tang 2009 fluxes-microaerobic.csv')

In [43]:
#note: aerobic is the fluxes from the paper, without GLUSy (i.e. the flux to glutamate in the biomass)

In [44]:
del df_anaer['N.B.: 0 means no info available']

In [45]:
model_copy = copy.deepcopy(model)
for index, row in df_anaer.iterrows():
    rct_id = row['model_id']
    rct = model_copy.reactions.get_by_id(rct_id)
    ub = row['UB']/20 #normalize to the 5 mmol glucose that is being supplied
    lb = row['LB']/20 #normalize to the 5 mmol glucose that is being supplied
    rct.bounds = (lb,ub)
    pfba_solution = cobra.flux_analysis.pfba(model_copy)
    print(rct_id,pfba_solution['biomass'])

GLC__Dtpts 0.328616263659893
GALUi 0.32861626365989305
G6PDH2r 0.32754860998855234
PRPPS 0.1451252828072703
TKT2 0.14512528280726988
TALA 0.1451252828072707
TKT1 0.1451252828072714
DDPA 0.14512528280727274
PGI 0.14512528280728362
FBA 0.14512528280728362
PGK 0.14512528280726728
ENO 0.14512528280726422
PGCD 0.1451252828072635
GHMT 0.14512528280726392
PSCVT 0.14512528280726392
PPCK 0.1451252828072688



solver status is 'infeasible'



PC 0.14512528280726877
PDH 0.14512528280726877
PFL 0.14512528280727252
LDH_L 7.712686799789064e-16
ALCD2x -7.61338181953427e-17
ASPTA 4.490791884855722e-16
CS 5.6744648992952885e-31
ACONTa 0.0
ICL 0.0
SUCOAS 0.0
SUCDi 0.0
FUM 1.469056239416829e-17
MDH 1.469056239416669e-17


In [39]:
model_copy.optimize()


solver status is 'infeasible'



Note:
- for the micro-aerobic conversion of pyr to acetyl-coa, we need to distinguish PDH from PFL.. for aerobic it is clear it is PDH. For microaerobic, it is hard to discern the exact ratio that one could observe, so i think we should set the bounds to allow the rang of either reaction within the measurement made and from there estimate the ratio of activity between PDH and PFL. 




- maybe i can plot the measured vs simulated fluxes/exchanges or so?
