# Introduction
From previous work, I have noticed that especially in the sugars there are quite some 'duplicate' sugars. E.g. f6p_k_c and f6p_B_c. The problem with this is that there are then also different reactions for the different metabolites, and then visualization of what is really happening in the model becomes difficult. Also, one can then have duplicate fluxes that do not connect properly to the rest of the model. Also because of this, the reactions and metabolites have been given BiGG compliant IDs, but not exactly as they should be and so this should be fixed. 

So in this notebook, I will take a look at these conversions. Specifically in glycolysis and the TCA it would be good to be able to observe the changes of flux distributions when we modify medium, make KO etc. 

In [1]:
import cameo
import pandas as pd
import cobra.io
import escher

In [3]:
model = cobra.io.read_sbml_model('../model/g-thermo.xml')

In [4]:
model_e_coli = cameo.load_model('iML1515')

In [5]:
solution = model.optimize()

In [6]:
escher.Builder('e_coli_core.Core metabolism', reaction_data=solution.fluxes.to_dict())

Downloading Map from https://escher.github.io/1-0-0/6/maps/Escherichia%20coli/e_coli_core.Core%20metabolism.json


Builder(reaction_data={'IDPh': 1.0165234238192156, 'PYRACTT': 0.0, 'CAT': 0.0, 'PDHam1hi': 0.0, 'CCP': 0.0, 'H…

In [7]:
model.metabolites.f1p_c.name = 'D-Fructose 1-phosphate'

For examle in the above map, you can see that there are quite some improper labels/reactions defined. Even though this is mapped to an E. coli model, EMP, PPP and TCA is quite conserved, and so it would make sense to modify the reactions and sugar IDs to match that of the skeleton provided in this map. This has been shown and verified in literature (e.g. Tang et al. 2009 DOI 10.1002/bit.22181 and Cripps et al. 2009 doi:10.1016/j.ymben.2009.08.005) These papers will be used as a quideline, together with bioinformatic and other literature research about certain reactions.

When deciding whether or not and how to combine the different metabolites/reactions, one should keep in mind the physiological relevance of having seperate sugars or not. 
E.g. P. thermoglucosidasius can grow on both starch and cellulose. The only difference between the two is the alpha vs beta d-glucose monomers. If both are consumed and metabolized, than maybe they can be summarized into just D-glucose.

__Approach__:

1) Glycolysis: start from glucose and move the way down through glycolysis until pyruvate

2) PPP: go through the pentose phosphate pathway and make sure there are no duplicates here.

3) TCA: go through this cycle finally to make sure it is complete.

__Glucose__: Right now, there are three different types of glucose: glc__D_c, glc__aD_c and glc__bD_c, but there is only transport and exchange for glc__D_c. The main difference between the alpha and beta types is due to the different polysaccharides and their breakdown. For example, starch hydrolysis leads to alpha-D-Glucose, where cellulose hydrolysis leads to b-D-glucose. *P. thermoglucosidasius* can grow on both, and so maybe for modelling purposes it is safe enough to 'lump' the two types together into just D-glucose. Also as the downstream pathways are the same for the different cugar types. It would grealy simplify the model and model analysis to do so, without really losing a lot of the physiology of the organism.

Also there is already a reaction in the model (GLCEPI) that just interconverts the a and b type, so it makes sense to just merge them all. 

In [8]:
# reaction GLCDPT should be fixed to include ATP and mane changed to HEX1 to be more bigg compliant
model.reactions.GLCDPT.id = 'HEX1'

In [9]:
model.reactions.HEX1.add_metabolites({model.metabolites.atp_c:-1, model.metabolites.adp_c:1, model.metabolites.g6p_c:1, model.metabolites.g6p_A_c:-1, model.metabolites.h_c:1})

In [10]:
model.reactions.HEX1.name = 'Hexokinase (D-glucose:ATP)'

In [11]:
#remove the GLCEPI rct. it is now not needed anymore
model.remove_reactions(model.reactions.GLCEPI)


need to pass in a list


need to pass in a list



In [12]:
#GLCAPT can also be removed, as it will be the same as HEX1 when we get rid of the alpha and beta.
model.remove_reactions(model.reactions.GLCAPT)

In [13]:
#GLCISO also becomes unneccesary (same as XYLI2)
model.remove_reactions(model.reactions.GLCISO)

In [14]:
#GLUKA can be removed, becomes the same as HEX1
model.remove_reactions(model.reactions.GLUKA)

For the rest of the reactions still associated to glc__aD_c or glc__bD_c, i can just change the reactions to include glc__D_c instead and the issue at the glucose node should be resolved.
Then finally, I can remove glc__aD_c and glc__bD_c metabolites.

In [15]:
for rct in model.metabolites.glc__aD_c.reactions:
    glc = model.metabolites.glc__D_c
    for met in rct.metabolites:
        if met.id in ['glc__aD_c']:
            glc_a = met
            stoich = rct.metabolites[model.metabolites.get_by_id(met.id)]
        else: continue
    rct.add_metabolites({glc:stoich, glc_a:-stoich})

In [16]:
for rct in model.metabolites.glc__bD_c.reactions:
    glc = model.metabolites.glc__D_c
    for met in rct.metabolites:
        if met.id in ['glc__bD_c']:
            glc_b = met
            stoich = rct.metabolites[model.metabolites.get_by_id(met.id)]
        else: continue
    rct.add_metabolites({glc:stoich, glc_b:-stoich})

In [17]:
model.remove_metabolites(model.metabolites.glc__aD_c, model.metabolites.glc__bD_c)

In [18]:
#save&commit
cobra.io.write_sbml_model(model,'../model/g-thermo.xml')

__Glucose-6-phosphate__: there are three g6p metabolites currently in the model: g6p_c, g6p_A_c and g6p_B_c. These should also be merged into just the g6p_c metabolite.
I'll also check all the reactions associated to these metabolites to make sure we do not create faulty or duplicate reactions.

We need to make sure there is the PGI reaction and the G6PDH2r reaction too. 

In [18]:
model.metabolites.g6p_c.name = 'Glucose 6-phosphate'

In [19]:
model.reactions.G6PDH2er.id = 'G6PDH2r'

In [20]:
#remove PGCM, as it becomes the same as PGMT. 
#also, PGMT should be reversible.
model.remove_reactions(model.reactions.PGCM)

In [21]:
model.reactions.PGMT.bounds = (-1000,1000)

In [22]:
#G6PI_1 remove: it interchanges the A and B form
model.remove_reactions(model.reactions.G6PI_1)

In [23]:
#rename G6PISO to PGI
model.reactions.G6PISO.id = 'PGI'

In [24]:
#remove PGIA, same as PGI
model.remove_reactions(model.reactions.PGIA)

In [25]:
#remove G6PI2, same as PGI
model.remove_reactions(model.reactions.G6PI2)


In [26]:
# remove G6PBDH as it will be the same as G6PDH2r
model.remove_reactions(model.reactions.G6PBDH)

In [27]:
#ARBT6PH should be renamed AB6PGH, and then the metabolite changed later.
model.reactions.ARBT6PH.id = 'AB6PGH'

In [28]:
#for the reactions left, just change metabolites
for rct in model.metabolites.g6p_A_c.reactions:
    g6p = model.metabolites.g6p_c
    for met in rct.metabolites:
        if met.id in ['g6p_A_c']:
            g6p_a = met
            stoich = rct.metabolites[model.metabolites.get_by_id(met.id)]
        else: continue
    rct.add_metabolites({g6p:stoich, g6p_a:-stoich})


In [29]:
for rct in model.metabolites.g6p_B_c.reactions:
    g6p = model.metabolites.g6p_c
    for met in rct.metabolites:
        if met.id in ['g6p_B_c']:
            g6p_b = met
            stoich = rct.metabolites[model.metabolites.get_by_id(met.id)]
        else: continue
    rct.add_metabolites({g6p:stoich, g6p_b:-stoich})

In [30]:
model.remove_metabolites(model.metabolites.g6p_A_c, model.metabolites.g6p_B_c)

In [31]:
model.metabolites.salcn6p_c.name = 'Salicin 6-phosphate'

In [32]:
#save&commit
cobra.io.write_sbml_model(model,'../model/g-thermo.xml')

__Fructose-6-phosphate__: our model currently has two fructose-5-phosphate molecules: f6p_k_c and f6p_B_c. Here f6p_k_c is the metabolite we would want to keep, but I will remane it at just f6p_c, to be more bigg compliant.


In [33]:
model.metabolites.f6p_k_c.id = 'f6p_c'

In [34]:
model.metabolites.f6p_c.name = 'D-Fructose 6-phosphate'

In [35]:
#remane PFKh  to PFK
model.reactions.PFKh.id = 'PFK'

In [36]:
#remove S7PG3PT as it is the same as TALA
model.remove_reactions(model.reactions.S7PG3PT)

In [37]:
#remove MANISO, it is the same as MAN6PI
model.remove_reactions(model.reactions.MANISO)

In [38]:
#remove MNLOR, becomes same as M1PD
model.remove_reactions(model.reactions.MNLOR)

In [39]:
model.metabolites.fru_c.name = 'D-Fructose'

In [40]:
#remove FRUPT, same as HEX7
model.remove_reactions(model.reactions.FRUPT)

In [41]:
#rename BFBP to FBP to be more BiGG compliant
model.reactions.BFBP.id = 'FBP'

In [42]:
# remove AH6PISO, same as AH6PI
model.remove_reactions(model.reactions.AH6PISO)

In [43]:
#remove FDPPH, will be same as FBP
model.remove_reactions(model.reactions.FDPPH)

In [44]:
#remove F6PPT, same as PFK
model.remove_reactions(model.reactions.F6PPT)

In [45]:
#remove F6PGT, same as TKT2
model.remove_reactions(model.reactions.F6PGT)

In [46]:
#replace f6p_B_c with f6p_c in the leftover reactions
for rct in model.metabolites.f6p_B_c.reactions:
    corr_met = model.metabolites.f6p_c
    for met in rct.metabolites:
        if met.id in ['f6p_B_c']:
            wrong_met = met
            stoich = rct.metabolites[model.metabolites.get_by_id(met.id)]
        else: continue
    rct.add_metabolites({corr_met:stoich, wrong_met:-stoich})

In [47]:
model.remove_metabolites(model.metabolites.f6p_B_c)

Comment: for the conversion of f6p to fructose-1,6-bisphosphate, there are several reactions possible, with either ATP, UTP, GTP, etc. Now it uses UTP, because this metabolite can be made 'for free' in the model now. Though should it not be ATP? 
For now I will mimic this by turning the reactions with the other nucleotide phosphates to bounds of 0. As ATP is the energy unit, it makes more sense to include this here. 

In [48]:
model.reactions.F6PUPT.bounds = (0,0)
model.reactions.F6PCPT.bounds = (0,0)
model.reactions.F6PIPT.bounds = (0,0)

In [49]:
#save&commit
cobra.io.write_sbml_model(model,'../model/g-thermo.xml')

__Fructose-1,6-bisphosphate__: we have two metabolites in the model currently: fdp_c and fdp_B_c. We will merge them together into just fdp_c. 

In [51]:
#remove FBAf, it is the same as FBA.
model.remove_reactions(model.reactions.FBAf)

In [52]:
model.reactions.F1PPT.id = 'FRUK'

In [53]:
#merge the fdp_B_c metabolite to fdp_c
for rct in model.metabolites.fdp_B_c.reactions:
    corr_met = model.metabolites.fdp_c
    for met in rct.metabolites:
        if met.id in ['fdp_B_c']:
            wrong_met = met
            stoich = rct.metabolites[model.metabolites.get_by_id(met.id)]
        else: continue
    rct.add_metabolites({corr_met:stoich, wrong_met:-stoich})

In [54]:
#remove fdp_B_c metabolite
model.remove_metabolites(model.metabolites.fdp_B_c)

In [55]:
#save&commit
cobra.io.write_sbml_model(model,'../model/g-thermo.xml')

__D-glyceraldehyde & Glyceronephosphate__ Initially, there seem to be only one G3P and DHAP metabolite in the model. This is also to be expected as they are not known to have various steroisomers. I've checked their flux balancing though to see if there are any strange things appearing. Then I found reaction FBA2, which contains glyald_c, which turns out to also be g3p. So this metabolite should be merged with g3p_c. 

for DHAP, we see the ALKP reaction which is phosphorylating dha into dhap. This is thermodynamically very unfeasible, and looking into it more shows that the reaction should be in the reverse. so these bounds are changed.

Also, because so many of the bounds are restricted to 20, the model decides to produce fructose1phosphate from the DHAP instead of g3p and moving down glycolysis. But the bounds should be modified anyway, this will be done in another notebook. I can change them all by hand, but that doesn't make so much sens.

In [56]:
#merge the glyald_c metabolite to g3p_c
for rct in model.metabolites.glyald_c.reactions:
    corr_met = model.metabolites.g3p_c
    for met in rct.metabolites:
        if met.id in ['glyald_c']:
            wrong_met = met
            stoich = rct.metabolites[model.metabolites.get_by_id(met.id)]
        else: continue
    rct.add_metabolites({corr_met:stoich, wrong_met:-stoich})

In [57]:
model.reactions.ALKP.bounds=(-1000,0)

In [58]:
#save&commit
cobra.io.write_sbml_model(model,'../model/g-thermo.xml')

__3-Phospho-D-glyceroyl__: Here we only have the metabolite 13dpg_c. Here one consideration is the GAPD(H) reactions. We know from Beata's thesis that the *P. thermoglucosidasius* NCIB11955 uses an NADP(H) dependent GAPD, which has some side affinity for NAD(H). I will discuss this reaction and then decide how to process it. 

In [59]:
model.reactions.S7PG3PGT.id = 'TKT1'

__3-phosphoglycerate__ This is metabolite 3pg_c. The ACPh reaction and PGK are very similar. except ACPh works with water and free phosphate, where PGK includes ATP and ADP. PGK is the reaction one would expect. Additionally I cannot find any evidence for the ACPh conversion and for it to be expected here. Therefor I will remove it here. 

In [60]:
#this reaction should not be reversible
model.reactions.GLYCK.bounds = (0,1000)

In [61]:
model.remove_reactions(model.reactions.ACPh)

In [63]:
#save&commit
cobra.io.write_sbml_model(model,'../model/g-thermo.xml')

__2-phosphoglycerate__ : This metabolite seems completely correct, along with all reactions associated to it. 

__Phosphoenolpyruvate__ (pep_c) At first glance it seems the only issue here is renaming of the PSCIT rct to PSCVT. Also it seems that our strain does not contain a phosphoenolpyruvate carboxylase. 

In [64]:
#PSCIT should be renamed PSCVT
model.reactions.PSCIT.id = 'PSCVT'

__Pyruvate__
Probably remove PYRPT rct, but should be resolved from Martyn's mass balancing. 

Lactate dehydrogenase: *P. thermoglucosidasius* NCIB 11955 has been shown to produce L-lactate instead of D-lactate at >99% optical purity (Tang 2009). So I will not change this reaction to match the standard BiGG map used.

ME2: doesn't seem to be present in the strain at all. Also Tang and Cripps do not report it. 

Hard to really spot anything wrong here right now. 

__Ethanol__: currently in the map we do not have matching IDs for the conversion of acetaldehyde to ethanol. So I will check this.

In [65]:
model.reactions.ALCD2ir.id = 'ALCD2x'

In [66]:
model.reactions.ALCD2x.name = 'Alcohol dehydrogenase (ethanol)'

In [67]:
#save&commit
cobra.io.write_sbml_model(model,'../model/g-thermo.xml')

## TCA cycle
It seems the TCA cycle is not really complete when trying to visualize the model. So I'll go through the cycle to check if there have been mismatched ID's that prevent the visualization of the full cycle. 

In [68]:
#Succinate to fumarate should be renamed (QUINOR to SUCDi) 
# also to the reaction i will add the quinone/quinol pair of our model to catch the change in redox.
#then also change the reversability of the reaction
model.reactions.QUINOR.add_metabolites({model.metabolites.qh2_c:-1, model.metabolites.ubiquin_c:1, model.metabolites.h_c:2})

In [69]:
model.reactions.QUINOR.id = 'SUCDi'

In [70]:
model.reactions.SUCDi.bounds = (0,1000)

In [71]:
#search for SUCOAS
model.reactions.SUCLm.id = 'SUCOAS'

In [72]:
model.metabolites.HC01434_c.id = 'osuc_c'

In [73]:
#search for ICDHyr
model.reactions.ICDHhr.id = 'ICDHyr'

In [74]:
#remove ACONT, as ACONTa and ACONTb are present already
model.remove_reactions(model.reactions.ACONT)

__Oxoglutarate__: the thesis of Charlotte Ward (2015, https://doi.org/10.25560/25514), looks into flux distribution of the TCA. She mentions that 'the genome contains both an oxoglutarate dehydrogenase complex (OGDC) and an oxoglutarate synthase (EC 1.2.7.3). Unlike the OGDC, the latter does not require NAD+ as a cofactor and does not rely upon the dihydrolipoamide dehydrogenase subunit (EC. 1.8.1.4) of the pyruvate dehydrogenase complex and instead mediates electron transport using a ferredoxin.'
So here I will just check this is correctly maintained in the model. The AKGDH is the oxoglutarate dehydrogenase complex, and so is properly present in the model. 

In [75]:
#the oxoglutarate synthase is currently names AKGOR. I will rename it  OOR3r to be more BiGG compliant, though it is not such a frequent reaction.
model.reactions.AKGOR.id = 'OOR3r'

In [76]:
#what is this reactions?
model.reactions.R_SUCOAACTr.id = 'SUCOAACTr'

__Lactate__
From experimental data, *P. thermoglucosidasius* NCIB11955 produced at >99% optical purity, L-lactate. There is still a D-lactate metabolite in our model. So I wanted to check exactly what reactions that is still involved in, and decide if it should be removed or not. According to genome annotation these reactions should be present. I cannot really find an indication as to why the counterpart reaction with L-lactate would or would not be possible, so I will leave them for now. 

In [77]:
model.metabolites.lac__D_c.summary()

Unnamed: 0_level_0,Unnamed: 1_level_0,PERCENT,FLUX,REACTION_STRING
RXN_STAT,ID,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
PRODUCING,ACM6PH,,0.0,acmum6p_c + h2o_c <=> acgam6p_c + 2.0 h_c + la...
PRODUCING,GLYOX,,0.0,gthrd_c + h_c + lac__D_c --> h2o_c + lgt__S_c
PRODUCING,LAC__Dt,,0.0,lac__D_c --> lac__D_e


In [78]:
#save&commit
cobra.io.write_sbml_model(model,'../model/g-thermo.xml')

## PPP
Now I will look into the PPP pathways, though most have likely already been fixed through fixing the initial sugar problems above. 

__6-phosphogluconolactonase (EC 3.1.1.31).__

This enzyme is not annotated in the *P. thermoglucosidasius* NCIB11955 genome. Also various data shows that this enzyme really is not present. It is generally accepted (from thermodynamic and in vitro data) that the conversion performed by this enzyme can happen spontaenously at sufficient flux at the high temperatures that this organism grows at. Without this reaction, the pentose phosphate pathway cannot run, and previous C13 labelling experiment shave shown that the other reactions do carry flux and so must mean this conversion occurs. 

So, the model contains a spontaneous version of this reaction. I will modify the reaction ID to match the map, and then also add a clear note in the reaction name to make users aware that it is spontaneous.


In [183]:
model.reactions.PGLHY.id = 'PGL'

In [185]:
model.reactions.PGL.name = 'Spontaneous 6-Phospho-D-glucono-1,5-lactone lactonohydrolase'

Aside from this, it seems the PPP is correctly annotated and functional. So for now i will not modify it any further.

__GAPDH__
Beata's thesis has shown that *P. thermoglucosidasius* NCIB11955 has a NADPH dependent Glyceraldehyde-3-phosphate dehydrogenase. The enzyme has some promiscuious activity for NADH, though the catalytic activity of the enzyme with NADPH is about 30 fold higher in vivo (Table 3.1). Currently the model has both versions of the reaction inside. However, there is no way to distinguish this higher catalytic efficiency for the model.

After discussion with Niko Sonnenschein, as the ratio of efficiency is so different, it makes sense that the vast majority of the flux will work with NADPH as co-factor. Therefore, to prevent the model from picking the NADH dependent reaction due to balancing preferences, I will remove it from the model, per his recommendation. 

Ofcourse this is not ideal, but comes closer to re-capitulating the real biology than the current way the model is kept.

In [84]:
model.remove_reactions(model.reactions.GAPD)


need to pass in a list


need to pass in a list



In [87]:
model.reactions.GAPDH_nadp_hi.id = 'GADPH'

In [90]:
#save&commit
cobra.io.write_sbml_model(model,'../model/g-thermo.xml')

__GAPDH__
Beata's thesis shows that the 3-phosphoglyceraldehyde dehydrogenase found in *P. thermoglucosidasius* NCIB11955 has a NADP(H) dependent GADPH, where many organisms have a NAD(H) dependent version of the enzyme. In vitro enzyme assays show that the enzyme hasa a 30 fold higher catalytic efficiency with NADP(H) than NAD(H) (See table 3.1 of the thesis). So the enzyme can in theory work with both NAD and NADP.

In the model, there are currently two reactions: one per co-factor used. However, the model cannot distinguish between this different efficiency between the reactions and so will just chose one of the two, depending on what suites the stoichiometry of the model better. (This is a general drawback of stoichiometric models.) This of course does not reflect physiology very well. 

After discussion with Niko, he recommended to remove the NADH dependent reaction. As the ratio between the two appears to be quite high, the majority of the flux will probably be carried through the NADP(H) dependent reaction. To mimic the model closer to physiology, removing the NADH reaction would achieve that. Ofcourse this is not perfect, but for this problem currently the best solution in the model. 

In [6]:
model.remove_reactions(model.reactions.GAPD)


need to pass in a list


need to pass in a list



In [7]:
model.reactions.GAPDH_nadp_hi.id = 'GAPDH'

In [None]:
#save & commit
cobra.io.write_sbml_model(model,'../model/g-thermo.xml')

In [89]:
solution = model.optimize()
solution.objective_value

0.5171233345023867

In [86]:
escher.Builder('e_coli_core.Core metabolism', reaction_data=solution.fluxes.to_dict())

Downloading Map from https://escher.github.io/1-0-0/6/maps/Escherichia%20coli/e_coli_core.Core%20metabolism.json


Builder(reaction_data={'IDPh': 9.611769293410603, 'PYRACTT': 0.0, 'CAT': 0.0, 'PDHam1hi': 0.0, 'CCP': -0.00773…