# Introduction
In this notebook, we will look into the NADPH metabolism further, as we've observed our model currently doesn't use the oxidative PPP when an FBA is run. 

In [1]:
import cameo
import pandas as pd
import cobra.io
import escher
from escher import Builder
from cobra import Metabolite,Reaction

In [2]:
model = cobra.io.read_sbml_model('../model/g-thermo.xml')

In [3]:
model_e_coli = cameo.load_model ('iML1515')

In [4]:
model_b_sub = cameo.load_model ('iYO844')

In [21]:
model.metabolites.nadph_c.summary()

Unnamed: 0_level_0,Unnamed: 1_level_0,PERCENT,FLUX,REACTION_STRING
RXN_STAT,ID,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
PRODUCING,GADPH,65.359281,17.815294,g3p_c + nadp_c + pi_c <=> 13dpg_c + nadph_c
PRODUCING,ICDHyr,32.933477,8.976837,icit_c + nadp_c <=> akg_c + co2_c + nadph_c
PRODUCING,MTHFD,1.707242,0.465351,mlthf_c + nadp_c <=> methf_c + nadph_c
CONSUMING,P5CR,58.944041,16.066661,nadp_c + pro__L_c <-- h_c + nadph_c + pyr5c_c
CONSUMING,GLUSy,18.964368,5.169209,akg_c + gln__L_c + h_c + nadph_c --> 2.0 glu__...
CONSUMING,ASAD,2.600234,0.708758,4pasp_c + nadph_c <=> aspsa_c + nadp_c + pi_c
CONSUMING,KARA1,2.289112,0.623954,23dhmb_c + nadp_c <=> alac__S_c + 2.0 h_c + na...
CONSUMING,HSDy,1.734226,0.472706,aspsa_c + h_c + nadph_c <=> hom__L_c + nadp_c
CONSUMING,SULR,1.572852,0.42872,3.0 h_c + 3.0 nadph_c + so3_c --> 3.0 h2o_c + ...
CONSUMING,ACCR2,1.403486,0.382555,acc2_c + nadph_c --> hacc2_c + nadp_c


The majority of NADPH is made through the GAPDH enzyme. Previously, the model has two reactions that captivate the EC 1.2.1.59 annotation: one with NADH and one with NADPH. Condisering data from Beata's thesis, we decided to remove the NADH enzyme to prevent the redundancy and force the correct co-factor to be used. At the time, we put the enzyme irreversibly, as this step is known to be important for both glycolysis and gluconeogenesis. However, looking into this further, it seems an annotation in our model was missing. In the strain, EC 1.2.1.12 is also present: This is a G3P dehydrogenase that uses NADH. 

Again this would re-introduce the question of redundancy and possibility of cycles into the model. But, looking into literature research further (e.g. Fillinger et al. 2000 doi: 10.1074/jbc.275.19.14031) we see that the NADH dependent enzyme is critical for glycolysis, where the NADPH dependent enzyme is restricted to gluconeogensis in B. subtilis. As we have no further data or indications of how this functions otherwise in our strain, it would be best to assume a similar physiology to B. subtilis. Therefore, here I will add the EC1.2.1.12 reaction, and fix the bounds of the NADPH reaction too. 

After that, we will re-evaluate where the models NADPH comes from and if there are more issues that were unidentified. 

In [22]:
#add EC. 1.2.1.12 reaction
model.add_reaction(Reaction(id='GAPD', name = 'glyceraldehyde-3-phosphate dehydrogenase'))

In [23]:
model.reactions.GAPD.annotation = model_e_coli.reactions.GAPD.annotation

In [24]:
model.reactions.GAPD.annotation['ec-code'] = '1.2.1.12'

In [25]:
model.reactions.GAPD.add_metabolites({
    model.metabolites.g3p_c:-1,
    model.metabolites.nad_c:-1,
    model.metabolites.pi_c:-1,
    model.metabolites.get_by_id('13dpg_c'):1,
    model.metabolites.nadh_c:1
})

In [26]:
#typo in rct ID
model.reactions.GADPH.id = 'GAPDH'

In [27]:
#fix bounds of the NADPH reaction to only allow gluconeogenesis
model.reactions.GAPDH.bounds = (-1000,0)

In [28]:
#save&commit
cobra.io.write_sbml_model(model,'../model/g-thermo.xml')

Now that we've changed this, we should check the new flux distribution in the map we've made. But first I will modify it to include the new two reactions in the G3P node. 

In [4]:
solution = model.optimize()

In [5]:
Builder(model = model, map_json = '../map/g-thermo-map.json', reaction_data = solution.fluxes.to_dict())

Builder(reaction_data={'IDPh': 1.7578569544053102, 'CAT': 0.018911036200955408, 'PDHam1hi': 0.0, 'HYDA': 0.0, …

After the changes, we see that there is still no flux through the PPP. The majority is made through the ICDHyr reaction. This is similar in the B. subtilis model, which means something is wrong in both. It makes sense to investigate the Isocitrate dehydrogenase step further. After looking into it further, it turns out that only in eukaryotes this is NADH associated. In prokaryotes the ICDHyr reaction uses NADPH. So i have no reason to believe that this would react with NADH in our case.

Ben then recommended trying to increase the NADPH demand in our model. To test this, one can synthetically increase the NADPH needed to make biomass. At some point we would expect that the PPP is turned on, as this is the most efficient NADPH producing pathway. However, doing so didn't change the flux through the PPP at all... Which is quite strange and so the problem must lie elsewhere.

I've also re-checked the complete PPP and all reactions and reversibilities are correct so this cannot be the cause of the problems. 
Instead, I spotted another reaction which I thought maybe has a wrong cofactor associated to it: HBCOAOR. This reaction converts 3-hydroxybutanoyl-Coa into acetoacetyl-coa and generates some NADPH in the process. This reaction does carry some flux in the optimization. Just to check, I turned off the reaction by setting its bounds to (0,0), and then when analyzing the FBA, you can see that the PPP now does carry flux. Now I need to figure out why blocking this reaction turns the PPP back on and if it really should be turned off or changed in our model.

In [6]:
solution['HBCOAOR']

0.10155164141945267

In [7]:
model.reactions.HBCOAOR

0,1
Reaction identifier,HBCOAOR
Name,(S)-3-Hydroxybutanoyl-CoA:NADP+ oxidoreductase
Memory address,0x014639cc5048
Stoichiometry,hbcoa_c + nadp_c --> aacoa_c + 4.0 h_c + nadph_c  3-Hydroxybutanoyl-CoA + NADP+ --> Acetoacetyl-CoA + 4.0 H+ + NADPH
GPR,RTMO04827 or RTMO02612 or RTMO04473 or RTMO01626
Lower bound,0.0
Upper bound,1000.0


In [80]:
# model.reactions.HBCOAOR.bounds = (0,0)
# solution = model.optimize()
# Builder(model = model, map_json = '../map/g-thermo-map.json', reaction_data = solution.fluxes.to_dict())

Looking into it a bit, I see the issue! There is a cycle through these metabolites that can convert NADH into NADPH. Of course this is not expected to be the case in metabolism. So I will check each reaction seperatly and make sure the reversibility is fixed to prevent this cycling from happening.

Overall, the enzyme for the HBCOAOR reaction (EC 1.1.1.57) is generally not so well studied. But in the organisms that it has been investigated in, acetoacetyl-CoA is the substrate, and NADPH is consumed. Also, when looking at the thermodynamics of the reaction, it would be expected that the reaction is favored in the NADPH consumption direction. As this is all the information that we have, it would make sense to assume that the reaction should be fixed in the direction of NADPH consumption. This would resolve the cycle that is forming.

In [127]:
model.reactions.HBCOAOR.bounds = (-1000,0)

In [133]:
solution = model.optimize()

In [134]:
solution.objective_value

0.7997303205948462

In [135]:
Builder(model = model, map_json = '../map/g-thermo-map.json', reaction_data = solution.fluxes.to_dict())

Builder(reaction_data={'IDPh': 1.7576614951303577, 'CAT': 0.018908933448842837, 'PDHam1hi': 0.0, 'HYDA': 0.0, …

In [12]:
#save & commit
cobra.io.write_sbml_model(model,'../model/g-thermo.xml')

Now we've fixed the issue that the oxidative PPP is turned off! However, the flux through this is still very low. This makes me think there may be more cycles like the one I spotted above that can convert NADH into NADPH. So, to check this I will look at all the stoichiometrically balanced cycles that memote predicts. I will make a small script that will check if NADPH is involved in those reactions and print the reactions that are. This should help narrow down reactions where there may be a cycle forming that can generate NADPH. I can then tackle them all seperately.

In [17]:
#list of stoichiometrically balanced cycles as predicted by memote
cycles = ["IDPh","ATPM","NADDP","GLUSy","ADK1","NMNAT","NDPK2","FADDP","AFAT","NTD7","ADNK1","ADPT","PYK","ME1","OAADC","SERD_L","ACALD","PTAr","ACS","GLU5K","GDH","PUTA3","GLNS","ACGS","GLUR","GALUi","XYLI2","ACKr","ACYP_2","NDPK1","NTP3","PPCK","MDH","PC","ALAD_L","ALAR","UAG2EMA","UAG2E","PYK3","GLYCLTDy","ASNS2","ASNN","GALT","NTD4","CYTDK1","UTCY","CYTDK2","NDPK3","PYK4","PSP_L","SERHL","TDP","TMPK","ENO","PYK2","ORNTA","ACODA","PHEOR","PHETA1","P5CD","ALDD2x","GLYCLTDx","NDPK9","PYK5","G6PDA","F6PCPT","F6PISO","F6PUPT","F6PIPT","PGI","HEX7","PTA2","PAMPT","MMSDH","FTHFLi","FTHFD","GLGC","ITCY","NTD2","URIK1","UTUP","URIK2","URIK3","PRPPS","PPM","UGLT","LLEUDr","LEUTAi","GALKr","ALATA_D","BNOR","BTCOAOR","VALTA","NTD9","GUAPRT","PPAPPT","PRPNTEAT","VALDHr","PGCD","PGM","PUNP1","TMDK1","NTD5","XYLK","NTD3","ADADir","PFK_3","SBP","ATDGDm","AGPOP","PPCOAC","DGNSK","NTD8","HACD1","NTD6","DADNK","DURIK1","NTD1","XPPT","PUNP3","ILEOR","ILETA","ORNTAC","ACOTA","PUNP7","RBK_L1","MANPT","ACGK","ACMANAHY","NTD10","HEX1","MME","DNADDP","NNATr","G5SD","G5SADs","MMTSAOR","MMALLIG","AGPR","ALCD4","ALCD4y","ALDH","PSERT","SALCNPT","PFK","FBP","SALCN6PH","HBCOAOR","HBCOAH","FMNRx","FLDO","NODOx","NODOy","MANNHY","M_6DGHY","STACHY","STACHY2","H2MB4POR","DMPPS","PGMT","2AMACHYD","EAR40x","EAR60y","EAR80x","EAR100x","EAR120x","EAR140x","EAR160x","BTNLIG","BTN5AMPL","ECOAH1","HCO3E","ATPS4r","Kt","Kt2","RU5PHY","XU5PHY","M6PHY","VALt","VALtabc","LEUtabc","LEUt","ILEt","ILEtabc","ILEt2","THRt","THRt2","CYStabc","CYSt","METabc","METt2","ARGtabc","ARGt","ARGt2","HISt","HIStabc","ASPtabc","ASPt","ASPt2","GLUtabc","GLUt","GLNtabc","GLNt","PROt","PROtabc","PROPIRE","EAR60x","EAR40y","EAR80y","EAR100y","EAR120y","EAR140y","EAR160y","GLYBtabc","GLYBt","SALGT"]
#see report '2020-06-08'

In [19]:
for rct in model.reactions:
    if rct.id in cycles:
        if model.metabolites.nadph_c in rct.metabolites:
            print (rct.id)
        else:
            continue
    else: continue

GLUSy
GLYCLTDy
BTCOAOR
G5SD
AGPR
ALCD4y
HBCOAOR
FLDO
NODOy
H2MB4POR
EAR60y
EAR40y
EAR80y
EAR100y
EAR120y
EAR140y
EAR160y


So the reactions above are involved in a stoichiometrically balanced cycle, and contain NADPH. I will go through each to make sure they don't cycle unnecesarily and create NADPH incorrectly.

After doing so, I saw that no other cycle is running that can make NADPH. There are some cycles identified here that are actually not a cycle, but originates from the fact that some reactions are shown in the reverse order. For example EAR60y and EAR60x. They are read by memote as a cycle even though the bounds given show that they are not. So therefore we can now assume this issue has been fixed.

In [64]:
model.reactions.EAR60x

0,1
Reaction identifier,EAR60x
Name,hexanoyl-[acp]:NAD+ trans-2-oxidoreductase
Memory address,0x014639f98248
Stoichiometry,hexACP_c + nad_c <-- h_c + nadh_c + thex2eACP_c  Hexanoyl-ACP (n-C6:0ACP) + NAD+ <-- H+ + NADH + Trans-Hex-2-enoyl-[acyl-carrier protein]
GPR,
Lower bound,-1000.0
Upper bound,0.0


In [62]:
model.reactions.EAR60y

0,1
Reaction identifier,EAR60y
Name,hexanoyl-[acp]:NAD+ trans-2-oxidoreductase
Memory address,0x014639dc3fc8
Stoichiometry,h_c + nadph_c + thex2eACP_c --> hexACP_c + nadp_c  H+ + NADPH + Trans-Hex-2-enoyl-[acyl-carrier protein] --> Hexanoyl-ACP (n-C6:0ACP) + NADP+
GPR,
Lower bound,0.0
Upper bound,1000.0


So with this, we will assume this issue is now solved, even though the fllux through the oxidative PPP is still quite small. In Tang et al, 30% of the carbon fed goes towards the PPP, though the fluxes don't really add up in the way that they are shown in the paper...
It seems that this is because the non-oxidative part still carries flux in the direction towards rup_c instead of away from it.. This warrants some extra investigation still.

## Non-oxidative PPP
The reason this is going in the reverse direction is due to the minimal NADPH demand in metabolism currently. When we increase the NADPH consumed in the biomass reaction (and NADP formed) you get an increase through the PPP and the right direction as one would expect. So we need to investigate further where there is too little NADPH demand in anabolism. 

It seems that alanine is made via a reaction one wouldn't expect: SELCYSLY. The seln that is made is reconverted into selcys by reacting with acser. These reactions are normally used for the degradation of selenocysteie compounds not alanine production. 

I've spotted the problem: the ACSERL reaction is incorrect. This should reflect the EC 2.5.1.47 reaction or R03601, cysteine synthase. Looking into it, I see that Kegg and Brenda have different reactions for this enzyme code. According to brenda, and also the naming of this enzyme step, it should convert O-acetyl-L-serine	+ hydrogen sulfide = L-cysteine	+ acetate. In our model, we have the kegg-version of the reaction: where hydrogen selenide is incorporated and selenocysteine is made instead of cysteine. 
We have the correct EC 2.5.1.47 reaction already in our model as CYSS. After looking into it the ACSERL reaction seems to be incorrectly addded. So I will remove it and check if alanine is then again made as it should be.


In [155]:
model.remove_reactions(model.reactions.ACSERL)


need to pass in a list



Hm, fixing this reaction shut off the oxidative PPP again...  It removes some of the NADPH requirement, and so the PPP is shut off. so again, we really mostly need to revisit the NADPH demand in the model to be able to fix this issue. 

__Methionine biosynthesis__
I noticed that methionine is currently synthesized through the degradation of SAM, instead of via cysteine as one would expect. Therefore, I looked into this a bit. It turns out that we have the SHSL1, SHSL2 and SHSL4h reactions in the model. All three of these correspond to the annotation EC 2.5.1.48, though they all have different KEGG IDs. 
SHSL2 has the additional annotation of EC 2.5.1.-. Looking into these reactions more, the SHSL1 reaction is generally the main reaction of this enzyme.

I then checked if the other two reactions are not side-reactions of the enzyme or if there is any indication as to why they should be present in the model. 
About the enzyme the IUBMB states 'A pyridoxal-phosphate protein. Also reacts with hydrogen sulfide and methanethiol as replacing agents, producing homocysteine and methionine, respectively. In the absence of thiol, can also catalyse beta,gamma-elimination to form 2-oxobutanoate, succinate and ammonia.' 
This means that in the absence of thiol (i.e. reaction SHSL1 and SHSL2), the SHSL4h reaction would be possible. But currently, we have flux through both SHSL2 and SHSL4h, which should not be possible. Considering that cysteine will always be present in the cell, the chance that SHSL4h will carry flux is almost zero. So to mimic this, and prevent it from carrying flux I will fix the bounds of the reaction to (0,0). I will not remove it, as the reaction can be an option, incase someone would need to investigate it.

Kanzaki et al. 1985 (https://academic.oup.com/femsle/article/33/1/65/522488 ) did a study of various cystathionine ,/-synthase enzymes from different bacteria and investigated their substrate preference. There they saw that gram positive, aerobic bacteria from the Bacillus genus generally used O-acetyl-L-homoserine as substrate to form cystathionine. (B. subtilis enzyme even exclusively does so) E. coli on the other hand is known (and shown) to predominantly use  O-succinyl-Lhomoserine  as substrate. 

Auger et al 2002 (https://www.microbiologyresearch.org/docserver/fulltext/micro/148/2/1480507a.pdf?expires=1591795358&id=id&accname=guest&checksum=2A5A2C8AF1ABE005358418AC1F5C373F ) investigate methionine biosynthesis in B subtilis further and saw that the MetI protein has both cystathionine γ-synthase and O-acetylhomoserine thiolyase activities. The Cystathionine y-synthase actvitty proposed here requires O-acetylhomoserine as substrate, in agreemet with the Kanzaki et al. 1985 paper. 

One issue in our model currently, we do not have the production of the O-acetylhomoserine. This is done via a homoserine O-acetyltransferase enzyme (2.3.1.31). It is annotated in our genome, just not present in the model. So I will add this here.

So overall:
- remove SHSL4h
- remove SHSL2 and SHSL1 reactions
- The cystathionine y-synthase with acetylhomoserine is captured in reaction AHSL
- the acetylhomoserine thiolase activity is shown in reaction AHHST
- Add the homoserine transacetylase reaction (ec 2.3.1.31): to produce acetylhomoserine

In [4]:
model.reactions.SHSL4h.bounds = (0,0)

In [5]:
model.remove_reactions(model.reactions.SHSL1)


need to pass in a list


need to pass in a list



In [6]:
model.remove_reactions(model.reactions.SHSL2)

In [7]:
#should be irreversible based on thermodynamics
model.reactions.AHHST = (0,1000)

In [8]:
#should be irreversible based on thermodyanmics
model.reactions.AHSL = (0,1000)

In [9]:
#add homoserine transacetylase
model.add_reaction(Reaction(id = 'HSERTA', name = 'homoserine O-acetyltransferase'))

In [10]:
model.reactions.HSERTA.annotation['sbo'] = 'SBO:0000176'
model.reactions.HSERTA.annotation['kegg.reaction'] = 'R01776'
model.reactions.HSERTA.annotation['ec-code'] = '2.3.1.31'
model.reactions.HSERTA.annotation['rhea'] = '13704'

In [11]:
model.groups.get_by_id('00270 - Cysteine and methionine metabolism').add_members(model.reactions.HSERTA)


need to pass in a list



In [12]:
model.reactions.HSERTA.add_metabolites({
    model.metabolites.accoa_c:-1,
    model.metabolites.hom__L_c:-1,
    model.metabolites.coa_c:1,
    model.metabolites.achms_c:1
})

In [13]:
#save&commit
cobra.io.write_sbml_model(model,'../model/g-thermo.xml')

Finally I need to modify the map of AA metabolism so that it is correctly shown.

In [78]:
Builder(model = model, map_json = '../map/amino-acid-biosynthesis.json')

Builder()