# Introduction
In previous work, Martyn has identified that there was an excessive amount of transport reactions in the Matteo's version of the model. He has looked into these reactions by hand and come to the conclusion which should be removed and which should be maintained. He provided this in an excel file, stored in '../databases/Transports allowed to remain_Martyn_Bennett'. 
Here I will look into this a bit more and remove unnecessary transport reactions from the working model.

In [1]:
import cameo
import pandas as pd
import cobra.io
import escher

In [2]:
model = cobra.io.read_sbml_model('../model/g-thermo.xml')

In [3]:
#generate a list of transport reactions
transport =[]
for rct in model.reactions:
    if rct.id[-1:] in 't':
        transport.append(rct)
    else:
        continue
len(transport)

152

So in the model, there are currently 152 transport reactions. I will try to remove all the reactions that Martyn recommended and reflect on their effect on biomass prediction whether they should be removed or not.

Again, here we have the problem that the file Martyn provided has rct ID's that are numerical and not BiGG compliant. This makes the script a bit more complex. The transport reactions don't have KEGG Ids so we cannot map the ID via this way. Instead, we will take the information about what metabolite is being transported to find the new metabolite ID and hence the new transport ID.

In [4]:
matteo = cobra.io.read_sbml_model('../databases/g-thermo-Matteo.xml')

In [5]:
#first convert the list of transport reactions to keep from martyns file to the model IDs.
transport_keep = ['M_14_e_out','M_29_e_out','M_1754_e_out','M_7_e_out','M_11_e_out','M_71_e_out','M_280_e_out','M_Biomass_e_out','M_154_e_out','M_320_e_out','M_222_e_out','M_31_e_out','M_214_e_out','M_38_e_out','M_204_e_out','M_79_e_out','M_229_e_out','M_21_e_out','M_3200_e_out','M_163_e_out','M_1_e_out']
len(transport_keep)

21

In [6]:
#dataframe of all metabolites and kegg IDs in working model
#make lists for all metabolite names from the working model
model_met_ID = []
model_met_name = []
model_met_kegg = []
for met in model.metabolites:
    model_met_ID.append(met.id)
    model_met_name.append(met.name)
    try: 
        model_met_kegg.append(met.notes['KEGG'])
    except KeyError:
        model_met_kegg.append('--')   

In [7]:
#make into a dataframe
model_met_df = pd.DataFrame({'Model ID' : model_met_ID, 'Model name' : model_met_name, 'Model Kegg':model_met_kegg})
model_met_df[0:5]

Unnamed: 0,Model ID,Model name,Model Kegg
0,pyridoxal_c,Pyridoxal_C4H5N2O3R2,C00030
1,pydx5p_c,Pyridoxal_phosphate_C4H5N2O3R2,C00018
2,co2dam_c,"Cob(II)yrinate-a,c-diamide",C06504
3,adhlam_c,S-Acetyldihydrolipoamide-E,C16255
4,selmethtrna_c,Selenomethionyl-tRNA(Met),C05336


In [8]:
#make list of metabolites that should be transported
transported_mets = []
for rct in transport_keep:
    met = rct[2:-4]
    try: 
        met_kegg = matteo.metabolites.get_by_id(met).notes['KEGG']
        met_id_model = model_met_df.loc[model_met_df['Model Kegg'] == met_kegg,'Model ID'].values[0]
        transported_mets.append(met_id_model[:-2])
    except KeyError:
        print (met)
    

Biomass_e


So we've been able to map all metabolites to the new ID system in the working model. Now I will gather these into a 'should be' transported list.

In [9]:
transported_mets_rct =[]
for met in transported_mets:
    rct = met.upper() + 't'
    transported_mets_rct.append(rct)
len(transported_mets_rct)

20

In [10]:
#now test what happens when we remove all transports except these martyn identified:
with model:
    for rct in transport:
        with model:
            if rct.id in transported_mets_rct:
                continue
            else: 
                model.remove_reactions(rct)
                biomass = model.optimize().objective_value
                if biomass > 0.75:
                    print ('removing', rct, 'gives biomass', biomass)
                elif biomass <0.72: 
                    print ('removing', rct, 'gives biomass', biomass)
            


need to pass in a list


need to pass in a list



removing CLt: cl_e --> cl_c gives biomass 0.0
removing ASN__Lt: asn__L_e --> asn__L_c gives biomass 0.7149572143742057
removing PYDX5Pt: pydx5p_c --> pydx5p_e gives biomass 0.7149572143742056
removing QH2t: qh2_e --> qh2_c gives biomass 0.0
removing GTHRDt: gthrd_e --> gthrd_c gives biomass 0.0
removing BIOMASSt: Biomass_c --> Biomass_e gives biomass 0.0
removing THMTPt: thmtp_c --> thmtp_e gives biomass 0.5721288617563675


The above analysis shows that removing these reactions will give a decrease in biomass formation when removed as individual transport reactions. Some of these reactions you would not expect should cause a difference. So this is something we should look into further later.

I will also check what happens when we remove the reactions cumulatively. (here i will not include the reactions that individually kill biomass already)


In [11]:
additional_trans = ['CLt', 'ASN__Lt', 'PYDX5Pt', 'QH2t', 'GTHRDt','BIOMASSt','THMTPt']

In [12]:
more_transport = transported_mets_rct
for rct in additional_trans:
    more_transport.append(rct)
len(more_transport)

27

In [15]:
with model:
    for rct in transport:
        if rct.id in more_transport:
            continue
        else: 
            model.remove_reactions(rct)
            biomass = model.optimize().objective_value
            if biomass > 0.75 or biomass <0.70: 
                print ('removing', rct, 'gives biomass', biomass)
            else:
                continue

removing AACOAt: aacoa_c --> aacoa_e gives biomass 0.648639997443846
removing CAPRAt: capra_e --> capra_c gives biomass 0.6486399974438469
removing GTHOXt: gthox_c --> gthox_e gives biomass 0.6486399974438464
removing ACETOLt: acetol_c --> acetol_e gives biomass 0.6486399974438459
removing SELMETHTRNAt: selmethtrna_c --> selmethtrna_e gives biomass 0.6486399974438465
removing MTHPGLUt: mthpglu_e --> mthpglu_c gives biomass 0.6486399974438465
removing M_5FTHFt: 5fthf_c --> 5fthf_e gives biomass 0.6486399974438464
removing FTETHPLGt: ftethplg_c --> ftethplg_e gives biomass 0.6486399974438466
removing HQNt: hqn_c --> hqn_e gives biomass 0.64863999744366
removing CPPPG1t: cpppg1_c --> cpppg1_e gives biomass 0.6486399974438466
removing APPLt: appl_e --> appl_c gives biomass 0.6486399974438466
removing THRPt: thrp_e --> thrp_c gives biomass 0.6486399974438463
removing DLGLUt: glu__DL_c --> dlglu_c gives biomass 0.6486399974438463
removing TCYNTt: tcynt_c --> tcynt_e gives biomass 0.648639997

In [14]:
more_transport.append('NADPt')

From each pass, one can see what the first metabolite is that kills biomass formation. This can then be added to the list of transports to keep and then re-run to find the total list of transports to maintain.

The only other transport that totally kills biomass is NADPt. This is wierd as normally this should not be supplied to a cell. Keep this in mind for further analysis. 

with the total list of transports to maintain, we can remove all the others from the actual model.

In [16]:
additional_trans = ['CLt', 'ASN__Lt', 'PYDX5Pt', 'QH2t', 'GTHRDt','BIOMASSt','THMTPt', 'NADPt']

In [17]:
tot_transport = transported_mets_rct
for rct in additional_trans:
    tot_transport.append(rct)
len(tot_transport)

36

In [18]:
for rct in transport:
    if rct.id in tot_transport:
        continue
    else:
        model.remove_reactions(rct)
model.optimize().objective_value

0.5710737454824805

In [19]:
#save&commit
cobra.io.write_sbml_model(model,'../model/g-thermo.xml')

In [1]:
# what about other sugars?? e.g. xylose etc. See figure of which sugars geobacillus grows on!