# Refinement notebook

This Notebook aims to refine the *L. starkeyi* (lst) GSM using biolog experimental data - which determines metabolic activity on various carbon, nitrogen, phosphorus, and sulfur substrates. More information on the biolog phenotypic assays can be found at the company website (here is the link to their [informative brochure](https://www.biolog.com/wp-content/uploads/2020/05/00A-037rA-PM-Microbiology-2011.pdf)). Essentially, there is a dye in each microarray well and if there is metabolic activity, the dye is reduced by NADH which leads to a measurable change in signal. This signal is measured at one wavelength (750 nm) while biomass optical density is measured at a second wavelenght (590 nm). For the purposes of validating the GSM, The biolog and GSM prediction data are classified as "Growth" or "No-growth". Instances where the GSM predictions were incorrect are subjectived to manual refinement and gap filling procedures. 

The code has been adapted from [the *R. toruloides* process](https://www.frontiersin.org/articles/10.3389/fbioe.2020.612832/full) detailing a similar process for *R. toruloides*. 


import needed libraries

In [1]:
%matplotlib inline
from matplotlib import pyplot as plt
from matplotlib import colors
import numpy as np
import pandas as pd
import cobra
import seaborn as sns
from sklearn.metrics import confusion_matrix, matthews_corrcoef
import core
from cobra.flux_analysis import gapfill

Load the *Lipomyces starkeyi* (Lst) GSM created in the **LST_draft_model_from_Rt_IFO0880_** notebook.

In [2]:
# load cobra LST model. 
# the v0.1 model has only been gap filled to enable growth. It includes reactions that were annotated in the sceModel but not the Lst model 
model = cobra.io.load_json_model("../models/lst_v0_3b_forPub.json")

Set parameter TokenServer to value "leghorn.emsl.pnl.gov"


load models to use as scaffolds. 

In [3]:

rto = cobra.io.load_json_model('../models/Rt_IFO0880.json')
sce = cobra.io.load_json_model('../models/iMM904.json')
ylip = cobra.io.load_matlab_model('../models/twoModels/iYLI647_corr.mat')
eco  = cobra.io.load_json_model('../models/iJO1366.json')
yeast8 = cobra.io.load_matlab_model('../models/yeast8_modifiedwBIGGnames.mat')


This model seems to have confidenceScores instead of rxnConfidenceScores field. Will use confidenceScores for what rxnConfidenceScores represents.
This model seems to have metCharge instead of metCharges field. Will use metCharge for what metCharges represents.
No defined compartments in model model. Compartments will be deduced heuristically using regular expressions.
Using regular expression found the following compartments:c, e, g, m, n, r, v, x


In [4]:
# adjust iYLI647 parameters for consistent annotation. 
for m in ylip.metabolites:
    if ('[' in m.id):
        ylip.metabolites.get_by_id(m.id).id = m.id.replace('[','_').replace(']','')
    if ('_L_' in m.id):
        ylip.metabolites.get_by_id(m.id).id = m.id.replace('_L_','__L_')
    if ('_D_' in m.id):
        ylip.metabolites.get_by_id(m.id).id = m.id.replace('_D_','__D_')
for r in ylip.reactions:
    if ('(e)' in r.id):
        ylip.reactions.get_by_id(r.id).id = r.id.replace('(e)','_e')
    if(('_L_') in r.id):
        ylip.reactions.get_by_id(r.id).id = r.id.replace('_L_','__L_')

    if(('_D_') in r.id):
        ylip.reactions.get_by_id(r.id).id = r.id.replace('_D_','__D_')


# Biolog data set up. 

load biolog data.  

Biolog plate assays were conducted with triplicate plates and dynamic measuring of 590 and 750 nm wavelengths. To determine growth significance, a threshold was set above the values recorded for the negative control. If the wavelength surpassed the highest negative control value for multiple timepoints, it was considered a true growth condition. The 590 and 750 wavelengths gave consistent results in terms of growth. 


The biolog phenotypic plates set ups can be viewed [here](https://www.biolog.com/wp-content/uploads/2020/04/00A-042-Rev-C-Phenotype-MicroArrays-1-10-Plate-Maps.pdf).

Plates PM1-4 were used.

## Biolog Assay - Data loading and prep.

The biolog data was collected from **Insert plate reader name here**. Two wavelengths, 590 & 750 nm were collected which represented biomass growth and metabolic activity, respectively. 


In [5]:
# well names for each plate. 
well_rows = 'ABCDEFGH'
well_columns = ['1','2','3','4','5','6','7','8','9','10','11','12']
well_names = [f'{row}{column}' for row in well_rows  for column in well_columns ]

In [6]:
# read in the data. 
Biolog = pd.read_csv('../data/round2_2023_dataForSim.csv',
#                     skiprows=10,index_col=0)
                        usecols=list(range(0,7)))

# # create the index out of the plate (PM1-4) and the well number.
Biolog.index = Biolog['PlateType']+'_'+Biolog['Well']
Biolog.head()

Unnamed: 0,PlateType,Experiment,Well,Row,Column,Compound,SigGrowth?
PM1_A1,PM1,Carbon,A1,1,1,Negative Control,False
PM1_A2,PM1,Carbon,A2,1,2,L-Arabinose,False
PM1_A3,PM1,Carbon,A3,1,3,N-Acetyl-DGlucosamine,False
PM1_A4,PM1,Carbon,A4,1,4,D-Saccharic Acid,False
PM1_A5,PM1,Carbon,A5,1,5,Succinic Acid,True


In [7]:
Biolog[Biolog['Compound']=='Negative Control']

Unnamed: 0,PlateType,Experiment,Well,Row,Column,Compound,SigGrowth?
PM1_A1,PM1,Carbon,A1,1,1,Negative Control,False
PM2_A1,PM2,Carbon,A1,1,1,Negative Control,False
PM3B_A1,PM3B,Nitrogen,A1,1,1,Negative Control,False
PM4A_A1,PM4A,Phosphorus,A1,1,1,Negative Control,False
PM4A_F1,PM4A,Sulfur,F1,6,1,Negative Control,False


#### correcting known growth compounds. 

In [8]:
Biolog[Biolog['Compound']=='Glycerol  ']

Unnamed: 0,PlateType,Experiment,Well,Row,Column,Compound,SigGrowth?
PM1_B3,PM1,Carbon,B3,2,3,Glycerol,False


In [9]:
Biolog.loc['PM1_B3','SigGrowth?']=True

In [10]:
Biolog.loc['PM1_B3']['SigGrowth?']

True

In [11]:
Biolog[Biolog['Compound']=='Inulin']

Unnamed: 0,PlateType,Experiment,Well,Row,Column,Compound,SigGrowth?
PM2_A9,PM2,Carbon,A9,1,9,Inulin,False


In [12]:
Biolog.loc['PM2_A9','SigGrowth?']=True

In [13]:
Biolog[Biolog['Compound']=='Citric Acid']

Unnamed: 0,PlateType,Experiment,Well,Row,Column,Compound,SigGrowth?
PM1_F2,PM1,Carbon,F2,6,2,Citric Acid,False


In [14]:
Biolog.loc['PM1_F2','SigGrowth?']=True

# Correlate Biolog chemicals being assessed with their corresponding metabolites in the lst GSM. 

In order to test the GSM predictions - we need to take the biolog compound (i.e., Succinate) and test the model ability to grow on it as a carbon source. To do so, we need to know the uptake reaction ID in the GSM. Because the lst model was built using rto as a scaffold, we have the same reaction names and conventions. Thus, we can load that from the information which was provided in the supporting information of [Joonhoon's paper](https://www.frontiersin.org/articles/10.3389/fbioe.2020.612832/full). 

#### Load in the metadata about each biolog experiment, including the model metabolite names needed for growth. 


In [15]:
# read in the media names. 
Biolog_Media = pd.read_csv('../data/biolog_medium_csv.csv',
                           usecols=list(range(0,6)),index_col=0)
Biolog_Media = Biolog_Media.loc[Biolog_Media[['Carbon','Nitrogen','Phosphorus','Sulfur']].dropna(axis=0, how='all').index]
Biolog_Media.head()

Unnamed: 0_level_0,Source,Carbon,Nitrogen,Phosphorus,Sulfur
biolog,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
PM1_A2,Carbon,EX_arab__L_e,EX_nh4_e,EX_pi_e,EX_so4_e
PM1_A3,Carbon,EX_acgam_e,EX_nh4_e,EX_pi_e,EX_so4_e
PM1_A4,Carbon,EX_glcr_e,EX_nh4_e,EX_pi_e,EX_so4_e
PM1_A5,Carbon,EX_succ_e,EX_nh4_e,EX_pi_e,EX_so4_e
PM1_A6,Carbon,EX_gal_e,EX_nh4_e,EX_pi_e,EX_so4_e


We now generate a mapping between metabolites and carbon source.

In [16]:
print('Biolog assayed {} metabolites'.format(len(Biolog)))
print('GSM has {} metabolites that are a part of the model'.format(len(Biolog_Media)))
print('A total coverage of {:.1f}%'.format(len(Biolog_Media)/len(Biolog)*100))

Biolog assayed 384 metabolites
GSM has 285 metabolites that are a part of the model
A total coverage of 74.2%


### Preparing for model predictions. 

Create a dataframe to store our information on predictions. 

In [17]:
# create a dataframe to hold our information. 
Biolog_in_model = pd.DataFrame(columns=['Biolog','Model','Exchange','Metabolite',
                                        'Internal','External','SigGrowth?'])

# across the biolog data. 
for i, row in Biolog.iterrows():
    
    # grab the corresponding metabolic reaction. 
    if i in Biolog_Media.index:   
        
        # obtain the biolog medium.
        x = Biolog_Media.loc[i]
        
        # obtain the exchange reaction corresponding to the medium.
        Biolog_in_model.loc[i,'Exchange'] = x[x['Source']]
        
        # grab the metabolite name (i.e., get rid of the EX_ part of reaction).
        Biolog_in_model.loc[i,'Metabolite'] = x[x['Source']].replace('EX_','')
        
        # perform further splitting if there are more than one source in the biolog assay.
        if ',' in Biolog_in_model.loc[i,'Metabolite']:
                  
            # detrmine external metabolite in the model corresponding to the reaction name, look to see if there is a internal metabolite by replacing the '_e' tail with '_c'.
            Biolog_in_model.at[i,'Model'] = [model.metabolites.get_by_id(x).name if x in model.metabolites \
                else next((model.metabolites.get_by_id(x.rsplit('_',1)[0]+'_'+c).name for c in model.compartments \
                    if x.rsplit('_',1)[0]+'_'+c in model.metabolites), None) \
                for x in Biolog_in_model.loc[i,'Metabolite'].split(',')]
            
            # store the internal metabolites (with the '_c' tail). 
            Biolog_in_model.loc[i,'Internal'] = all(any(x.rsplit('_',1)[0]+'_'+c in model.metabolites
                                                        for c in model.compartments) \
                                                     for x in Biolog_in_model.loc[i,'Metabolite'].split(','))
            
            # store the external metaoblites (with the '_e' tail).
            Biolog_in_model.loc[i,'External'] = all(x in model.metabolites \
                                                    for x in Biolog_in_model.loc[i,'Metabolite'].split(','))
        
        # if there is only one source in the biolog assay. 
        else:
            # detrmine external metabolite in the model corresponding to the reaction name, look to see if there is a internal metabolite by replacing the '_e' tail with '_c'.
            Biolog_in_model.loc[i,'Model'] = model.metabolites.get_by_id(Biolog_in_model.loc[i,'Metabolite']).name \
                if Biolog_in_model.loc[i,'Metabolite'] in model.metabolites \
                else next((model.metabolites.get_by_id(Biolog_in_model.loc[i,'Metabolite'].rsplit('_',1)[0]+'_'+c).name \
                    for c in model.compartments 
                          if Biolog_in_model.loc[i,'Metabolite'].rsplit('_',1)[0]+c in model.metabolites), None)
            
            # store the internal metabolites (with the '_c' tail). 
            Biolog_in_model.loc[i,'Internal'] = any(Biolog_in_model.loc[i,'Metabolite'].rsplit('_',1)[0]+'_'+c
                                                    in model.metabolites for c in model.compartments)
            
            # store the external metaoblites (with the '_e' tail).
            Biolog_in_model.loc[i,'External'] = Biolog_in_model.loc[i,'Metabolite'] in model.metabolites
    
    # case where there is no uptake/metabolite reactions corresponding to the sources tested in biolog (i.e, Tween80). 
    else:
        Biolog_in_model.loc[i] = None
    
    # store the growth data from the biolog assay in the storage dataframe. 
    Biolog_in_model.loc[i,'Biolog'] = row['Compound']
#     Biolog_in_model.loc[i,'Average'] = row['Average']
    Biolog_in_model.loc[i,'SigGrowth?'] = row['SigGrowth?']

  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog

In [18]:
Biolog_in_model.head(50)

Unnamed: 0,Biolog,Model,Exchange,Metabolite,Internal,External,SigGrowth?
PM1_A1,Negative Control,,,,,,False
PM1_A2,L-Arabinose,L-Arabinose,EX_arab__L_e,arab__L_e,True,True,False
PM1_A3,N-Acetyl-DGlucosamine,N-Acetyl-D-glucosamine,EX_acgam_e,acgam_e,True,True,False
PM1_A4,D-Saccharic Acid,,EX_glcr_e,glcr_e,False,False,False
PM1_A5,Succinic Acid,Succinate,EX_succ_e,succ_e,True,True,True
PM1_A6,D-Galactose,D-Galactose,EX_gal_e,gal_e,True,True,True
PM1_A7,L-Aspartic Acid,L-Aspartate,EX_asp__L_e,asp__L_e,True,True,False
PM1_A8,L-Proline,L-Proline,EX_pro__L_e,pro__L_e,True,True,False
PM1_A9,D-Alanine,D-Alanine,EX_ala__D_e,ala__D_e,True,True,False
PM1_A10,D-Trehalose,Trehalose,EX_tre_e,tre_e,True,True,False


# Running GSM predictions.

create a dataframe to hold our GSM prediction results. 

In [19]:
Biolog_Prediction = pd.DataFrame(columns=['PlateType','Experiment','Row','Column',
                                          'Data','Data_TF', 'Prediction','Prediction_TF'])

### Perform predictions.

The model carbon, nitrogen, phosphorus and sulfur sources uptake rates are set to zero unless used in the model. 


In [20]:
# threshold for being able to grow.
threshold = 1e-3

# remove the lower bound.
Biolog_Prediction = core.PerformFBAPredictions(model,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model,threshold=threshold)


In [21]:
Biolog_Prediction.head()

Unnamed: 0,PlateType,Experiment,Row,Column,Data,Data_TF,Prediction,Prediction_TF
PM1_A1,PM1,Carbon,1,1,,False,,
PM1_A2,PM1,Carbon,1,2,,False,0.724581,True
PM1_A3,PM1,Carbon,1,3,,False,0.0,False
PM1_A4,PM1,Carbon,1,4,,False,,
PM1_A5,PM1,Carbon,1,5,,True,0.0,False


In [22]:
Biolog_Prediction.Prediction = Biolog_Prediction.Prediction.astype(float)

In [23]:
# convert to float.
# Biolog_Prediction['590_avg'] = Biolog_Prediction['590_avg'].astype(float)
# Biolog_Prediction['750_avg'] = Biolog_Prediction['750_avg'].astype(float)
# Biolog_Prediction['Prediction'] = Biolog_Prediction['Prediction'].astype(float)

# if prediction less than 1e-6 - set as zero. 
Biolog_Prediction.loc[abs(Biolog_Prediction.Prediction) < 1e-6, 'Prediction'] = 0

In [24]:
Biolog_Prediction.head()

Unnamed: 0,PlateType,Experiment,Row,Column,Data,Data_TF,Prediction,Prediction_TF
PM1_A1,PM1,Carbon,1,1,,False,,
PM1_A2,PM1,Carbon,1,2,,False,0.724581,True
PM1_A3,PM1,Carbon,1,3,,False,0.0,False
PM1_A4,PM1,Carbon,1,4,,False,,
PM1_A5,PM1,Carbon,1,5,,True,0.0,False


Visualize the accuracy of the model classifications and look at classification metrics. 

In [25]:
# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))

52 38 56 67 213


In [26]:
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,52,38
Experiment,Growth,56,67


Obtain performance metrics.

In [27]:
# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)
# Specificity or true negative rate
TNR = TN/(TN+FP) 
# Precision or positive predictive value
PPV = TP/(TP+FP)
# Negative predictive value
NPV = TN/(TN+FN)
# Fall out or false positive rate
FPR = FP/(FP+TN)
# False negative rate
FNR = FN/(TP+FN)
# False discovery rate
FDR = FP/(TP+FP)
# Overall accuracy
ACC = (TP+TN)/(TP+FP+FN+TN)
# Matthew's
MCC = matthews_corrcoef(y_data, y_pred)
print('Recall:', TPR.round(3))
print('Precision:', PPV.round(3))
print('Accuracy:', ACC.round(3))
print('Matthew\'s correlation:', MCC.round(3))

Recall: 0.545
Precision: 0.638
Accuracy: 0.559
Matthew's correlation: 0.121


In [28]:
print('number of reactions in model\t\t\t\t\t',sum(~Biolog_in_model['Exchange'].isna()))
print('number of metabolites missing from internal\t\t\t', len(Biolog_in_model.query('External == True and Internal == False')))
print('number of metabolites missing from external\t\t\t',len(Biolog_in_model.query('External == False and Internal == True')))
print('number of metabolites in both internal and external\t\t', len(Biolog_in_model.query('External == True and Internal == True')))
print('number of metabolites missing/incorrectly annotated exchange\t',len(Biolog_in_model.query('External == False and Internal == False')))

number of reactions in model					 285
number of metabolites missing from internal			 0
number of metabolites missing from external			 0
number of metabolites in both internal and external		 213
number of metabolites missing/incorrectly annotated exchange	 72


In [29]:
Biolog_in_model.query('External == False and Internal == False').to_csv('notInModel_needsAdditions.csv')

metabolites not currently in model but are able to grow.

In [30]:
Biolog_in_model.query('External == False and Internal == False')[Biolog_in_model.query('External == False and Internal == False')['SigGrowth?']==True]

Unnamed: 0,Biolog,Model,Exchange,Metabolite,Internal,External,SigGrowth?
PM1_B11,D--Mannitol,,EX_mnl_e,mnl_e,False,False,True
PM1_C2,D--Galactonic Acid-g-Lactone,,EX_galctn__D_e,galctn__D_e,False,False,True
PM1_C6,L-Rhamnose,,EX_rmn_e,rmn_e,False,False,True
PM1_G9,Mono Methyl Succinate,,EX_methsucc_e,methsucc_e,False,False,True
PM2_A6,Dextrin,,EX_dextrin_e,dextrin_e,False,False,True
PM2_A12,Pectin,,EX_pect_e,pect_e,False,False,True
PM3B_C7,D-Lysine,,EX_lys__D_e,lys__D_e,False,False,True
PM3B_C9,D-Valine,,EX_val__D_e,val__D_e,False,False,True
PM3B_E9,D-Galactosamine,,EX_galam_e,galam_e,False,False,True
PM3B_E12,N-Acetyl-DGalactosamine,,EX_acgal_e,acgal_e,False,False,True


Examine the biolog data in predictions in more details. 

In [31]:
# create a new data frame copy. 
Biolog_Prediction_Normalized = Biolog_Prediction.copy()


In [32]:
Predictions_Biolog_Data = pd.concat([Biolog_in_model,Biolog_Prediction_Normalized],axis=1)
Predictions_Biolog_Data.drop(['Internal','External','Row','Column','PlateType','SigGrowth?'],axis=1,inplace=True)
# Predictions_Biolog_Data

In [33]:
incorrect_should_grow = Biolog_Prediction.loc[Biolog_Prediction.loc[y_pred[(y_data!=y_pred) & (y_pred==0)].index,:].index]
incorrect_shouldnot_grow = Biolog_Prediction.loc[Biolog_Prediction.loc[y_pred[(y_data!=y_pred) & (y_pred==1)].index,:].index]


In [34]:
incorrect_should_grow.head(4)

Unnamed: 0,PlateType,Experiment,Row,Column,Data,Data_TF,Prediction,Prediction_TF
PM1_A5,PM1,Carbon,1,5,,True,0.0,False
PM1_D6,PM1,Carbon,4,6,,True,0.0,False
PM1_F2,PM1,Carbon,6,2,,True,0.0,False
PM1_F3,PM1,Carbon,6,3,,True,0.0,False


In [35]:
len(incorrect_should_grow)

56

In [36]:
len(incorrect_shouldnot_grow)

38

In [37]:
# append the source to the incorrect growth.
biolog_source = []
for PMcondition in incorrect_should_grow.index:
#     print(PMcondition)
    biolog_source.append(Biolog_in_model.loc[PMcondition,:].Biolog)
incorrect_should_grow['source']=biolog_source
# incorrect_should_grow.head(4)

biolog_source = []
for PMcondition in incorrect_shouldnot_grow.index:
#     print(PMcondition)
    biolog_source.append(Biolog_in_model.loc[PMcondition,:].Biolog)
incorrect_shouldnot_grow['source']=biolog_source

In [38]:
incorrect_shouldnot_grow

Unnamed: 0,PlateType,Experiment,Row,Column,Data,Data_TF,Prediction,Prediction_TF,source
PM1_A2,PM1,Carbon,1,2,,False,0.724581,True,L-Arabinose
PM1_A7,PM1,Carbon,1,7,,False,0.435112,True,L-Aspartic Acid
PM1_A9,PM1,Carbon,1,9,,False,0.139329,True,D-Alanine
PM1_A10,PM1,Carbon,1,10,,False,1.732412,True,D-Trehalose
PM1_B1,PM1,Carbon,2,1,,False,0.346945,True,D-Serine
PM1_B6,PM1,Carbon,2,6,,False,0.822642,True,D-Gluconic Acid
PM1_B9,PM1,Carbon,2,9,,False,0.372447,True,L-Lactic Acid
PM1_B12,PM1,Carbon,2,12,,False,0.645823,True,L-Glutamic Acid
PM1_C4,PM1,Carbon,3,4,,False,0.747856,True,D--Ribose
PM1_E1,PM1,Carbon,5,1,,False,0.6468,True,L-Glutamine


## Which carbon sources should grow but were not predicted too.

In [39]:
# index of carbon sources that were predicted to NOT grow but SHOULD grow. 
indexCarbon = Biolog_Prediction.loc[Biolog_Prediction.loc[y_pred[(y_data!=y_pred) & (y_pred==0)].index,:].index,'Experiment']=='Carbon'

indexCarbon = indexCarbon[indexCarbon==True]

# print out number incorrect.
print(len(Biolog_in_model.loc[indexCarbon.index,:]))

# display the carbon sources. 
Biolog_in_model.loc[indexCarbon.index,:].Biolog



8


PM1_A5            Succinic Acid
PM1_D6     a-Keto-Glutaric Acid
PM1_F2              Citric Acid
PM1_F3             m-Inositol  
PM1_F5             Fumaric Acid
PM1_G12            L-Malic Acid
PM2_A8                 Glycogen
PM2_D4                L-Sorbose
Name: Biolog, dtype: object

let's examine the transport reactions to see if they allow uptake. 

In [40]:
# hold the reaction names.
reactionsToInvestigate = Biolog_in_model.loc[indexCarbon.index,:].Exchange.values

In [41]:
# iterate through the reactions.
for r in reactionsToInvestigate:
    
    # get the metabolites.
    met = model.reactions.get_by_id(r).metabolites
    for m in met:
        
        # print all of the reactions. 
        for reac in model.metabolites.get_by_id(m.id).reactions:
            print(m.id,reac.id,reac.bounds)
        print('\n')

succ_e EX_succ_e (0.0, 1000.0)


akg_e EX_akg_e (0.0, 1000.0)


cit_e EX_cit_e (0.0, 1000.0)


inost_e INSTt2 (0.0, 1000.0)
inost_e PHYTSe (0.0, 1000.0)
inost_e EX_inost_e (0.0, 1000.0)


fum_e EX_fum_e (0.0, 1000.0)


mal__L_e EX_mal__L_e (0.0, 1000.0)


glycogen_e EX_glycogen_e (0.0, 1000.0)


srb__L_e EX_srb__L_e (0.0, 1000.0)
srb__L_e SRB_Lt (-1000.0, 1000.0)




## Nitrogen sources that should grow but were predicted not to. 

In [42]:

indexN2 = Biolog_Prediction.loc[Biolog_Prediction.loc[y_pred[(y_data!=y_pred) & (y_pred==0)].index,:].index,'Experiment']=='Nitrogen'

indexN2 = indexN2[indexN2==True]
# Biolog_in_model.loc[Biolog_Prediction.loc[Biolog_Prediction.loc[y_pred[(y_data!=y_pred) & (y_pred==0)].index,:].index,'Experiment']=='Carbon',:]
print(len(Biolog_in_model.loc[indexN2.index,:]))
Biolog_in_model.loc[indexN2.index,:].Biolog


27


PM3B_A9                   L-Asparagine
PM3B_B3                    L-Histidine
PM3B_B4                   L-Isoleucine
PM3B_B5                      L-Leucine
PM3B_B6                       L-Lysine
PM3B_B8                L-Phenylalanine
PM3B_B9                      L-Proline
PM3B_C5                D-Aspartic Acid
PM3B_D1     N-Acetyl-L- Glutamic Acid 
PM3B_D9                   Ethanolamine
PM3B_D11                    Putrescine
PM3B_E1                      Histamine
PM3B_E2             b-Phenylethylamine
PM3B_E3                       Tyramine
PM3B_E8                  D-Glucosamine
PM3B_E11         N-Acetyl-DGlucosamine
PM3B_F2                        Adenine
PM3B_F4                       Cytidine
PM3B_F5                       Cytosine
PM3B_F8                        Thymine
PM3B_F9                      Thymidine
PM3B_F10                        Uracil
PM3B_F11                       Uridine
PM3B_F12                       Inosine
PM3B_G2                     Xanthosine
PM3B_G3                  

## Examine the reaction bounds of key metabolites.

In [43]:
reactionsToInvestigate = Biolog_in_model.loc[indexN2.index,:].Exchange.values

In [44]:
for r in reactionsToInvestigate:
    met = model.reactions.get_by_id(r).metabolites
    for m in met:
        for reac in model.metabolites.get_by_id(m.id).reactions:
            print(m.id,reac.id,reac.bounds)
        

asn__L_e EX_asn__L_e (0.0, 1000.0)
asn__L_e ASNtN1 (-1000.0, 1000.0)
asn__L_e ASNt2r (-1000.0, 1000.0)
his__L_e HISt2r (-1000.0, 1000.0)
his__L_e HIStN1 (-1000.0, 1000.0)
his__L_e EX_his__L_e (0.0, 1000.0)
ile__L_e EX_ile__L_e (0.0, 1000.0)
ile__L_e ILEt2r (-1000.0, 1000.0)
leu__L_e EX_leu__L_e (0.0, 1000.0)
leu__L_e LEUt2r (-1000.0, 1000.0)
lys__L_e SERLYSNaex (0.0, 1000.0)
lys__L_e EX_lys__L_e (0.0, 1000.0)
lys__L_e LYSt2r (-1000.0, 1000.0)
phe__L_e EX_phe__L_e (0.0, 1000.0)
phe__L_e PHEt2r (-1000.0, 1000.0)
pro__L_e PROt2r (-1000.0, 1000.0)
pro__L_e EX_pro__L_e (0.0, 1000.0)
asp__D_e EX_asp__D_e (0.0, 1000.0)
asp__D_e ASPDTDe (-1000.0, 1000.0)
acglu_e ACGLUtd (-1000.0, 1000.0)
acglu_e EX_acglu_e (0.0, 1000.0)
etha_e ETHAt (-1000.0, 1000.0)
etha_e EX_etha_e (0.0, 1000.0)
ptrc_e EX_ptrc_e (0.0, 1000.0)
ptrc_e PTRCt3i (0.0, 1000.0)
hista_e HISTAtu (-1000.0, 1000.0)
hista_e EX_hista_e (0.0, 1000.0)
peamn_e EX_peamn_e (0.0, 1000.0)
tym_e EX_tym_e (0.0, 1000.0)
tym_e TYMte (-1000.0, 1000.

## Which P didnt grow but should?

In [45]:

indexP = Biolog_Prediction.loc[Biolog_Prediction.loc[y_pred[(y_data!=y_pred) & (y_pred==0)].index,:].index,'Experiment']=='Phosphorus'
indexP = indexP[indexP==True]
# Biolog_in_model.loc[Biolog_Prediction.loc[Biolog_Prediction.loc[y_pred[(y_data!=y_pred) & (y_pred==0)].index,:].index,'Experiment']=='Carbon',:]
print(len(Biolog_in_model.loc[indexP.index,:]))
Biolog_in_model.loc[indexP.index,:].Biolog


21


PM4A_A10              Adenosine- 5’- monophosphate
PM4A_A12    Adenosine- 3’,5’- Cyclic monophosphate
PM4A_B3                   D,L-a-Glycerol Phosphate
PM4A_B4                       b-Glycerol Phosphate
PM4A_B6                   D-2-PhosphoGlyceric Acid
PM4A_B7                   D-3-PhosphoGlyceric Acid
PM4A_B10              Guanosine- 5’- monophosphate
PM4A_B12    Guanosine- 3’,5’- Cyclic monophosphate
PM4A_C1                       Phosphoenol Pyruvate
PM4A_C3                     D-Glucose-1- Phosphate
PM4A_C4                     D-Glucose-6- Phosphate
PM4A_C5              2-Deoxy-DGlucose 6- Phosphate
PM4A_C6                   D-Glucosamine6-Phosphate
PM4A_C7                     6-PhosphoGluconic Acid
PM4A_C10               Cytidine- 5’- monophosphate
PM4A_C12     Cytidine- 3’,5’- Cyclic monophosphate
PM4A_D1                     D-Mannose-1- Phosphate
PM4A_D2                     D-Mannose-6- Phosphate
PM4A_D10                Uridine- 5’- monophosphate
PM4A_E4                        

## Examine P source reactions bounds.

In [46]:
reactionsToInvestigate = Biolog_in_model.loc[indexP.index,:].Exchange.values

In [47]:
for r in reactionsToInvestigate:
    met = model.reactions.get_by_id(r).metabolites
    for m in met:
        for reac in model.metabolites.get_by_id(m.id).reactions:
            print(m.id,reac.id,reac.bounds)
        

amp_e EX_amp_e (0.0, 1000.0)
camp_e CAMPt2 (-1000.0, 1000.0)
camp_e CAMPt (0.0, 1000.0)
camp_e EX_camp_e (0.0, 1000.0)
glyc3p_e EX_glyc3p_e (0.0, 1000.0)
glyc2p_e EX_glyc2p_e (0.0, 1000.0)
2pg_e EX_2pg_e (0.0, 1000.0)
3pg_e EX_3pg_e (0.0, 1000.0)
gmp_e EX_gmp_e (0.0, 1000.0)
35cgmp_e EX_35cgmp_e (0.0, 1000.0)
35cgmp_e CGMPt (0.0, 1000.0)
35cgmp_e CGMPt2 (-1000.0, 1000.0)
pep_e EX_pep_e (0.0, 1000.0)
g1p_e EX_g1p_e (0.0, 1000.0)
g6p_e EX_g6p_e (0.0, 1000.0)
2doxg6p_e EX_2doxg6p_e (0.0, 1000.0)
gam6p_e EX_gam6p_e (0.0, 1000.0)
6pgc_e EX_6pgc_e (0.0, 1000.0)
cmp_e EX_cmp_e (0.0, 1000.0)
35ccmp_e EX_35ccmp_e (0.0, 1000.0)
35ccmp_e CCMPt2 (-1000.0, 1000.0)
man1p_e EX_man1p_e (0.0, 1000.0)
man6p_e EX_man6p_e (0.0, 1000.0)
ump_e EX_ump_e (0.0, 1000.0)
cholp_e EX_cholp_e (0.0, 1000.0)
cholp_e CHOLPtr (-1000.0, 1000.0)
ethamp_e EX_ethamp_e (0.0, 1000.0)
ethamp_e ETHAMPtr (-1000.0, 1000.0)


## Which Sulfur didnt work?

In [48]:

indexS = Biolog_Prediction.loc[Biolog_Prediction.loc[y_pred[(y_data!=y_pred) & (y_pred==0)].index,:].index,'Experiment']=='Sulfur'

indexS = indexS[indexS==True]
# Biolog_in_model.loc[Biolog_Prediction.loc[Biolog_Prediction.loc[y_pred[(y_data!=y_pred) & (y_pred==0)].index,:].index,'Experiment']=='Carbon',:]
print(len(Biolog_in_model.loc[indexS.index,:]))
Biolog_in_model.loc[indexS.index,:].Biolog


0


Series([], Name: Biolog, dtype: object)

## Which S sources didnt grow?

In [49]:
reactionsToInvestigate = Biolog_in_model.loc[indexS.index,:].Exchange.values

In [50]:
for r in reactionsToInvestigate:
    met = model.reactions.get_by_id(r).metabolites
    for m in met:
        for reac in model.metabolites.get_by_id(m.id).reactions:
            print(m.id,reac.id,reac.bounds)
        

# Which carbon sources were incorrectly predicted to grow.  

In [51]:
t_index = Biolog_Prediction.loc[Biolog_Prediction.loc[y_pred[(y_data!=y_pred) & (y_pred==1)].index,:].index,'Experiment']=='Carbon'

t_index = t_index[t_index==True]
# Biolog_in_model.loc[Biolog_Prediction.loc[Biolog_Prediction.loc[y_pred[(y_data!=y_pred) & (y_pred==0)].index,:].index,'Experiment']=='Carbon',:]
print(len(Biolog_in_model.loc[t_index.index,:]))
Biolog_Prediction.loc[t_index.index,:]


29


Unnamed: 0,PlateType,Experiment,Row,Column,Data,Data_TF,Prediction,Prediction_TF
PM1_A2,PM1,Carbon,1,2,,False,0.724581,True
PM1_A7,PM1,Carbon,1,7,,False,0.435112,True
PM1_A9,PM1,Carbon,1,9,,False,0.139329,True
PM1_A10,PM1,Carbon,1,10,,False,1.732412,True
PM1_B1,PM1,Carbon,2,1,,False,0.346945,True
PM1_B6,PM1,Carbon,2,6,,False,0.822642,True
PM1_B9,PM1,Carbon,2,9,,False,0.372447,True
PM1_B12,PM1,Carbon,2,12,,False,0.645823,True
PM1_C4,PM1,Carbon,3,4,,False,0.747856,True
PM1_E1,PM1,Carbon,5,1,,False,0.6468,True


In [52]:
Biolog_in_model.loc[t_index.index,:]

Unnamed: 0,Biolog,Model,Exchange,Metabolite,Internal,External,SigGrowth?
PM1_A2,L-Arabinose,L-Arabinose,EX_arab__L_e,arab__L_e,True,True,False
PM1_A7,L-Aspartic Acid,L-Aspartate,EX_asp__L_e,asp__L_e,True,True,False
PM1_A9,D-Alanine,D-Alanine,EX_ala__D_e,ala__D_e,True,True,False
PM1_A10,D-Trehalose,Trehalose,EX_tre_e,tre_e,True,True,False
PM1_B1,D-Serine,D-Serine,EX_ser__D_e,ser__D_e,True,True,False
PM1_B6,D-Gluconic Acid,D-Gluconate,EX_glcn_e,glcn_e,True,True,False
PM1_B9,L-Lactic Acid,L-Lactate,EX_lac__L_e,lac__L_e,True,True,False
PM1_B12,L-Glutamic Acid,L-Glutamate,EX_glu__L_e,glu__L_e,True,True,False
PM1_C4,D--Ribose,D-Ribose,EX_rib__D_e,rib__D_e,True,True,False
PM1_E1,L-Glutamine,L-Glutamine,EX_gln__L_e,gln__L_e,True,True,False


# Which nitrogen sources were incorrectly predicted to grow.  

In [54]:

t_index = Biolog_Prediction.loc[Biolog_Prediction.loc[y_pred[(y_data!=y_pred) & (y_pred==1)].index,:].index,'Experiment']=='Nitrogen'

t_index = t_index[t_index==True]
# Biolog_in_model.loc[Biolog_Prediction.loc[Biolog_Prediction.loc[y_pred[(y_data!=y_pred) & (y_pred==0)].index,:].index,'Experiment']=='Carbon',:]
print(len(Biolog_in_model.loc[t_index.index,:]))
Biolog_Prediction.loc[t_index.index,:]


8


Unnamed: 0,PlateType,Experiment,Row,Column,Data,Data_TF,Prediction,Prediction_TF
PM3B_A7,PM3B,Nitrogen,1,7,,False,0.103191,True
PM3B_A11,PM3B,Nitrogen,1,11,,False,0.671609,True
PM3B_B11,PM3B,Nitrogen,2,11,,False,0.883352,True
PM3B_B12,PM3B,Nitrogen,2,12,,False,0.397694,True
PM3B_C11,PM3B,Nitrogen,3,11,,False,0.863178,True
PM3B_C12,PM3B,Nitrogen,3,12,,False,0.355552,True
PM3B_H1,PM3B,Nitrogen,8,1,,False,0.987791,True
PM3B_H12,PM3B,Nitrogen,8,12,,False,0.777929,True


In [56]:
Biolog_in_model.loc[t_index.index,:]

Unnamed: 0,Biolog,Model,Exchange,Metabolite,Internal,External,SigGrowth?
PM3B_A7,L-Alanine,L-Alanine,EX_ala__L_e,ala__L_e,True,True,False
PM3B_A11,L-Cysteine,L-Cysteine,EX_cys__L_e,cys__L_e,True,True,False
PM3B_B11,L-Threonine,L-Threonine,EX_thr__L_e,thr__L_e,True,True,False
PM3B_B12,L-Tryptophan,L-Tryptophan,EX_trp__L_e,trp__L_e,True,True,False
PM3B_C11,L-Homoserine,L-Homoserine,EX_hom__L_e,hom__L_e,True,True,False
PM3B_C12,L-Ornithine D-1 N-Acetyl-LGlutamic Acid,Ornithine,EX_orn_e,orn_e,True,True,False
PM3B_H1,Ala-Asp,"[L-Alanine, L-Aspartate]","EX_ala__L_e,EX_asp__L_e","ala__L_e,asp__L_e",True,True,False
PM3B_H12,Met-Ala,"[L-Methionine, L-Alanine]","EX_met__L_e,EX_ala__L_e","met__L_e,ala__L_e",True,True,False


# Which P sources were incorrectly predicted to grow.  

In [57]:

t_index = Biolog_Prediction.loc[Biolog_Prediction.loc[y_pred[(y_data!=y_pred) & (y_pred==1)].index,:].index,'Experiment']=='Phosphorus'

t_index = t_index[t_index==True]
# Biolog_in_model.loc[Biolog_Prediction.loc[Biolog_Prediction.loc[y_pred[(y_data!=y_pred) & (y_pred==0)].index,:].index,'Experiment']=='Carbon',:]
print(len(Biolog_in_model.loc[t_index.index,:]))
Biolog_Prediction.loc[t_index.index,:]


0


Unnamed: 0,PlateType,Experiment,Row,Column,Data,Data_TF,Prediction,Prediction_TF


# Which Sulfur sources were incorrectly predicted to grow.  

In [58]:

t_index = Biolog_Prediction.loc[Biolog_Prediction.loc[y_pred[(y_data!=y_pred) & (y_pred==1)].index,:].index,'Experiment']=='Phosphorous'

t_index = t_index[t_index==True]
# Biolog_in_model.loc[Biolog_Prediction.loc[Biolog_Prediction.loc[y_pred[(y_data!=y_pred) & (y_pred==0)].index,:].index,'Experiment']=='Carbon',:]
print(len(Biolog_in_model.loc[t_index.index,:]))
Biolog_Prediction.loc[t_index.index,:]


0


Unnamed: 0,PlateType,Experiment,Row,Column,Data,Data_TF,Prediction,Prediction_TF


In [59]:
# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))

52 38 56 67 213


In [60]:
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,52,38
Experiment,Growth,56,67


Obtain performance metrics.

In [61]:
# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)
# Specificity or true negative rate
TNR = TN/(TN+FP) 
# Precision or positive predictive value
PPV = TP/(TP+FP)
# Negative predictive value
NPV = TN/(TN+FN)
# Fall out or false positive rate
FPR = FP/(FP+TN)
# False negative rate
FNR = FN/(TP+FN)
# False discovery rate
FDR = FP/(TP+FP)
# Overall accuracy
ACC = (TP+TN)/(TP+FP+FN+TN)
# Matthew's
MCC = matthews_corrcoef(y_data, y_pred)
print('Recall:', TPR.round(3))
print('Precision:', PPV.round(3))
print('Accuracy:', ACC.round(3))
print('Matthew\'s correlation:', MCC.round(3))

Recall: 0.545
Precision: 0.638
Accuracy: 0.559
Matthew's correlation: 0.121


## Tracking down errors in growing conditions/non-growing predictions

## GAP fill with other GSM (i.e., *S. cerevisaie*, *R. toruloides*, *E. coli*). 


In [62]:
%%time

# Lets keep our original model object as a copy. 
# model_copy = model.copy()

# nutrient sources to examine. 
Nutrients = ['Carbon','Nitrogen','Phosphorus','Sulfur']
model_test, rxns_neededToFix_noGrowth_01 = core.gapfil_reactions(model,rto,Biolog_Prediction,Nutrients,Biolog_in_model)



Carbon 


----
 EX_succ_e
template model growth failed (0.0)
----
 EX_akg_e
template model growth failed (0.0)
----
 EX_cit_e
template model growth failed (0.0)
----
 EX_inost_e
template model growth failed (0.0)
----
 EX_fum_e
template model growth failed (0.0)
----
 EX_mal__L_e
template model growth failed (0.0)
----
 EX_glycogen_e
template model growth failed (0.0)
----
 EX_srb__L_e
template model growth failed (0.0)
Nitrogen 


----
 EX_asn__L_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpr653_3zr.lp
Reading time = 0.01 seconds
: 1908 rows, 4346 columns, 17652 nonzeros
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmphvy0h2jc.lp
Reading time = 0.01 seconds
: 2051 rows, 4796 columns, 19762 nonzeros
gapfiller initialized
gapfiller succeeded
ASNNe: asn__L_e + h2o_e --> asp__L_e + nh4_e 	 L asparaginase  extracellular


----
 EX_his__L_e
template model growth failed (0.0)
----
 EX_ile__L_e
Read LP format model fro

Ignoring reaction 'NTD10' since it already exists.
Ignoring reaction 'NTD10' since it already exists.


gapfiller succeeded
NTD10: h2o_c + xmp_c --> pi_c + xtsn_c 	 5'-nucleotidase (XMP)


----
 EX_glyc3p_e
template model growth failed (0.0)
----
 EX_glyc2p_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmp63s0qudu.lp
Reading time = 0.01 seconds
: 1908 rows, 4364 columns, 17728 nonzeros
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpko94dffr.lp
Reading time = 0.01 seconds
: 2051 rows, 4796 columns, 19762 nonzeros
gapfiller initialized
gapfiller succeeded
G2PPe: glyc2p_e + h2o_e --> glyc_e + pi_e 	 Glycerol-2-phosphate phosphatase, extracellular


----
 EX_2pg_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmphbuil80w.lp
Reading time = 0.01 seconds
: 1908 rows, 4366 columns, 17736 nonzeros
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpgz29euli.lp
Reading time = 0.01 seconds
: 2051 rows, 4796 columns, 19762 nonzeros
gapfiller initialized
gapfiller s

In [63]:
rxns_neededToFix_noGrowth_01.T

Unnamed: 0,0,1,2,3,4
0,ASNNe,10043 or 13627,asn__L_e + h2o_e --> asp__L_e + nh4_e,L asparaginase extracellular,EX_asn__L_e
1,HACD9m,13606 or 16284,3hmbcoa_m + nad_m <=> 2maacoa_m + h_m + nadh_m,"(2S,3S)-3-hydroxy-2-methylbutanoyl-CoA:NAD+ ox...",EX_ile__L_e
2,ACOAD8m,10012,fad_m + ivcoa_m --> 3mb2coa_m + fadh2_m,Isovaleryl-coa dehydrogenase,EX_leu__L_e
3,MGCHrm,16128,3mgcoa_m + h2o_m <=> hmgcoa_m,Methylglutaconyl-coa hydratase,EX_leu__L_e
4,HCO3Em,10985,co2_m + h2o_m <=> h_m + hco3_m,HCO3 equilibration reaction,EX_leu__L_e
5,PTRCOX1,13959,h2o_c + o2_c + ptrc_c --> 4abutn_c + h2o2_c + ...,Putrescine:oxygen oxidoreductase (deaminating),EX_ptrc_e
6,NTD10,16648,h2o_c + xmp_c --> pi_c + xtsn_c,5'-nucleotidase (XMP),EX_ade_e
7,ABUTt2r,11269,4abut_e + h_e <=> 4abut_c + h_c,4 aminobutyrate reversible transport in via pr...,EX_4abut_e
8,NTD7e,15429 or 9531,amp_e + h2o_e --> adn_e + pi_e,"5'-nucleotidase (AMP), extracellular",EX_amp_e
9,NTD10,16648,h2o_c + xmp_c --> pi_c + xtsn_c,5'-nucleotidase (XMP),EX_camp_e


In [57]:
rxns_neededToFix_noGrowth_01.T.to_csv('annotationsNeededToFixGrowthSituations_rto_forpub.csv')

## Re-run model predictions with the added reactions to see how well the model predict with the gapfilled model.

In [64]:
Biolog_Prediction = core.PerformFBAPredictions(model,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model)


In [59]:
# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan(Biolog_Prediction.Prediction)]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))

45 47 34 87 213


In [60]:
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,45,47
Experiment,Growth,34,87


Obtain performance metrics.

In [61]:
# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)
# Specificity or true negative rate
TNR = TN/(TN+FP) 
# Precision or positive predictive value
PPV = TP/(TP+FP)
# Negative predictive value
NPV = TN/(TN+FN)
# Fall out or false positive rate
FPR = FP/(FP+TN)
# False negative rate
FNR = FN/(TP+FN)
# False discovery rate
FDR = FP/(TP+FP)
# Overall accuracy
ACC = (TP+TN)/(TP+FP+FN+TN)
# Matthew's
MCC = matthews_corrcoef(y_data, y_pred)
print('Recall:', TPR.round(3))
print('Precision:', PPV.round(3))
print('Accuracy:', ACC.round(3))
print('Matthew\'s correlation:', MCC.round(3))

Recall: 0.719
Precision: 0.649
Accuracy: 0.62
Matthew's correlation: 0.213


## yeast 8 gap fill.

load gsm again.

In [62]:
# load cobra LST model. 
model = cobra.io.load_json_model("../models/lst_v0_3b_forPub.json")

In [63]:

Biolog_Prediction = core.PerformFBAPredictions(model,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model)


gapfilling is stuck on the following reactions. drop and run without these reactions. 

In [64]:
Biolog_in_model[Biolog_in_model.Exchange == 'EX_ethamp_e']

Unnamed: 0,Biolog,Model,Exchange,Metabolite,Internal,External,SigGrowth?
PM4A_E5,O-PhosphorylEthanolamine,Ethanolamine phosphate C2H7NO4P,EX_ethamp_e,ethamp_e,True,True,True


In [65]:
Biolog_Prediction_copy = Biolog_Prediction.copy()
Biolog_Prediction_copy = Biolog_Prediction_copy.drop(['PM4A_E5'])

In [66]:
Biolog_in_model[Biolog_in_model.Exchange == 'EX_cholp_e']

Unnamed: 0,Biolog,Model,Exchange,Metabolite,Internal,External,SigGrowth?
PM4A_E4,Phosphoryl Choline,Choline phosphate C5H13NO4P,EX_cholp_e,cholp_e,True,True,True


In [67]:
Biolog_Prediction_copy = Biolog_Prediction_copy.drop(['PM4A_E4'])

In [68]:
%%time

# Lets keep our original model object as a copy. 
# model_copy = model.copy()

# nutrient sources to examine. 
Nutrients = ['Carbon','Nitrogen','Phosphorus','Sulfur']
model_test, rxns_neededToFix_noGrowth_01 = core.gapfil_reactions(model,yeast8,Biolog_Prediction_copy,Nutrients,Biolog_in_model)
 


Carbon 


----
 EX_succ_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpiq2u474g.lp
Reading time = 0.01 seconds
: 1908 rows, 4336 columns, 17622 nonzeros
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpd994v31j.lp
Reading time = 0.02 seconds
: 2806 rows, 8262 columns, 31164 nonzeros
gapfiller initialized
gapfiller succeeded
SUCCt: succ_e <=> succ_c 	 succinate transport


----
 EX_akg_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpxs4tejb_.lp
Reading time = 0.01 seconds
: 1908 rows, 4338 columns, 17626 nonzeros
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpmy_48a3k.lp
Reading time = 0.02 seconds
: 2806 rows, 8262 columns, 31164 nonzeros
gapfiller initialized
gapfiller succeeded
r_1588: akg_e <=> akg_c 	 2-oxoglutarate transport


----
 EX_inost_e
template model growth failed (0.0)
----
 EX_fum_e
Read LP format model from file /var/folders/kn

Ignoring reaction 'r_4454' since it already exists.
Ignoring reaction 'r_4454' since it already exists.


gapfiller succeeded
r_4454: ump_e <=> ump_c 	 UMP transport


----
 EX_thymd_e
template model growth failed (0.0)
----
 EX_ura_e
template model growth failed (0.0)
----
 EX_uri_e
template model growth failed (0.0)
----
 EX_ins_e
template model growth failed (0.0)
----
 EX_xtsn_e
template model growth failed (0.0)
----
 EX_4abut_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpzqf_6xhd.lp
Reading time = 0.01 seconds
: 1912 rows, 4370 columns, 17706 nonzeros
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpjij1rsbd.lp
Reading time = 0.01 seconds
: 2806 rows, 8262 columns, 31164 nonzeros
gapfiller initialized
gapfiller succeeded
ABUTt2r: 4abut_e + h_e --> 4abut_c + h_c 	 4-aminobutyrate transport


Phosphorus 


----
 EX_amp_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmp4h25_klm.lp
Reading time = 0.01 seconds
: 1912 rows, 4372 columns, 17714 nonzeros
Read LP format model from file 

Ignoring reaction 'NTD10' since it already exists.
Ignoring reaction 'NTD10' since it already exists.


gapfiller succeeded
NTD10: h2o_c + xmp_c --> pi_c + xtsn_c 	 5'-nucleotidase (XMP)
AMPt6: amp_e + h_e <=> amp_c + h_c 	 AMP transport in/out via proton symport


----
 EX_glyc2p_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpx_yze6nj.lp
Reading time = 0.01 seconds
: 1912 rows, 4374 columns, 17722 nonzeros
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpmc1m9e2h.lp
Reading time = 0.02 seconds
: 2806 rows, 8262 columns, 31164 nonzeros
gapfiller initialized
gapfiller succeeded
r_4341: glyc2p_e <=> glyc2p_c 	 glycerol 2-phosphate(2-) transport


----
 EX_2pg_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmp_aify74a.lp
Reading time = 0.01 seconds
: 1912 rows, 4376 columns, 17726 nonzeros
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpb1gezfrq.lp
Reading time = 0.02 seconds
: 2806 rows, 8262 columns, 31164 nonzeros
gapfiller initialized
gapfiller su

Ignoring reaction 'G6PDA' since it already exists.
Ignoring reaction 'G6PDA' since it already exists.


gapfiller succeeded
G6PDA: gam6p_c + h2o_c --> f6p_c + nh4_c 	 glucosamine-6-phosphate deaminase
GAM6Pt: gam6p_e <=> gam6p_c 	 D-glucosamine 6-phosphate uniport


----
 EX_6pgc_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpuxw6bv9n.lp
Reading time = 0.01 seconds
: 1912 rows, 4392 columns, 17778 nonzeros
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpf35gol9e.lp
Reading time = 0.01 seconds
: 2806 rows, 8262 columns, 31164 nonzeros
gapfiller initialized
gapfiller succeeded
6PGCt6: 6pgc_e + h_e <=> 6pgc_c + h_c 	 6-Phospho-D-gluconate transport in/out via proton symport


----
 EX_cmp_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpk6sih9qg.lp
Reading time = 0.01 seconds
: 1912 rows, 4394 columns, 17786 nonzeros
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpcjlc92qj.lp
Reading time = 0.02 seconds
: 2806 rows, 8262 columns, 31164 nonzeros
gapf

Ignoring reaction 'r_4454' since it already exists.
Ignoring reaction 'DURIK1' since it already exists.
Ignoring reaction 'r_4454' since it already exists.
Ignoring reaction 'DURIK1' since it already exists.


gapfiller succeeded
r_4454: ump_e <=> ump_c 	 UMP transport
DURIK1: atp_c + duri_c --> adp_c + dump_c + h_c 	 deoxyuridine kinase (ATP:deoxyuridine)
PYNP2r: pi_c + uri_c <=> r1p_c + ura_c 	 pyrimidine-nucleoside phosphorylase (uracil)


Sulfur 


CPU times: user 4min 53s, sys: 4.53 s, total: 4min 58s
Wall time: 4min 53s


In [69]:
rxns_neededToFix_noGrowth_01.T

Unnamed: 0,0,1,2,3,4
0,SUCCt,,succ_e <=> succ_c,succinate transport,EX_succ_e
1,r_1588,,akg_e <=> akg_c,2-oxoglutarate transport,EX_akg_e
2,FUMtr,,fum_e <=> fum_c,formate transport,EX_fum_e
3,MALt,,mal__L_e <=> mal__L_c,L-malate transport,EX_mal__L_e
4,ASNN,YDR321W,asn__L_c + h2o_c --> asp__L_c + nh4_c,L-asparaginase,EX_asn__L_e
5,EX_2mbald_e,,2mbald_e -->,2-methylbutanal exchange,EX_ile__L_e
6,2MBALDt,,2mbald_c <=> 2mbald_e,2-methylbutanal transport,EX_ile__L_e
7,EX_3mbald_e,,3mbald_e -->,3-methylbutanal exchange,EX_leu__L_e
8,3MBALDt,,3mbald_c <=> 3mbald_e,3-methylbutanal transport,EX_leu__L_e
9,EX_2phetoh_e,,2phetoh_e -->,2-phenylethanol exchange,EX_phe__L_e


In [70]:
rxns_neededToFix_noGrowth_01.T.to_csv('annotationsNeededToFixGrowthSituations_yeast8_forpub.csv')

## Re-run model predictions with the added reactions to see how well the model predict with the gapfilled model.

In [71]:
Biolog_Prediction = core.PerformFBAPredictions(model,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model)


In [72]:
# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan(Biolog_Prediction.Prediction)]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))

42 50 24 97 213


In [73]:
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,42,50
Experiment,Growth,24,97


Obtain performance metrics.

In [74]:
# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)
# Specificity or true negative rate
TNR = TN/(TN+FP) 
# Precision or positive predictive value
PPV = TP/(TP+FP)
# Negative predictive value
NPV = TN/(TN+FN)
# Fall out or false positive rate
FPR = FP/(FP+TN)
# False negative rate
FNR = FN/(TP+FN)
# False discovery rate
FDR = FP/(TP+FP)
# Overall accuracy
ACC = (TP+TN)/(TP+FP+FN+TN)
# Matthew's
MCC = matthews_corrcoef(y_data, y_pred)
print('Recall:', TPR.round(3))
print('Precision:', PPV.round(3))
print('Accuracy:', ACC.round(3))
print('Matthew\'s correlation:', MCC.round(3))

Recall: 0.802
Precision: 0.66
Accuracy: 0.653
Matthew's correlation: 0.277


## Gap Fill with E.coli

In [75]:
model = cobra.io.load_json_model("../models/lst_v0_3b_forPub.json")

In [76]:

Biolog_Prediction = core.PerformFBAPredictions(model,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model)


In [77]:
# Lets keep our original model object as a copy. 
# model_copy = model.copy()

# nutrient sources to examine. 
Nutrients = ['Carbon','Nitrogen','Phosphorus','Sulfur']
model_test, rxns_neededToFix_noGrowth_02 = core.gapfil_reactions(model,eco,Biolog_Prediction,Nutrients,Biolog_in_model)
 


Carbon 


----
 EX_succ_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpnt_6rr48.lp
Reading time = 0.02 seconds
: 1908 rows, 4336 columns, 17622 nonzeros
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmplkugwwec.lp
Reading time = 0.02 seconds
: 1805 rows, 5166 columns, 20366 nonzeros
gapfiller initialized
gapfiller succeeded
ATPS4rpp: adp_c + 4.0 h_p + pi_c <=> atp_c + h2o_c + 3.0 h_c 	 ATP synthase (four protons for one ATP) (periplasm)
SUCCt2_2pp: 2.0 h_p + succ_p --> 2.0 h_c + succ_c 	 Succinate transport via proton symport (2 H) (periplasm)
SUCCtex: succ_e <=> succ_p 	 Succinate transport via diffusion (extracellular to periplasm)


----
 EX_akg_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpu3mt2p1a.lp
Reading time = 0.01 seconds
: 1910 rows, 4342 columns, 17646 nonzeros
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpyw4zcrsg.lp
Reading t

Ignoring reaction 'ATPS4rpp' since it already exists.
Ignoring reaction 'ATPS4rpp' since it already exists.


gapfiller succeeded
ATPS4rpp: adp_c + 4.0 h_p + pi_c <=> atp_c + h2o_c + 3.0 h_c 	 ATP synthase (four protons for one ATP) (periplasm)
AKGt2rpp: akg_p + h_p <=> akg_c + h_c 	 2-oxoglutarate reversible transport via symport (periplasm)
AKGtex: akg_e <=> akg_p 	 Alpha-ketoglutarate transport via diffusion (extracellular to periplasm)


----
 EX_inost_e
template model growth failed (0.0)
----
 EX_fum_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmp0kffkeet.lp
Reading time = 0.02 seconds
: 1911 rows, 4346 columns, 17658 nonzeros
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmp48wpk7j7.lp
Reading time = 0.01 seconds
: 1805 rows, 5166 columns, 20366 nonzeros
gapfiller initialized
gapfiller succeeded
FUMt2_2pp: fum_p + 2.0 h_p --> fum_c + 2.0 h_c 	 Fumarate transport via proton symport (2 H) (periplasm)
FUMtex: fum_e <=> fum_p 	 Fumarate transport via diffusion (extracellular to periplasm)
Htex: h_e <=> h_p 	 Proton trans

Ignoring reaction 'ATPS4rpp' since it already exists.
Ignoring reaction 'ATPS4rpp' since it already exists.


gapfiller succeeded
ATPS4rpp: adp_c + 4.0 h_p + pi_c <=> atp_c + h2o_c + 3.0 h_c 	 ATP synthase (four protons for one ATP) (periplasm)
MALt2_2pp: 2.0 h_p + mal__L_p --> 2.0 h_c + mal__L_c 	 Malate transport via proton symport (2 H) (periplasm)
MALtex: mal__L_e <=> mal__L_p 	 Malate transport via diffusion (extracellular to periplasm)


Nitrogen 


----
 EX_asn__L_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmp425_zhrt.lp
Reading time = 0.02 seconds
: 1913 rows, 4356 columns, 17686 nonzeros
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpy8mi3g_m.lp
Reading time = 0.01 seconds
: 1805 rows, 5166 columns, 20366 nonzeros
gapfiller initialized
gapfiller succeeded
ASNN: asn__L_c + h2o_c --> asp__L_c + nh4_c 	 L-asparaginase


----
 EX_his__L_e
template model growth failed (0.0)
----
 EX_ile__L_e
template model growth failed (0.0)
----
 EX_leu__L_e
template model growth failed (0.0)
----
 EX_lys__L_e
template model growt

Ignoring reaction 'H2O2tex' since it already exists.
Ignoring reaction 'H2O2tex' since it already exists.


gapfiller succeeded
H2O2tex: h2o2_e <=> h2o2_p 	 Hydrogen peroxide transport via diffusion (external)
EX_4hoxpacd_e: 4hoxpacd_e -->  	 4-Hydroxyphenylacetaldehyde exchange
4HOXPACDtex: 4hoxpacd_e <=> 4hoxpacd_p 	 4-hydroxyphenylacetaldehyde transport via diffusion (extracellular to periplasm)
H2Otex: h2o_e <=> h2o_p 	 H2O transport via diffusion (extracellular to periplasm)
NH4tpp: nh4_p <=> nh4_c 	 Ammonia reversible transport (periplasm)
O2tpp: o2_p <=> o2_c 	 O2 transport via diffusion (periplasm)
TYMtex: tym_e <=> tym_p 	 Tyramine transport via diffusion (extracellular to periplasm)
TYROXDApp: h2o_p + o2_p + tym_p --> 4hoxpacd_p + h2o2_p + nh4_p 	 Tyramine:oxygen oxidoreductase(deaminating)(flavin-containing) (periplasm)


----
 EX_gam_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmptv82jziv.lp
Reading time = 0.02 seconds
: 1924 rows, 4398 columns, 17812 nonzeros
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmp

Ignoring reaction 'G6PDA' since it already exists.
Ignoring reaction 'G6PDA' since it already exists.


gapfiller succeeded
G6PDA: gam6p_c + h2o_c --> f6p_c + nh4_c 	 Glucosamine-6-phosphate deaminase
ACGAptspp: acgam_p + pep_c --> acgam6p_c + pyr_c 	 N-Acetyl-D-glucosamine transport via PEP:Pyr PTS  (periplasm)
ACGAtex: acgam_e <=> acgam_p 	 N-Acetyl-D-glucosamine transport via diffusion (extracellular to periplasm)


----
 EX_ade_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmp76e6bb6m.lp
Reading time = 0.01 seconds
: 1925 rows, 4404 columns, 17832 nonzeros
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpgybbv18f.lp
Reading time = 0.01 seconds
: 1805 rows, 5166 columns, 20366 nonzeros
gapfiller initialized
gapfiller succeeded
NTD10: h2o_c + xmp_c --> pi_c + xtsn_c 	 5'-nucleotidase (XMP)


----
 EX_cytd_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpe8kgfdf5.lp
Reading time = 0.01 seconds
: 1925 rows, 4406 columns, 17840 nonzeros
Read LP format model from file /var/folders/kn/z

Ignoring reaction 'DURIK1' since it already exists.


gapfiller initialized
gapfiller succeeded
DURIK1: atp_c + duri_c --> adp_c + dump_c + h_c 	 Deoxyuridine kinase (ATP:Deoxyuridine)


Ignoring reaction 'DURIK1' since it already exists.




----
 EX_thym_e
template model growth failed (0.0)
----
 EX_thymd_e
template model growth failed (0.0)
----
 EX_ura_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpxkbzr4hy.lp
Reading time = 0.01 seconds
: 1925 rows, 4410 columns, 17858 nonzeros
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpk_ym9z3y.lp
Reading time = 0.01 seconds
: 1805 rows, 5166 columns, 20366 nonzeros
gapfiller initialized


Ignoring reaction 'ATPS4rpp' since it already exists.
Ignoring reaction 'ATPS4rpp' since it already exists.


gapfiller succeeded
ATPS4rpp: adp_c + 4.0 h_p + pi_c <=> atp_c + h2o_c + 3.0 h_c 	 ATP synthase (four protons for one ATP) (periplasm)
EX_3hpp_e: 3hpp_e -->  	 3-Hydroxypropanoate exchange
3AMACHYD: 3amac_c + h2o_c + h_c --> msa_c + nh4_c 	 3-aminoacrylate hydrolase
3HPPtex: 3hpp_e <=> 3hpp_p 	 3-hydroxypropionate transport via diffusion (extracellular to periplasm)
3HPPtpp: 3hpp_c + h_c --> 3hpp_p + h_p 	 3-hydroxypropionate transport via proton symport (periplasm)
CBMD: cbm_c + 2.0 h_c --> co2_c + nh4_c 	 Carbamate deaminase
POAACR: nadh_c + poaac_c --> 3amac_c + h2o_c + nad_c 	 Peroxyaminoacrylate reductase
PYROX: h_c + nadh_c + o2_c + ura_c --> nad_c + uracp_c 	 Pyrimidine oxygenase
URACPAH: h2o_c + uracp_c --> cbm_c + h_c + poaac_c 	 Peroxyureidoacrylate hydrolase


----
 EX_uri_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpjlmatdqq.lp
Reading time = 0.01 seconds
: 1931 rows, 4426 columns, 17922 nonzeros
Read LP format model from file /var/fo

Ignoring reaction 'ATPS4rpp' since it already exists.
Ignoring reaction 'EX_3hpp_e' since it already exists.
Ignoring reaction '3AMACHYD' since it already exists.
Ignoring reaction '3HPPtex' since it already exists.
Ignoring reaction '3HPPtpp' since it already exists.
Ignoring reaction 'CBMD' since it already exists.
Ignoring reaction 'POAACR' since it already exists.
Ignoring reaction 'PYROX' since it already exists.
Ignoring reaction 'URACPAH' since it already exists.
Ignoring reaction 'ATPS4rpp' since it already exists.
Ignoring reaction 'EX_3hpp_e' since it already exists.
Ignoring reaction '3AMACHYD' since it already exists.
Ignoring reaction '3HPPtex' since it already exists.
Ignoring reaction '3HPPtpp' since it already exists.
Ignoring reaction 'CBMD' since it already exists.
Ignoring reaction 'POAACR' since it already exists.
Ignoring reaction 'PYROX' since it already exists.
Ignoring reaction 'URACPAH' since it already exists.


gapfiller succeeded
ATPS4rpp: adp_c + 4.0 h_p + pi_c <=> atp_c + h2o_c + 3.0 h_c 	 ATP synthase (four protons for one ATP) (periplasm)
EX_3hpp_e: 3hpp_e -->  	 3-Hydroxypropanoate exchange
3AMACHYD: 3amac_c + h2o_c + h_c --> msa_c + nh4_c 	 3-aminoacrylate hydrolase
3HPPtex: 3hpp_e <=> 3hpp_p 	 3-hydroxypropionate transport via diffusion (extracellular to periplasm)
3HPPtpp: 3hpp_c + h_c --> 3hpp_p + h_p 	 3-hydroxypropionate transport via proton symport (periplasm)
CBMD: cbm_c + 2.0 h_c --> co2_c + nh4_c 	 Carbamate deaminase
POAACR: nadh_c + poaac_c --> 3amac_c + h2o_c + nad_c 	 Peroxyaminoacrylate reductase
PYROX: h_c + nadh_c + o2_c + ura_c --> nad_c + uracp_c 	 Pyrimidine oxygenase
URACPAH: h2o_c + uracp_c --> cbm_c + h_c + poaac_c 	 Peroxyureidoacrylate hydrolase
CYTDH: cytd_c + h2o_c --> csn_c + rib__D_c 	 Cytidine hydrolase


----
 EX_ins_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmp7z8vavob.lp
Reading time = 0.01 seconds
: 1931 rows, 442

Ignoring reaction 'ATPS4rpp' since it already exists.
Ignoring reaction 'ATPS4rpp' since it already exists.


gapfiller succeeded
ATPS4rpp: adp_c + 4.0 h_p + pi_c <=> atp_c + h2o_c + 3.0 h_c 	 ATP synthase (four protons for one ATP) (periplasm)
INStex: ins_e <=> ins_p 	 Inosine transport via diffusion (extracellular to periplasm)
HXAND: h2o_c + hxan_c + nad_c --> h_c + nadh_c + xan_c 	 Hypoxanthine dehydrogenase
INSt2pp_copy2: h_p + ins_p <=> h_c + ins_c 	 Inosine transport in via proton symport (periplasm)
URIC: 2.0 h2o_c + o2_c + urate_c --> alltn_c + co2_c + h2o2_c 	 Uricase
XAND: h2o_c + nad_c + xan_c --> h_c + nadh_c + urate_c 	 Xanthine dehydrogenase


----
 EX_xtsn_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpu3rhz9qz.lp
Reading time = 0.01 seconds
: 1932 rows, 4438 columns, 17978 nonzeros
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpscg_g_sl.lp
Reading time = 0.01 seconds
: 1805 rows, 5166 columns, 20366 nonzeros
gapfiller initialized


Ignoring reaction 'URIC' since it already exists.
Ignoring reaction 'XAND' since it already exists.
Ignoring reaction 'URIC' since it already exists.
Ignoring reaction 'XAND' since it already exists.


gapfiller succeeded
URIC: 2.0 h2o_c + o2_c + urate_c --> alltn_c + co2_c + h2o2_c 	 Uricase
XAND: h2o_c + nad_c + xan_c --> h_c + nadh_c + urate_c 	 Xanthine dehydrogenase


----
 EX_4abut_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpt51516q8.lp
Reading time = 0.02 seconds
: 1932 rows, 4438 columns, 17978 nonzeros
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpjw_y87vc.lp
Reading time = 0.01 seconds
: 1805 rows, 5166 columns, 20366 nonzeros
gapfiller initialized


Ignoring reaction 'ATPS4rpp' since it already exists.
Ignoring reaction 'ATPS4rpp' since it already exists.


gapfiller succeeded
ATPS4rpp: adp_c + 4.0 h_p + pi_c <=> atp_c + h2o_c + 3.0 h_c 	 ATP synthase (four protons for one ATP) (periplasm)
ABUTt2pp: 4abut_p + h_p --> 4abut_c + h_c 	 4-aminobutyrate transport in via proton symport (periplasm)
ABUTtex: 4abut_e <=> 4abut_p 	 4-aminobutyrate transport via diffusion (extracellular to periplasm)


Phosphorus 


----
 EX_amp_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpzr2m_ybp.lp
Reading time = 0.01 seconds
: 1933 rows, 4442 columns, 17990 nonzeros
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmp1o2pcmx3.lp
Reading time = 0.02 seconds
: 1805 rows, 5166 columns, 20366 nonzeros
gapfiller initialized


Ignoring reaction 'H2Otex' since it already exists.
Ignoring reaction 'H2Otex' since it already exists.


gapfiller succeeded
H2Otex: h2o_e <=> h2o_p 	 H2O transport via diffusion (extracellular to periplasm)
ADNtex: adn_e <=> adn_p 	 Adenosine transport via diffusion (extracellular to periplasm)
AMPtex: amp_e <=> amp_p 	 AMP transport via diffusion (extracellular to periplasm)
NTD7pp: amp_p + h2o_p --> adn_p + pi_p 	 5'-nucleotidase (AMP)
PItex: pi_e <=> pi_p 	 Phosphate transport via diffusion (extracellular to periplasm)


----
 EX_glyc3p_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpw0u3mqye.lp
Reading time = 0.01 seconds
: 1936 rows, 4450 columns, 18010 nonzeros
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpif5f80d1.lp
Reading time = 0.01 seconds
: 1805 rows, 5166 columns, 20366 nonzeros
gapfiller initialized
gapfiller succeeded
GLYC3Pabcpp: atp_c + glyc3p_p + h2o_c --> adp_c + glyc3p_c + h_c + pi_c 	 Sn-Glycerol 3-phosphate transport via ABC system (periplasm)
GLYC3Ptex: glyc3p_e <=> glyc3p_p 	 Glycerol-3-pho

Ignoring reaction 'H2Otex' since it already exists.
Ignoring reaction 'H2Otex' since it already exists.


gapfiller succeeded
H2Otex: h2o_e <=> h2o_p 	 H2O transport via diffusion (extracellular to periplasm)
GMPtex: gmp_e <=> gmp_p 	 GMP transport via diffusion (extracellular to periplasm)
GSNtex: gsn_e <=> gsn_p 	 Guanosine transport via diffusion (extracellular to periplasm)
NTD9pp: gmp_p + h2o_p --> gsn_p + pi_p 	 5'-nucleotidase (GMP)
PIuabcpp: atp_c + h2o_c + pi_p --> adp_c + h_c + 2.0 pi_c 	 Phosphate transport via ABC system (uptake, periplasm)


----
 EX_g1p_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpfb_hr8e2.lp
Reading time = 0.01 seconds
: 1940 rows, 4466 columns, 18074 nonzeros
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpzjn5wx18.lp
Reading time = 0.02 seconds
: 1805 rows, 5166 columns, 20366 nonzeros
gapfiller initialized


Ignoring reaction 'H2Otex' since it already exists.
Ignoring reaction 'PIuabcpp' since it already exists.
Ignoring reaction 'H2Otex' since it already exists.
Ignoring reaction 'PIuabcpp' since it already exists.


gapfiller succeeded
H2Otex: h2o_e <=> h2o_p 	 H2O transport via diffusion (extracellular to periplasm)
PIuabcpp: atp_c + h2o_c + pi_p --> adp_c + h_c + 2.0 pi_c 	 Phosphate transport via ABC system (uptake, periplasm)
G1PPpp: g1p_p + h2o_p --> glc__D_p + pi_p 	 Glucose-1-phosphatase
G1Ptex: g1p_e <=> g1p_p 	 D-glucose 1-phosphate transport via diffusion
GLCtex_copy1: glc__D_e <=> glc__D_p 	 Glucose transport via diffusion (extracellular to periplasm)


----
 EX_g6p_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpumc91o7c.lp
Reading time = 0.01 seconds
: 1942 rows, 4472 columns, 18090 nonzeros
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpta9n1xvt.lp
Reading time = 0.01 seconds
: 1805 rows, 5166 columns, 20366 nonzeros


Ignoring reaction 'PItex' since it already exists.


gapfiller initialized


Ignoring reaction 'PItex' since it already exists.


gapfiller succeeded
PItex: pi_e <=> pi_p 	 Phosphate transport via diffusion (extracellular to periplasm)
G6Pt6_2pp: g6p_p + 2.0 pi_c --> g6p_c + 2.0 pi_p 	 Glucose-6-phosphate transport via phosphate antiport (periplasm)
G6Ptex: g6p_e <=> g6p_p 	 Glucose 6-phosphate transport via diffusion (extracellular to periplasm)


----
 EX_gam6p_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmp54hucafj.lp
Reading time = 0.01 seconds
: 1943 rows, 4476 columns, 18102 nonzeros
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmp1ob9xkda.lp
Reading time = 0.01 seconds
: 1805 rows, 5166 columns, 20366 nonzeros
gapfiller initialized


Ignoring reaction 'G6PDA' since it already exists.
Ignoring reaction 'PIuabcpp' since it already exists.
Ignoring reaction 'G6PDA' since it already exists.
Ignoring reaction 'PIuabcpp' since it already exists.


gapfiller succeeded
G6PDA: gam6p_c + h2o_c --> f6p_c + nh4_c 	 Glucosamine-6-phosphate deaminase
PIuabcpp: atp_c + h2o_c + pi_p --> adp_c + h_c + 2.0 pi_c 	 Phosphate transport via ABC system (uptake, periplasm)
GAM6Pt6_2pp: gam6p_p + 2.0 pi_c --> gam6p_c + 2.0 pi_p 	 D-Glucosamine 6-phosphate transport via phosphate antiport (periplasm)
GAMAN6Ptex: gam6p_e <=> gam6p_p 	 D-glucosamine 6-phosphate transport via diffusion (extracellular to periplasm)


----
 EX_cmp_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmp_go7fzjm.lp
Reading time = 0.01 seconds
: 1944 rows, 4480 columns, 18114 nonzeros
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmp2ask2bm_.lp
Reading time = 0.02 seconds
: 1805 rows, 5166 columns, 20366 nonzeros
gapfiller initialized


Ignoring reaction 'H2Otex' since it already exists.
Ignoring reaction 'PItex' since it already exists.
Ignoring reaction 'H2Otex' since it already exists.
Ignoring reaction 'PItex' since it already exists.


gapfiller succeeded
H2Otex: h2o_e <=> h2o_p 	 H2O transport via diffusion (extracellular to periplasm)
PItex: pi_e <=> pi_p 	 Phosphate transport via diffusion (extracellular to periplasm)
CMPtex: cmp_e <=> cmp_p 	 CMP transport via diffusion (extracellular to periplasm)
CYTDtex: cytd_e <=> cytd_p 	 Cytidine transport via diffusion (extracellular to periplasm)
NTD4pp: cmp_p + h2o_p --> cytd_p + pi_p 	 5'-nucleotidase (CMP)


----
 EX_man6p_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpsp_1eb9p.lp
Reading time = 0.01 seconds
: 1946 rows, 4486 columns, 18130 nonzeros
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpg5gc8ddp.lp
Reading time = 0.02 seconds
: 1805 rows, 5166 columns, 20366 nonzeros
gapfiller initialized


Ignoring reaction 'PItex' since it already exists.
Ignoring reaction 'PItex' since it already exists.


gapfiller succeeded
PItex: pi_e <=> pi_p 	 Phosphate transport via diffusion (extracellular to periplasm)
MAN6Pt6_2pp: man6p_p + 2.0 pi_c --> man6p_c + 2.0 pi_p 	 Mannose-6-phosphate transport via phosphate antiport (periplasm)
MAN6Ptex: man6p_e <=> man6p_p 	 Mannose 6-phosphate transport via diffusion (extracellular to periplasm)


----
 EX_ump_e
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpftu7rvdn.lp
Reading time = 0.01 seconds
: 1947 rows, 4490 columns, 18142 nonzeros
Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpxessd4t9.lp
Reading time = 0.01 seconds
: 1805 rows, 5166 columns, 20366 nonzeros
gapfiller initialized


Ignoring reaction 'H2Otpp' since it already exists.
Ignoring reaction 'PIuabcpp' since it already exists.
Ignoring reaction 'H2Otpp' since it already exists.
Ignoring reaction 'PIuabcpp' since it already exists.


gapfiller succeeded
H2Otpp: h2o_p <=> h2o_c 	 H2O transport via diffusion (periplasm)
PIuabcpp: atp_c + h2o_c + pi_p --> adp_c + h_c + 2.0 pi_c 	 Phosphate transport via ABC system (uptake, periplasm)
NTD2pp: h2o_p + ump_p --> pi_p + uri_p 	 5'-nucleotidase (UMP)
URItex: uri_e <=> uri_p 	 Uridine transport via diffusion (extracellular to periplasm)
UMPtex: ump_e <=> ump_p 	 UMP transport via diffusion (extracellular to periplasm)


Sulfur 




In [78]:
rxns_neededToFix_noGrowth_02.T

Unnamed: 0,0,1,2,3,4
0,ATPS4rpp,((b3736 and b3737 and b3738) and (b3731 and b3...,adp_c + 4.0 h_p + pi_c <=> atp_c + h2o_c + 3.0...,ATP synthase (four protons for one ATP) (perip...,EX_succ_e
1,SUCCt2_2pp,b3528,2.0 h_p + succ_p --> 2.0 h_c + succ_c,Succinate transport via proton symport (2 H) (...,EX_succ_e
2,SUCCtex,b0241 or b0929 or b1377 or b2215,succ_e <=> succ_p,Succinate transport via diffusion (extracellul...,EX_succ_e
3,ATPS4rpp,((b3736 and b3737 and b3738) and (b3731 and b3...,adp_c + 4.0 h_p + pi_c <=> atp_c + h2o_c + 3.0...,ATP synthase (four protons for one ATP) (perip...,EX_akg_e
4,AKGt2rpp,b2587,akg_p + h_p <=> akg_c + h_c,2-oxoglutarate reversible transport via sympor...,EX_akg_e
...,...,...,...,...,...
106,H2Otpp,s0001 or b0875,h2o_p <=> h2o_c,H2O transport via diffusion (periplasm),EX_ump_e
107,PIuabcpp,b3726 and b3725 and b3727 and b3728,atp_c + h2o_c + pi_p --> adp_c + h_c + 2.0 pi_c,"Phosphate transport via ABC system (uptake, pe...",EX_ump_e
108,NTD2pp,b0480 or b4055,h2o_p + ump_p --> pi_p + uri_p,5'-nucleotidase (UMP),EX_ump_e
109,URItex,b0411,uri_e <=> uri_p,Uridine transport via diffusion (extracellular...,EX_ump_e


In [79]:
rxns_neededToFix_noGrowth_02.T.to_csv('annotationsNeededToFixGrowthSituations_eco_forPUB.csv')

## Re-run model predictions with the added reactions to see how well the model predict with the gapfilled model.

In [80]:
Biolog_Prediction = core.PerformFBAPredictions(model,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model)


In [81]:
# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))

35 57 22 99 213


In [82]:
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,35,57
Experiment,Growth,22,99


Obtain performance metrics.

In [83]:
# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)
# Specificity or true negative rate
TNR = TN/(TN+FP) 
# Precision or positive predictive value
PPV = TP/(TP+FP)
# Negative predictive value
NPV = TN/(TN+FN)
# Fall out or false positive rate
FPR = FP/(FP+TN)
# False negative rate
FNR = FN/(TP+FN)
# False discovery rate
FDR = FP/(TP+FP)
# Overall accuracy
ACC = (TP+TN)/(TP+FP+FN+TN)
# Matthew's
MCC = matthews_corrcoef(y_data, y_pred)
print('Recall:', TPR.round(3))
print('Precision:', PPV.round(3))
print('Accuracy:', ACC.round(3))
print('Matthew\'s correlation:', MCC.round(3))

Recall: 0.818
Precision: 0.635
Accuracy: 0.629
Matthew's correlation: 0.222
