# testingGapReactions

This Notebook aims to refine the *L. starkeyi* (lst) GSM using biolog experimental data - which determines metabolic activity on various carbon, nitrogen, phosphorus, and sulfur substrates. After the gapfilling performed in Round 4a notebook, we start determining which reactions we can add to the *Lipomyces starkeyi* draft GSM. 



In [1]:
%matplotlib inline
from matplotlib import pyplot as plt
from matplotlib import colors
import numpy as np
import pandas as pd
import cobra
import seaborn as sns
from sklearn.metrics import confusion_matrix, matthews_corrcoef
import core

from cobra.flux_analysis import gapfill

In [2]:
# load cobra LST model. 
# the v0.1 model has only been gap filled to enable growth. It includes reactions that were annotated in the sceModel but not the Lst model 
model = cobra.io.load_json_model("../models/lst_v0_3b_forPub.json")

Set parameter TokenServer to value "leghorn.emsl.pnl.gov"


In [3]:

rto = cobra.io.load_json_model('../models/Rt_IFO0880.json')
sce = cobra.io.load_json_model('../models/iMM904.json')
ylip = cobra.io.load_matlab_model('../models/twoModels/iYLI647_corr.mat')
eco  = cobra.io.load_json_model('../models/iJO1366.json')
yeast8 = cobra.io.load_matlab_model('../models/yeast8_modifiedwBIGGnames.mat')


This model seems to have confidenceScores instead of rxnConfidenceScores field. Will use confidenceScores for what rxnConfidenceScores represents.
This model seems to have metCharge instead of metCharges field. Will use metCharge for what metCharges represents.
No defined compartments in model model. Compartments will be deduced heuristically using regular expressions.
Using regular expression found the following compartments:c, e, g, m, n, r, v, x


In [4]:
# adjust iYLI647 parameters for consistent annotation. 
for m in ylip.metabolites:
    if ('[' in m.id):
        ylip.metabolites.get_by_id(m.id).id = m.id.replace('[','_').replace(']','')
    if ('_L_' in m.id):
        ylip.metabolites.get_by_id(m.id).id = m.id.replace('_L_','__L_')
    if ('_D_' in m.id):
        ylip.metabolites.get_by_id(m.id).id = m.id.replace('_D_','__D_')
for r in ylip.reactions:
    if ('(e)' in r.id):
        ylip.reactions.get_by_id(r.id).id = r.id.replace('(e)','_e')
    if(('_L_') in r.id):
        ylip.reactions.get_by_id(r.id).id = r.id.replace('_L_','__L_')

    if(('_D_') in r.id):
        ylip.reactions.get_by_id(r.id).id = r.id.replace('_D_','__D_')


## Biolog Assay - Data loading and prep.

The biolog data was collected from **Insert plate reader name here**. Two wavelengths, 590 & 750 nm were collected which represented biomass growth and metabolic activity, respectively. 


In [5]:
# well names for each plate. 
well_rows = 'ABCDEFGH'
well_columns = ['1','2','3','4','5','6','7','8','9','10','11','12']
well_names = [f'{row}{column}' for row in well_rows  for column in well_columns ]

In [6]:
# read in the data. 
Biolog = pd.read_csv('../data/round2_2023_dataForSim.csv',
#                     skiprows=10,index_col=0)
                        usecols=list(range(0,7)))

# # create the index out of the plate (PM1-4) and the well number.
Biolog.index = Biolog['PlateType']+'_'+Biolog['Well']
Biolog.head()

Unnamed: 0,PlateType,Experiment,Well,Row,Column,Compound,SigGrowth?
PM1_A1,PM1,Carbon,A1,1,1,Negative Control,False
PM1_A2,PM1,Carbon,A2,1,2,L-Arabinose,False
PM1_A3,PM1,Carbon,A3,1,3,N-Acetyl-DGlucosamine,False
PM1_A4,PM1,Carbon,A4,1,4,D-Saccharic Acid,False
PM1_A5,PM1,Carbon,A5,1,5,Succinic Acid,True


In [7]:
Biolog[Biolog['Compound']=='Negative Control']

Unnamed: 0,PlateType,Experiment,Well,Row,Column,Compound,SigGrowth?
PM1_A1,PM1,Carbon,A1,1,1,Negative Control,False
PM2_A1,PM2,Carbon,A1,1,1,Negative Control,False
PM3B_A1,PM3B,Nitrogen,A1,1,1,Negative Control,False
PM4A_A1,PM4A,Phosphorus,A1,1,1,Negative Control,False
PM4A_F1,PM4A,Sulfur,F1,6,1,Negative Control,False


In [8]:
Biolog.loc['PM1_B3','SigGrowth?']=True

In [9]:
Biolog.loc['PM2_A9','SigGrowth?']=True

In [10]:
Biolog.loc['PM1_F2','SigGrowth?']=True

# Correlate Biolog chemicals being assessed with their corresponding metabolites in the lst GSM. 

In order to test the GSM predictions - we need to take the biolog compound (i.e., Succinate) and test the model ability to grow on it as a carbon source. To do so, we need to know the uptake reaction ID in the GSM. Because the lst model was built using rto as a scaffold, we have the same reaction names and conventions. Thus, we can load that from the information which was provided in the supporting information of [Joonhoon's paper](https://www.frontiersin.org/articles/10.3389/fbioe.2020.612832/full). 

#### Load in the metadata about each biolog experiment, including the model metabolite names needed for growth. 


In [11]:
# read in the media names. 
Biolog_Media = pd.read_csv('../data/biolog_medium_csv.csv',
                           usecols=list(range(0,6)),index_col=0)
Biolog_Media = Biolog_Media.loc[Biolog_Media[['Carbon','Nitrogen','Phosphorus','Sulfur']].dropna(axis=0, how='all').index]
Biolog_Media.head()

Unnamed: 0_level_0,Source,Carbon,Nitrogen,Phosphorus,Sulfur
biolog,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
PM1_A2,Carbon,EX_arab__L_e,EX_nh4_e,EX_pi_e,EX_so4_e
PM1_A3,Carbon,EX_acgam_e,EX_nh4_e,EX_pi_e,EX_so4_e
PM1_A4,Carbon,EX_glcr_e,EX_nh4_e,EX_pi_e,EX_so4_e
PM1_A5,Carbon,EX_succ_e,EX_nh4_e,EX_pi_e,EX_so4_e
PM1_A6,Carbon,EX_gal_e,EX_nh4_e,EX_pi_e,EX_so4_e


We now generate a mapping between metabolites and carbon source.

In [12]:
print('Biolog assayed {} metabolites'.format(len(Biolog)))
print('GSM has {} metabolites that are a part of the model'.format(len(Biolog_Media)))
print('A total coverage of {:.1f}%'.format(len(Biolog_Media)/len(Biolog)*100))

Biolog assayed 384 metabolites
GSM has 285 metabolites that are a part of the model
A total coverage of 74.2%


### Preparing for model predictions. 

Create a dataframe to store our information on predictions. 

In [13]:
# create a dataframe to hold our information. 
Biolog_in_model = pd.DataFrame(columns=['Biolog','Model','Exchange','Metabolite',
                                        'Internal','External','SigGrowth?'])

# across the biolog data. 
for i, row in Biolog.iterrows():
    
    # grab the corresponding metabolic reaction. 
    if i in Biolog_Media.index:   
        
        # obtain the biolog medium.
        x = Biolog_Media.loc[i]
        
        # obtain the exchange reaction corresponding to the medium.
        Biolog_in_model.loc[i,'Exchange'] = x[x['Source']]
        
        # grab the metabolite name (i.e., get rid of the EX_ part of reaction).
        Biolog_in_model.loc[i,'Metabolite'] = x[x['Source']].replace('EX_','')
        
        # perform further splitting if there are more than one source in the biolog assay.
        if ',' in Biolog_in_model.loc[i,'Metabolite']:
                  
            # detrmine external metabolite in the model corresponding to the reaction name, look to see if there is a internal metabolite by replacing the '_e' tail with '_c'.
            Biolog_in_model.at[i,'Model'] = [model.metabolites.get_by_id(x).name if x in model.metabolites \
                else next((model.metabolites.get_by_id(x.rsplit('_',1)[0]+'_'+c).name for c in model.compartments \
                    if x.rsplit('_',1)[0]+'_'+c in model.metabolites), None) \
                for x in Biolog_in_model.loc[i,'Metabolite'].split(',')]
            
            # store the internal metabolites (with the '_c' tail). 
            Biolog_in_model.loc[i,'Internal'] = all(any(x.rsplit('_',1)[0]+'_'+c in model.metabolites
                                                        for c in model.compartments) \
                                                     for x in Biolog_in_model.loc[i,'Metabolite'].split(','))
            
            # store the external metaoblites (with the '_e' tail).
            Biolog_in_model.loc[i,'External'] = all(x in model.metabolites \
                                                    for x in Biolog_in_model.loc[i,'Metabolite'].split(','))
        
        # if there is only one source in the biolog assay. 
        else:
            # detrmine external metabolite in the model corresponding to the reaction name, look to see if there is a internal metabolite by replacing the '_e' tail with '_c'.
            Biolog_in_model.loc[i,'Model'] = model.metabolites.get_by_id(Biolog_in_model.loc[i,'Metabolite']).name \
                if Biolog_in_model.loc[i,'Metabolite'] in model.metabolites \
                else next((model.metabolites.get_by_id(Biolog_in_model.loc[i,'Metabolite'].rsplit('_',1)[0]+'_'+c).name \
                    for c in model.compartments 
                          if Biolog_in_model.loc[i,'Metabolite'].rsplit('_',1)[0]+c in model.metabolites), None)
            
            # store the internal metabolites (with the '_c' tail). 
            Biolog_in_model.loc[i,'Internal'] = any(Biolog_in_model.loc[i,'Metabolite'].rsplit('_',1)[0]+'_'+c
                                                    in model.metabolites for c in model.compartments)
            
            # store the external metaoblites (with the '_e' tail).
            Biolog_in_model.loc[i,'External'] = Biolog_in_model.loc[i,'Metabolite'] in model.metabolites
    
    # case where there is no uptake/metabolite reactions corresponding to the sources tested in biolog (i.e, Tween80). 
    else:
        Biolog_in_model.loc[i] = None
    
    # store the growth data from the biolog assay in the storage dataframe. 
    Biolog_in_model.loc[i,'Biolog'] = row['Compound']
#     Biolog_in_model.loc[i,'Average'] = row['Average']
    Biolog_in_model.loc[i,'SigGrowth?'] = row['SigGrowth?']

  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog_in_model.loc[i] = None
  Biolog

In [14]:
Biolog_in_model.head(50)

Unnamed: 0,Biolog,Model,Exchange,Metabolite,Internal,External,SigGrowth?
PM1_A1,Negative Control,,,,,,False
PM1_A2,L-Arabinose,L-Arabinose,EX_arab__L_e,arab__L_e,True,True,False
PM1_A3,N-Acetyl-DGlucosamine,N-Acetyl-D-glucosamine,EX_acgam_e,acgam_e,True,True,False
PM1_A4,D-Saccharic Acid,,EX_glcr_e,glcr_e,False,False,False
PM1_A5,Succinic Acid,Succinate,EX_succ_e,succ_e,True,True,True
PM1_A6,D-Galactose,D-Galactose,EX_gal_e,gal_e,True,True,True
PM1_A7,L-Aspartic Acid,L-Aspartate,EX_asp__L_e,asp__L_e,True,True,False
PM1_A8,L-Proline,L-Proline,EX_pro__L_e,pro__L_e,True,True,False
PM1_A9,D-Alanine,D-Alanine,EX_ala__D_e,ala__D_e,True,True,False
PM1_A10,D-Trehalose,Trehalose,EX_tre_e,tre_e,True,True,False


# Running GSM predictions.

create a dataframe to hold our GSM prediction results. 

In [15]:
Biolog_Prediction = pd.DataFrame(columns=['PlateType','Experiment','Row','Column',
                                          'Data','Data_TF', 'Prediction','Prediction_TF'])

### Perform predictions.

The model carbon, nitrogen, phosphorus and sulfur sources uptake rates are set to zero unless used in the model. 


In [16]:
# threshold for being able to grow.
threshold = 1e-3

# remove the lower bound.
Biolog_Prediction = core.PerformFBAPredictions(model,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model,threshold=threshold)


In [17]:

# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))

52 38 56 67 213


In [18]:

df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,52,38
Experiment,Growth,56,67


In [19]:
# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)
# Specificity or true negative rate
TNR = TN/(TN+FP) 
# Precision or positive predictive value
PPV = TP/(TP+FP)
# Negative predictive value
NPV = TN/(TN+FN)
# Fall out or false positive rate
FPR = FP/(FP+TN)
# False negative rate
FNR = FN/(TP+FN)
# False discovery rate
FDR = FP/(TP+FP)
# Overall accuracy
ACC = (TP+TN)/(TP+FP+FN+TN)
# Matthew's
MCC = matthews_corrcoef(y_data, y_pred)
print('Recall:', TPR.round(3))
print('Precision:', PPV.round(3))
print('Accuracy:', ACC.round(3))
print('Matthew\'s correlation:', MCC.round(3))

Recall: 0.545
Precision: 0.638
Accuracy: 0.559
Matthew's correlation: 0.121


In [20]:
model_reactions_added = model.copy()

Read LP format model from file /var/folders/kn/zzns_smn1q79xdnf1tgqk__r0000gn/T/tmpjf2x362w.lp
Reading time = 0.01 seconds
: 1908 rows, 4346 columns, 17652 nonzeros


In [21]:
model_reactions_added

0,1
Name,Lst
Memory address,29e5c6850
Number of metabolites,1908
Number of reactions,2173
Number of genes,996
Number of groups,0
Objective expression,1.0*BIOMASS_Ls - 1.0*BIOMASS_Ls_reverse_27d85
Compartments,"c, m, r, e, x, v, n, g, d"


In [22]:
r = yeast8.reactions.get_by_id('MALt').copy()
model_reactions_added.add_reactions([r])

In [23]:
# threshold for being able to grow.
threshold = 1e-3

# remove the lower bound.
Biolog_Prediction = core.PerformFBAPredictions(model_reactions_added,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model,threshold=threshold)

# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

52 38 55 68 213


Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,52,38
Experiment,Growth,55,68


In [24]:
r = yeast8.reactions.get_by_id('SUCCt').copy()
model_reactions_added.add_reactions([r])

In [25]:
# threshold for being able to grow.
threshold = 1e-3

# remove the lower bound.
Biolog_Prediction = core.PerformFBAPredictions(model_reactions_added,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model,threshold=threshold)

# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

52 38 54 69 213


Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,52,38
Experiment,Growth,54,69


In [26]:
r = yeast8.reactions.get_by_id('FUMtr').copy()
model_reactions_added.add_reactions([r])

In [27]:
# threshold for being able to grow.
threshold = 1e-3

# remove the lower bound.
Biolog_Prediction = core.PerformFBAPredictions(model_reactions_added,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model,threshold=threshold)

# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

52 38 53 70 213


Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,52,38
Experiment,Growth,53,70


In [28]:
r = sce.reactions.get_by_id('AKGt2r').copy()
model_reactions_added.add_reactions([r])

In [29]:
# threshold for being able to grow.
threshold = 1e-3

# remove the lower bound.
Biolog_Prediction = core.PerformFBAPredictions(model_reactions_added,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model,threshold=threshold)

# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

52 38 52 71 213


Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,52,38
Experiment,Growth,52,71


In [30]:
# threshold for being able to grow.
threshold = 1e-3

# remove the lower bound.
Biolog_Prediction = core.PerformFBAPredictions(model_reactions_added,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model,threshold=threshold)

# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

52 38 52 71 213


Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,52,38
Experiment,Growth,52,71


In [31]:
# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)
# Specificity or true negative rate
TNR = TN/(TN+FP) 
# Precision or positive predictive value
PPV = TP/(TP+FP)
# Negative predictive value
NPV = TN/(TN+FN)
# Fall out or false positive rate
FPR = FP/(FP+TN)
# False negative rate
FNR = FN/(TP+FN)
# False discovery rate
FDR = FP/(TP+FP)
# Overall accuracy
ACC = (TP+TN)/(TP+FP+FN+TN)
# Matthew's
MCC = matthews_corrcoef(y_data, y_pred)
print('Recall:', TPR.round(3))
print('Precision:', PPV.round(3))
print('Accuracy:', ACC.round(3))
print('Matthew\'s correlation:', MCC.round(3))

Recall: 0.577
Precision: 0.651
Accuracy: 0.577
Matthew's correlation: 0.153


In [32]:
model_reactions_added.optimize().objective_value

0.08888212291591462

In [33]:
model.optimize().objective_value

0.08888212290636534

In [34]:
# threshold for being able to grow.
threshold = 1e-3

# remove the lower bound.
Biolog_Prediction = core.PerformFBAPredictions(model_reactions_added,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model,threshold=threshold)

# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

52 38 52 71 213


Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,52,38
Experiment,Growth,52,71


In [35]:
# r = sce.reactions.get_by_id('PACALDt').copy()
# model_reactions_added.add_reactions([r])

In [36]:
model_reactions_added.optimize().objective_value

0.08888212291591348

In [37]:
# threshold for being able to grow.
threshold = 1e-3

# remove the lower bound.
Biolog_Prediction = core.PerformFBAPredictions(model_reactions_added,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model,threshold=threshold)

# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

52 38 52 71 213


Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,52,38
Experiment,Growth,52,71


In [38]:
model_reactions_added.genes.Lipst1_1_55185

0,1
Gene identifier,Lipst1_1_55185
Name,Lipst1_1_55185
Memory address,0x29dad4110
Functional,True
In 2 reaction(s),"H2CO3D, HCO3E"


In [39]:
# threshold for being able to grow.
threshold = 1e-3

# remove the lower bound.
Biolog_Prediction = core.PerformFBAPredictions(model_reactions_added,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model,threshold=threshold)

# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

52 38 52 71 213


Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,52,38
Experiment,Growth,52,71


In [40]:
model_reactions_added.optimize().objective_value

0.08888212291591388

In [41]:
# gene may not be targetting mitochondria.
r = rto.reactions.get_by_id('NTD9').copy()
r.gene_reaction_rule = 'Lipst1_1_4357'
model_reactions_added.add_reactions([r])


In [42]:
# threshold for being able to grow.
threshold = 1e-3

# remove the lower bound.
Biolog_Prediction = core.PerformFBAPredictions(model_reactions_added,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model,threshold=threshold)

# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

51 39 49 74 213


Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,51,39
Experiment,Growth,49,74


In [43]:
# gene may not be targetting mitochondria.
r = rto.reactions.get_by_id('NTD7e').copy()
r.gene_reaction_rule = 'Lipst1_1_2594'
model_reactions_added.add_reactions([r])


In [44]:
model_reactions_added.optimize().objective_value

0.08888212291591359

In [45]:
# threshold for being able to grow.
threshold = 1e-3

# remove the lower bound.
Biolog_Prediction = core.PerformFBAPredictions(model_reactions_added,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model,threshold=threshold)

# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

51 39 48 75 213


Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,51,39
Experiment,Growth,48,75


In [46]:
# gene may not be targetting mitochondria.
r = rto.reactions.get_by_id('G2PPe').copy()
r.gene_reaction_rule = 'Lipst1_1_2594'
model_reactions_added.add_reactions([r])


In [47]:
# threshold for being able to grow.
threshold = 1e-3

# remove the lower bound.
Biolog_Prediction = core.PerformFBAPredictions(model_reactions_added,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model,threshold=threshold)

# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

51 39 47 76 213


Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,51,39
Experiment,Growth,47,76


In [48]:
# gene may not be targetting mitochondria.
r = rto.reactions.get_by_id('2PGPe').copy()
r.gene_reaction_rule = 'Lipst1_1_2594'
model_reactions_added.add_reactions([r])


In [49]:
# threshold for being able to grow.
threshold = 1e-3

# remove the lower bound.
Biolog_Prediction = core.PerformFBAPredictions(model_reactions_added,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model,threshold=threshold)

# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

51 39 46 77 213


Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,51,39
Experiment,Growth,46,77


In [50]:
# gene may not be targetting mitochondria.
r = rto.reactions.get_by_id('3PGPe').copy()
r.gene_reaction_rule = 'Lipst1_1_2594'
model_reactions_added.add_reactions([r])


In [51]:
# threshold for being able to grow.
threshold = 1e-3

# remove the lower bound.
Biolog_Prediction = core.PerformFBAPredictions(model_reactions_added,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model,threshold=threshold)

# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

51 39 45 78 213


Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,51,39
Experiment,Growth,45,78


In [52]:
# gene may not be targetting mitochondria.
r = rto.reactions.get_by_id('NTD9e').copy()
r.gene_reaction_rule = 'Lipst1_1_2594'
model_reactions_added.add_reactions([r])


In [53]:
# threshold for being able to grow.
threshold = 1e-3

# remove the lower bound.
Biolog_Prediction = core.PerformFBAPredictions(model_reactions_added,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model,threshold=threshold)

# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

51 39 44 79 213


Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,51,39
Experiment,Growth,44,79


In [54]:
# gene may not be targetting mitochondria.
r = rto.reactions.get_by_id('PEPPe').copy()
r.gene_reaction_rule = 'Lipst1_1_2594'
model_reactions_added.add_reactions([r])


In [55]:
# threshold for being able to grow.
threshold = 1e-3

# remove the lower bound.
Biolog_Prediction = core.PerformFBAPredictions(model_reactions_added,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model,threshold=threshold)

# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

51 39 43 80 213


Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,51,39
Experiment,Growth,43,80


In [56]:
# gene may not be targetting mitochondria.
r = rto.reactions.get_by_id('GAM6PPe').copy()
r.gene_reaction_rule = 'Lipst1_1_2594'
model_reactions_added.add_reactions([r])

In [57]:
# threshold for being able to grow.
threshold = 1e-3

# remove the lower bound.
Biolog_Prediction = core.PerformFBAPredictions(model_reactions_added,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model,threshold=threshold)

# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

51 39 42 81 213


Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,51,39
Experiment,Growth,42,81


In [58]:
# gene may not be targetting mitochondria.
r = rto.reactions.get_by_id('GNPe').copy()
r.gene_reaction_rule = 'Lipst1_1_2594'
model_reactions_added.add_reactions([r])

In [59]:
# threshold for being able to grow.
threshold = 1e-3

# remove the lower bound.
Biolog_Prediction = core.PerformFBAPredictions(model_reactions_added,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model,threshold=threshold)

# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

51 39 41 82 213


Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,51,39
Experiment,Growth,41,82


In [60]:
# gene may not be targetting mitochondria.
r = rto.reactions.get_by_id('NTD4e').copy()
r.gene_reaction_rule = 'Lipst1_1_2594'
model_reactions_added.add_reactions([r])

In [61]:
# threshold for being able to grow.
threshold = 1e-3

# remove the lower bound.
Biolog_Prediction = core.PerformFBAPredictions(model_reactions_added,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model,threshold=threshold)

# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

51 39 40 83 213


Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,51,39
Experiment,Growth,40,83


In [62]:
# gene may not be targetting mitochondria.
r = rto.reactions.get_by_id('MAN1PPe').copy()
r.gene_reaction_rule = 'Lipst1_1_2594'
model_reactions_added.add_reactions([r])

In [63]:
# threshold for being able to grow.
threshold = 1e-3

# remove the lower bound.
Biolog_Prediction = core.PerformFBAPredictions(model_reactions_added,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model,threshold=threshold)

# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

51 39 39 84 213


Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,51,39
Experiment,Growth,39,84


In [64]:
# gene may not be targetting mitochondria.
r = rto.reactions.get_by_id('MAN6PPe').copy()
r.gene_reaction_rule = 'Lipst1_1_2594'
model_reactions_added.add_reactions([r])

In [65]:
# threshold for being able to grow.
threshold = 1e-3

# remove the lower bound.
Biolog_Prediction = core.PerformFBAPredictions(model_reactions_added,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model,threshold=threshold)

# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

51 39 38 85 213


Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,51,39
Experiment,Growth,38,85


In [66]:
# gene may not be targetting mitochondria.
r = rto.reactions.get_by_id('NTD2e').copy()
r.gene_reaction_rule = 'Lipst1_1_2594'
model_reactions_added.add_reactions([r])

In [67]:
# threshold for being able to grow.
threshold = 1e-3

# remove the lower bound.
Biolog_Prediction = core.PerformFBAPredictions(model_reactions_added,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model,threshold=threshold)

# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

51 39 37 86 213


Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,51,39
Experiment,Growth,37,86


In [68]:
# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)
# Specificity or true negative rate
TNR = TN/(TN+FP) 
# Precision or positive predictive value
PPV = TP/(TP+FP)
# Negative predictive value
NPV = TN/(TN+FN)
# Fall out or false positive rate
FPR = FP/(FP+TN)
# False negative rate
FNR = FN/(TP+FN)
# False discovery rate
FDR = FP/(TP+FP)
# Overall accuracy
ACC = (TP+TN)/(TP+FP+FN+TN)
# Matthew's
MCC = matthews_corrcoef(y_data, y_pred)
print('Recall:', TPR.round(3))
print('Precision:', PPV.round(3))
print('Accuracy:', ACC.round(3))
print('Matthew\'s correlation:', MCC.round(3))

Recall: 0.699
Precision: 0.688
Accuracy: 0.643
Matthew's correlation: 0.267


In [69]:
model_reactions_added.optimize().objective_value

0.08888212291591358

In [70]:
# # gene may not be targetting mitochondria.
# r = yeast8.reactions.get_by_id('AMPt6').copy()
# # r.gene_reaction_rule = 'Lipst1_1_2594'
# model_reactions_added.add_reactions([r])

In [71]:
# model_reactions_added.reactions.AMPt6

In [72]:
# threshold for being able to grow.
threshold = 1e-3

# remove the lower bound.
Biolog_Prediction = core.PerformFBAPredictions(model_reactions_added,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model,threshold=threshold)

# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

51 39 37 86 213


Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,51,39
Experiment,Growth,37,86


In [73]:
genes = model.genes
genes = [g.id for g in genes]
wrong = [g for g in genes if 'Lipst1' not in g]
len(set(wrong))
wrong_g2 = []
wrong_rx2 = []
for g in wrong:
    reac = model.genes.get_by_id(g).reactions
    if not reac:
        print(f'{g} has no associated reaction')
        
        cobra.manipulation.delete.remove_genes(model,[g])
    for rx in reac:
        if rx:
#             print(rx,g)
            wrong_g2.append(g)
            wrong_rx2.append(rx)
print(len(set(wrong_g2)))
print(len(set(wrong_rx2)))

1
1


#### glycogen was a carbon source that should enable growth. 

In [74]:
for rx in model.metabolites.glycogen_e.reactions:
    print(rx,rx.bounds)

EX_glycogen_e: glycogen_e -->  (0.0, 1000.0)


In [75]:
for rx in rto.metabolites.glycogen_e.reactions:
    print(rx)


EX_glycogen_e: glycogen_e --> 


In [76]:
for rx in model.metabolites.glycogen_e.reactions:
    print(rx)

EX_glycogen_e: glycogen_e --> 


In [77]:
for rx in model.metabolites.glycogen_c.reactions:
    print(rx)

GLCP: glycogen_c + pi_c --> g1p_c
BIOMASS_Ls: 0.6542899047921068 13BDglcn_c + 0.12116467064111869 16BDglcn_c + 0.0008767124745935498 5mthf_c + 0.45552 alatrna_c + 0.141799 argtrna_c + 0.132431 asntrna_c + 0.204341 asptrna_c + 96.233892 atp_c + 0.0016522920994288411 btn_m + 0.0005685306148572357 ca2_c + 0.001224527478154046 camp_c + 0.021186648694289174 chitin_c + 1.1e-05 clpn_LS_m + 0.0005261641507693166 coa_c + 0.045479 ctp_c + 0.0003587482846154432 cu2_c + 0.040268 cystrna_c + 0.003483 datp_c + 0.003166 dctp_c + 0.00278 dgtp_c + 0.003587 dttp_c + 0.005033 ergst_r + 1e-05 ergstest_LS_r + 0.0005131808795165673 fad_c + 0.00040794804936270397 fe2_c + 0.00040794804936270397 fe3_c + 0.115471 glntrna_c + 0.199788 glutrna_c + 0.3540121239284534 glycogen_c + 0.442117 glytrna_c + 0.0013119937265936208 gthrd_c + 0.050529 gtp_c + 92.341901 h2o_c + 0.0004721810755605166 hemeA_m + 0.060743 histrna_c + 0.201923 iletrna_c + 0.4002284696078455 k_c + 0.293113 leutrna_c + 0.0021231065148574896 lipopb_m

missing a transport reaction. 

limited [BIGG](http://bigg.ucsd.edu/search?query=glycogen_c) glycogen reactions. One [transport reaction](http://bigg.ucsd.edu/universal/reactions/GLYCOGENt).

In [78]:
r = cobra.Reaction("GLYCOGENt")

r.id = "GLYCOGENt"

r.add_metabolites({model_reactions_added.metabolites.glycogen_e: -1,
                  model_reactions_added.metabolites.glycogen_c: 1,
                                 
                  
                  })
model_reactions_added.add_reactions([r])

In [79]:
# threshold for being able to grow.
threshold = 1e-3

# remove the lower bound.
Biolog_Prediction = core.PerformFBAPredictions(model_reactions_added,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model,threshold=threshold)

# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

51 39 36 87 213


Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,51,39
Experiment,Growth,36,87


In [80]:
with model_reactions_added:
    model_reactions_added.reactions.EX_glc__D_e.bounds=0,0
    model_reactions_added.reactions.EX_glycogen_e.bounds=-10,-10
    print(model_reactions_added.optimize().objective_value)


0.9193531810600025


In [81]:
model_reactions_added.optimize().objective_value

0.08888212291591259

In [82]:
model_reactions_added.metabolites.cytd_e

0,1
Metabolite identifier,cytd_e
Name,Cytidine
Memory address,0x29d79ac10
Formula,C9H13N3O5
Compartment,e
In 3 reaction(s),"NTD4e, EX_cytd_e, CYTDt2"


In [83]:
yeast8.reactions.DTTPt

0,1
Reaction identifier,DTTPt
Name,dTTP uniport
Memory address,0x29d367dd0
Stoichiometry,dttp_e <=> dttp_c  dTTP <=> dTTP
GPR,
Lower bound,-1000.0
Upper bound,1000.0


enable cytodine growth.

In [84]:
r = yeast8.reactions.DTTPt.copy()

# r.id = "GLYCOGENt"

# r.add_metabolites({model_reactions_added.metabolites.glycogen_e: -1,
#                   model_reactions_added.metabolites.glycogen_c: 1,
                                 
                  
#                   })
model_reactions_added.add_reactions([r])

In [85]:
r = yeast8.reactions.EX_dttp_e.copy()

# r.id = "GLYCOGENt"

# r.add_metabolites({model_reactions_added.metabolites.glycogen_e: -1,
#                   model_reactions_added.metabolites.glycogen_c: 1,
                                 
                  
#                   })
model_reactions_added.add_reactions([r])

In [86]:
# threshold for being able to grow.
threshold = 1e-3

# remove the lower bound.
Biolog_Prediction = core.PerformFBAPredictions(model_reactions_added,Biolog,Biolog_Prediction,Biolog_Media,Biolog_in_model,threshold=threshold)

# determine biolog conditions that were able to be tested in the model. 
temp = Biolog_Prediction.index[~np.isnan((Biolog_Prediction.Prediction).astype(float))]

# experimental data.
y_data = Biolog_Prediction.Data_TF[temp].astype(int)

# model prediction.
y_pred = Biolog_Prediction.Prediction_TF[temp].astype(int)
TN, FP, FN, TP = confusion_matrix(y_data, y_pred).ravel()
print(TN, FP, FN, TP, sum([TN, FP, FN, TP]))
df_confusion = pd.DataFrame(confusion_matrix(y_data, y_pred),
                            index = pd.MultiIndex.from_product([['Experiment'],['No growth', 'Growth']]),
                            columns = pd.MultiIndex.from_product([['Prediction'],['No growth', 'Growth']]))
df_confusion

51 39 34 89 213


Unnamed: 0_level_0,Unnamed: 1_level_0,Prediction,Prediction
Unnamed: 0_level_1,Unnamed: 1_level_1,No growth,Growth
Experiment,No growth,51,39
Experiment,Growth,34,89


In [92]:
model_reactions_added.optimize().objective_value

0.08888212291591345

In [108]:
34+89

123

In [90]:
89/123

0.7235772357723578

In [93]:
genes = model_reactions_added.genes
genes = [g.id for g in genes]
wrong = [g for g in genes if 'Lipst1' not in g]
len(set(wrong))
wrong_g2 = []
wrong_rx2 = []
for g in wrong:
    reac = model_reactions_added.genes.get_by_id(g).reactions
    if not reac:
        print(f'{g} has no associated reaction')
        
        cobra.manipulation.delete.remove_genes(model_reactions_added,[g])
    for rx in reac:
        if rx:
#             print(rx,g)
            wrong_g2.append(g)
            wrong_rx2.append(rx)
print(len(set(wrong_g2)))
print(len(set(wrong_rx2)))

1
1


In [94]:
wrong_rx2

[<Reaction FA220tp at 0x29eb28310>]

In [89]:
wrong_g2

['9912']

In [90]:
# cobra.manipulation.delete.remove_genes(model_reactions_added,['9912'])

In [95]:
# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)
# Specificity or true negative rate
TNR = TN/(TN+FP) 
# Precision or positive predictive value
PPV = TP/(TP+FP)
# Negative predictive value
NPV = TN/(TN+FN)
# Fall out or false positive rate
FPR = FP/(FP+TN)
# False negative rate
FNR = FN/(TP+FN)
# False discovery rate
FDR = FP/(TP+FP)
# Overall accuracy
ACC = (TP+TN)/(TP+FP+FN+TN)
# Matthew's
MCC = matthews_corrcoef(y_data, y_pred)
print('Recall:', TPR.round(3))
print('Precision:', PPV.round(3))
print('Accuracy:', ACC.round(3))
print('Matthew\'s correlation:', MCC.round(3))

Recall: 0.724
Precision: 0.695
Accuracy: 0.657
Matthew's correlation: 0.293


In [96]:
cobra.manipulation.delete.remove_genes(model_reactions_added,['9912'])

In [97]:
model.reactions.FA220tp

0,1
Reaction identifier,FA220tp
Name,Fatty acid peroxisomal transport
Memory address,0x298d3e9d0
Stoichiometry,"docosac_c --> docosac_x  Behenate, Docosanoate --> Behenate, Docosanoate"
GPR,9912
Lower bound,0.0
Upper bound,1000.0


In [98]:
model.reactions.FA220tp.gene_reaction_rule = ''
model.reactions.FA220tp

0,1
Reaction identifier,FA220tp
Name,Fatty acid peroxisomal transport
Memory address,0x298d3e9d0
Stoichiometry,"docosac_c --> docosac_x  Behenate, Docosanoate --> Behenate, Docosanoate"
GPR,
Lower bound,0.0
Upper bound,1000.0


In [99]:
model_reactions_added.objective='BIOMASS_Ls'
model_reactions_added.optimize()

Unnamed: 0,fluxes,reduced_costs
ALCD25yi,0.0,-0.001716
MTHFCm,0.0,-0.000000
AMPN,0.0,-0.004414
DAGCPTer_LS,0.0,0.000000
PYRt2,0.0,0.000000
...,...,...
MAN6PPe,0.0,0.000000
NTD2e,0.0,0.000000
GLYCOGENt,0.0,0.000000
DTTPt,0.0,0.000000


In [96]:
model_reactions_added.objective='TAGL_LS'
model_reactions_added.optimize()

Unnamed: 0,fluxes,reduced_costs
ALCD25yi,0.0,0.0
MTHFCm,0.0,-0.0
AMPN,0.0,0.0
DAGCPTer_LS,0.0,0.0
PYRt2,0.0,0.0
...,...,...
MAN6PPe,0.0,0.0
NTD2e,0.0,0.0
GLYCOGENt,0.0,0.0
DTTPt,0.0,-0.0


In [100]:
r = model.reactions.FA220tp.copy()
model.reactions.FA220tp.gene_reaction_rule = ''
model.reactions.FA220tp


0,1
Reaction identifier,FA220tp
Name,Fatty acid peroxisomal transport
Memory address,0x298d3e9d0
Stoichiometry,"docosac_c --> docosac_x  Behenate, Docosanoate --> Behenate, Docosanoate"
GPR,
Lower bound,0.0
Upper bound,1000.0


In [101]:
model_reactions_added.add_reactions([r])

In [102]:
model_reactions_added.objective='TAGL_LS'
model_reactions_added.optimize()

Unnamed: 0,fluxes,reduced_costs
ALCD25yi,0.000000,-8.881784e-16
MTHFCm,0.000000,-0.000000e+00
AMPN,0.000000,-2.266546e-01
DAGCPTer_LS,0.000000,-1.387779e-17
PYRt2,0.000000,0.000000e+00
...,...,...
NTD2e,0.000000,0.000000e+00
GLYCOGENt,0.000000,0.000000e+00
DTTPt,0.000000,0.000000e+00
EX_dttp_e,0.000000,-6.180115e+00


In [103]:
model_reactions_added.objective='BIOMASS_Ls'
model_reactions_added.optimize()

Unnamed: 0,fluxes,reduced_costs
ALCD25yi,0.0,-0.001716
MTHFCm,0.0,-0.000000
AMPN,0.0,-0.004414
DAGCPTer_LS,0.0,0.000000
PYRt2,0.0,0.000000
...,...,...
NTD2e,0.0,0.000000
GLYCOGENt,0.0,0.000000
DTTPt,0.0,0.000000
EX_dttp_e,0.0,-0.317042


In [104]:
genes = model_reactions_added.genes
genes = [g.id for g in genes]
wrong = [g for g in genes if 'Lipst1' not in g]
len(set(wrong))
wrong_g2 = []
wrong_rx2 = []
for g in wrong:
    reac = model_reactions_added.genes.get_by_id(g).reactions
    if not reac:
        print(f'{g} has no associated reaction')
        
        cobra.manipulation.delete.remove_genes(model_reactions_added,[g])
    for rx in reac:
        if rx:
#             print(rx,g)
            wrong_g2.append(g)
            wrong_rx2.append(rx)
print(len(set(wrong_g2)))
print(len(set(wrong_rx2)))

0
0


In [105]:
model_reactions_added.objective='TAGL_LS'
model_reactions_added.optimize()

Unnamed: 0,fluxes,reduced_costs
ALCD25yi,0.000000,1.776357e-15
MTHFCm,0.000000,-0.000000e+00
AMPN,0.000000,-2.266546e-01
DAGCPTer_LS,0.000000,-7.105427e-15
PYRt2,0.000000,0.000000e+00
...,...,...
NTD2e,0.000000,0.000000e+00
GLYCOGENt,0.000000,0.000000e+00
DTTPt,0.000000,0.000000e+00
EX_dttp_e,0.000000,-1.426413e+01


In [106]:
model_reactions_added.objective='BIOMASS_Ls'
model_reactions_added.optimize()

Unnamed: 0,fluxes,reduced_costs
ALCD25yi,0.0,-1.716392e-03
MTHFCm,0.0,-0.000000e+00
AMPN,0.0,-4.413579e-03
DAGCPTer_LS,0.0,-1.110223e-16
PYRt2,0.0,0.000000e+00
...,...,...
NTD2e,0.0,0.000000e+00
GLYCOGENt,0.0,0.000000e+00
DTTPt,0.0,0.000000e+00
EX_dttp_e,0.0,-3.170421e-01


In [107]:
cobra.io.save_json_model(model_reactions_added,'../models/Lst_v0_4_forPUB.json')