## **Notebook to make predictions with reprocessed Yoneda data and EFLUX2** 

By Christina Schenk and Garrett Roell

Tested on biodesign_3.7 kernel on jprime

### EFLUX2 predictions and evaluations
This notebook predicts fluxes for R. opacus cultures growing with glucose.

#### **Data Labels**:

#### Henson transcriptomics data: 
* WT 1.0 g/L Glucose (**WT-G**) (3 trials at 2 timepoints, 6 total trials)

#### Combined with Rhiannon 2018 metabolomics and OD data:
* Metabolomics and OD data for WT Glucose (**WT-G**)
                                               

### Method: 
<ol>
<li>Predict fluxes with EFLUX2</li>
<li>Compare predictions with 13CMFA: Scatter plots and flux maps</li>
<li>Load file with observed growth rates (Notebook E)</li>
<li>Save growth rate predictions to csv file</li>
</ol>



#### **Set up imports**

In [1]:
import os
import sys
import pandas as pd
import numpy as np
import cobra
import scipy.stats
#import cplex
%matplotlib inline

import matplotlib
from matplotlib import pyplot as plt
from matplotlib.offsetbox import (TextArea, DrawingArea, OffsetImage,
                                  AnnotationBbox)
from matplotlib.cbook import get_sample_data
import matplotlib.image as mpimg
import matplotlib.cm as cm

from edd_utils import login, export_study, export_metadata

from sklearn.metrics import r2_score

output_dir = '../plots/'
source_dir = '../src'
sys.path.append(source_dir)
from ensemblemethods import *


from plot import *
from utils import *

#### **Load Genome Scale Model**

In [2]:
model = cobra.io.read_sbml_model("../models/Ropacus_annotated.xml")

#### **Load Transcript Data**

In [3]:
transcript_df = pd.read_csv('../data/transcripts/csv/henson_CPM_melted.csv')

# isolate wildtype glucose data
transcript_df = transcript_df[transcript_df['Line Name'].str.contains("WT-Glu")]

# separate data from different time points
transcript_df_T1 = transcript_df[transcript_df['Time'] == 10]
transcript_df_T2 = transcript_df[transcript_df['Time'] == 13]

# preview data
display(transcript_df_T1.head())
display(transcript_df_T2.head())

Unnamed: 0,Line Name,Measurement Type,Time,Count,Units
6,WT-Glu-R1,WP_005263480_1,10,27.446117,CPM
7,WT-Glu-R2,WP_005263480_1,10,27.543845,CPM
8,WT-Glu-R3,WP_005263480_1,10,26.797758,CPM
60,WT-Glu-R1,WP_005249107_1,10,355.352526,CPM
61,WT-Glu-R2,WP_005249107_1,10,349.533393,CPM


Unnamed: 0,Line Name,Measurement Type,Time,Count,Units
9,WT-Glu-R1,WP_005263480_1,13,15.584245,CPM
10,WT-Glu-R2,WP_005263480_1,13,22.444427,CPM
11,WT-Glu-R3,WP_005263480_1,13,20.43331,CPM
63,WT-Glu-R1,WP_005249107_1,13,219.79857,CPM
64,WT-Glu-R2,WP_005249107_1,13,328.518773,CPM


#### **Run EFlux2 for Each Condition**

In [4]:
# create dictionary to hold eflux solutions
eflux_solutions = {}

# loop over trials
for time in ['T1', 'T2']:
    for trial in ['WT-Glu-R1', 'WT-Glu-R2','WT-Glu-R3']:
        
        # display trial name
        trial_number = trial.split('-R')[1]
        trial_name = f'glucose_eflux_{time.lower()}_{trial_number}'
        print(trial_name)
        
        # get the data for that trial
        transcriptomics = []
        if time == 'T1':
            transcriptomics = transcript_df_T1[transcript_df_T1['Line Name'] == trial]
        else:
            transcriptomics = transcript_df_T2[transcript_df_T2['Line Name'] == trial]

        # map the transcripts to genome scale reactions
        trans_data = construct_trans_df(transcriptomics, trial)
        
        # run eflux to get genome scale fluxes
        eflux2_sol = eflux2_pred(model, trans_data, trial, 'glucose', sub_uptake_rate=100)
        
        # add eflux solution to dictionary
        eflux_solutions[trial_name] = eflux2_sol

glucose_eflux_t1_1
FBA status optimal
FBA solution 4.486259051815459


<optlang.cplex_interface.Objective at 0x7fa3c8557410>

{'EX_ca2_e': 1000.0,
 'EX_cl_e': 1000.0,
 'EX_cobalt2_e': 1000.0,
 'EX_cu2_e': 1000.0,
 'EX_fe2_e': 1000.0,
 'EX_fe3_e': 1000.0,
 'EX_glc__D_e': 1000.0,
 'EX_h2o_e': 1000.0,
 'EX_h_e': 1000.0,
 'EX_k_e': 1000.0,
 'EX_mg2_e': 1000.0,
 'EX_mn2_e': 1000.0,
 'EX_mobd_e': 1000.0,
 'EX_nh4_e': 1000.0,
 'EX_o2_e': 1000.0,
 'EX_pi_e': 1000.0,
 'EX_so4_e': 1000.0,
 'EX_zn2_e': 1000.0}



EFlux2 status infeasible
EFlux2 solution 2118327.1302549117
glucose_eflux_t1_2
FBA status optimal
FBA solution 5.659970353495425


<optlang.cplex_interface.Objective at 0x7fa3be299d10>

{'EX_ca2_e': 1000.0,
 'EX_cl_e': 1000.0,
 'EX_cobalt2_e': 1000.0,
 'EX_cu2_e': 1000.0,
 'EX_fe2_e': 1000.0,
 'EX_fe3_e': 1000.0,
 'EX_glc__D_e': 1000.0,
 'EX_h2o_e': 1000.0,
 'EX_h_e': 1000.0,
 'EX_k_e': 1000.0,
 'EX_mg2_e': 1000.0,
 'EX_mn2_e': 1000.0,
 'EX_mobd_e': 1000.0,
 'EX_nh4_e': 1000.0,
 'EX_o2_e': 1000.0,
 'EX_pi_e': 1000.0,
 'EX_so4_e': 1000.0,
 'EX_zn2_e': 1000.0}

EFlux2 status infeasible
EFlux2 solution 5225758.379350671
glucose_eflux_t1_3
FBA status optimal
FBA solution 4.773732727995189


<optlang.cplex_interface.Objective at 0x7fa3affaf6d0>

{'EX_ca2_e': 1000.0,
 'EX_cl_e': 1000.0,
 'EX_cobalt2_e': 1000.0,
 'EX_cu2_e': 1000.0,
 'EX_fe2_e': 1000.0,
 'EX_fe3_e': 1000.0,
 'EX_glc__D_e': 1000.0,
 'EX_h2o_e': 1000.0,
 'EX_h_e': 1000.0,
 'EX_k_e': 1000.0,
 'EX_mg2_e': 1000.0,
 'EX_mn2_e': 1000.0,
 'EX_mobd_e': 1000.0,
 'EX_nh4_e': 1000.0,
 'EX_o2_e': 1000.0,
 'EX_pi_e': 1000.0,
 'EX_so4_e': 1000.0,
 'EX_zn2_e': 1000.0}

EFlux2 status optimal
EFlux2 solution 2156997.8698322186
glucose_eflux_t2_1
FBA status optimal
FBA solution 5.449962846220861


<optlang.cplex_interface.Objective at 0x7fa3aa57ef90>

{'EX_ca2_e': 1000.0,
 'EX_cl_e': 1000.0,
 'EX_cobalt2_e': 1000.0,
 'EX_cu2_e': 1000.0,
 'EX_fe2_e': 1000.0,
 'EX_fe3_e': 1000.0,
 'EX_glc__D_e': 1000.0,
 'EX_h2o_e': 1000.0,
 'EX_h_e': 1000.0,
 'EX_k_e': 1000.0,
 'EX_mg2_e': 1000.0,
 'EX_mn2_e': 1000.0,
 'EX_mobd_e': 1000.0,
 'EX_nh4_e': 1000.0,
 'EX_o2_e': 1000.0,
 'EX_pi_e': 1000.0,
 'EX_so4_e': 1000.0,
 'EX_zn2_e': 1000.0}

EFlux2 status infeasible
EFlux2 solution 4686564.305391339
glucose_eflux_t2_2
FBA status optimal
FBA solution 5.863649535600088


<optlang.cplex_interface.Objective at 0x7fa3af909110>

{'EX_ca2_e': 1000.0,
 'EX_cl_e': 1000.0,
 'EX_cobalt2_e': 1000.0,
 'EX_cu2_e': 1000.0,
 'EX_fe2_e': 1000.0,
 'EX_fe3_e': 1000.0,
 'EX_glc__D_e': 1000.0,
 'EX_h2o_e': 1000.0,
 'EX_h_e': 1000.0,
 'EX_k_e': 1000.0,
 'EX_mg2_e': 1000.0,
 'EX_mn2_e': 1000.0,
 'EX_mobd_e': 1000.0,
 'EX_nh4_e': 1000.0,
 'EX_o2_e': 1000.0,
 'EX_pi_e': 1000.0,
 'EX_so4_e': 1000.0,
 'EX_zn2_e': 1000.0}

EFlux2 status optimal
EFlux2 solution 3472726.738958462
glucose_eflux_t2_3
FBA status optimal
FBA solution 7.160901262016411


<optlang.cplex_interface.Objective at 0x7fa3964fb950>

{'EX_ca2_e': 1000.0,
 'EX_cl_e': 1000.0,
 'EX_cobalt2_e': 1000.0,
 'EX_cu2_e': 1000.0,
 'EX_fe2_e': 1000.0,
 'EX_fe3_e': 1000.0,
 'EX_glc__D_e': 1000.0,
 'EX_h2o_e': 1000.0,
 'EX_h_e': 1000.0,
 'EX_k_e': 1000.0,
 'EX_mg2_e': 1000.0,
 'EX_mn2_e': 1000.0,
 'EX_mobd_e': 1000.0,
 'EX_nh4_e': 1000.0,
 'EX_o2_e': 1000.0,
 'EX_pi_e': 1000.0,
 'EX_so4_e': 1000.0,
 'EX_zn2_e': 1000.0}

EFlux2 status optimal
EFlux2 solution 5838880.895318156


#### **Save Glucose E-Flux2 Genome Scale Fluxes**

In [5]:
# loop over solution dictionary. Convert the flux solutions to 
for trial_name, flux_solution in eflux_solutions.items():

    flux_df = flux_solution_to_df(model, flux_solution)
    flux_df.to_csv(f'../data/genome_scale_fluxes/{trial_name}.csv', index=False)

    display(flux_df.head(5))

Unnamed: 0,reaction_id,reaction_name,reaction_reaction,flux
0,12DGR140tipp,"1,2 diacylglycerol transport via flipping (per...",12dgr140_p --> 12dgr140_c,0.0
1,13PPDH,"1,3-propanediol dehydrogenase",3hppnl_c + h_c + nadh_c <=> 13ppd_c + nad_c,0.0
2,1P2CBXLCYCL,1 Pyrroline 2 carboxylate cyclation,5a2opntn_c <=> 1p2cbxl_c + h2o_c + h_c,0.0
3,1P2CBXLR,Delta1 piperideine 2 carboxylate reductase,1p2cbxl_c + 2.0 h_c + nadph_c --> nadp_c + pro...,0.0
4,23CTI1,Decenyl coa cis trans isomerization cis dec 3...,decoa_c --> dc2coa_c + h_c,0.0


Unnamed: 0,reaction_id,reaction_name,reaction_reaction,flux
0,12DGR140tipp,"1,2 diacylglycerol transport via flipping (per...",12dgr140_p --> 12dgr140_c,0.0
1,13PPDH,"1,3-propanediol dehydrogenase",3hppnl_c + h_c + nadh_c <=> 13ppd_c + nad_c,0.0
2,1P2CBXLCYCL,1 Pyrroline 2 carboxylate cyclation,5a2opntn_c <=> 1p2cbxl_c + h2o_c + h_c,0.0
3,1P2CBXLR,Delta1 piperideine 2 carboxylate reductase,1p2cbxl_c + 2.0 h_c + nadph_c --> nadp_c + pro...,0.0
4,23CTI1,Decenyl coa cis trans isomerization cis dec 3...,decoa_c --> dc2coa_c + h_c,0.0


Unnamed: 0,reaction_id,reaction_name,reaction_reaction,flux
0,12DGR140tipp,"1,2 diacylglycerol transport via flipping (per...",12dgr140_p --> 12dgr140_c,0.0
1,13PPDH,"1,3-propanediol dehydrogenase",3hppnl_c + h_c + nadh_c <=> 13ppd_c + nad_c,0.0
2,1P2CBXLCYCL,1 Pyrroline 2 carboxylate cyclation,5a2opntn_c <=> 1p2cbxl_c + h2o_c + h_c,0.0
3,1P2CBXLR,Delta1 piperideine 2 carboxylate reductase,1p2cbxl_c + 2.0 h_c + nadph_c --> nadp_c + pro...,0.0
4,23CTI1,Decenyl coa cis trans isomerization cis dec 3...,decoa_c --> dc2coa_c + h_c,0.0


Unnamed: 0,reaction_id,reaction_name,reaction_reaction,flux
0,12DGR140tipp,"1,2 diacylglycerol transport via flipping (per...",12dgr140_p --> 12dgr140_c,0.0
1,13PPDH,"1,3-propanediol dehydrogenase",3hppnl_c + h_c + nadh_c <=> 13ppd_c + nad_c,0.0
2,1P2CBXLCYCL,1 Pyrroline 2 carboxylate cyclation,5a2opntn_c <=> 1p2cbxl_c + h2o_c + h_c,0.0
3,1P2CBXLR,Delta1 piperideine 2 carboxylate reductase,1p2cbxl_c + 2.0 h_c + nadph_c --> nadp_c + pro...,1.41675
4,23CTI1,Decenyl coa cis trans isomerization cis dec 3...,decoa_c --> dc2coa_c + h_c,0.0


Unnamed: 0,reaction_id,reaction_name,reaction_reaction,flux
0,12DGR140tipp,"1,2 diacylglycerol transport via flipping (per...",12dgr140_p --> 12dgr140_c,0.0
1,13PPDH,"1,3-propanediol dehydrogenase",3hppnl_c + h_c + nadh_c <=> 13ppd_c + nad_c,0.0
2,1P2CBXLCYCL,1 Pyrroline 2 carboxylate cyclation,5a2opntn_c <=> 1p2cbxl_c + h2o_c + h_c,0.0
3,1P2CBXLR,Delta1 piperideine 2 carboxylate reductase,1p2cbxl_c + 2.0 h_c + nadph_c --> nadp_c + pro...,0.0
4,23CTI1,Decenyl coa cis trans isomerization cis dec 3...,decoa_c --> dc2coa_c + h_c,0.0


Unnamed: 0,reaction_id,reaction_name,reaction_reaction,flux
0,12DGR140tipp,"1,2 diacylglycerol transport via flipping (per...",12dgr140_p --> 12dgr140_c,0.0
1,13PPDH,"1,3-propanediol dehydrogenase",3hppnl_c + h_c + nadh_c <=> 13ppd_c + nad_c,0.0
2,1P2CBXLCYCL,1 Pyrroline 2 carboxylate cyclation,5a2opntn_c <=> 1p2cbxl_c + h2o_c + h_c,0.0
3,1P2CBXLR,Delta1 piperideine 2 carboxylate reductase,1p2cbxl_c + 2.0 h_c + nadph_c --> nadp_c + pro...,0.0
4,23CTI1,Decenyl coa cis trans isomerization cis dec 3...,decoa_c --> dc2coa_c + h_c,0.0


#### **Get average and standard deviation of genome scale fluxes**

In [37]:
# can curate trials to use
relevant_trials = [
    'glucose_eflux_t1_1', 
    'glucose_eflux_t1_2', 
    'glucose_eflux_t1_3',
    'glucose_eflux_t2_1', 
    'glucose_eflux_t2_2', 
    'glucose_eflux_t2_3'
]

# get list of relevant flux vectors
flux_vectors = [eflux_solutions[trial_name].fluxes for trial_name in relevant_trials]

# combine into a single dataframe
all_eflux_solutions = pd.concat(fluxes_list, axis=1)

# calculate average and standard deviation of flux vectors
average_eflux_solution = pd.DataFrame(all_eflux_solutions.mean(axis=1), columns=['fluxes'])
std_eflux_solution = eflux2sol_all.std(axis=1)

average_eflux_solution.head()

Unnamed: 0,fluxes
12DGR140tipp,0.0
13PPDH,0.0
1P2CBXLCYCL,0.0
1P2CBXLR,0.236125
23CTI1,0.0


#### **Load Glucose 13C MFA Data**

In [7]:
glucose_fluxes = pd.read_csv('../data/central_fluxes/glucose_13C.csv')

# # Remove rows that do not have a mapping to the GSM
glucose_fluxes.dropna(subset = ["Forward Reactions"], inplace=True)
print(f'There are {len(glucose_fluxes)} fluxes that can be compared between the MFA and FBA')

glucose_fluxes.head()

There are 44 fluxes that can be compared between the MFA and FBA


Unnamed: 0,Pathway,Forward Reactions,Reaction,Location on map,Flux,90% Confidence Lower Bound,90% Confidence Upper Bound
0,Substrate Uptake,reverse_EX_glc__D_e,Gluc.ext + ATP -> G6P,"(50, 460)",100.0,100.0,100.0
1,EMP Pathway,PGI,G6P <-> F6P,"(-150, 430)",-1.61,-2.09,1.42
2,EMP Pathway,PFK or reverse_FBP,F6P + ATP -> FBP,"(-220, 195)",0.0,0.0,1.91
3,EMP Pathway,FBA,FBP <-> DHAP + GAP,"(-140, 115)",0.0,0.0,1.91
4,EMP Pathway,TPI,DHAP <-> GAP,"(-270, 150)",0.0,0.0,1.91


#### **Map Glucose E-Flux2 fluxes to 13C MFA Reactions**

In [38]:
# probably need to get average and std dev for each genome scale model reaction
#EFLUX2 calculations:


glucose_fluxes = add_pred_fluxes_to_13c_df(glucose_fluxes, average_eflux_solution, std_eflux_solution , 'glucose', 'E-Flux2', 'WT')
# glucose_fluxes = add_pred_fluxes_to_13c_df_without_std(glucose_fluxes, glucose_fba_solution, 'FBA', 'WT')

glucose_fluxes.head()

AttributeError: 'Series' object has no attribute 'stds'

#### **Save Glucose E-Flux2 Central Flux Predictions**

In [None]:
glucose_fluxes.to_csv('../data/central_fluxes/glucose_FBA_pFBA.csv', index=False)

#### **Plot Glucose FBA Fluxes vs 13C MFA Fluxes**

In [None]:
obs_vs_pred_scatter_plot_with_std(glucose_fluxes, substrate='glucose', method='FBA', strain='WT', output_dir=output_dir)

#### **Display Glucose E-Flux2 Flux Map**

In [None]:
generate_flux_map(glucose_fluxes, 'Flux', substrate='glucose', method='13C_MFA', output_dir=output_dir)

#### **Load experimental growth parameters from Notebook E**

In [None]:
consumption_and_growth_data = pd.read_csv('../data/growth_rates/experimental_growth_parameters.csv', index_col=0)
consumption_and_growth_data

#### **Calculate FBA and pFBA growth rates and add to data frame**

In [None]:
fba_growth_rates = []
pfba_growth_rates = []

# loop over strains
for strain in ['WT-P', 'PVHG-P', 'WT-G']:
    
    # get the growth rate per 100 mmol of substrate uptake 
    if '-P' in strain:
        fba_growth_per_100 = phenol_fba_solution['Growth_Phenol']
        pfba_growth_per_100 = phenol_pfba_solution['Growth_Phenol']
    elif '-G' in strain:
        fba_growth_per_100 = glucose_fba_solution['Growth_Glucose']
        pfba_growth_per_100 = glucose_pfba_solution['Growth_Glucose']
        
    # get the experimental uptake rate
    uptake_rate = consumption_and_growth_data.loc[strain,'substrate consumption rate']
        
    # calculate the growth rate adjusted for substrate uptake rate
    fba_growth_rate = (fba_growth_per_100 / 100) * uptake_rate
    pfba_growth_rate = (pfba_growth_per_100 / 100) * uptake_rate
    
    fba_growth_rates.append(fba_growth_rate)
    pfba_growth_rates.append(fba_growth_rate)
    
# add 
consumption_and_growth_data['FBA growth rate'] = fba_growth_rates
consumption_and_growth_data['pFBA growth rate'] = pfba_growth_rates

consumption_and_growth_data

In [None]:
#### **Save FBA and pFBA Growth Rates**

In [None]:
consumption_and_growth_data.to_csv('../data/growth_rates/fba_pfba_growth_rates.csv', index=True, header= True)

In [None]:
#### **Plot FBA Growth Rates**

In [None]:
selectedlist = ['WT-P', 'PVHG-P', 'WT-G']
comparison_scatter_plot(
    consumption_and_growth_data.loc[selectedlist, 'growth rate'], 
    consumption_and_growth_data.loc[selectedlist, 'FBA growth rate'], 
    selectedlist, 
    'FBA',
    output_dir=output_dir
)