## **Notebook to make predictions with reprocessed Yoneda data and EFLUX2** 

By Christina Schenk and Garrett Roell

Tested on biodesign_3.7 kernel on jprime

### EFLUX2 predictions and evaluations
This notebook predicts fluxes for R. opacus cultures growing with glucose. The data was published in [Yoneda (2016)](https://academic.oup.com/nar/article/44/5/2240/2465306).



#### **Data Labels**:

#### Yoneda transcriptomics data: 
* WT 1.0 g/L Glucose, 0.05g/L ammonium sulfate (**WT-LN-G**) (3 trials)

#### Combined with Rhiannon 2018 metabolomics and OD data:
* Metabolomics and OD data for WT Glucose (before **WT-G** but now **WT-LN-G**)
                                               

### Method: 
<ol>
<li>Predict fluxes with EFLUX2</li>
<li>Compare predictions with 13CMFA: Scatter plots and flux maps</li>
<li>Load file with observed growth rates (Notebook E)</li>
<li>Save growth rate predictions to csv file</li>
</ol>



##### **Import python packages**

In [1]:
import os
import sys
import pandas as pd
import numpy as np
import cobra
import scipy.stats
#import cplex
%matplotlib inline

import matplotlib
from matplotlib import pyplot as plt
from matplotlib.offsetbox import (TextArea, DrawingArea, OffsetImage,
                                  AnnotationBbox)
from matplotlib.cbook import get_sample_data
import matplotlib.image as mpimg
import matplotlib.cm as cm

from edd_utils import login, export_study, export_metadata

from sklearn.metrics import r2_score

SourceDir = '../src'
sys.path.append(SourceDir)
from ensemblemethods import EFlux2
from utils import *

output_dir = '../plots/CPM/'
from plot import *

##### **Load data**

In [2]:
# # Study to Download
# study_slug = 'biodesign_yoneda_set3_reprocessed'
# # EDD server
# edd_server = 'public-edd.jbei.org'
# user       = 'schenkch'

In [3]:
# session = login(edd_server=edd_server, user=user)

# df = export_study(session, study_slug, edd_server=edd_server)
# #df.head()

##### **Filter transcriptomics data from all EDD data into different dataframes**

In [4]:
# df_Trans = df[df['Protocol'].str.contains('Transcriptomics')]
# df_Trans.head()

In [5]:
# df_Trans = transcript_measurements['glucose_cpm'] #fpkm
# scratch/OpacusBiodesign/transcript_data/csv/henson_CPM_melted.csv
df_trans = pd.read_csv('../transcript_data/csv/henson_CPM_melted.csv')
df_trans.head()

Unnamed: 0,Line Name,Measurement Type,Time,Count,Units
0,WT-M-R1,WP_005263480_1,20,19.00342,CPM
1,WT-M-R2,WP_005263480_1,20,16.800668,CPM
2,WT-M-R3,WP_005263480_1,20,12.95522,CPM
3,WT-M-R1,WP_005263480_1,32,10.984975,CPM
4,WT-M-R2,WP_005263480_1,32,9.520107,CPM


In [6]:
# time could be 10 or 13
df_trans = df_trans[df_trans['Time'] == 13 ]
df_trans.head()

Unnamed: 0,Line Name,Measurement Type,Time,Count,Units
9,WT-Glu-R1,WP_005263480_1,13,15.584245,CPM
10,WT-Glu-R2,WP_005263480_1,13,22.444427,CPM
11,WT-Glu-R3,WP_005263480_1,13,20.43331,CPM
35,PVHG-Glu-R1,WP_005263480_1,13,22.993761,CPM
36,PVHG-Glu-R2,WP_005263480_1,13,23.797461,CPM


##### **Load Genome Scale Model**

In [7]:
model = cobra.io.read_sbml_model("../GSMs/Ropacus_annotated.xml")

#### **1. EFLUX2 Predictions for Wild type**

In [8]:
eflux2sol, eflux2sol_std = eflux2_pred_for_three_reps(model, df_trans, 'WT-Glu-R1', 'WT-Glu-R2','WT-Glu-R3', 'glucose')

running first replicate
FBA status optimal
FBA solution 5.449962846220861


<optlang.cplex_interface.Objective at 0x7fb0cb870210>

{'EX_ca2_e': 1000.0,
 'EX_cl_e': 1000.0,
 'EX_cobalt2_e': 1000.0,
 'EX_cu2_e': 1000.0,
 'EX_fe2_e': 1000.0,
 'EX_fe3_e': 1000.0,
 'EX_glc__D_e': 1000.0,
 'EX_h2o_e': 1000.0,
 'EX_h_e': 1000.0,
 'EX_k_e': 1000.0,
 'EX_mg2_e': 1000.0,
 'EX_mn2_e': 1000.0,
 'EX_mobd_e': 1000.0,
 'EX_nh4_e': 1000.0,
 'EX_o2_e': 1000.0,
 'EX_pi_e': 1000.0,
 'EX_so4_e': 1000.0,
 'EX_zn2_e': 1000.0}



EFlux2 status infeasible
EFlux2 solution 4686564.305391339
running second replicate
FBA status optimal
FBA solution 5.863649535600088


<optlang.cplex_interface.Objective at 0x7fb0bf63ea10>

{'EX_ca2_e': 1000.0,
 'EX_cl_e': 1000.0,
 'EX_cobalt2_e': 1000.0,
 'EX_cu2_e': 1000.0,
 'EX_fe2_e': 1000.0,
 'EX_fe3_e': 1000.0,
 'EX_glc__D_e': 1000.0,
 'EX_h2o_e': 1000.0,
 'EX_h_e': 1000.0,
 'EX_k_e': 1000.0,
 'EX_mg2_e': 1000.0,
 'EX_mn2_e': 1000.0,
 'EX_mobd_e': 1000.0,
 'EX_nh4_e': 1000.0,
 'EX_o2_e': 1000.0,
 'EX_pi_e': 1000.0,
 'EX_so4_e': 1000.0,
 'EX_zn2_e': 1000.0}

EFlux2 status optimal
EFlux2 solution 3472726.739228936
running third replicate
FBA status optimal
FBA solution 7.160901262016422


<optlang.cplex_interface.Objective at 0x7fb0b32a40d0>

{'EX_ca2_e': 1000.0,
 'EX_cl_e': 1000.0,
 'EX_cobalt2_e': 1000.0,
 'EX_cu2_e': 1000.0,
 'EX_fe2_e': 1000.0,
 'EX_fe3_e': 1000.0,
 'EX_glc__D_e': 1000.0,
 'EX_h2o_e': 1000.0,
 'EX_h_e': 1000.0,
 'EX_k_e': 1000.0,
 'EX_mg2_e': 1000.0,
 'EX_mn2_e': 1000.0,
 'EX_mobd_e': 1000.0,
 'EX_nh4_e': 1000.0,
 'EX_o2_e': 1000.0,
 'EX_pi_e': 1000.0,
 'EX_so4_e': 1000.0,
 'EX_zn2_e': 1000.0}

EFlux2 status infeasible
EFlux2 solution 5838880.895646621


##### **Save solution to data frame**

In [19]:
eflux2soldf = pd.DataFrame(eflux2sol, columns=['fluxes'])
eflux2solstddf = pd.DataFrame(eflux2sol_std, columns=['stds'])

display(eflux2soldf.head)

<bound method NDFrame.head of                       fluxes
12DGR140tipp        0.000000
13PPDH              0.000000
1P2CBXLCYCL         0.000000
1P2CBXLR            0.472250
23CTI1              0.000000
...                      ...
EX_guaiacol_e       0.000000
guaiacol_transport  0.000000
GUADEM              0.000000
Growth_Phenol       0.000000
Growth_Glucose      6.158171

[3019 rows x 1 columns]>

#### **2. Plot solutions: Comparison of EFLUX2 WT predictions and 13c measurements**

##### **Load 13cdata**

#####  **Get 13C MFA measured fluxes for glucose**

In [10]:
# glucose_fluxes = pd.read_csv('../13C_flux_data/13C_glucose_flux_data.csv')

# # Remove rows that do not have a mapping to the GSM
# glucose_fluxes.dropna(subset = ["Forward Reactions"], inplace=True)
# print(f'There are {len(glucose_fluxes)} fluxes that can be compared between the MFA and FBA')
# glucose_fluxes

##### **Add Glucose EFLUX2 flux values to phenol fluxes dataframe**

In [11]:
# obspred_fluxes = add_pred_fluxes_to_13c_df(glucose_fluxes, eflux2soldf, eflux2solstddf, 'glucose', 'E-Flux2', 'WT')
# obspred_fluxes.to_csv('../13C_flux_data/obspredfluxes_Glucose_EFLUX2_CPM.csv', index=True, header= True)
# obspred_fluxes.head()

### **Plot EFLUX2 vs 13C MFA**

##### **Plot 13C MFA observations vs. EFLUX2 predictions**

##### Scatter plot: 13CMFA vs. EFLUX2

In [12]:
# scatterplotcomp_obs_vs_pred(obspred_fluxes, substrate='glucose', method='E-Flux2', strain='WT')

##### **Plot 13C MFA observations vs. EFLUX2 predictions with confidence intervals**

##### Scatter plot with standard deviations: 13CMFA vs. EFLUX2

In [13]:
# scatterplotcomp_obs_vs_pred_withstd(obspred_fluxes, substrate='glucose', method='E-Flux2', strain='WT')

#### **Glucose EFlux2 WT Flux Map**

In [14]:
# map_flux_results(obspred_fluxes, 'E-Flux2 WT Value')

### **3. Load File with observed growth rates**

##### **Load observed growth rates and plot glucose growth rates**

In [15]:
# consumption_and_growth_data = pd.read_csv('../consumption_and_growth_data/consumption_and_growth_data.csv', index_col=0)
# consumption_and_growth_data

##### **For comparison of predicted and observed growth rates: scale predicted growth rate by multiplying with (observed substrate uptake / predicted substrate uptake)**

In [16]:
# scaledgrowthrate_wtlng = scale_growth_to_sub(
#     eflux2soldf.loc['Growth_Glucose',:].values[0], 
#     eflux2soldf.loc['EX_glc__D_e',:].values[0],
#     consumption_and_growth_data.loc['WT-G', 'substrate consumption rate']
# )

##### **Add scaled values to new dataframe**

In [17]:
# allgrowthrates=pd.DataFrame(index=['WT-G'], columns=['Growth_Glucose_EFLUX2'], dtype=float)
# allgrowthrates.at['WT-G','Growth_Glucose_EFLUX2'] = scaledgrowthrate_wtlng
# allgrowthrates

### **4. Save growth rate predictions to csv file**

##### **Save growth rates as csv file**

In [18]:
# allgrowthrates.to_csv('../consumption_and_growth_data/allgrowthratesGlucoseEFLUX2_CPM.csv', index=True, header= True)