## **Notebook to make predictions with reprocessed Yoneda data and EFLUX2** 

By Christina Schenk and Garrett Roell

Tested on biodesign_3.7 kernel on jprime

### EFLUX2 predictions and evaluations
This notebook predicts fluxes for R. opacus cultures growing with glucose. The data was published in [Yoneda (2016)](https://academic.oup.com/nar/article/44/5/2240/2465306).



#### **The data uses the following mapping as introduced in Notebook E:**

#### Yoneda transcriptomics data: 
* WT 1.0 g/L Glucose, 0.05g/L ammonium sulfate (**WT-LN-G**) (3 trials)

#### Combined with Rhiannon 2018 metabolomics and OD data:
* Metabolomics and OD data for WT Glucose (before **WT-G** but now **WT-LN-G**)
                                               

### Method: 
<ol>
<li>Predict fluxes with EFLUX2</li>
<li>Compare predictions with 13CMFA: Scatter plots and flux maps</li>
<li>Load file with observed growth rates (Notebook E)</li>
<li>Compare growth rate predictions with growth rate observations</li>
</ol>



##### **Import python packages**

In [1]:
import os
import sys
import pandas as pd
import numpy as np
import cobra
import scipy.stats
#import cplex
%matplotlib inline

import matplotlib
from matplotlib import pyplot as plt
from matplotlib.offsetbox import (TextArea, DrawingArea, OffsetImage,
                                  AnnotationBbox)
from matplotlib.cbook import get_sample_data
import matplotlib.image as mpimg
import matplotlib.cm as cm

from statistics import stdev

from edd_utils import login, export_study, export_metadata

from sklearn.metrics import r2_score

from optlang.symbolics import add

##### **Load data**

In [2]:
# define blank dictionary and fill with dataframes mapping transcript meausurements to GSM reaction ids
reaction_transcripts = {}

# add the dataframes that map reaction ids to transcript values to the object 
reaction_transcripts['glucose_cpm'] = pd.read_csv('../transcript_mapping/glucose_cpm.csv', index_col=0)
reaction_transcripts['glucose_fpkm'] = pd.read_csv('../transcript_mapping/glucose_fpkm.csv', index_col=0)
reaction_transcripts['glucose_mr'] = pd.read_csv('../transcript_mapping/glucose_mr.csv', index_col=0)
reaction_transcripts['glucose_tmm'] = pd.read_csv('../transcript_mapping/glucose_tmm.csv', index_col=0)

reaction_transcripts['phenol_cpm'] = pd.read_csv('../transcript_mapping/phenol_cpm.csv', index_col=0)
reaction_transcripts['phenol_fpkm'] = pd.read_csv('../transcript_mapping/phenol_fpkm.csv', index_col=0)
reaction_transcripts['phenol_mr'] = pd.read_csv('../transcript_mapping/phenol_mr.csv', index_col=0)
reaction_transcripts['phenol_tmm'] = pd.read_csv('../transcript_mapping/phenol_tmm.csv', index_col=0)

#### Calculate the percentage of reactions with zero transcripts

In [3]:
# loop over the dictionary containing the reaction to transcript mapping dataframes
for df_name, reaction_transcript_df in reaction_transcripts.items():

    # get a list of all the values in the dataframe
    all_values = list(reaction_transcript_df.values.flatten())
    
    # find the total number of reactions with zero transcripts in the dataframe
    number_zeros = all_values.count(0)

    print(f'{df_name}: {100 * number_zeros/len(all_values):.2f}% of reactions have zero transcripts')

glucose_cpm: 4.97% of reactions have zero transcripts
glucose_fpkm: 4.97% of reactions have zero transcripts
glucose_mr: 4.97% of reactions have zero transcripts
glucose_tmm: 7.51% of reactions have zero transcripts
phenol_cpm: 32.11% of reactions have zero transcripts
phenol_fpkm: 32.11% of reactions have zero transcripts
phenol_mr: 32.11% of reactions have zero transcripts
phenol_tmm: 32.58% of reactions have zero transcripts


##### **Load Genome Scale Model**

In [4]:
file_name =  '../GSMs/Ropacus_annotated_curated.xml'
model = cobra.io.read_sbml_model(file_name)

#### Implementation of eflux with reaction and transcripts already mapped

In [5]:
def eflux2(reaction_transcripts_df, carbon_source, sub_uptake_rate):
    
    # assign important reaction names based on carbon source
    if carbon_source == 'glucose':
        uptake_reaction_id = 'EX_glc__D_e'
        objective_reaction_id = 'Growth_Glucose'
    elif carbon_source == 'phenol':
        uptake_reaction_id = 'EX_phenol_e'
        objective_reaction_id = 'Growth_Phenol'
    else:
        print('carbon source is not compatible with this E-Flux2 implementation')
        return
        
    # This is based on previous code, I'm not sure it's needed -Garrett
    eflux2_model = model.copy()
    
    # open context manager to run FBA with eflux2 constraints
    with eflux2_model:
        
        # ensure that a 1000 default reaction constraint will not come into limitiing. This is important
        # becuase it makes the function robust to data with high transcript counts
        for reaction in eflux2_model.reactions:
            if reaction.lower_bound <= -1000:
                reaction.lower_bound = -np.Inf
            if reaction.upper_bound >= 1000:
                reaction.upper_bound = np.Inf 
        
        # set the upper and lower limits of reactions with transcript data
        for reaction_id in reaction_transcripts_df.index:
            
            # get the reaction object using the id
            reaction = eflux2_model.reactions.get_by_id(reaction_id)
            
            # get transcript level for reaction
            transcript_level = reaction_transcripts_df[reaction_id]
            
            # if a reaction is capable of having negative flux, it's flux is capped at the transcript level
            # Note: reactions with 0 transcripts are not bounded because when reactions with zero transcripts 
            # were set to zero flux, the model was not able to grow
            if reaction.lower_bound < 0.0 and transcript_level != 0:
                reaction.lower_bound = -1 * transcript_level
                
            
            # if a reaction is capable of having negative flux, it's flux is capped at the transcript level
            # Note: reactions with 0 transcripts are not bounded because when reactions with zero transcripts 
            # were set to zero flux, the model was not able to grow
            if reaction.upper_bound > 0.0 and transcript_level != 0:
                reaction.upper_bound = reaction_transcripts_df[reaction_id]
                
        # set up the medium so that it never limits the reaction network
        medium = {k: np.inf for k,v in eflux2_model.medium.items()}
        
        # EFlux2 interestingly does not have set a bound for its carbon source
        # The carbon source is allowed to have infinite uptake flux
        # The uptake is eventually limited by a transcript measurement somewhere else in the network
        
        # All unused carbon source uptake fluxes are set to zero
        
        # for glucose, remove phenol uptake flux
        if carbon_source == 'glucose':
            medium['EX_phenol_e'] = 0
        
        # for phenol, remove glucose uptake flux
        if carbon_source == 'phenol':
            medium['EX_glc__D_e'] = 0
        
        # for phenol and glucose, remove all other carbon sources
        medium['EX_guaiacol_e'] = 0
        medium['EX_vanlt_e'] = 0
        medium['EX_tag'] = 0

        # set the model medium to the customized medium dictionary
        eflux2_model.medium = medium

        # check medium
        # print(medium)
        
        # set the objective function with constant defined above
        eflux2_model.objective = objective_reaction_id

        # fix the unused biomass reactions to zero
        if carbon_source == 'glucose':
            eflux2_model.reactions.get_by_id('Growth_Phenol').upper_bound = 0
            eflux2_model.reactions.get_by_id('Growth_Phenol').lower_bound = 0
        if carbon_source == 'phenol':
            eflux2_model.reactions.get_by_id('Growth_Glucose').upper_bound = 0
            eflux2_model.reactions.get_by_id('Growth_Glucose').lower_bound = 0
            
        # always set the b.subtilis flux to zero
        eflux2_model.reactions.get_by_id('Growth').upper_bound = 0
        eflux2_model.reactions.get_by_id('Growth').lower_bound = 0
                
        # this prevents the solver from failing
        eflux2_model.tolerance = 1e-9
        
        # solve the network
        fba_sol = eflux2_model.optimize()
    
        # determine the growth and uptake rate of the transcript derived solution
        transcript_derived_sub_uptake_rate = fba_sol.fluxes[uptake_reaction_id]
        transcript_derived_growth_rate = fba_sol.fluxes[objective_reaction_id]
        
        print(f'The transcript derived substrate uptake rate is {transcript_derived_sub_uptake_rate:.4f}')
        print(f'The transcript derived growth rate is {transcript_derived_growth_rate:.4f}')

        # Constrain the biomass to the optimal value
        eflux2_model.reactions.get_by_id(objective_reaction_id).upper_bound = fba_sol.objective_value
        eflux2_model.reactions.get_by_id(objective_reaction_id).lower_bound = fba_sol.objective_value

        # Minimize the sum of squared flux values
        """Note: Because of quadratic objective still have to use cplex objective formulation.
        Optlang does not support quadratic type of constraints and objectives yet."""

        # check medium
#         print(eflux2_model.medium)
        
        eflux2_model.objective = eflux2_model.problem.Objective(add([rxn.flux_expression**2 for rxn in eflux2_model.reactions]), direction='min')
        eflux2_sol = eflux2_model.optimize()
        # print('EFlux2 status', eflux2_sol.status)
        # print('EFlux2 solution', eflux2_sol.objective_value)
        
        # Scale the solution by ratio of substrate uptake rates
        scale_factor = -1 * sub_uptake_rate / transcript_derived_sub_uptake_rate
        
        # need to figure out how to scale this solution object
        scaled_eflux2_sol = pd.Series(
            [flux * scale_factor for flux in eflux2_sol.fluxes], # values
            index=eflux2_sol.fluxes.index # index
        )
        
        print(f'The scaled substrate uptake rate is {scaled_eflux2_sol[uptake_reaction_id]:.4f}')
        print(f'The scaled growth rate is {scaled_eflux2_sol[objective_reaction_id]:.4f}')
        
        # spacer
        print()

        # return the fluxes as a series with reaction ids as the index and fluxes as values
        return scaled_eflux2_sol 

#### Calculate E-Flux2 fluxes for WT phenol

In [6]:
# calculated in notebook E (units mmol glucose / g dry cell weight/ hr)
wt_phenol_uptake_rate = 1.352072

# this could be made into a loop for other normalization methods
data_source = 'phenol_cpm'

# define dataframe to hold WT phenol E-Flux2 flux values
wt_phenol_eflux2_df = pd.DataFrame()

# loop over WT glucose trials
for trial in ['WT-P-R1', 'WT-P-R2', 'WT-P-R3']:
    
    # for each trial, calculate the fluxes
    trial_fluxes = eflux2(reaction_transcripts[data_source][trial], 'phenol', wt_phenol_uptake_rate)
    
    # add flux series to results dataframe
    wt_phenol_eflux2_df[trial] = trial_fluxes
    
# show shape of dataframe
wt_phenol_eflux2_df.tail()

The transcript derived substrate uptake rate is -6.3347
The transcript derived growth rate is 0.5249
The scaled substrate uptake rate is -1.3521
The scaled growth rate is 0.1120

The transcript derived substrate uptake rate is -5.0643
The transcript derived growth rate is 0.4244
The scaled substrate uptake rate is -1.3521
The scaled growth rate is 0.1133

The transcript derived substrate uptake rate is -7.3496
The transcript derived growth rate is 0.5481
The scaled substrate uptake rate is -1.3521
The scaled growth rate is 0.1008



Unnamed: 0,WT-P-R1,WT-P-R2,WT-P-R3
GUADEM,0.0,0.0,0.0
tag_production,0.0,0.0,0.0
EX_tag,0.0,0.0,0.0
Growth_Phenol,0.11204,0.113316,0.100822
Growth_Glucose,0.0,0.0,0.0


#### Calculate the average and standard deviation of WT phenol E-Flux2 fluxes

In [7]:
# define blank lists to hold average and standard deviation data
average_col = []
std_dev_col = []

# loop over the rows of dataframe to fill the lists
for _, row in wt_phenol_eflux2_df.iterrows():
    
    # add average and standard deviation of each row the the lists
    average_col.append((row['WT-P-R1'] + row['WT-P-R2'] + row['WT-P-R3']) / 3)
    std_dev_col.append(stdev([row['WT-P-R1'], row['WT-P-R2'], row['WT-P-R3']]))
    
# add the lists as columns to the dataframe
wt_phenol_eflux2_df['WT-P-Average'] = average_col
wt_phenol_eflux2_df['WT-P-Std-Dev'] = std_dev_col

wt_phenol_eflux2_df.tail(5)

Unnamed: 0,WT-P-R1,WT-P-R2,WT-P-R3,WT-P-Average,WT-P-Std-Dev
GUADEM,0.0,0.0,0.0,0.0,0.0
tag_production,0.0,0.0,0.0,0.0,0.0
EX_tag,0.0,0.0,0.0,0.0,0.0
Growth_Phenol,0.11204,0.113316,0.100822,0.108726,0.006875
Growth_Glucose,0.0,0.0,0.0,0.0,0.0


#### Calculate E-Flux2 fluxes for PVHG phenol	

In [8]:
# calculated in notebook E (units mmol glucose / g dry cell weight/ hr)
pvhg_phenol_uptake_rate = 1.967485

# this could be made into a loop for other normalization methods
data_source = 'phenol_cpm'

# define dataframe to hold PVHG phenol E-Flux2 flux values
pvhg_phenol_eflux2_df = pd.DataFrame()

# loop over PVHG glucose trials
for trial in ['PVHG-P-R1', 'PVHG-P-R2', 'PVHG-P-R3']:
    
    # for each trial, calculate the fluxes
    trial_fluxes = eflux2(reaction_transcripts[data_source][trial], 'phenol', pvhg_phenol_uptake_rate)
    
    # add flux series to results dataframe
    pvhg_phenol_eflux2_df[trial] = trial_fluxes
    
# show shape of dataframe
pvhg_phenol_eflux2_df.tail()

The transcript derived substrate uptake rate is -6.3685
The transcript derived growth rate is 0.5250
The scaled substrate uptake rate is -1.9675
The scaled growth rate is 0.1622

The transcript derived substrate uptake rate is -3.2695
The transcript derived growth rate is 0.2813
The scaled substrate uptake rate is -1.9675
The scaled growth rate is 0.1693

The transcript derived substrate uptake rate is -5.7536
The transcript derived growth rate is 0.4773




The scaled substrate uptake rate is -1.9675
The scaled growth rate is 0.1632



Unnamed: 0,PVHG-P-R1,PVHG-P-R2,PVHG-P-R3
GUADEM,0.0,0.0,0.0
tag_production,0.0,0.0,0.0
EX_tag,0.0,0.0,0.0
Growth_Phenol,0.1622,0.169269,0.163222
Growth_Glucose,0.0,0.0,0.0


#### Calculate the average and standard deviation of PVHG phenol E-Flux2 fluxes

In [9]:
# define blank lists to hold average and standard deviation data
average_col = []
std_dev_col = []

# loop over the rows of dataframe to fill the lists
for _, row in pvhg_phenol_eflux2_df.iterrows():
    
    # add average and standard deviation of each row the the lists
    average_col.append((row['PVHG-P-R1'] + row['PVHG-P-R2'] + row['PVHG-P-R3']) / 3)
    std_dev_col.append(stdev([row['PVHG-P-R1'], row['PVHG-P-R2'], row['PVHG-P-R3']]))
    
# add the lists as columns to the dataframe
pvhg_phenol_eflux2_df['PVHG-P-Average'] = average_col
pvhg_phenol_eflux2_df['PVHG-P-Std-Dev'] = std_dev_col

pvhg_phenol_eflux2_df.tail(5)

Unnamed: 0,PVHG-P-R1,PVHG-P-R2,PVHG-P-R3,PVHG-P-Average,PVHG-P-Std-Dev
GUADEM,0.0,0.0,0.0,0.0,0.0
tag_production,0.0,0.0,0.0,0.0,0.0
EX_tag,0.0,0.0,0.0,0.0,0.0
Growth_Phenol,0.1622,0.169269,0.163222,0.164897,0.00382
Growth_Glucose,0.0,0.0,0.0,0.0,0.0


#### Calculate E-Flux2 fluxes for WT glucose

In [None]:
# calculated in notebook E (units mmol glucose / g dry cell weight/ hr)
wt_glucose_uptake_rate = 3.582471

# this could be made into a loop for other normalization methods
data_source = 'glucose_cpm'

# define dataframe to hold WT glucose E-Flux2 flux values
glucose_eflux2_df = pd.DataFrame()

# loop over WT glucose trials
for trial in ['WT-G-R1', 'WT-G-R2', 'WT-G-R3']:
    
    # for each trial, calculate the fluxes
    trial_fluxes = eflux2(reaction_transcripts[data_source][trial], 'glucose', wt_glucose_uptake_rate)
    
    # add flux series to results dataframe
    glucose_eflux2_df[trial] = trial_fluxes
    
# show shape of dataframe
glucose_eflux2_df.tail()

The transcript derived substrate uptake rate is -6.0820
The transcript derived growth rate is 0.4762
The scaled substrate uptake rate is -3.1659
The scaled growth rate is 0.2805

The transcript derived substrate uptake rate is -1.5991
The transcript derived growth rate is 0.1619


#### Calculate the average and standard deviation of WT glucose E-Flux2 fluxes

In [None]:
# define blank lists to hold average and standard deviation data
average_col = []
std_dev_col = []

# loop over the rows of dataframe to fill the lists
for _, row in glucose_eflux2_df.iterrows():

    average_col.append((row['WT-G-R1'] + row['WT-G-R2'] + row['WT-G-R3']) / 3)
    std_dev_col.append(stdev([row['WT-G-R1'], row['WT-G-R2'], row['WT-G-R3']]))
    
# add the lists as columns to the dataframe
glucose_eflux2_df['WT-G-Average'] = average_col
glucose_eflux2_df['WT-G-Std-Dev'] = std_dev_col

glucose_eflux2_df.tail(5)