# S2M2

## Integrating Omics data into *Genome-Scale Metabolic Models (GEMs)* with *Troppo*


High-throughput data from various omics fields can be integrated into *GEMs* to improve phenotype prediction. This approach aims to either further constrain the solution space or create context-specific models from generic *GEMs*.
When considering omics data to integrate into models, transcriptomics appears as the most used. Transcript profiles describe the expression under specific conditions for all known genes of the studied organism. Moreover, it is the most accessible to obtain experimentally and already has well-established techniques, such as microarrays and RNA-seq.

## *Troppo*

Troppo (Tissue-specific RecOnstruction and Phenotype Prediction using Omics data) is a Python package containing methods for tissue specific reconstruction to use with constraint-based models.

The current methods implemented are:
- FastCORE
- CORDA
- GIMME
- (t)INIT
- iMAT

## How to use *Troppo*

### Imports

In [None]:
import cobra
import re
import pandas as pd
from troppo.omics.readers.generic import TabularReader
from troppo.methods_wrappers import ReconstructionWrapper

### Initial Setup

#### Load Model

In [2]:
model = cobra.io.load_matlab_model('models/redHUMAN_recon2_smin.mat')
model

This model seems to have metCharge instead of metCharges field. Will use metCharge for what metCharges represents.
No defined compartments in model redHUMAN_recon2_smin_02Sep2019_135437. Compartments will be deduced heuristically using regular expressions.
Using regular expression found the following compartments:c, e, l, m, n, r, x


0,1
Name,redHUMAN_recon2_smin_02Sep2019_135437
Memory address,18b3eec2408
Number of metabolites,469
Number of reactions,1396
Number of genes,699
Number of groups,49
Objective expression,1.0*biomass - 1.0*biomass_reverse_01e59
Compartments,"c, m, x, e, r, l, n"


#### Load Transcriptomics Dataset

You have to download the .csv file containing the gene expression for breast cancer cell lines present in the CCLE panel. The nomenclature has been normalized for what is found in the metabolic model of this exercise.

In [3]:
df_expression = pd.read_csv('data/CCLE_breast_cancer_expression.csv', index_col=0)
df_expression.head()

Unnamed: 0_level_0,10165,6514,51557,47,6563,3421,6898,760,9123,501,...,594,4728,8781,39,2639,5160,4724,7352,3945,2876
DepMap_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
ACH-000017,5.380591,0.0,0.056584,7.664483,0.111031,6.11624,0.014355,0.124328,7.890021,5.376082,...,2.948601,6.971199,0.0,4.661065,4.419539,7.67638,5.77663,0.863938,8.376386,0.594549
ACH-000019,4.358256,0.0,0.014355,6.361944,0.097611,6.730776,0.056584,4.918863,4.339137,5.962086,...,3.161888,7.469886,0.0,5.056584,5.475085,7.593951,6.753952,0.678072,1.550901,0.650765
ACH-000028,4.808385,0.0,0.0,6.352441,0.070389,6.647171,0.214125,3.030336,3.320485,6.005625,...,2.869871,7.241173,0.0,5.128458,4.892877,7.466546,6.880808,0.895303,2.014355,0.748461
ACH-000044,3.555816,0.0,0.0,5.169123,2.505891,5.816856,1.22033,3.313246,3.75809,5.824513,...,3.815575,6.265474,0.0,5.954196,5.322649,6.504144,7.633431,3.558268,1.744161,7.530055
ACH-000097,4.288359,0.0,0.014355,6.190417,0.137504,5.934517,0.056584,3.266037,5.743892,5.65392,...,3.385431,8.584399,0.0,5.744161,4.541019,7.117487,7.41312,0.765535,4.484138,2.901108


For future usage, note that the dataset that will be used as input in *Troppo* needs to have the samples as rows and gene IDs in the columns.

In [4]:
df_expression.shape

(56, 579)

#### *Troppo* set up

In [5]:
patt = re.compile('__COBAMPGPRDOT__[0-9]{1}')
replace_alt_transcripts = lambda x: patt.sub('',x)

In [6]:
def integration_fx(data_map):
    return [[k for k, v in data_map.get_scores().items() if (v is not None and v > threshold) or k in ['biomass']]]

In [7]:
threshold = 15

### Data Integration with *Troppo*

In [8]:
ocs = TabularReader(path_or_df=df_expression, nomenclature='entrez_id', omics_type='transcriptomics').to_containers()
samples = ['ACH-000017']
oc_sample = [oc for oc in ocs if oc.get_Condition() in samples]

In [9]:
rw = ReconstructionWrapper(model, ttg_ratio=9999, gpr_gene_parse_function = replace_alt_transcripts)



In [10]:
r_models = {}
for sample in oc_sample:
    r_models[sample.get_Condition()] = rw.run_from_omics(omics_data = sample, algorithm = 'fastcore', and_or_funcs=(min,sum), 
                                                         integration_strategy=('custom', [integration_fx]), solver='CPLEX')

J size24
[  39   45   74   92  101  102  314  315  332  347  350  364  375  389
  401  427  456  463  468  470  478  485 1271 1336]
before LP7
LP7
-0.0024
done LP7
LP9
3195.5103888167373
done LP9
7 235
before LP7
LP7
-0.0005
done LP7
LP9
82.50000000000455
done LP9
7 249
before LP7
LP7
0.0
done LP7
2 249
Flipped
before LP7
LP7
-0.0002
done LP7
LP9
80.0
done LP9
2 259
0 259


In [11]:
for sample in r_models.keys():
    print(sample, ':', len([name_reaction for name_reaction,value in r_models[sample].items() if value is True]))

ACH-000017 : 259


In [12]:
with model as test_model:
    reactions_to_deactivate = [name_reaction for name_reaction,value in r_models['ACH-000017'].items() if value is False]
    for r in reactions_to_deactivate:
        test_model.reactions.get_by_id(r).bounds = [0.0,0.0]
    solution_fba = test_model.optimize()
    solution_pfba = cobra.flux_analysis.pfba(test_model)

In [13]:
model.summary(solution_fba)

Metabolite,Reaction,Flux,C-Number,C-Flux
arg_L_e,EX_arg_L_e,0.001338,6,1.50%
chol_e,EX_chol_e,0.0006406,5,0.60%
cit_e,EX_cit_e,0.0006654,6,0.75%
glc_D_e,EX_glc_e,0.0543,6,61.01%
gln_L_e,EX_gln_L_e,0.005439,5,5.09%
his_L_e,EX_his_L_e,0.0004709,6,0.53%
ile_L_e,EX_ile_L_e,0.002099,6,2.36%
leu_L_e,EX_leu_L_e,0.002032,6,2.28%
lys_L_e,EX_lys_L_e,0.002662,6,2.99%
met_L_e,EX_met_L_e,0.0007435,5,0.70%

Metabolite,Reaction,Flux,C-Number,C-Flux
ala_B_e,EX_ala_B_e,-0.005417,3,4.51%
for_e,EX_for_e,-7.6e-05,1,0.02%
h2o_e,EX_h2o_e,-6.108e-05,0,0.00%
h_e,EX_h_e,-0.09683,0,0.00%
hco3_e,EX_hco3_e,-0.0232,1,6.44%
lac_L_e,EX_lac_L_e,-0.09749,3,81.17%
nh4_e,EX_nh4_e,-0.005479,0,0.00%
succ_e,EX_succ_e,-0.007079,4,7.86%


In [14]:
model.summary(solution_pfba)

Metabolite,Reaction,Flux,C-Number,C-Flux
arg_L_e,EX_arg_L_e,0.001338,6,1.59%
chol_e,EX_chol_e,0.0006406,5,0.64%
cit_e,EX_cit_e,0.0006654,6,0.79%
glc_D_e,EX_glc_e,0.05202,6,61.91%
gln_L_e,EX_gln_L_e,0.005439,5,5.39%
his_L_e,EX_his_L_e,0.0004709,6,0.56%
ile_L_e,EX_ile_L_e,0.003874,6,4.61%
leu_L_e,EX_leu_L_e,0.002488,6,2.96%
lys_L_e,EX_lys_L_e,0.002206,6,2.63%
met_L_e,EX_met_L_e,0.0007435,5,0.74%

Metabolite,Reaction,Flux,C-Number,C-Flux
ala_B_e,EX_ala_B_e,-0.002982,3,2.71%
co2_e,EX_co2_e,-0.01436,1,4.35%
for_e,EX_for_e,-7.6e-05,1,0.02%
h2o_e,EX_h2o_e,-0.007466,0,0.00%
h_e,EX_h_e,-0.09592,0,0.00%
hco3_e,EX_hco3_e,-0.004141,1,1.25%
lac_L_e,EX_lac_L_e,-0.09301,3,84.45%
nh4_e,EX_nh4_e,-0.003908,0,0.00%
succ_e,EX_succ_e,-0.005964,4,7.22%


## Exercise 13

We have shown several types of omics data which can be used to reconstruct a tissue specific model. For this exercise, we will use the breast cancer cell lines present in the CCLE panel.

a) Select the 'ACH-000019', 'ACH-000028', 'ACH-000349' samples. With them, reconstruct a tissue model for each with the FastCORE algorithm. Perform FBA and pFBA for all the 3 models reconstructed. Highlight the main differences between them.

b) Try to reproduce the Warburg Effect (if already not present). Use the `escher` library to view the metabolic pathway.

Hint: Change the uptake of the oxygen drain to a small value. Also, our model is the redHUMAN reconstruction based on Recon2. However, some of the reaction names overlap with Recon1 and so we will use the central carbon metabolism map for that model.

c) Select another random sample. Reconstruct 3 models with different thresholds. What are the main differences?