# Integration of omics data: tINIT
### Exercise 1

Let's start by reading the model using COBRApy package

a) Open the model in Python using the load_matlab_model() function

In [1]:
import cobra
model = cobra.io.load_matlab_model("models/redHUMAN_recon2_smin.mat")

Scaling...
 A: min|aij| =  1.000e+00  max|aij| =  1.000e+00  ratio =  1.000e+00
Problem data seem to be well scaled


b) Get the information about the biomass and PFK reactions, ATP metabolite in cytosol and glucose metabolite in the extracellular space

In [2]:
model.reactions.biomass

0,1
Reaction identifier,biomass
Name,
Memory address,0x07f8bdcb79dd0
Stoichiometry,0.505626 ala_L_c + 0.35926 arg_L_c + 0.279425 asn_L_c + 0.352607 asp_L_c + 20.704451 atp_c + 0.020401 chsterol_c + 0.011658 clpn_hs_c + 0.039036 ctp_c + 0.046571 cys_L_c + 0.013183 datp_n +...  0.505626 L-alanine + 0.35926 L-argininium(1+) + 0.279425 L-asparagine + 0.352607 L-aspartate(1-) + 20.704451 ATP + 0.020401 cholesterol + 0.011658 cardiolipin + 0.039036 CTP + 0.046571 L-cysteine +...
GPR,
Lower bound,0.0
Upper bound,0.03536465206938496


In [3]:
model.reactions.PFK

0,1
Reaction identifier,PFK
Name,
Memory address,0x07f8bdce80150
Stoichiometry,"atp_c + f6p_c --> adp_c + fdp_c + h_c  ATP + D-Fructose 6-phosphate --> ADP + D-Fructose 1,6-bisphosphate + proton"
GPR,(5214.1) or (5211.2) or (5214.1 and 5211.1) or (5214.1 and 5213.1) or (5213.1) or (5214.1 and...
Lower bound,0.0
Upper bound,100.0


In [4]:
model.metabolites.atp_c

0,1
Metabolite identifier,atp_c
Name,ATP
Memory address,0x07f8bdcff9390
Formula,C10H14N5O13P3
Compartment,c
In 86 reaction(s),"LMPD_pglyc_hs_c_4_11, ACS, CTPS2, GLNS, LMPD_pglyc_hs_c_5_11, LMPD_pglyc_hs_c_6_11, r0301, LMPD_pglyc_hs_c_7_11, PFK, LMPD_clpn_hs_c_1_12, PRPPS, LMPD_clpn_hs_c_2_12, CBPS, LMPD_clpn_hs_c_3_12,..."


In [5]:
model.metabolites.glc_D_e

0,1
Metabolite identifier,glc_D_e
Name,D-glucose
Memory address,0x07f8bdcfb6450
Formula,C6H12O6
Compartment,e
In 6 reaction(s),"GLCt4, GLCt2_2, EX_glc_e, GLCt1r, GLCMter, GLCSGLT1le"


c) Run FBA for the model

In [6]:
sim_fba = model.optimize()
model.summary(sim_fba)

Metabolite,Reaction,Flux,C-Number,C-Flux
arg_L_e,EX_arg_L_e,0.0752,6,12.87%
asn_L_e,EX_asn_L_e,0.009882,4,1.13%
chol_e,EX_chol_e,0.006081,5,0.87%
glc_D_e,EX_glc_e,0.2188,6,37.45%
gln_L_e,EX_gln_L_e,0.02529,5,3.61%
gly_e,EX_gly_e,0.003116,2,0.18%
his_L_e,EX_his_L_e,0.06782,6,11.61%
ile_L_e,EX_ile_L_e,0.01012,6,1.73%
leu_L_e,EX_leu_L_e,0.05076,6,8.69%
lys_L_e,EX_lys_L_e,0.03106,6,5.32%

Metabolite,Reaction,Flux,C-Number,C-Flux
co2_e,EX_co2_e,-0.1153,1,6.21%
for_e,EX_for_e,-0.0007215,1,0.04%
lac_L_e,EX_lac_L_e,-0.1971,3,31.86%
mal_L_e,EX_mal_L_e,-0.00122,4,0.26%
mthgxl_e,EX_mthgxl_e,-0.1457,3,23.54%
nh4_e,EX_nh4_e,-0.3066,0,0.00%
succ_e,EX_succ_e,-0.1611,4,34.72%
urea_e,EX_urea_e,-0.06249,1,3.37%


d) Run pFBA for the model. What are the main differences between the two optimizations (fluxes, reactions, metabolites)?

In [7]:
sim_pfba = cobra.flux_analysis.pfba(model)
model.summary(sim_pfba)

Metabolite,Reaction,Flux,C-Number,C-Flux
arg_L_e,EX_arg_L_e,0.02576,6,8.23%
asn_L_e,EX_asn_L_e,0.01202,4,2.56%
asp_L_e,EX_asp_L_e,0.009781,4,2.08%
chol_e,EX_chol_e,0.006081,5,1.62%
glc_D_e,EX_glc_e,0.05401,6,17.25%
gln_L_e,EX_gln_L_e,0.02517,5,6.70%
his_L_e,EX_his_L_e,0.00447,6,1.43%
ile_L_e,EX_ile_L_e,0.01012,6,3.23%
leu_L_e,EX_leu_L_e,0.05076,6,16.21%
lys_L_e,EX_lys_L_e,0.02094,6,6.69%

Metabolite,Reaction,Flux,C-Number,C-Flux
for_e,EX_for_e,-0.0007215,1,0.32%
h2o_e,EX_h2o_e,-0.04415,0,0.00%
hco3_e,EX_hco3_e,-0.2048,1,89.46%
nh4_e,EX_nh4_e,-0.04218,0,0.00%
succ_e,EX_succ_e,-0.002587,4,4.52%
urea_e,EX_urea_e,-0.01305,1,5.70%


Biomass is the same. The C-flux between uptake reactions are different, alongside with their fluxes. Hydrogen is not consumed in the pFBA simulation. Formaldehyde is not secreted on the pFBA simulation. There is a shift of C-Flux of secreted metabolite, since in pFBA is mostly in the form of HCO3 and in the FBA simulation is in HCO3 (-30%), lactate and putrescine.

e) Knockout the drain associated with **glucose**. Optimize the model with FBA and pFBA. What happens?

In [8]:
model_wo_glc = model.copy()
model_wo_glc.reactions.EX_glc_e.bounds = [0.0,0.0]
model_wo_glc.reactions.EX_glc_e.bounds

(0.0, 0.0)

In [9]:
model_wo_glc.summary(model_wo_glc.optimize())

Metabolite,Reaction,Flux,C-Number,C-Flux
ala_L_e,EX_ala_L_e,0.01375,3,1.55%
arg_L_e,EX_arg_L_e,0.01271,6,2.87%
asn_L_e,EX_asn_L_e,0.009882,4,1.49%
chol_e,EX_chol_e,0.006081,5,1.14%
gln_L_e,EX_gln_L_e,0.1046,5,19.67%
his_L_e,EX_his_L_e,0.05729,6,12.93%
ile_L_e,EX_ile_L_e,0.04197,6,9.47%
leu_L_e,EX_leu_L_e,0.05076,6,11.46%
lys_L_e,EX_lys_L_e,0.04091,6,9.23%
met_L_e,EX_met_L_e,0.01862,5,3.50%

Metabolite,Reaction,Flux,C-Number,C-Flux
ade_e,EX_ade_e,-0.03757,5,18.62%
for_e,EX_for_e,-0.0007215,1,0.07%
hco3_e,EX_hco3_e,-0.2116,1,20.97%
mal_L_e,EX_mal_L_e,-0.00122,4,0.48%
nh4_e,EX_nh4_e,-0.1716,0,0.00%
orn_e,EX_orn_e,-0.04983,5,24.70%
pyr_e,EX_pyr_e,-0.1182,3,35.15%
so4_e,EX_so4_e,-0.01156,0,0.00%


In [10]:
model_wo_glc.summary(cobra.flux_analysis.pfba(model_wo_glc))

Metabolite,Reaction,Flux,C-Number,C-Flux
ala_L_e,EX_ala_L_e,0.01375,3,2.24%
arg_L_e,EX_arg_L_e,0.01783,6,5.81%
asn_L_e,EX_asn_L_e,0.009882,4,2.15%
asp_L_e,EX_asp_L_e,0.009781,4,2.12%
chol_e,EX_chol_e,0.006081,5,1.65%
cit_e,EX_cit_e,0.0006654,6,0.22%
gln_L_e,EX_gln_L_e,0.02529,5,6.86%
h_e,EX_h_e,0.1097,0,0.00%
his_L_e,EX_his_L_e,0.00447,6,1.46%
ile_L_e,EX_ile_L_e,0.01012,6,3.29%

Metabolite,Reaction,Flux,C-Number,C-Flux
creat_e,EX_creat_e,-0.001647,4,3.41%
hco3_e,EX_hco3_e,-0.1832,1,94.79%
nh4_e,EX_nh4_e,-0.1148,0,0.00%
urea_e,EX_urea_e,-0.003481,1,1.80%


Biomass is the same. There is a shift in the secretion of the C, FBA uses CO2 to do it, in the pFBA it goes again with the HCO3. The drain for the glucose has disappeared from both cases, as expected. Aspartate and threonine is being consumed in the pFBA and not in the FBA.

f) Find he essential reactions for the model after knocking out the previous reactions. What are the differences when compared to the "regular" model?

In [11]:
essential_normal = cobra.flux_analysis.find_essential_reactions(model)
essential_wo_glc = cobra.flux_analysis.find_essential_reactions(model_wo_glc)

(Have to make this, since there are two different model with different memory addresses)

In [12]:
set([e.id for e in essential_wo_glc]) - set([e.id for e in essential_normal])

{'ENO', 'FBA', 'FBP', 'GAPD', 'PGI', 'PGK', 'PGM', 'TPI'}

There are 7 different essential reactions

### Visualising flux distributions

The `escher` library can be used to view metabolic pathways for certain models.

In [13]:
import escher

Although maps can be created with this tool, there are pre-built maps that can be explored. You can list these maps and check if your model can use them.

In [14]:
escher.list_available_maps()

[{'organism': 'Saccharomyces cerevisiae',
  'map_name': 'iMM904.Central carbon metabolism'},
 {'organism': 'Homo sapiens',
  'map_name': 'RECON1.Inositol retinol metabolism'},
 {'organism': 'Homo sapiens', 'map_name': 'RECON1.Glycolysis TCA PPP'},
 {'organism': 'Homo sapiens', 'map_name': 'RECON1.Tryptophan metabolism'},
 {'organism': 'Homo sapiens', 'map_name': 'RECON1.Carbohydrate metabolism'},
 {'organism': 'Homo sapiens',
  'map_name': 'RECON1.Amino acid metabolism (partial)'},
 {'organism': 'Escherichia coli', 'map_name': 'iJO1366.Nucleotide metabolism'},
 {'organism': 'Escherichia coli',
  'map_name': 'iJO1366.Fatty acid biosynthesis (saturated)'},
 {'organism': 'Escherichia coli',
  'map_name': 'iJO1366.Nucleotide and histidine biosynthesis'},
 {'organism': 'Escherichia coli', 'map_name': 'e_coli_core.Core metabolism'},
 {'organism': 'Escherichia coli', 'map_name': 'iJO1366.Central metabolism'},
 {'organism': 'Escherichia coli',
  'map_name': 'iJO1366.Fatty acid beta-oxidation'}

Our model is the redHUMAN reconstruction based on Recon2. However, some of the reaction names overlap with Recon1 and so we will use the central carbon metabolism map. Our map will be stored in a variable named `builder` and we will associate our `model` variable with the COBRApy model to this map.

In [15]:
builder = escher.Builder(map_name='RECON1.Glycolysis TCA PPP')
builder.model = model

Downloading Map from https://escher.github.io/1-0-0/6/maps/Homo%20sapiens/RECON1.Glycolysis%20TCA%20PPP.json


The map can be visualised within Jupyter by directly accessing the `builder` variable. Let us try to visualise the flux distribution obtained with FBA which we named `sim_fba`.

In [16]:
builder

Builder()

As you can see, the map represents what is encoded in the metabolic model. We can visualise flux distributions in this map with arrow colors by modifying the `reaction_data` attribute.

In [17]:
builder.reaction_data = sim_fba.fluxes
builder

Builder(reaction_data={'2AMACHYD': 0.0, '2AMADPTm': 0.0, '2DR1PP': 0.0, '2HCO3_NAt': 0.0, '2OXOADOXm': 0.01011…

We looked at the absolute flux values for this flux distribution. But we can also use `escher` to make useful animations representing various flux states. Let's open a map in the next cell. 

In [18]:
builder

Builder(reaction_data={'2AMACHYD': 0.0, '2AMADPTm': 0.0, '2DR1PP': 0.0, '2HCO3_NAt': 0.0, '2OXOADOXm': 0.01011…

Pay close attention to the map we just created. We are going to visualise the changes made by altering the oxygen uptake rate in our model. We first have to define a set of rates to test. In this example, we will increase the uptake rate by $10^{-3}$ in every successive simulation. To make this work we will then:
- Go through each uptake value
- Use the `with` command to modify our model temporarily
- Set the lower bound to -`uptake value`

In [19]:
from time import sleep
time_between_simulations = 1
oxygen_rates = [0, 0.001, 0.002, 0.003, 0.004, 0.005]

for rate in oxygen_rates:
    with model as mutant_model:
        mutant_model.reactions.get_by_id('EX_o2_e').bounds = (-rate, 1000)
        solution = mutant_model.optimize()
    print(solution)
    builder.reaction_data = solution.fluxes
    sleep(time_between_simulations)

<Solution 0.000 at 0x7f8bd196f790>
<Solution 0.005 at 0x7f8bd18402d0>
<Solution 0.011 at 0x7f8bd18d4f50>
<Solution 0.016 at 0x7f8bd196f910>
<Solution 0.022 at 0x7f8bd18405d0>
<Solution 0.027 at 0x7f8bd1840a10>


### Exercise 2

You have to download the .csv file containing the information for breast cancer cell lines present in the CCLE panel. The nomenclature has been normalized for what is found in the metabolic model of this exercise.

a) Read the file containing the information with the *pandas* package

In [20]:
import pandas as pd
df_expression = pd.read_csv('data/CCLE_breast_cancer_expression.csv', index_col=0)
df_info = pd.read_csv('data/CCLE_breast_cancer_info.csv', index_col=0)

In [21]:
df_expression.index

Index(['ACH-000017', 'ACH-000019', 'ACH-000028', 'ACH-000044', 'ACH-000097',
       'ACH-000111', 'ACH-000117', 'ACH-000147', 'ACH-000148', 'ACH-000196',
       'ACH-000212', 'ACH-000223', 'ACH-000248', 'ACH-000258', 'ACH-000276',
       'ACH-000277', 'ACH-000288', 'ACH-000330', 'ACH-000349', 'ACH-000352',
       'ACH-000374', 'ACH-000536', 'ACH-000554', 'ACH-000568', 'ACH-000573',
       'ACH-000621', 'ACH-000624', 'ACH-000643', 'ACH-000668', 'ACH-000691',
       'ACH-000699', 'ACH-000711', 'ACH-000721', 'ACH-000725', 'ACH-000755',
       'ACH-000759', 'ACH-000768', 'ACH-000783', 'ACH-000818', 'ACH-000828',
       'ACH-000849', 'ACH-000856', 'ACH-000857', 'ACH-000859', 'ACH-000876',
       'ACH-000902', 'ACH-000910', 'ACH-000927', 'ACH-000930', 'ACH-000934',
       'ACH-001390', 'ACH-001391', 'ACH-001392', 'ACH-001394', 'ACH-001396',
       'ACH-001819'],
      dtype='object', name='DepMap_ID')

b) Calculate the mean for 3 different genes

In [22]:
df_expression.loc[:,df_expression.columns[:3]].mean()

10165    4.292774
6514     0.007298
51557    0.081124
dtype: float64

c) Select the samples 'ACH-000019', 'ACH-000028', 'ACH-000349'. With them, reconstruct a tissue model for each of them with FastCORE algorithm. Perform FBA and pFBA for all the 3 samples reconstructed. Highlight the main differences between them.

In [23]:
from troppo.omics.readers.generic import TabularReader
from troppo.methods_wrappers import ReconstructionWrapper
from numpy import linspace
import re

patt = re.compile('__COBAMPGPRDOT__[0-9]{1}')
replace_alt_transcripts = lambda x: patt.sub('',x)
t = 10
def integration_fx(data_map):
    return [[k for k, v in data_map.get_scores().items() if
             (v is not None and v > t) or k in ['biomass']]]

In [24]:
tab_rdr = TabularReader(path_or_df=df_expression, nomenclature='entrez_id', omics_type='transcriptomics', cache_df=True)
ocs = tab_rdr.to_containers()

In [25]:
samples = ['ACH-000019', 'ACH-000028', 'ACH-000349']

In [26]:
oc_sample = [oc for oc in ocs if oc.get_Condition() in samples]

In [27]:
rw = ReconstructionWrapper(model, ttg_ratio=9999, gpr_gene_parse_function = replace_alt_transcripts)

In [28]:
solver = 'GLPK'
r_models = {}
for sample in oc_sample:
    r_models[sample.get_Condition()] = rw.run_from_omics(
        omics_data = sample, 
        algorithm = 'fastcore', 
        and_or_funcs=(min,sum), 
        integration_strategy=('custom', [integration_fx]), 
        solver=solver
    )

J size47
[  68   72   74   76   87   94  101  102  141  147  148  151  168  279
  303  314  315  332  337  346  347  364  373  389  427  444  456  463
  467  470  477  478  485  496  498  504  505  508  517  526  528  533
  534  595  596 1271 1336]
before LP7
LP7
Could not set parameters with this solver
-0.004699999999999999
done LP7
LP9
Could not set parameters with this solver
3376.9561179142484
done LP9
11 260
before LP7
LP7
Could not set parameters with this solver
-0.0011000000000000003
done LP7
LP9
Could not set parameters with this solver
352.50000000186265
done LP9
11 305
0 305
J size32
[  74   87   94  101  102  168  279  303  314  315  332  346  347  364
  373  389  427  444  456  463  467  470  477  478  485  504  517  534
  595  596 1271 1336]
before LP7
LP7
Could not set parameters with this solver
-0.0031999999999999984
done LP7
LP9
Could not set parameters with this solver
3306.583400794722
done LP9
9 246
before LP7
LP7
Could not set parameters with this solver
-0.00090

In [29]:
with model as test_model:
    reactions_to_deactivate = [k for k,v in r_models[samples[1]].items() if v is False]
    for r in reactions_to_deactivate:
        test_model.reactions.get_by_id(r).bounds = [0.0,0.0]
    solution_fba = test_model.optimize()
    solution_pfba = cobra.flux_analysis.pfba(test_model)

In [30]:
model.summary(solution_fba)

Metabolite,Reaction,Flux,C-Number,C-Flux
arg_L_e,EX_arg_L_e,0.01997,6,3.30%
asn_L_e,EX_asn_L_e,0.005671,4,0.62%
chol_e,EX_chol_e,0.00349,5,0.48%
cit_e,EX_cit_e,0.0006654,6,0.11%
glc_D_e,EX_glc_e,0.4016,6,66.38%
gln_L_e,EX_gln_L_e,0.04267,5,5.88%
his_L_e,EX_his_L_e,0.02853,6,4.72%
ile_L_e,EX_ile_L_e,0.007129,6,1.18%
leu_L_e,EX_leu_L_e,0.01107,6,1.83%
lys_L_e,EX_lys_L_e,0.0145,6,2.40%

Metabolite,Reaction,Flux,C-Number,C-Flux
5mta_e,EX_5mta_e,-0.01457,11,5.97%
CE1936_e,EX_CE1936_e,-0.007286,10,2.72%
creat_e,EX_creat_e,-0.0009451,4,0.14%
h2o_e,EX_h2o_e,-0.02574,0,0.00%
h_e,EX_h_e,-0.718,0,0.00%
hco3_e,EX_hco3_e,-0.1194,1,4.45%
lac_L_e,EX_lac_L_e,-0.7153,3,79.96%
nh4_e,EX_nh4_e,-0.1043,0,0.00%
succ_e,EX_succ_e,-0.04239,4,6.32%
urea_e,EX_urea_e,-0.01173,1,0.44%


In [31]:
model.summary(solution_pfba)

Metabolite,Reaction,Flux,C-Number,C-Flux
arg_L_e,EX_arg_L_e,0.01946,6,3.22%
asn_L_e,EX_asn_L_e,0.005671,4,0.62%
chol_e,EX_chol_e,0.00349,5,0.48%
cit_e,EX_cit_e,0.0006654,6,0.11%
glc_D_e,EX_glc_e,0.4015,6,66.38%
gln_L_e,EX_gln_L_e,0.04267,5,5.88%
his_L_e,EX_his_L_e,0.02751,6,4.55%
ile_L_e,EX_ile_L_e,0.007129,6,1.18%
leu_L_e,EX_leu_L_e,0.01107,6,1.83%
lys_L_e,EX_lys_L_e,0.0145,6,2.40%

Metabolite,Reaction,Flux,C-Number,C-Flux
5mta_e,EX_5mta_e,-0.01457,11,5.97%
CE1936_e,EX_CE1936_e,-0.007286,10,2.72%
creat_e,EX_creat_e,-0.0009451,4,0.14%
h2o_e,EX_h2o_e,-0.005711,0,0.00%
h_e,EX_h_e,-0.7682,0,0.00%
hco3_e,EX_hco3_e,-0.1194,1,4.45%
lac_L_e,EX_lac_L_e,-0.7153,3,79.98%
nh4_e,EX_nh4_e,-0.1023,0,0.00%
succ_e,EX_succ_e,-0.04239,4,6.32%
urea_e,EX_urea_e,-0.01122,1,0.42%
