# Libraries

In [1]:
import pandas as pd
from cobra.io.json import load_json_model
import numpy as np
from cobra.flux_analysis import flux_variability_analysis
import math
from tqdm import tqdm

# Model

In [2]:
model = load_json_model('./data/Recon3D.json')
model.solver = 'gurobi' #you can use cplex or glpk as you whish, but gurobi and cplex are usually faster
model

Set parameter Username
Set parameter LicenseID to value 2732828
Academic license - for non-commercial use only - expires 2026-11-04


0,1
Name,Recon3D
Memory address,31524fc70
Number of metabolites,5835
Number of reactions,10600
Number of genes,2248
Number of groups,0
Objective expression,1.0*BIOMASS_maintenance - 1.0*BIOMASS_maintenance_reverse_5b3f9
Compartments,"cytosol, lysosome, mitochondria, endoplasmic reticulum, extracellular space, peroxisome/glyoxysome, nucleus, golgi apparatus, inner mitochondrial compartment"


In [3]:
model.slim_optimize()

755.003215550663

# Exofluxomics data

- We are using Exofluxomics of 7 ovarian cancer cell lines of NCI60 (https://dctd.cancer.gov/data-tools-biospecimens/data and https://www.revvity.com/ask/nci-60-cancer-cell-lines#kb-section-0).

- The Exofluxomics analysis was published in: "Metabolite profiling identifies a key role for glycine in rapid cancer cell proliferation" -  Jain et. al 2012

- NCI60 used the RPMI-1640 as growth medium

- Your paper uses 5 ovarian cancer cell lines from CCLE that are different from the ones we're going to use to define the exofluxomics (NCI60 cells). We just need an approximation of the extracellular fluxes. Keep in mind that not all cells of your paper were growing into RPMI-1640 (It's okay for this project to not care about that).

- Your paper uses HUMAN1 as GEM, while we are going to use RECON 3D for easier integration with Jain metabolites IDs





In [5]:
df_jain = pd.read_excel('./data/1218595databases1_corrected.xls', sheet_name='CORE data', index_col=1) #exoflux of NCI60
df_jain = df_jain[df_jain['Calibrated (e)']==1]
df_jain

Unnamed: 0_level_0,Method (a),Calibrated (e),UACC-257,UACC-257.1,OVCAR-8,OVCAR-8.1,OVCAR-5,OVCAR-5.1,SF-295,SF-295.1,...,SN12C,SN12C.1,MDA-MB-231/ATCC,MDA-MB-231/ATCC.1,CCRF-CEM,CCRF-CEM.1,RPMI 8226,RPMI 8226.1,SR,SR.1
Metabolite (b),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2-phosphoglycerate,IPR,1,0.010891,0.012313,0.001671,0.002037,0.004495,0.003178,0.046393,0.041708,...,0.000799,0.003414,0.003347,0.004058,0.000048,0.000353,0.003970,0.004046,0.004946,0.006281
OH-phenylpyruvate,IPR,1,0.006489,0.006301,0.003754,0.004093,0.004945,0.002575,0.061102,0.071014,...,0.004449,0.004086,-0.000016,0.000997,0.000274,-0.000045,0.005093,0.002148,0.000366,0.000388
aminoisobutyrate,HILIC,1,-0.000996,-0.001030,-0.000922,-0.003205,-0.009112,-0.003275,-0.007030,-0.006882,...,0.000038,-0.002188,-0.001644,-0.001411,-0.004358,-0.004011,-0.000541,-0.004851,-0.002008,-0.001844
3-hydroxyanthranilate,HILIC,1,0.000004,-0.000009,-0.000003,-0.000006,-0.000006,-0.000028,0.000012,0.000995,...,0.000049,0.000049,0.000025,0.000068,0.000083,0.000024,0.000104,0.000138,0.000038,-0.000011
3-phosphoglycerate,IPR,1,0.011295,0.009672,0.002216,0.001839,0.005351,0.004794,0.029259,0.027037,...,0.003098,0.002399,0.003040,0.003676,0.001776,-0.000262,0.006900,-0.001072,0.000831,0.003157
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
uridine,IPR,1,-0.001730,-0.002408,-0.004089,-0.004404,-0.006982,-0.006946,0.075043,0.056388,...,-0.026612,-0.026042,-0.001166,-0.000968,-0.013973,-0.013942,-0.029317,-0.026703,-0.012063,-0.011686
valine,HILIC,1,-2.289535,-2.289212,-1.779011,-1.790573,-2.103658,-2.020315,-15.016188,-15.760112,...,-4.109991,-3.608674,-2.811156,-3.146968,-3.286169,-3.439844,-9.662331,-8.993947,-3.089847,-3.137293
xanthine,IPR,1,0.010001,0.007740,0.007553,0.007257,0.048450,0.052410,0.156952,0.117376,...,0.052171,0.051318,0.019527,0.018116,0.022432,0.023292,0.088359,0.086740,0.035188,0.039231
XMP,IPR,1,0.000982,-0.000302,-0.000055,-0.000207,0.000268,0.000118,0.000766,0.001146,...,0.000395,0.000303,-0.000093,0.000649,0.003888,0.003521,0.003966,0.006606,0.001118,0.001443


In [6]:
df_jain.rename(columns={'IGROV1':'IGR-OV1', 'IGROV1.1':'IGR-OV1.1'}, inplace=True)

In [7]:
cell_ids = ['IGR-OV1','IGR-OV1.1', 'OVCAR-3', 'OVCAR-3.1', 'OVCAR-4',  'OVCAR-4.1', 
            'OVCAR-5','OVCAR-5.1', 'OVCAR-8', 'OVCAR-8.1', 'SK-OV-3', 'SK-OV-3.1', 'NCI/ADR-RES', 'NCI/ADR-RES.1'] #NCI60 ovarian cancer cell lines

In [8]:
cell_ids

['IGR-OV1',
 'IGR-OV1.1',
 'OVCAR-3',
 'OVCAR-3.1',
 'OVCAR-4',
 'OVCAR-4.1',
 'OVCAR-5',
 'OVCAR-5.1',
 'OVCAR-8',
 'OVCAR-8.1',
 'SK-OV-3',
 'SK-OV-3.1',
 'NCI/ADR-RES',
 'NCI/ADR-RES.1']

In [9]:
df_jain = df_jain[cell_ids]
df_jain

Unnamed: 0_level_0,IGR-OV1,IGR-OV1.1,OVCAR-3,OVCAR-3.1,OVCAR-4,OVCAR-4.1,OVCAR-5,OVCAR-5.1,OVCAR-8,OVCAR-8.1,SK-OV-3,SK-OV-3.1,NCI/ADR-RES,NCI/ADR-RES.1
Metabolite (b),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2-phosphoglycerate,0.004024,0.004238,0.003641,0.002481,0.008432,0.012999,0.004495,0.003178,0.001671,0.002037,-0.000364,0.000310,0.000970,0.000666
OH-phenylpyruvate,0.011052,0.016182,0.006976,0.005363,0.010825,0.007354,0.004945,0.002575,0.003754,0.004093,0.011270,0.007094,0.001803,0.004855
aminoisobutyrate,-0.000353,0.001802,0.001429,0.000837,0.001697,0.001087,-0.009112,-0.003275,-0.000922,-0.003205,-0.001814,0.001540,-0.001321,-0.003875
3-hydroxyanthranilate,-0.000048,-0.000027,-0.000020,-0.000023,-0.000015,-0.000028,-0.000006,-0.000028,-0.000003,-0.000006,-0.000037,0.000169,-0.000008,-0.000007
3-phosphoglycerate,-0.000419,0.005083,0.000907,-0.000375,0.011112,0.007249,0.005351,0.004794,0.002216,0.001839,0.001217,0.000808,0.000989,0.003787
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
uridine,0.006014,0.010471,-0.002425,-0.000642,0.001338,0.001505,-0.006982,-0.006946,-0.004089,-0.004404,0.281732,0.296405,-0.015892,-0.016018
valine,-7.349679,-7.461049,-3.392893,-3.415313,-3.804516,-3.714310,-2.103658,-2.020315,-1.779011,-1.790573,-3.698666,-3.069079,-2.852274,-2.792150
xanthine,0.161968,0.161236,0.037498,0.038947,0.024761,0.024642,0.048450,0.052410,0.007553,0.007257,0.027664,0.030031,0.023520,0.024672
XMP,0.011492,0.013016,0.003105,0.001852,0.000055,0.000199,0.000268,0.000118,-0.000055,-0.000207,-0.000260,-0.001320,-0.001069,-0.000187


In [10]:
# CCLE https://sites.broadinstitute.org/ccle/ contains more than 1K human cancer cell lines with associated RNA-seq data
# Meeson et al. (your paper) used 5 of this cell lines for ovarian cancer clusterized in low and high grade based on literature)
CCLE_annotation = pd.read_csv('./data/Cell_lines_annotations_20181226.txt', index_col=0, sep='\t')
CCLE_annotation

Unnamed: 0_level_0,depMapID,Name,Pathology,Site_Primary,Site_Subtype1,Site_Subtype2,Site_Subtype3,Histology,Hist_Subtype1,Hist_Subtype2,...,Growth.Medium,Supplements,FreezingMedium,DoublingTimefromVendor,DoublingTimeCalculatedhrs,type,type_refined,PATHOLOGIST_ANNOTATION,mutRate,tcga_code
CCLE_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
DMS53_LUNG,ACH-000698,DMS 53,primary,lung,NS,NS,NS,carcinoma,small_cell_carcinoma,NS,...,Waymouth's+10%FBS,,5%DMSO,,450.0,lung_small_cell,lung_small_cell,Lung:SCLC,1.572416e+09,SCLC
SW1116_LARGE_INTESTINE,ACH-000489,SW1116,primary,large_intestine,colon,NS,NS,carcinoma,adenocarcinoma,NS,...,L15+10%FBS,,5%DMSO,,1232.0,colorectal,colorectal,Colorectal:Carcinoma,9.437319e+09,COAD/READ
NCIH1694_LUNG,ACH-000431,NCI-H1694,metastasis,lung,NS,NS,NS,carcinoma,small_cell_carcinoma,NS,...,DMEM:F12+5%FBS,"005 mg/ml insulin, 01 mg/ml transferrin, 30nM ...",5% DMSO,,812.0,lung_small_cell,lung_small_cell,Lung:SCLC,1.095350e+08,SCLC
P3HR1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE,ACH-000707,P3HR-1,metastasis,haematopoietic_and_lymphoid_tissue,NS,NS,NS,lymphoid_neoplasm,Burkitt_lymphoma,NS,...,,,,,,lymphoma_Burkitt,lymphoma_Burkitt,Lymphoma:NH_B_cell,1.664564e+09,DLBC
HUT78_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE,ACH-000509,HuT 78,primary,haematopoietic_and_lymphoid_tissue,skin,NS,NS,lymphoid_neoplasm,mycosis_fungoides-Sezary_syndrome,NS,...,,,,,,lymphoma_other,T-cell_lymphoma_other,Lymphoma:NH_T_cell,1.182176e+09,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
GEO_LARGE_INTESTINE,ACH-002394,,,,,,,,,,...,,,,,,,colorectal,,,COAD/READ
NCIH2330_LUNG,,,,,,,,,,,...,,,,,,,,,,
LP6_SOFT_TISSUE,ACH-001116,,,,,,,,,,...,,,,,,,,,,
HS578BST_BREAST,,,,,,,,,,,...,,,,,,,,,,


In [11]:
dict_meeson={
    'COV318':'high_grade',
    'CAOV3':'high_grade',
    'OAW28':'high_grade',
    '59M':'low_grade',
    'HEYA8':'low_grade',
}
for cell_id, type in dict_meeson.items():
    print(CCLE_annotation.loc[f'{cell_id}_OVARY'].Name)
    
# all cell lines of your paper are found in this CCLE annotation dataset as 'name+_OVARY'

COV318
Caov-3
OAW28
59M
Hey-A8


In [12]:
df_jain

Unnamed: 0_level_0,IGR-OV1,IGR-OV1.1,OVCAR-3,OVCAR-3.1,OVCAR-4,OVCAR-4.1,OVCAR-5,OVCAR-5.1,OVCAR-8,OVCAR-8.1,SK-OV-3,SK-OV-3.1,NCI/ADR-RES,NCI/ADR-RES.1
Metabolite (b),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2-phosphoglycerate,0.004024,0.004238,0.003641,0.002481,0.008432,0.012999,0.004495,0.003178,0.001671,0.002037,-0.000364,0.000310,0.000970,0.000666
OH-phenylpyruvate,0.011052,0.016182,0.006976,0.005363,0.010825,0.007354,0.004945,0.002575,0.003754,0.004093,0.011270,0.007094,0.001803,0.004855
aminoisobutyrate,-0.000353,0.001802,0.001429,0.000837,0.001697,0.001087,-0.009112,-0.003275,-0.000922,-0.003205,-0.001814,0.001540,-0.001321,-0.003875
3-hydroxyanthranilate,-0.000048,-0.000027,-0.000020,-0.000023,-0.000015,-0.000028,-0.000006,-0.000028,-0.000003,-0.000006,-0.000037,0.000169,-0.000008,-0.000007
3-phosphoglycerate,-0.000419,0.005083,0.000907,-0.000375,0.011112,0.007249,0.005351,0.004794,0.002216,0.001839,0.001217,0.000808,0.000989,0.003787
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
uridine,0.006014,0.010471,-0.002425,-0.000642,0.001338,0.001505,-0.006982,-0.006946,-0.004089,-0.004404,0.281732,0.296405,-0.015892,-0.016018
valine,-7.349679,-7.461049,-3.392893,-3.415313,-3.804516,-3.714310,-2.103658,-2.020315,-1.779011,-1.790573,-3.698666,-3.069079,-2.852274,-2.792150
xanthine,0.161968,0.161236,0.037498,0.038947,0.024761,0.024642,0.048450,0.052410,0.007553,0.007257,0.027664,0.030031,0.023520,0.024672
XMP,0.011492,0.013016,0.003105,0.001852,0.000055,0.000199,0.000268,0.000118,-0.000055,-0.000207,-0.000260,-0.001320,-0.001069,-0.000187


In [13]:
# we now look at the fluxes from Jain and take, for each metabolite, the min and max fluxes across the ovarian cancer cell lines only
# in this way we have a unique profile for ovarian cancer extracellualr fluxes
# We expect the cells to differentiate once we integrate RNA-seq data

# Relaxation 0.5 and 1.5

df_jain = df_jain * 5e-3 # convert fmol/cell*h to mmol/gdw*h (flux) assuming cell_volume = 2e-10
df_exo_flux = pd.DataFrame(columns=['Min', 'Max', 'relaxed_Min', 'relaxed_Max'], index=df_jain.index)
for met_name_jain, row in df_jain.iterrows():
    min_ = row.min()
    max_ = row.max()
    df_exo_flux.at[met_name_jain, 'Min'] = min_
    df_exo_flux.at[met_name_jain, 'Max'] = max_
    if (min_ < 0 and max_ > 0):
        df_exo_flux.at[met_name_jain, 'relaxed_Min'] = min_ * 1.5
        df_exo_flux.at[met_name_jain, 'relaxed_Max'] = max_ * 1.5
    elif(min_ < 0 and max_ < 0):
        df_exo_flux.at[met_name_jain, 'relaxed_Min'] = min_ * 1.5
        df_exo_flux.at[met_name_jain, 'relaxed_Max'] = max_ * 0.5
    else:
        df_exo_flux.at[met_name_jain, 'relaxed_Min'] = min_ * 0.5
        df_exo_flux.at[met_name_jain, 'relaxed_Max'] = max_ * 1.5
df_exo_flux

Unnamed: 0_level_0,Min,Max,relaxed_Min,relaxed_Max
Metabolite (b),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2-phosphoglycerate,-0.000002,0.000065,-0.000003,0.000097
OH-phenylpyruvate,0.000009,0.000081,0.000005,0.000121
aminoisobutyrate,-0.000046,0.000009,-0.000068,0.000014
3-hydroxyanthranilate,-0.0,0.000001,-0.0,0.000001
3-phosphoglycerate,-0.000002,0.000056,-0.000003,0.000083
...,...,...,...,...
uridine,-0.00008,0.001482,-0.00012,0.002223
valine,-0.037305,-0.008895,-0.055958,-0.004448
xanthine,0.000036,0.00081,0.000018,0.001215
XMP,-0.000007,0.000065,-0.00001,0.000098


In [15]:
df_dict_met_rxn = pd.read_excel('./data/Liaskos_et_al.xlsx', sheet_name='Mapping_mets_IDs', index_col=0)
df_dict_met_rxn

Unnamed: 0_level_0,Metabolite ID Recon3D
Metabolite Jain et al.,Unnamed: 1_level_1
2-phosphoglycerate,2pg_e
OH-phenylpyruvate,34hpp_e
aminoisobutyrate,3aib_e
3-hydroxyanthranilate,3hanthrn_e
3-phosphoglycerate,3pg_e
...,...
uridine,uri_e
valine,val_L_e
xanthine,xan_e
XMP,xmp_e


In [16]:
df_exo_flux = df_exo_flux.join(df_dict_met_rxn, how='left')
df_exo_flux

Unnamed: 0_level_0,Min,Max,relaxed_Min,relaxed_Max,Metabolite ID Recon3D
Metabolite (b),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2-phosphoglycerate,-0.000002,0.000065,-0.000003,0.000097,2pg_e
OH-phenylpyruvate,0.000009,0.000081,0.000005,0.000121,34hpp_e
aminoisobutyrate,-0.000046,0.000009,-0.000068,0.000014,3aib_e
3-hydroxyanthranilate,-0.0,0.000001,-0.0,0.000001,3hanthrn_e
3-phosphoglycerate,-0.000002,0.000056,-0.000003,0.000083,3pg_e
...,...,...,...,...,...
uridine,-0.00008,0.001482,-0.00012,0.002223,uri_e
valine,-0.037305,-0.008895,-0.055958,-0.004448,val_L_e
xanthine,0.000036,0.00081,0.000018,0.001215,xan_e
XMP,-0.000007,0.000065,-0.00001,0.000098,xmp_e


In [17]:
df_exo_flux.loc[df_exo_flux['Metabolite ID Recon3D']=='tdechola_e/tdchola_e']
df_exo_flux.at['glycodeoxycholate/glycochenodeoxycholate', 'Metabolite ID Recon3D'] = 'dgchol_e'
df_exo_flux.at['taurodeoxycholate/taurochenodeoxycholate', 'Metabolite ID Recon3D'] = 'tdchola_e'

In [18]:
def double_nth_underscore(text, n=2):
    underscores = 0
    result = []
    for char in text:
        if char == '_':
            underscores += 1
            if underscores == n:
                result.append('__')
            else:
                result.append('_')
        else:
            result.append(char)
    return ''.join(result)

model_met_ids = [met.id for met in model.metabolites]
for index, row in df_exo_flux.iterrows():
    if(row['Metabolite ID Recon3D'] in model_met_ids):
        continue
    else:
        print(row['Metabolite ID Recon3D'])
        df_exo_flux.at[index, 'Metabolite ID Recon3D'] = double_nth_underscore(row['Metabolite ID Recon3D'], n=1)

print("##################################################")

model_met_ids = [met.id for met in model.metabolites]
for index, row in df_exo_flux.iterrows():
    if(row['Metabolite ID Recon3D'] in model_met_ids):
        continue
    else:
        print(row['Metabolite ID Recon3D'])
       

ala_L_e
arg_L_e
asn_L_e
asp_L_e
citr_L_e
cyst_L_e
glc_D_e
gln_L_e
glu_L_e
hcys_L_e
hom_L_e
ile_L_e
lac_L_e
leu_L_e
lys_L_e
mal_L_e
met_L_e
phe_L_e
pnto_R_e
pro_L_e
sbt_D_e
ser_L_e
thr_L_e
thyox_L_e
trp_L_e
tyr_L_e
val_L_e
##################################################


In [19]:
def double_nth_underscore(text, n=2):
    underscores = 0
    result = []
    for char in text:
        if char == '_':
            underscores += 1
            if underscores == n:
                result.append('__')
            else:
                result.append('_')
        else:
            result.append(char)
    return ''.join(result)

for index, row in df_exo_flux.iterrows():
    found_id = None
    for rxn in model.reactions:
        if(rxn.id.startswith('EX_') and len(rxn.metabolites) == 1):
            for met in rxn.metabolites:
                if(met.id == row['Metabolite ID Recon3D']):
                    found_id = rxn.id
    df_exo_flux.at[index, 'Reaction'] = found_id
df_exo_flux.loc[df_exo_flux['Reaction'].isnull()]

Unnamed: 0_level_0,Min,Max,relaxed_Min,relaxed_Max,Metabolite ID Recon3D,Reaction
Metabolite (b),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1


In [20]:
df_exo_flux

Unnamed: 0_level_0,Min,Max,relaxed_Min,relaxed_Max,Metabolite ID Recon3D,Reaction
Metabolite (b),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2-phosphoglycerate,-0.000002,0.000065,-0.000003,0.000097,2pg_e,EX_2pg_e
OH-phenylpyruvate,0.000009,0.000081,0.000005,0.000121,34hpp_e,EX_34hpp_e
aminoisobutyrate,-0.000046,0.000009,-0.000068,0.000014,3aib_e,EX_3aib_e
3-hydroxyanthranilate,-0.0,0.000001,-0.0,0.000001,3hanthrn_e,EX_3hanthrn_e
3-phosphoglycerate,-0.000002,0.000056,-0.000003,0.000083,3pg_e,EX_3pg_e
...,...,...,...,...,...,...
uridine,-0.00008,0.001482,-0.00012,0.002223,uri_e,EX_uri_e
valine,-0.037305,-0.008895,-0.055958,-0.004448,val__L_e,EX_val__L_e
xanthine,0.000036,0.00081,0.000018,0.001215,xan_e,EX_xan_e
XMP,-0.000007,0.000065,-0.00001,0.000098,xmp_e,EX_xmp_e


In [21]:
medium_df = pd.read_excel('./data/Liaskos_et_al.xlsx', sheet_name='RPMI1640_medium_EX_rxns', index_col=0)
medium_df.loc['L-Hydroxyproline'] = ['EX_4hpro_LT_e'] 
#drop row Chloride
medium_df = medium_df.drop('Chloride', axis=0)
medium_df


Unnamed: 0_level_0,Reaction
RPMI 1640 medium NCI60,Unnamed: 1_level_1
Glycine,EX_gly_e
L-Arginine,EX_arg_L_e
L-Asparagine,EX_asn_L_e
L-Aspartic acid,EX_asp_L_e
L-Cystine,EX_cys_L_e
L-Glutamic Acid,EX_glu_L_e
L-Glutamine,EX_gln_L_e
L-Histidine,EX_his_L_e
L-Hydroxyproline,EX_4hpro_LT_e
L-Isoleucine,EX_ile_L_e


In [22]:
def double_nth_underscore(text, n=2):
    underscores = 0
    result = []
    for char in text:
        if char == '_':
            underscores += 1
            if underscores == n:
                result.append('__')
            else:
                result.append('_')
        else:
            result.append(char)
    return ''.join(result)

for index, row in medium_df.iterrows():
    rxn_id = row['Reaction']
    if rxn_id in model.reactions:
        continue
    else:
        medium_df.at[index, 'Reaction'] = double_nth_underscore(medium_df.at[index, 'Reaction'], 2)

In [23]:
medium_df

Unnamed: 0_level_0,Reaction
RPMI 1640 medium NCI60,Unnamed: 1_level_1
Glycine,EX_gly_e
L-Arginine,EX_arg__L_e
L-Asparagine,EX_asn__L_e
L-Aspartic acid,EX_asp__L_e
L-Cystine,EX_cys__L_e
L-Glutamic Acid,EX_glu__L_e
L-Glutamine,EX_gln__L_e
L-Histidine,EX_his__L_e
L-Hydroxyproline,EX_4hpro_LT_e
L-Isoleucine,EX_ile__L_e


In [24]:
for index, row in medium_df.iterrows():
    rxn_id = row['Reaction']
    if rxn_id in model.reactions:
        continue
    else:
        print(rxn_id)

In [25]:
# find all exhcngae reactions in the model
ex_rxns = []
for rxn in model.reactions:
    if rxn.id.startswith('EX_'):
        ex_rxns.append(rxn.id)
print(len(ex_rxns))

1560


In [26]:
is_medium = []
for ex in ex_rxns:
    if ex in medium_df['Reaction'].values:
        is_medium.append(1)
    else:
        is_medium.append(0)
sum(is_medium)

40

In [27]:
is_jain = []
for ex in ex_rxns:
    if ex in df_exo_flux['Reaction'].values:
        is_jain.append(1)
    else:
        is_jain.append(0)
sum(is_jain)

115

In [28]:
df_exoflux = pd.DataFrame(index=ex_rxns, columns=['in_medium', 'in_jain'])
df_exoflux['in_medium'] = is_medium
df_exoflux['in_jain'] = is_jain
df_exoflux

Unnamed: 0,in_medium,in_jain
EX_5adtststerone_e,0,0
EX_5adtststerones_e,0,0
EX_5fthf_e,0,0
EX_5htrp_e,0,0
EX_5mthf_e,0,0
...,...,...
EX_3h3mglt_e,0,0
EX_ppiogly_e,0,0
EX_sbt__D_e,0,1
EX_lvstacid_e,0,0


In [29]:
df_exo_flux

Unnamed: 0_level_0,Min,Max,relaxed_Min,relaxed_Max,Metabolite ID Recon3D,Reaction
Metabolite (b),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2-phosphoglycerate,-0.000002,0.000065,-0.000003,0.000097,2pg_e,EX_2pg_e
OH-phenylpyruvate,0.000009,0.000081,0.000005,0.000121,34hpp_e,EX_34hpp_e
aminoisobutyrate,-0.000046,0.000009,-0.000068,0.000014,3aib_e,EX_3aib_e
3-hydroxyanthranilate,-0.0,0.000001,-0.0,0.000001,3hanthrn_e,EX_3hanthrn_e
3-phosphoglycerate,-0.000002,0.000056,-0.000003,0.000083,3pg_e,EX_3pg_e
...,...,...,...,...,...,...
uridine,-0.00008,0.001482,-0.00012,0.002223,uri_e,EX_uri_e
valine,-0.037305,-0.008895,-0.055958,-0.004448,val__L_e,EX_val__L_e
xanthine,0.000036,0.00081,0.000018,0.001215,xan_e,EX_xan_e
XMP,-0.000007,0.000065,-0.00001,0.000098,xmp_e,EX_xmp_e


In [30]:
col_met_ids = []
for index, row in df_exoflux.iterrows():
    rxn = model.reactions.get_by_id(index)
    for met in rxn.metabolites:
        col_met_ids.append(met.id)
df_exoflux['Metabolite ID Recon3D'] = col_met_ids
df_exoflux

Unnamed: 0,in_medium,in_jain,Metabolite ID Recon3D
EX_5adtststerone_e,0,0,5adtststerone_e
EX_5adtststerones_e,0,0,5adtststerones_e
EX_5fthf_e,0,0,5fthf_e
EX_5htrp_e,0,0,5htrp_e
EX_5mthf_e,0,0,5mthf_e
...,...,...,...
EX_3h3mglt_e,0,0,3h3mglt_e
EX_ppiogly_e,0,0,ppiogly_e
EX_sbt__D_e,0,1,sbt__D_e
EX_lvstacid_e,0,0,lvstacid_e


In [31]:
row['in_jain']==1 & row['in_medium']==0

True

In [32]:
LB = []
UB = []
is_set_list = []
for index, row in df_exoflux.iterrows():
    if(row['in_jain']==1 and row['in_medium']==1):
        LB.append(df_exo_flux.loc[df_exo_flux['Reaction']==index, 'relaxed_Min'].values[0])
        UB.append(df_exo_flux.loc[df_exo_flux['Reaction']==index, 'relaxed_Max'].values[0])
        is_set = 1
    elif(row['in_jain']==1 and row['in_medium']==0):
        jain_LB = df_exo_flux.loc[df_exo_flux['Reaction']==index, 'relaxed_Min'].values[0]
        jain_UB = df_exo_flux.loc[df_exo_flux['Reaction']==index, 'relaxed_Max'].values[0]
        if(jain_LB<0 and jain_UB>0):
            LB.append(1e-7)
            UB.append(jain_UB)
        elif(jain_LB <0 and jain_UB<0):
            LB.append(-1e-7)
            UB.append(100)
        else:
            LB.append(jain_LB)
            UB.append(jain_UB)
        is_set = 1
    else:
        is_set = 0
        LB.append(-100)
        UB.append(+100)
    is_set_list.append(is_set)
df_exoflux['LB'] = LB
df_exoflux['UB'] = UB
df_exoflux['is_set'] = is_set_list
df_exoflux

Unnamed: 0,in_medium,in_jain,Metabolite ID Recon3D,LB,UB,is_set
EX_5adtststerone_e,0,0,5adtststerone_e,-100.000000,100.000000,0
EX_5adtststerones_e,0,0,5adtststerones_e,-100.000000,100.000000,0
EX_5fthf_e,0,0,5fthf_e,-100.000000,100.000000,0
EX_5htrp_e,0,0,5htrp_e,-100.000000,100.000000,0
EX_5mthf_e,0,0,5mthf_e,-100.000000,100.000000,0
...,...,...,...,...,...,...
EX_3h3mglt_e,0,0,3h3mglt_e,-100.000000,100.000000,0
EX_ppiogly_e,0,0,ppiogly_e,-100.000000,100.000000,0
EX_sbt__D_e,0,1,sbt__D_e,0.000046,0.000514,1
EX_lvstacid_e,0,0,lvstacid_e,-100.000000,100.000000,0


In [33]:
df_exoflux.to_csv('./data/exoflux_bounds.csv')

In [34]:
#fix oxygen uptake based on literature LB = -2, UB = 0
#fix ATPM flux based on literature LB 2.576 UB 3.864
model.reactions.get_by_id('EX_o2_e').bounds = (-2, 0)
model.reactions.get_by_id('ATPM').bounds = (2.576, 3.864)

In [35]:
for index, row in tqdm(df_exoflux.iterrows(), total=df_exoflux.shape[0]):
    original_bounds = model.reactions.get_by_id(index).bounds
    model.reactions.get_by_id(index).bounds = (row['LB'], row['UB'])
    if math.isnan(model.slim_optimize()):
        print(f'Infeasible reaction: {index} with bounds {row["LB"]}, {row["UB"]}')
        model.reactions.get_by_id(index).bounds = original_bounds

 80%|███████▉  | 1247/1560 [00:03<00:01, 190.94it/s]

Infeasible reaction: EX_sucr_e with bounds 1.9942750059574653e-06, 2.22776617839489e-05


100%|██████████| 1560/1560 [00:04<00:00, 325.64it/s]

Infeasible reaction: EX_sbt__D_e with bounds 4.58713531797975e-05, 0.000513652932905718





All bounds were integrated correctly without need for MILP

In [36]:
model.objective = 'BIOMASS_reaction'

In [37]:
model.slim_optimize()

313.26041897378315

Now we run FVA to get the feasible ranges for all reactions

In [38]:
df_fva = flux_variability_analysis(model, fraction_of_optimum=0.0, processes=20)
df_fva

Set parameter Username
Set parameter LicenseID to value 2732828
Academic license - for non-commercial use only - expires 2026-11-04
Read LP format model from file /var/folders/0f/7pcrwybx3_zfsnzs2qjgxp8h0000gn/T/tmpxsdrwrum.lp
Reading time = 0.03 seconds
: 5836 rows, 21201 columns, 80853 nonzeros
Set parameter Username
Set parameter LicenseID to value 2732828
Academic license - for non-commercial use only - expires 2026-11-04
Read LP format model from file /var/folders/0f/7pcrwybx3_zfsnzs2qjgxp8h0000gn/T/tmp04y8ccwc.lp
Reading time = 0.03 seconds
: 5836 rows, 21201 columns, 80853 nonzeros
Set parameter Username
Set parameter LicenseID to value 2732828
Academic license - for non-commercial use only - expires 2026-11-04
Read LP format model from file /var/folders/0f/7pcrwybx3_zfsnzs2qjgxp8h0000gn/T/tmpmo11fxip.lp
Reading time = 0.02 seconds
: 5836 rows, 21201 columns, 80853 nonzeros
Set parameter Username
Set parameter LicenseID to value 2732828
Academic license - for non-commercial use 

Unnamed: 0,minimum,maximum
24_25DHVITD3tm,0.0,100.0
25HVITD3t,0.0,100.0
COAtl,0.0,1000.0
EX_5adtststerone_e,-100.0,100.0
EX_5adtststerones_e,-100.0,100.0
...,...,...
ACMPGLUTTRsc,0.0,100.0
FVSCOAhc,0.0,100.0
MDZGLChr,0.0,100.0
TMACMPhr,0.0,100.0


In [39]:
df_fva.to_csv('./data/fva_exoflux.csv')

In [40]:
df_fva = pd.read_csv('./data/fva_exoflux.csv', index_col=0)

In [41]:
#inject bounds into the model
for index, row in df_fva.iterrows():
    model.reactions.get_by_id(index).bounds = (row['minimum'], row['maximum'])

In [42]:
#check feasibility again
print(model.slim_optimize())

313.2604189723885


# Low and high grade models

We have 7 cells belonging to 2 families: Low and High grade cancer.

We are going to work in parallel for the two families.

For each family, we select its cell lines and doubling times reported by your paper. We select the lowest and highest doubling time across all cell lines of this family. These are going to be the LB and UB for the biomass reaction associated to the generic model of family x. We also need to trasnform doubling time to  mmol/gdw*h.

flux_biomass = ln(2)/DT

In [43]:
df_cells = pd.DataFrame(columns = ['ID', 'Type', 'Doubling time'])
df_cells.loc[0] = ['COV318', 'high_grade', 50]
df_cells.loc[1] = ['CAOV3', 'high_grade', 63.95]
df_cells.loc[2] = ['OAW28', 'high_grade', 37]
df_cells.loc[3] = ['59M', 'low_grade', 48]
df_cells.loc[4] = ['HEYA8', 'low_grade', 16]
df_cells

Unnamed: 0,ID,Type,Doubling time
0,COV318,high_grade,50.0
1,CAOV3,high_grade,63.95
2,OAW28,high_grade,37.0
3,59M,low_grade,48.0
4,HEYA8,low_grade,16.0


In [44]:
#high grade min = 37.0 max = 63.95
#low grade min = 16 max = 48
#we relax by a +-20%
LB_biomass_HG = (np.log(2)/63.95) *0.8
UB_biomass_HG = (np.log(2)/37.0)*1.2

LB_biomass_LG = (np.log(2)/48)*0.8
UB_biomass_LG = (np.log(2)/16)*1.2

print(LB_biomass_HG)
print(UB_biomass_HG)

print(LB_biomass_LG)
print(UB_biomass_LG)

0.00867111406486249
0.022480449099241465
0.011552453009332421
0.0519860385419959


In [45]:
#we take the FVA from exofluxomics and for each family we set the biomass bounds and compute FVA again in order to create new bounds for each family
model.reactions.get_by_id('BIOMASS_reaction').bounds = (LB_biomass_HG, UB_biomass_HG)
print(model.slim_optimize())
df_fva_HG = flux_variability_analysis(model, fraction_of_optimum=0.0, processes=20)
df_fva_HG.to_csv('./data/fva_exoflux_HG.csv')
df_fva_HG.loc['BIOMASS_reaction']

0.022480449099241465
Set parameter Username
Set parameter LicenseID to value 2732828
Academic license - for non-commercial use only - expires 2026-11-04
Read LP format model from file /var/folders/0f/7pcrwybx3_zfsnzs2qjgxp8h0000gn/T/tmpyh8nyxz1.lp
Reading time = 0.02 seconds
: 5836 rows, 21201 columns, 80853 nonzeros
Set parameter Username
Set parameter LicenseID to value 2732828
Academic license - for non-commercial use only - expires 2026-11-04
Read LP format model from file /var/folders/0f/7pcrwybx3_zfsnzs2qjgxp8h0000gn/T/tmpormrziky.lp
Reading time = 0.03 seconds
: 5836 rows, 21201 columns, 80853 nonzeros
Set parameter Username
Set parameter LicenseID to value 2732828
Academic license - for non-commercial use only - expires 2026-11-04
Read LP format model from file /var/folders/0f/7pcrwybx3_zfsnzs2qjgxp8h0000gn/T/tmp09xi12xi.lp
Reading time = 0.02 seconds
: 5836 rows, 21201 columns, 80853 nonzeros
Set parameter Username
Set parameter LicenseID to value 2732828
Academic license - fo

minimum    0.008671
maximum    0.022480
Name: BIOMASS_reaction, dtype: float64

In [46]:
model.reactions.get_by_id('BIOMASS_reaction').bounds = (LB_biomass_LG, UB_biomass_LG)
print(model.slim_optimize())
df_fva_LG = flux_variability_analysis(model, fraction_of_optimum=0.0, processes=20)
df_fva_LG.to_csv('./data/fva_exoflux_LG.csv')
df_fva_LG.loc['BIOMASS_reaction']

0.0519860385419959
Set parameter Username
Set parameter LicenseID to value 2732828
Academic license - for non-commercial use only - expires 2026-11-04
Read LP format model from file /var/folders/0f/7pcrwybx3_zfsnzs2qjgxp8h0000gn/T/tmp7c5krnpj.lp
Reading time = 0.02 seconds
: 5836 rows, 21201 columns, 80853 nonzeros
Set parameter Username
Set parameter LicenseID to value 2732828
Academic license - for non-commercial use only - expires 2026-11-04
Read LP format model from file /var/folders/0f/7pcrwybx3_zfsnzs2qjgxp8h0000gn/T/tmpdyogp4ou.lp
Reading time = 0.03 seconds
: 5836 rows, 21201 columns, 80853 nonzeros
Set parameter Username
Set parameter LicenseID to value 2732828
Academic license - for non-commercial use only - expires 2026-11-04
Read LP format model from file /var/folders/0f/7pcrwybx3_zfsnzs2qjgxp8h0000gn/T/tmp3n1zacp5.lp
Reading time = 0.03 seconds
: 5836 rows, 21201 columns, 80853 nonzeros
Set parameter Username
Set parameter LicenseID to value 2732828
Academic license - for 

minimum    0.011552
maximum    0.051986
Name: BIOMASS_reaction, dtype: float64

Basically each of these two FVA represent a model (HG anD LG)

# Minimize Sum of uptakes

In the exoflux dataset we used a column called is_set to indicate which exchange reactions had their bounds modified based on Jain data.

All the experimantal bounds were integrated. However, many exchange reactions were left as -100, +100 (this doesn't happen in reality).

For this reason, we apply the parsimonius principle to minimize the sum of uptakes BUT in such a way that the UB for growth rate can still be reached.

We will run this pipeline for our two models: low and high grade.


In [47]:
model = load_json_model('./data/Recon3D.json')
model.solver = 'gurobi'
model

0,1
Name,Recon3D
Memory address,1076c4ee0
Number of metabolites,5835
Number of reactions,10600
Number of genes,2248
Number of groups,0
Objective expression,1.0*BIOMASS_maintenance - 1.0*BIOMASS_maintenance_reverse_5b3f9
Compartments,"cytosol, lysosome, mitochondria, endoplasmic reticulum, extracellular space, peroxisome/glyoxysome, nucleus, golgi apparatus, inner mitochondrial compartment"


In [48]:
df_exoflux = pd.read_csv('./data/exoflux_bounds.csv', index_col=0)
df_exoflux

Unnamed: 0,in_medium,in_jain,Metabolite ID Recon3D,LB,UB,is_set
EX_5adtststerone_e,0,0,5adtststerone_e,-100.000000,100.000000,0
EX_5adtststerones_e,0,0,5adtststerones_e,-100.000000,100.000000,0
EX_5fthf_e,0,0,5fthf_e,-100.000000,100.000000,0
EX_5htrp_e,0,0,5htrp_e,-100.000000,100.000000,0
EX_5mthf_e,0,0,5mthf_e,-100.000000,100.000000,0
...,...,...,...,...,...,...
EX_3h3mglt_e,0,0,3h3mglt_e,-100.000000,100.000000,0
EX_ppiogly_e,0,0,ppiogly_e,-100.000000,100.000000,0
EX_sbt__D_e,0,1,sbt__D_e,0.000046,0.000514,1
EX_lvstacid_e,0,0,lvstacid_e,-100.000000,100.000000,0


In [49]:
df_exoflux["is_set"].sum()

115

In [50]:
df_fva_LG = pd.read_csv('./data/fva_exoflux_LG.csv', index_col=0)
df_fva_HG = pd.read_csv('./data/fva_exoflux_HG.csv', index_col=0)

In [51]:
for index, row in df_fva_LG.iterrows():
    model.reactions.get_by_id(index).bounds = (row['minimum'], row['maximum'])

obj_var = None
for index, row in df_exoflux.iterrows():
    if model.reactions.get_by_id(index).lower_bound < 1e-6:
        if row.is_set == 0:
            if(obj_var is None):
                obj_var = model.reactions.get_by_id(index).reverse_variable
            else:
                obj_var = obj_var + model.reactions.get_by_id(index).reverse_variable
            

#we take the reverse variable of each reaction flux. The reverse variable is between [0, |LB|]. It's always a positive number.
#cobra always return the net flux for each reactions as the difference between the forward and reverse variables.
#we want to minimize the negative flux of these reactions = uptake fluxes

original_biomass_bounds = model.reactions.get_by_id('BIOMASS_reaction').bounds
model.reactions.get_by_id('BIOMASS_reaction').bounds = (original_biomass_bounds[1]*0.90, original_biomass_bounds[1])

model.objective = obj_var
model.objective_direction = 'min'
X = model.slim_optimize()
print("MIN sum of uptake fluxes for LG: ", X)

#now we add a constraint to the model 
min_sum_up = model.problem.Constraint(
    obj_var,
    ub=X*1.2, lb = 0)
model.add_cons_vars(min_sum_up)
#we relax biomass bounds back to original
model.reactions.get_by_id('BIOMASS_reaction').bounds = original_biomass_bounds

model.objective = "BIOMASS_reaction"
model.objective_direction = "max"

#compute FVA again
df_fva_LG_min_sum = flux_variability_analysis(model, fraction_of_optimum=0.0, processes=20)
df_fva_LG_min_sum.to_csv('./data/fva_exoflux_LG_min_sum.csv')

MIN sum of uptake fluxes for LG:  5e-07
Set parameter Username
Set parameter LicenseID to value 2732828
Academic license - for non-commercial use only - expires 2026-11-04
Read LP format model from file /var/folders/0f/7pcrwybx3_zfsnzs2qjgxp8h0000gn/T/tmp4ty9kus0.lp
Reading time = 0.02 seconds
: 5837 rows, 21202 columns, 82299 nonzeros
Set parameter Username
Set parameter LicenseID to value 2732828
Academic license - for non-commercial use only - expires 2026-11-04
Read LP format model from file /var/folders/0f/7pcrwybx3_zfsnzs2qjgxp8h0000gn/T/tmpcx4h24l3.lp
Reading time = 0.03 seconds
: 5837 rows, 21202 columns, 82299 nonzeros
Set parameter Username
Set parameter LicenseID to value 2732828
Academic license - for non-commercial use only - expires 2026-11-04
Read LP format model from file /var/folders/0f/7pcrwybx3_zfsnzs2qjgxp8h0000gn/T/tmpllod7ef9.lp
Reading time = 0.03 seconds
: 5837 rows, 21202 columns, 82299 nonzeros
Set parameter Username
Set parameter LicenseID to value 2732828
Ac

In [52]:
model = load_json_model('./data/Recon3D.json')
model.solver = 'gurobi'
model

0,1
Name,Recon3D
Memory address,327062260
Number of metabolites,5835
Number of reactions,10600
Number of genes,2248
Number of groups,0
Objective expression,1.0*BIOMASS_maintenance - 1.0*BIOMASS_maintenance_reverse_5b3f9
Compartments,"cytosol, lysosome, mitochondria, endoplasmic reticulum, extracellular space, peroxisome/glyoxysome, nucleus, golgi apparatus, inner mitochondrial compartment"


In [53]:
for index, row in df_fva_HG.iterrows():
    model.reactions.get_by_id(index).bounds = (row['minimum'], row['maximum'])

obj_var = None
for index, row in df_exoflux.iterrows():
    if model.reactions.get_by_id(index).lower_bound < 1e-6:
        if row.is_set == 0:
            if(obj_var is None):
                obj_var = model.reactions.get_by_id(index).reverse_variable
            else:
                obj_var = obj_var + model.reactions.get_by_id(index).reverse_variable
            

#we take the reverse variable of each reaction flux. The reverse variable is between [0, |LB|]. It's always a positive number.
#cobra always return the net flux for each reactions as the difference between the forward and reverse variables.
#we want to minimize the negative flux of these reactions = uptake fluxes

original_biomass_bounds = model.reactions.get_by_id('BIOMASS_reaction').bounds
model.reactions.get_by_id('BIOMASS_reaction').bounds = (original_biomass_bounds[1]*0.90, original_biomass_bounds[1])

model.objective = obj_var
model.objective_direction = 'min'
X = model.slim_optimize()
print("MIN sum of uptake fluxes for HG: ", X)

#now we add a constraint to the model 
min_sum_up = model.problem.Constraint(
    obj_var,
    ub=X*1.2, lb = 0)
model.add_cons_vars(min_sum_up)
#we relax biomass bounds back to original
model.reactions.get_by_id('BIOMASS_reaction').bounds = original_biomass_bounds

model.objective = "BIOMASS_reaction"
model.objective_direction = "max"

#compute FVA again
df_fva_HG_min_sum = flux_variability_analysis(model, fraction_of_optimum=0.0, processes=20)
df_fva_HG_min_sum.to_csv('./data/fva_exoflux_HG_min_sum.csv')

MIN sum of uptake fluxes for HG:  5e-07
Set parameter Username
Set parameter LicenseID to value 2732828
Academic license - for non-commercial use only - expires 2026-11-04
Read LP format model from file /var/folders/0f/7pcrwybx3_zfsnzs2qjgxp8h0000gn/T/tmprf2s90iq.lp
Reading time = 0.02 seconds
: 5837 rows, 21202 columns, 82299 nonzeros
Set parameter Username
Set parameter LicenseID to value 2732828
Academic license - for non-commercial use only - expires 2026-11-04
Read LP format model from file /var/folders/0f/7pcrwybx3_zfsnzs2qjgxp8h0000gn/T/tmpd2irb7ua.lp
Reading time = 0.03 seconds
: 5837 rows, 21202 columns, 82299 nonzeros
Set parameter Username
Set parameter LicenseID to value 2732828
Academic license - for non-commercial use only - expires 2026-11-04
Read LP format model from file /var/folders/0f/7pcrwybx3_zfsnzs2qjgxp8h0000gn/T/tmpzxi4mjs4.lp
Reading time = 0.03 seconds
: 5837 rows, 21202 columns, 82299 nonzeros
Set parameter Username
Set parameter LicenseID to value 2732828
Ac

In [54]:
df_fva_HG_min_sum.loc["BIOMASS_reaction"]

minimum    0.008671
maximum    0.022480
Name: BIOMASS_reaction, dtype: float64

In [55]:
df_fva_LG_min_sum.loc["BIOMASS_reaction"]

minimum    0.011552
maximum    0.051986
Name: BIOMASS_reaction, dtype: float64

As you see thr experimental bounds for biomass are still feasible

# RNA-seq integration

Now it's time to integrate RNA-seq data into the two models we've just created to generate 7 context-specific metaboli models.

You shall follow the algorithm proposed by the authors.

CLLE RNA-seq data are given by the authors and are available online: **PAY ATTENTION TO ANY PREPROCESSING OR TRASFORMATION PERFOMED BY THE AUTHORS ON THE RNA-SEQ DATA**

You have now two models defined by the last FVA files we saved: one for HG and one for LG.  Obviously, you should integrate RNA-seq of a HG cell by using the base HG model.

# Flux comparison

In FIG. 4 , the authors compared subsystems by selecting the most de-regulated reactions across cell types (LG and HG). I think They optimized to max biomass each of the 7 models  (they generated 7 FBA solutions) and then they computed the average of each flux across cell lines of the same family.

However, they selected one of the many solutions that respect max biomass growth. I suggest to:

1. Impose biomass at its upper bound for each model (LB = UB*0.90 to avoid solver numerical issues)
2. Run flux sampling with OPTGP (cobrapy) with thinning = 100 and 1k samples per cell line
3. DO not 'summarize' flux probability distributions with a simple average, but use more advanced methods. The objective here is to identify the most different reactions across the two cancer families. You could run non parametric statistical tests such as mann-whitney to check if two probability distributions are significantly different or not. You have 3 cells vs 2 cells (all pair combinations), so you could perform this test only on reactions belonging to core subsystems such as glycolysis, TCA cycle pentophosphate etc.. in order to redure the number of compared distribution per cell couple.
4. Once you identified the top-n most different probability distributions (reaction fluxes) across cells of different type, you could plot them with boxplots as the authors did 
5. It might be interesting to check if we have 'less differences' in distributions of cells belonging to the same family.

# Gene essentiality

The authors perfomed a gene essentiality analysis to find which genes are important to still enable growth of the cells. Check how  they identified them.

Check gene knockouts on COBRApy docs to identify this important genes per cell line.

Compare your results. The authors compared their results with in-vitro data for validation (this would be very nice to do).