## Unification of the datasets provided by Hooman's paper

A dataset created from the iCHOv1 xml file and the excell file provided as Supplementary data have been unified  in order to have information from both dataset (i.e. lower and upper bound from the xml file and notes from the excell)

In [1]:
import pandas as pd
import cobra
from cobra.io import read_sbml_model
from cobra.util import create_stoichiometric_matrix
from tqdm.notebook import tqdm



In [2]:
# Load Hefzi's dataset
df1 = pd.read_excel('../../Data/iCHOv1_Reconstruction.xlsx', header = 1)
df1

Unnamed: 0,Reaction,Reaction Name,Subsystem,Reaction Formula,GPR,Curation Notes,References
0,COKECBESr,Carboxylesterase (cocaine) (er),ALKALOID SYNTHESIS,coke[r] + h2o[r] -> bz[r] + egme[r] + h[r],(100756666) or (100767789),From the various human carboxylesterases only ...,
1,EGMESTr,ecgonine methyl esterase (ER),ALKALOID SYNTHESIS,h2o[r] + egme[r] -> h[r] + ecgon[r] + meoh[r],(100771815),Encoded enzyme (100771815) is an arylacetamide...,
2,NMPTRCOX,N-Methylputrescine:oxygen oxidoreductase (deam...,ALKALOID SYNTHESIS,nmptrc[c] + o2[c] -> nh4[c] + 1mpyr[c] + h2o2...,(100771382) or (100762635) or (100762926) or (...,All encode amine oxidases: AOC1 (100771382); A...,
3,PRO1x,L-ProlineNAD+ 5-oxidoreductase,ARGININE AND PROLINE METABOLISM,pro_L[c] + nad[c] -> 2 h[c] + 1pyr5c[c] + nad...,(100773901),"100773901 maps to 58510, ""similar to proline d...",
4,DHDDH,Dihydrodiol dehydrogenase,CYP METABOLISM,nadp[c] + dhnpthld[c] -> npthld[c] + nadph[c],(100753544),Dimeric dihydrodiol dehydrogenase was found to...,
...,...,...,...,...,...,...,...
3224,C8CRNt,C8:0 acylcarnitine transport to mitochondria,CARNITINE SHUTTLE,c8crn[c] -> c8crn[m],(100765000),,
3225,MALCRNt,Malonyl carnitine transport into mitochondria,CARNITINE SHUTTLE,HC10859[c] -> HC10859[m],(100765000),,
3226,NP1,nucleotide phosphatase,NAD METABOLISM,h[c] + nac[c] + r1p[c] -> pi[c] + nicrns[c],(100757729),Homolog to the human gene in Recon1,
3227,UGT1A10r,"UDP-glucuronosyltransferase 1-10 precursor, mi...",STEROID METABOLISM,2 h[r] + udpglcur[r] + bilirub[r] <=> udp[r] ...,(100755423),PMID:7945246 shows mono and di gluconidation o...,


In [3]:
# Read the model using the cobrapy library
model = read_sbml_model('../../Data/reconciliation/models/iCHOv1_final.xml')

# Create dataframe from the model with the attributes that we are interested in
attributes = []
for reaction in tqdm(model.reactions):
    attributes.append([reaction.id, reaction.name, reaction.reaction, reaction.gpr, 
                       reaction.subsystem, reaction.lower_bound, reaction.upper_bound])

df2 = pd.DataFrame(data=attributes, columns=['Reaction', 'Reaction Name', 'Reaction Formula', 'GPR', 'Subsystem', 'Lower bound', 'Upper bound'])
df2

Set parameter Username
Academic license - for non-commercial use only - expires 2024-03-24


  0%|          | 0/6663 [00:00<?, ?it/s]

Unnamed: 0,Reaction,Reaction Name,Reaction Formula,GPR,Subsystem,Lower bound,Upper bound
0,COKECBESr,Carboxylesterase (cocaine) (er),coke_r + h2o_r --> bz_r + egme_r + h_r,100756666 or 100767789,ALKALOID SYNTHESIS,0.0,1000.0
1,EGMESTr,ecgonine methyl esterase (ER),egme_r + h2o_r --> ecgon_r + h_r + meoh_r,100771815,ALKALOID SYNTHESIS,0.0,1000.0
2,NMPTRCOX,N-Methylputrescine:oxygen oxidoreductase (deam...,nmptrc_c + o2_c --> 1mpyr_c + h2o2_c + nh4_c,100771382 or 100762635 or 100762926 or 100763954,ALKALOID SYNTHESIS,0.0,1000.0
3,PRO1x,L-ProlineNAD+ 5-oxidoreductase,nad_c + pro_L_c --> 1pyr5c_c + 2.0 h_c + nadh_c,100773901,ARGININE AND PROLINE METABOLISM,0.0,1000.0
4,DHDDH,Dihydrodiol dehydrogenase,dhnpthld_c + nadp_c --> nadph_c + npthld_c,100753544,CYP METABOLISM,0.0,1000.0
...,...,...,...,...,...,...,...
6658,DM_sprm_c_,DM_sprm[c],sprm_c -->,,DEMAND,0.0,1000.0
6659,DM_yvite_c_,DM_yvite[c],yvite_c -->,,DEMAND,0.0,1000.0
6660,DM_56iqcrbxlt_c_,DM_56iqcrbxlt[c],56iqcrbxlt_c -->,,DEMAND,0.0,1000.0
6661,DM_atp_c_,ATP maintenance,atp_c + h2o_c --> adp_c + h_c + pi_c,,DEMAND,0.0,1000.0


In [4]:
model

0,1
Name,iCHOv1
Memory address,16afeda20
Number of metabolites,4456
Number of reactions,6663
Number of genes,1766
Number of groups,121
Objective expression,1.0*biomass_cho_producing - 1.0*biomass_cho_producing_reverse_3e80b
Compartments,"endoplasmic reticulum, cytosol, mitochondria, peroxisome, golgi apparatus, extracellular space, nucleus, lysosome, intermembrane space of the mitochondria"


### df1 has 3229 reactions while df2 contains 6663 reactions

In [5]:
# Unify number of columns
df1 = df1.reindex(columns = df1.columns.tolist() + ['Lower bound','Upper bound'])
df2 = df2.reindex(columns = df2.columns.tolist() + ['Curation Notes','References'])

# Unify order of columns
df1 = df1[['Reaction', 'Reaction Name', 'Reaction Formula', 'GPR', 'Subsystem', 'Lower bound', 'Upper bound', 'Curation Notes', 'References']]
df2 = df2[['Reaction', 'Reaction Name', 'Reaction Formula', 'GPR', 'Subsystem', 'Lower bound', 'Upper bound', 'Curation Notes', 'References']]

In [6]:
# Merge both datasets 
hefzi_df = pd.concat([df2, df1])
hefzi_final = hefzi_df.groupby('Reaction').first()
hefzi_final = hefzi_final.reset_index()
hefzi_final.rename(columns = {'GPR':'GPR_hef'}, inplace = True)

hefzi_final.to_excel('../../Data/hefzi_final.xlsx')