## Unification of the iCHO2291 datasets

A dataset created from the iCHO2291 xml file and the excell file provided as Supplementary data have been unified  in order to have information from both dataset (i.e. forward and reverse kcat from the excel file and GPR reaction formula from the excell)

In [1]:
import pandas as pd
import cobra
from cobra.io import read_sbml_model
from tqdm.notebook import tqdm



In [2]:
# Load iCHO2291 excell file provided as supplementary data
df1 = pd.read_excel('../../Data/Supplementary Data.xlsx', 'Data S2')
df1

Unnamed: 0,Rxn,GPR,Proteins,EC Number,Mol wt,kcat_forward,kcat_backward,Subsystem (iCHO2291),Subsystem (iCHO1766)
0,10FTHF5GLUtl,,,,,,,Transport,"TRANSPORT, LYSOSOMAL"
1,10FTHF5GLUtm,,,,,,,Transport,"TRANSPORT, MITOCHONDRIAL"
2,10FTHF6GLUtl,,,,,,,Transport,"TRANSPORT, LYSOSOMAL"
3,10FTHF6GLUtm,,,,,,,Transport,"TRANSPORT, MITOCHONDRIAL"
4,10FTHF7GLUtl,,,,,,,Transport,"TRANSPORT, LYSOSOMAL"
...,...,...,...,...,...,...,...,...,...
6231,RTOTALFATPc,,,,,,,Exchange/demand/sink reaction,R GROUP SYNTHESIS
6232,RTOTALt,,,,,,,Transport,"TRANSPORT, EXTRACELLULAR"
6233,Rtotaltl,,,,,,,Transport,"TRANSPORT, LYSOSOMAL"
6234,Rtotaltp,,,,,,,Transport,"TRANSPORT, PEROXISOMAL"


In [3]:
# Read the iCHO2291 model from https://www.ebi.ac.uk/biomodels/ using the cobrapy library
model = read_sbml_model('../../Data/Reconciliation/models/iCHO2291.xml')

# Create dataframe from the model with the attributes that we are interested in
attributes = []
for reaction in tqdm(model.reactions):
    attributes.append([reaction.id, reaction.name, reaction.reaction, reaction.gpr, 
                       reaction.subsystem, reaction.lower_bound, reaction.upper_bound])

df2 = pd.DataFrame(data=attributes, columns=['Reaction', 'Reaction Name', 'Reaction Formula', 'GPR', 'Subsystem', 'Lower bound', 'Upper bound'])
df2

Set parameter Username
Academic license - for non-commercial use only - expires 2024-03-24


  0%|          | 0/6236 [00:00<?, ?it/s]

Unnamed: 0,Reaction,Reaction Name,Reaction Formula,GPR,Subsystem,Lower bound,Upper bound
0,10FTHF5GLUtl,"5-glutamyl-10FTHF transport, lysosomal",10fthf5glu[c] --> 10fthf5glu[l],,Transport,0.0,1000.0
1,10FTHF5GLUtm,"5-glutamyl-10FTHF transport, mitochondrial",10fthf5glu[m] --> 10fthf5glu[c],,Transport,0.0,1000.0
2,10FTHF6GLUtl,"6-glutamyl-10FTHF transport, lysosomal",10fthf6glu[c] --> 10fthf6glu[l],,Transport,0.0,1000.0
3,10FTHF6GLUtm,"6-glutamyl-10FTHF transport, mitochondrial",10fthf6glu[m] --> 10fthf6glu[c],,Transport,0.0,1000.0
4,10FTHF7GLUtl,"7-glutamyl-10FTHF transport, lysosomal",10fthf7glu[c] --> 10fthf7glu[l],,Transport,0.0,1000.0
...,...,...,...,...,...,...,...
6231,RTOTALFATPc,uptake of Rtotal by enterocytes,Rtotal[e] + atp[c] + coa[c] --> Rtotalcoa[c] +...,,Exchange/demand/sink reaction,0.0,1000.0
6232,RTOTALt,RTOTAL transport,Rtotal[e] <=> Rtotal[c],,Transport,-1000.0,1000.0
6233,Rtotaltl,fatty acid intracellular transport,Rtotal[c] <=> Rtotal[l],,Transport,-1000.0,1000.0
6234,Rtotaltp,fatty acid intracellular transport,Rtotal[c] <=> Rtotal[x],,Transport,-1000.0,1000.0


In [4]:
model

0,1
Name,iCHO2291
Memory address,17768d1e0
Number of metabolites,3972
Number of reactions,6236
Number of genes,2291
Number of groups,15
Objective expression,1.0*biomass_cho - 1.0*biomass_cho_reverse_073b2
Compartments,"Cytoplasm, Lysosome, Mitochondrion, Endoplasmic_reticulum, Nucleus, Extracellular, Peroxisome, Golgi"


### Both dfs has the same amount of reactions but differen info in the columns

In [5]:
# Unify number of columns
df1.rename(columns = {'Rxn':'Reaction', 'Subsystem (iCHO2291)':'Subsystem'}, inplace = True)
df1 = df1.reindex(columns = df1.columns.tolist() + ['Reaction Name','Reaction Formula','Lower bound','Upper bound'])
df1.drop(columns=['Subsystem (iCHO1766)'])
df2 = df2.reindex(columns = df2.columns.tolist() + ['Proteins','EC Number','Mol wt','kcat_forward','kcat_backward'])


# Unify order of columns
df1 = df1[['Reaction', 'Reaction Name', 'Reaction Formula', 'GPR', 'Subsystem', 'Lower bound', 'Upper bound', 'Proteins','EC Number','Mol wt','kcat_forward','kcat_backward']]
df2 = df2[['Reaction', 'Reaction Name', 'Reaction Formula', 'GPR', 'Subsystem', 'Lower bound', 'Upper bound', 'Proteins','EC Number','Mol wt','kcat_forward','kcat_backward']]

In [6]:
# Merge both datasets 
iCHO2291 = pd.concat([df2, df1])
iCHO2291 = iCHO2291.groupby('Reaction').first()
iCHO2291 = iCHO2291.reset_index()
iCHO2291.rename(columns = {'GPR':'GPR_yeo'}, inplace = True)

iCHO2291.to_excel('../../Data/iCHO2291_final.xlsx')