## Unification of the iCHO2291 datasets

A dataset created from the iCHO2291 xml file and the excell file provided as Supplementary data have been unified  in order to have information from both dataset (i.e. forward and reverse kcat from the excel file and GPR reaction formula from the excell)

In [None]:
import pandas as pd
import cobra
from cobra.io import read_sbml_model
from tqdm.notebook import tqdm

In [None]:
# Load iCHO2291 excell file provided as supplementary data
df1 = pd.read_excel('../../Data/Reconciliation/datasets/iCHO2291.xlsx', 'Data S2')
df1

In [None]:
# Read the iCHO2291 model from https://www.ebi.ac.uk/biomodels/ using the cobrapy library
model = read_sbml_model('../../Data/Reconciliation/models/iCHO2291.xml')

# Create dataframe from the model with the attributes that we are interested in
attributes = []
for reaction in tqdm(model.reactions):
    attributes.append([reaction.id, reaction.name, reaction.reaction, reaction.gpr, 
                       reaction.subsystem, reaction.lower_bound, reaction.upper_bound])

df2 = pd.DataFrame(data=attributes, columns=['Reaction', 'Reaction Name', 'Reaction Formula', 'GPR', 'Subsystem', 'Lower bound', 'Upper bound'])
df2

In [None]:
model

### Both dfs has the same amount of reactions but differen info in the columns

In [None]:
# Unify number of columns
df1.rename(columns = {'Rxn':'Reaction', 'Subsystem (iCHO2291)':'Subsystem'}, inplace = True)
df1 = df1.reindex(columns = df1.columns.tolist() + ['Reaction Name','Reaction Formula','Lower bound','Upper bound'])
df1.drop(columns=['Subsystem (iCHO1766)'])
df2 = df2.reindex(columns = df2.columns.tolist() + ['Proteins','EC Number','Mol wt','kcat_forward','kcat_backward'])


# Unify order of columns
df1 = df1[['Reaction', 'Reaction Name', 'Reaction Formula', 'GPR', 'Subsystem', 'Lower bound', 'Upper bound', 'Proteins','EC Number','Mol wt','kcat_forward','kcat_backward']]
df2 = df2[['Reaction', 'Reaction Name', 'Reaction Formula', 'GPR', 'Subsystem', 'Lower bound', 'Upper bound', 'Proteins','EC Number','Mol wt','kcat_forward','kcat_backward']]

In [None]:
# Merge both datasets 
iCHO2291 = pd.concat([df2, df1])
iCHO2291 = iCHO2291.groupby('Reaction').first()
iCHO2291 = iCHO2291.reset_index()
iCHO2291.rename(columns = {'GPR':'GPR_yeo'}, inplace = True)

iCHO2291.to_excel('../../Data/Reconciliation/datasets/iCHO2291_final.xlsx')