## Unification of the datasets provided by Hooman's paper

A dataset created from the iCHOv1 xml file and the excell file provided as Supplementary data have been unified  in order to have information from both dataset (i.e. lower and upper bound from the xml file and notes from the excell)

In [None]:
import pandas as pd
import cobra
from cobra.io import read_sbml_model
from cobra.util import create_stoichiometric_matrix
from tqdm.notebook import tqdm

In [None]:
# Load Hefzi's dataset
df1 = pd.read_excel('../../Data/Reconciliation/datasets/iCHOv1_Reconstruction.xlsx', header = 1)
df1

In [None]:
# Read the model using the cobrapy library
model = read_sbml_model('../../Data/reconciliation/models/iCHOv1_final.xml')

# Create dataframe from the model with the attributes that we are interested in
attributes = []
for reaction in tqdm(model.reactions):
    attributes.append([reaction.id, reaction.name, reaction.reaction, reaction.gpr, 
                       reaction.subsystem, reaction.lower_bound, reaction.upper_bound])

df2 = pd.DataFrame(data=attributes, columns=['Reaction', 'Reaction Name', 'Reaction Formula', 'GPR', 'Subsystem', 'Lower bound', 'Upper bound'])
df2

In [None]:
model

### df1 has 3229 reactions while df2 contains 6663 reactions

In [None]:
# Unify number of columns
df1 = df1.reindex(columns = df1.columns.tolist() + ['Lower bound','Upper bound'])
df2 = df2.reindex(columns = df2.columns.tolist() + ['Curation Notes','References'])

# Unify order of columns
df1 = df1[['Reaction', 'Reaction Name', 'Reaction Formula', 'GPR', 'Subsystem', 'Lower bound', 'Upper bound', 'Curation Notes', 'References']]
df2 = df2[['Reaction', 'Reaction Name', 'Reaction Formula', 'GPR', 'Subsystem', 'Lower bound', 'Upper bound', 'Curation Notes', 'References']]

In [None]:
# Merge both datasets 
hefzi_df = pd.concat([df2, df1])
hefzi_final = hefzi_df.groupby('Reaction').first()
hefzi_final = hefzi_final.reset_index()
hefzi_final.rename(columns = {'GPR':'GPR_hef'}, inplace = True)

hefzi_final.to_excel('../../Data/Reconciliation/datasets/hefzi_final.xlsx')