# Reactions
This notebook contains all the steps followed in order to reconcile the existing metabolic reconstruction in CHO cells. It si divided into two parts: **1. Network Reconstruction** and **2. Identification of duplicated reactions**

[1. Network Reconstruction](#reconstruction) <br>
&nbsp;&nbsp;&nbsp;&nbsp;**1.1 Datasets generation and merge**<br>
&nbsp;&nbsp;&nbsp;&nbsp;**1.2 Normalization of the data** <br>
&nbsp;&nbsp;&nbsp;&nbsp;**1.3 Group all the data into a unified dataset** <br>
&nbsp;&nbsp;&nbsp;&nbsp;**1.4 Addition of the Recon3D CHO ortholog GPRs into the reconstruction** <br>
&nbsp;&nbsp;&nbsp;&nbsp;**1.5 Divide the dataset in two to facilitate reading and curation in Google Sheets** <br>

[2. Identification of Duplicated Reactions](#duplicated) <br>
&nbsp;&nbsp;&nbsp;&nbsp;**2.1 Generate the model and identify duplicated reactions**<br>
&nbsp;&nbsp;&nbsp;&nbsp;**2.2 Fix duplicated reactions in the dataset from the list duplicated_reactions obtained above** <br>

[3. Add Database links and Pre-checks to "Rxns" and "Attributes" Sheet](#bigg) <br>
&nbsp;&nbsp;&nbsp;&nbsp;**3.1 Add BiGG and EBI links to the "Rxns" and "Attributes" Sheets**<br>
&nbsp;&nbsp;&nbsp;&nbsp;**3.2 Add a "Pre-check" tag to the Rxns Sheet** <br>

[4. Divide into compartments](#compartments) <br>
&nbsp;&nbsp;&nbsp;&nbsp;**2.1 Generate the model and identify duplicated reactions**<br>
&nbsp;&nbsp;&nbsp;&nbsp;**2.2 Fix duplicated reactions in the dataset from the list duplicated_reactions obtained above** <br>

## 1. Network Reconstruction <a id='reconstruction'></a>
These include the generation of datasets (1), normalization of the data (2), merging all four reconstructions into a unified dataset (3), compiling the generated reconstruction into a cobra model in order to identify duplicated reactions (4), fixing duplicated reactions using BiGG ID annotation (5), adding Recon3D GPR information (6), and finally dividing the dataset into two different datasets that will be further curated in Google Sheets (7).

In [None]:
# Libraries import
import pandas as pd
import numpy as np

### 1.1 Datasets generation and merge
Dataset generation from previous reconstructions (CHO_DG44, iCHO1766, iCHO2101, iCHO2291)

In [None]:
#Read excel files and create the dfs

#Camels CHO_DG44 metabolic reconstruction
camel_df = pd.read_excel('../Data/Reconciliation/datasets/CHO_DG44.xlsx', header = 1)

#Hefzi's iCHO1766 metabolic reconstruction
hefzi_df = pd.read_excel('../Data/Reconciliation/datasets/hefzi_final.xlsx')

#Foudaliha's iCHO2101 metabolic reconstruction
fouladiha_df = pd.read_excel('../Data/Reconciliation/datasets/iCHO2101.xlsx', 'Supplementary Table 10', header = 1)

#Yeo's iCHO2291 metabolic reconstruction
iCHO2291 = pd.read_excel('../Data/Reconciliation/datasets/iCHO2291_final.xlsx')

### 1.2 Normalization of the data
All dataset are normalized into the same shape and format and then combined into one big dataset

In [None]:
#Standarization of the columns names
camel_df.rename(columns = {'Reaction ID':'Reaction', 'Initial reaction in model':'Reaction Formula', 'Reaction name':'Reaction Name', 'Justification':'Curation Notes'}, inplace = True)
fouladiha_df.rename(columns = {'Abbreviation':'Reaction', 'Description':'Reaction Name', 'Reaction':'Reaction Formula', 'GPR':'GPR_fou'}, inplace = True)

# Addition of tag columns for organizational purposes
camel_df.insert(loc=0, column='cam', value='X')
camel_df.insert(loc=1, column='hef', value=np.nan)
camel_df.insert(loc=2, column='fou', value=np.nan)
camel_df.insert(loc=3, column='yeo', value=np.nan)

hefzi_df.insert(loc=0, column='cam', value=np.nan)
hefzi_df.insert(loc=1, column='hef', value='X')
hefzi_df.insert(loc=2, column='fou', value=np.nan)
hefzi_df.insert(loc=3, column='yeo', value=np.nan)

fouladiha_df.insert(loc=0, column='cam', value=np.nan)
fouladiha_df.insert(loc=1, column='hef', value=np.nan)
fouladiha_df.insert(loc=2, column='fou', value='X')
fouladiha_df.insert(loc=3, column='yeo', value=np.nan)

iCHO2291.insert(loc=0, column='cam', value=np.nan)
iCHO2291.insert(loc=1, column='hef', value=np.nan)
iCHO2291.insert(loc=2, column='fou', value=np.nan)
iCHO2291.insert(loc=3, column='yeo', value='X')

In [None]:
# Generate cols list with the column names from all datasets
cols = hefzi_df.columns.to_list()+fouladiha_df.columns.to_list()+iCHO2291.columns.to_list()+camel_df.columns.to_list()

# Eliminate repetitive values in the 'cols' list
cols = [cols[i] for i in range(len(cols)) if i == cols.index(cols[i])]

In [None]:
def add_col(df):
    '''
    This function adds the columns from the cols list 
    that are not present in the df
    '''
    df.columns
    add_col = []
    for col in cols:
        if col not in df.columns:
            add_col.append(col)
    df = df.reindex(columns = df.columns.tolist() + add_col)
    return df

In [None]:
# Unify columns for all datasets
hefzi_df = add_col(hefzi_df)
fouladiha_df = add_col(fouladiha_df)
iCHO2291 = add_col(iCHO2291)
camel_df = add_col(camel_df)

In [None]:
# Reorder columns in all datasets the same way
fouladiha_df = fouladiha_df[['cam','hef', 'fou', 'yeo', 'Reaction', 'Reaction Name', 'Reaction Formula', 'GPR_hef','GPR_fou','GPR_yeo', 'Subsystem', 'Genes', 'Protein', 'EC Number', 'Mol wt', 'kcat_forward', 'kcat_backward', 'Reversible','Lower bound', 'Upper bound', 'Objective', 'Curation Notes', 'References', 'Reaction ID Camels Models']]
iCHO2291 = iCHO2291[['cam','hef', 'fou', 'yeo', 'Reaction', 'Reaction Name', 'Reaction Formula', 'GPR_hef','GPR_fou','GPR_yeo', 'Subsystem', 'Genes', 'Protein', 'EC Number', 'Mol wt', 'kcat_forward', 'kcat_backward', 'Reversible','Lower bound', 'Upper bound', 'Objective', 'Curation Notes', 'References', 'Reaction ID Camels Models']]
hefzi_df = hefzi_df[['cam','hef', 'fou', 'yeo', 'Reaction', 'Reaction Name', 'Reaction Formula', 'GPR_hef','GPR_fou','GPR_yeo', 'Subsystem', 'Genes', 'Protein', 'EC Number', 'Mol wt', 'kcat_forward', 'kcat_backward', 'Reversible','Lower bound', 'Upper bound', 'Objective', 'Curation Notes', 'References', 'Reaction ID Camels Models']]
camel_df = camel_df[['cam','hef', 'fou', 'yeo', 'Reaction', 'Reaction Name', 'Reaction Formula', 'GPR_hef','GPR_fou','GPR_yeo', 'Subsystem', 'Genes', 'Protein', 'EC Number', 'Mol wt', 'kcat_forward', 'kcat_backward', 'Reversible','Lower bound', 'Upper bound', 'Objective', 'Curation Notes', 'References', 'Reaction ID Camels Models']]

camel_df['Reaction'] = camel_df['Reaction'].str.strip()

In [None]:
# Merge all the dfs into a unified df
all_dfs = pd.concat([camel_df, hefzi_df, fouladiha_df, iCHO2291])
all_dfs = all_dfs.reset_index(drop = True)

#Unify reaction names
all_dfs['Reaction'] = all_dfs['Reaction'].str.replace('_cho', '')
all_dfs['Reaction'] = all_dfs['Reaction'].str.replace(r"(e)", "_e_", regex = False)
all_dfs['Reaction'] = all_dfs['Reaction'].str.replace("[", "_")
all_dfs['Reaction'] = all_dfs['Reaction'].str.replace("]", "_")

all_dfs #20940 rows/reactions (many of them repeated)

In [None]:
# Remove the underscore at the end of some reactions

rxns = []
idx = []

for i,row in all_dfs.iterrows():
    if str(row['Reaction']).endswith('_'):
        s = re.sub('_$', '', row['Reaction'])
        rxns.append(s)
        idx.append(i)
        
all_dfs['Reaction'].update(pd.Series(rxns,index=idx))

### 1.3 Group all the data into a unified dataset
The combined dataset generated above "all_dfs" is grouped by the Reaction BiGG ID to obtain a dataset with unique reaction identifiers

In [None]:
# Group the data into a unified dataset

all_dfs2 = all_dfs.groupby('Reaction').first()
all_dfs2['Reaction Formula'] = all_dfs2['Reaction Formula'].str.replace('[','_')
all_dfs2['Reaction Formula'] = all_dfs2['Reaction Formula'].str.replace(']','')
all_dfs2['Reaction Formula'] = all_dfs2['Reaction Formula'].str.replace(' => | =>',' --> ')
all_dfs2['Reaction Formula'] = all_dfs2['Reaction Formula'].str.replace(' <-- | <--',' <=> ')
all_dfs2

In [None]:
# Separate Demand Reaccion into a different df and remove from reconstruction. 
# Keep demand reactions from extracellular space

all_dfs2 = all_dfs2.reset_index()
demand_reactions = pd.DataFrame(columns = all_dfs2.columns)

for index, rxn in all_dfs2.iterrows():
    rxn = rxn['Reaction Formula']
    a,b = re.split('<=>|-->',rxn)
    if (b == '' or b == ' ') and not a.endswith('_e '):
        temp_df = all_dfs2[all_dfs2['Reaction Formula'] == rxn]
        demand_reactions = pd.concat([demand_reactions,temp_df])
        all_dfs2.drop(index, inplace=True)

demand_reactions = demand_reactions.reset_index(drop=True)
demand_reactions.to_excel('../Data/Reconciliation/datasets/demand_reactions.xlsx')
demand_reactions

In [None]:
all_dfs2 = all_dfs2.reset_index(drop=True)
all_dfs2

### 1.4 Addition of the Recon3D CHO ortholog GPRs into the reconstruction
The information from the dataset generated in the notebook "GPR Annotation" containing all the information from Recon3D GPRS in human and its corresponding CHO orthologs is mapped into our reconstruction

In [None]:
# Generate a dict with recon3d reactions as keys and the CHO GPRs as values.
recon3d = pd.read_excel('../Data/GPR_Curation/recon3D_chogprs.xlsx')
recon3d_dict = recon3d.set_index('m_reaction')['CHO GPR'].to_dict()
recon3d_dict

In [None]:
# Map 'recon3d_dict' into the all_dfs3 dataframe
# the reaction IDs should be the same as those in our reconstruction

all_dfs3 = all_dfs2.reset_index()
all_dfs3['GPR_Recon3D'] = all_dfs3['Reaction'].map(recon3d_dict)

### 1.5 Divide the dataset in two to facilitate reading and curation in Google Sheets
The dataset all_dfs3 is divided into two different datasets: all_dfs4 contains mainly all the information regarding GPRs assigned from previous reconstructions. all_dfs5 contains the rest of the attributes in the reconstruction such as EC number, bounds, etc.

In [None]:
# all_dfs4 contains mainly information of the reactions GPRs
all_dfs4 = all_dfs3[['Reaction', 'Reaction Name', 'Reaction Formula', 'Subsystem', 'GPR_hef', 'GPR_fou', 'GPR_yeo', 'GPR_Recon3D', 'Curation Notes', 'References']]
all_dfs4.insert(8,'GPR_final', '')
all_dfs4.to_excel('../Data/Reconciliation/datasets/all_dfs4.xlsx')
all_dfs4

In [None]:
# all_dfs5 contains information of the rest of the attributes in our reconstruction
all_dfs5 = all_dfs3[['Reaction', 'Genes', 'Protein', 'EC Number', 'Mol wt', 'kcat_forward', 'kcat_backward', 'Reversible', 'Lower bound', 'Upper bound', 'Objective']]
all_dfs5.to_excel('../Data/Reconciliation/datasets/all_dfs5.xlsx')
all_dfs5

## 2. Identifications of Duplicated Reactions <a id='duplicated'></a>
In this part of the notebook we use the **duplicated_reactions** fucntion from the **utils** module to spot duplicated reactions in our dataset. First we generate a model, using cobprapy, with our dataset, then we apply the **duplicated_reactions** function to the model, and finally we standarize the name of the duplicated reactions according to the nomenclature used in BiGG.

In [1]:
# Libraries import
import re
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
from requests_html import HTMLSession
from tqdm.notebook import tqdm
from cobra import Model, Reaction, Metabolite, util

from utils import duplicated_reactions
from google_sheet import GoogleSheet

### 2.1 Generate the model and identify duplicated reactions
Generation of a cobra model from the **Google Sheet dataset**. This model will be used to identify duplicated reactions from the stoichiometric matrix of the model.

In [2]:
# Generate the necessary datasets for the identification of the duplicated reactions
KEY_FILE_PATH = 'credentials.json'
SPREADSHEET_ID_v3 = '1MlBXeHIKw8k8fZyXm-sN__AHTRSunJxar_-bqvukZws'

# Initialize the GoogleSheet object
sheet = GoogleSheet(SPREADSHEET_ID_v3, KEY_FILE_PATH)

# Read data from the Google Sheet
sheet_rxns = 'Rxns'
sheet_attributes = 'Attributes'

reactions = sheet.read_google_sheet(sheet_rxns)
rxns_attributes = sheet.read_google_sheet(sheet_attributes)

In [3]:
##### ----- Create a model from the reactions sheet ----- #####
model = Model("iCHO")
lr = []
for _, row in reactions.iterrows():
    r = Reaction(row['Reaction'])
    lr.append(r)    
model.add_reactions(lr)
model

Set parameter Username
Academic license - for non-commercial use only - expires 2024-03-24


0,1
Name,iCHO
Memory address,15c93f8b0
Number of metabolites,0
Number of reactions,10277
Number of genes,0
Number of groups,0
Objective expression,0
Compartments,


In [4]:
##### ----- Add information to each one of the reactions ----- #####
for i,r in enumerate(tqdm(model.reactions)):
    print(r.id)
    r.build_reaction_from_string(reactions['Reaction Formula'][i])
    r.name = reactions['Reaction Name'][i]
    r.subsystem = reactions['Subsystem'][i]
    r.lower_bound = float(rxns_attributes['Lower bound'][i])
    r.upper_bound = float(rxns_attributes['Upper bound'][i])

  0%|          | 0/10277 [00:00<?, ?it/s]

13DAMPPOX
unknown metabolite '13dampp_c' created
unknown metabolite 'h2o_c' created
unknown metabolite 'o2_c' created
unknown metabolite 'bamppald_c' created
unknown metabolite 'h2o2_c' created
unknown metabolite 'nh4_c' created
2AMACHYD
unknown metabolite '2amac_c' created
unknown metabolite 'pyr_c' created
2AMACSULT
unknown metabolite 'nadph_c' created
unknown metabolite 'paps_c' created
unknown metabolite 'Lcyst_c' created
unknown metabolite 'nadp_c' created
unknown metabolite 'pap_c' created
3SALATAi
unknown metabolite '3sala_c' created
unknown metabolite 'akg_c' created
unknown metabolite 'h_c' created
unknown metabolite '3snpyr_c' created
unknown metabolite 'glu_L_c' created
3SALATAim
unknown metabolite '3sala_m' created
unknown metabolite 'akg_m' created
unknown metabolite 'h_m' created
unknown metabolite '3snpyr_m' created
unknown metabolite 'glu_L_m' created
3SPYRSP
unknown metabolite 'so3_c' created
3SPYRSPm
unknown metabolite 'h2o_m' created
unknown metabolite 'pyr_m' create

GULN3D
unknown metabolite 'guln_c' created
GULNDer
unknown metabolite 'glcur_r' created
unknown metabolite 'guln_r' created
GUR1PP
unknown metabolite 'glcur1p_c' created
unknown metabolite 'glcur_c' created
GapFill-R01738
GapFill-R01895
unknown metabolite 'rbt_c' created
unknown metabolite 'rbl_D_c' created
HCO3E
unknown metabolite 'hco3_c' created
HCO3Ee
unknown metabolite 'h2o_e' created
unknown metabolite 'hco3_e' created
HCO3Em
HEX1
HMR_4592
unknown metabolite 'xyl_D_c' created
unknown metabolite 'xylt_c' created
HMR_7745
unknown metabolite 'M01388_c' created
HMR_7746
unknown metabolite 'M01389_c' created
HMR_7747
HMR_7748
HMR_7749
HMR_9799
unknown metabolite 'M03165_c' created
unknown metabolite 's7p_c' created
HMR_9800
unknown metabolite 'M03166_c' created
HPI
unknown metabolite 'hop_c' created
HPYRDC
HPYRDCm
unknown metabolite 'hpyr_m' created
HPYRR2x
unknown metabolite 'glyc_S_c' created
HPYRRx
HPYRRy
ICDHxm
ICDHy
ICDHyp
unknown metabolite 'icit_x' created
unknown metabolite 'n

HMR_3705
unknown metabolite 'arach_r' created
HMR_3706
unknown metabolite 'M00017_r' created
HMR_3707
unknown metabolite 'CE2510_r' created
HMR_3708
unknown metabolite 'M01235_r' created
HMR_3709
unknown metabolite 'M01207_r' created
HMR_3710
unknown metabolite 'M02457_r' created
HMR_3711
unknown metabolite 'M02053_r' created
HMR_3712
unknown metabolite 'docosac_r' created
HMR_3713
unknown metabolite 'doco13ac_r' created
HMR_3714
unknown metabolite 'M01582_r' created
HMR_3715
unknown metabolite 'M03045_r' created
HMR_3716
unknown metabolite 'lgnc_r' created
HMR_3717
unknown metabolite 'nrvnc_r' created
HMR_3718
unknown metabolite 'hexc_r' created
HMR_3719
unknown metabolite 'M03153_r' created
HMR_3721
unknown metabolite 'strdnc_r' created
HMR_3722
unknown metabolite 'eicostet_r' created
HMR_3723
unknown metabolite 'tmndnc_r' created
HMR_3724
unknown metabolite 'clpnd_r' created
HMR_3725
unknown metabolite 'tetpent3_r' created
HMR_3726
unknown metabolite 'tethex3_r' created
HMR_3727
unk

HMR_7618
HMR_7619
unknown metabolite 'M00217_c' created
HMR_7622
unknown metabolite 'M00197_c' created
unknown metabolite 'M00199_c' created
HMR_7624
unknown metabolite 'M00215_c' created
unknown metabolite 'adprib_c' created
HMR_7626
unknown metabolite 'M02496_c' created
HOXG
HYAL
unknown metabolite 'ha_l' created
unknown metabolite 'glcur_l' created
HYALe
unknown metabolite 'ha_e' created
unknown metabolite 'acgam_e' created
unknown metabolite 'glcur_e' created
LARGEg
unknown metabolite 'nxylrbtlprbtlpgalnmser_g' created
unknown metabolite 'udpxyl_g' created
unknown metabolite 'adystroglycan_g' created
M16N4Tg_new1
unknown metabolite 'l5fn5m2masn_g' created
M16N4Tg_new2
unknown metabolite 'l6fn6m2masn_g' created
MNSERGALTg
MSERterg
unknown metabolite 'mser_g' created
NACHEX10ly
unknown metabolite 'ksi_deg7_l' created
NACHEX11ly
unknown metabolite 'ksi_deg10_l' created
NACHEX12ly
unknown metabolite 'ksi_deg13_l' created
NACHEX13ly
unknown metabolite 'ksi_deg16_l' created
NACHEX14ly
un

TRPHYDRO2
TRPO2
TRPOX
unknown metabolite 'indpyr_c' created
TRYPTAOX
TYMSULT
unknown metabolite 'tymsf_c' created
TYR3MO2
TYRASE
unknown metabolite 'melanin_c' created
TYRASE2
unknown metabolite '56iqcrbxlt_c' created
TYRCBOX
TYRDHINDOX
unknown metabolite 'ind56qn_c' created
TYRDOPO
TYRDOPO3
TYRDOPOX
TYROX
TYROXDAc
TYRTA
TYRTAm
unknown metabolite 'tyr_L_m' created
unknown metabolite '34hpp_m' created
UDPG4DOPA
unknown metabolite 'dopa4glcur_c' created
VANILPYRc
unknown metabolite 'vanilpyr_c' created
r0021
r0022
r0129
r0130
r0541
r0647m
10FTHF5GLUtl
unknown metabolite '10fthf5glu_c' created
unknown metabolite '10fthf5glu_l' created
10FTHF5GLUtm
unknown metabolite '10fthf5glu_m' created
10FTHF6GLUtl
unknown metabolite '10fthf6glu_c' created
unknown metabolite '10fthf6glu_l' created
10FTHF6GLUtm
unknown metabolite '10fthf6glu_m' created
10FTHF7GLUtl
unknown metabolite '10fthf7glu_c' created
unknown metabolite '10fthf7glu_l' created
10FTHF7GLUtm
unknown metabolite '10fthf7glu_m' created
1

ARGCYSGLYt
unknown metabolite 'argcysgly_e' created
unknown metabolite 'argcysgly_c' created
ARGCYSSERt
unknown metabolite 'argcysser_e' created
unknown metabolite 'argcysser_c' created
ARGGLUGLUt
unknown metabolite 'arggluglu_e' created
unknown metabolite 'arggluglu_c' created
ARGGLUPROt
unknown metabolite 'argglupro_e' created
unknown metabolite 'argglupro_c' created
ARGGLYGLYt
unknown metabolite 'argglygly_e' created
unknown metabolite 'argglygly_c' created
ARGHISTHRt
unknown metabolite 'arghisthr_e' created
unknown metabolite 'arghisthr_c' created
ARGLEUPHEt
unknown metabolite 'argleuphe_e' created
unknown metabolite 'argleuphe_c' created
ARGLYSASPt
unknown metabolite 'arglysasp_e' created
unknown metabolite 'arglysasp_c' created
ARGLYSex
unknown metabolite 'arg_L_e' created
unknown metabolite 'lys_L_e' created
ARGN1ASPMDte
unknown metabolite 'N1aspmd_e' created
ARGPHEARGt
unknown metabolite 'argphearg_e' created
unknown metabolite 'argphearg_c' created
ARGPROMETt
unknown metabolit

unknown metabolite 'cysglnmet_e' created
unknown metabolite 'cysglnmet_c' created
CYSGLNNaEx
CYSGLUHISt
unknown metabolite 'cysgluhis_e' created
unknown metabolite 'cysgluhis_c' created
CYSGLUTRPt
unknown metabolite 'cysglutrp_e' created
unknown metabolite 'cysglutrp_c' created
CYSGLUexR
CYSGLYex
CYSLEUTHRt
unknown metabolite 'cysleuthr_e' created
unknown metabolite 'cysleuthr_c' created
CYSPHELAT2tc
CYSSERMETt
unknown metabolite 'cyssermet_e' created
unknown metabolite 'cyssermet_c' created
CYSSERNaEx
CYSTGLUex
unknown metabolite 'Lcystin_e' created
CYSTHRNaEx
CYSTSERex
CYSTYRASNt
unknown metabolite 'cystyrasn_e' created
unknown metabolite 'cystyrasn_c' created
CYSt4
CYSt4rev
CYSt7l
CYStec
CYTDt
unknown metabolite 'cytd_e' created
unknown metabolite 'cytd_c' created
CYTDt2r
CYTDt4
CYTDt5
CYTDtl
unknown metabolite 'cytd_l' created
CYTDtm
unknown metabolite 'cytd_m' created
CYTDtn
unknown metabolite 'cytd_n' created
Coqe
unknown metabolite 'q10_e' created
unknown metabolite 'q10_c' crea

HMR_2900
unknown metabolite 'M01770_r' created
HMR_2901
HMR_2902
unknown metabolite 'M00022_r' created
HMR_2903
HMR_2904
unknown metabolite 'M00263_r' created
HMR_2905
HMR_2906
unknown metabolite 'M03046_c' created
unknown metabolite 'M03047_c' created
HMR_2907
unknown metabolite 'M03046_r' created
HMR_2908
HMR_2910
unknown metabolite 'lgnccrn_r' created
unknown metabolite 'lgnccrn_c' created
HMR_2911
HMR_2913
unknown metabolite 'nrvnccrn_c' created
unknown metabolite 'nrvnccrn_r' created
HMR_2914
HMR_2916
unknown metabolite 'hexccrn_r' created
unknown metabolite 'hexccrn_c' created
HMR_2917
HMR_2918
unknown metabolite 'M02111_c' created
unknown metabolite 'M02112_c' created
HMR_2919
unknown metabolite 'M02111_r' created
HMR_2920
HMR_2922
unknown metabolite 'tetpent3crn_c' created
unknown metabolite 'tetpent3crn_r' created
HMR_2923
HMR_2925
unknown metabolite 'tettet6crn_c' created
unknown metabolite 'tettet6crn_r' created
HMR_2926
HMR_2928
unknown metabolite 'tetpent6crn_c' created
un

TRPTHRTYRt
unknown metabolite 'trpthrtyr_e' created
unknown metabolite 'trpthrtyr_c' created
TRPTYRGLNt
unknown metabolite 'trptyrgln_e' created
unknown metabolite 'trptyrgln_c' created
TRPTYRTYRt
unknown metabolite 'trptyrtyr_e' created
TRPVALASPt
unknown metabolite 'trpvalasp_e' created
TRPt
TRPt4
TRPt7l
TRYPTAte
unknown metabolite 'trypta_e' created
TSTSTERONEGLCte
unknown metabolite 'tststeroneglc_c' created
unknown metabolite 'tststeroneglc_e' created
TSTSTERONEGLCtr
unknown metabolite 'tststeroneglc_r' created
TSTSTERONESte
unknown metabolite 'tststerones_c' created
unknown metabolite 'tststerones_e' created
TSTSTERONEt
unknown metabolite 'tststerone_e' created
unknown metabolite 'tststerone_c' created
TSTSTERONEtr
unknown metabolite 'tststerone_r' created
TSULt4_3
unknown metabolite 'tsul_e' created
TTCTECOAtr
TTDCAtr
unknown metabolite 'ttdca_e' created
unknown metabolite 'ttdca_c' created
TTDCPT2
unknown metabolite 'tdcoa_m' created
TUDCA3Sabc
unknown metabolite 'tudca3s_c' cr

r1526
r1527
unknown metabolite 'dca_c' created
unknown metabolite 'dca_e' created
r1528
r1529
r1530
r1531
r1532
unknown metabolite 'HC02200_c' created
unknown metabolite 'HC02200_e' created
r1533
r1536
r1540
r1544
r1546
r1547
r1548
r1549
r1551
r1552
r1553
r1554
r1556
r1557
r1559
r1560
r1561
r1562
r1563
r1564
r1565
r1566
r1567
r1568
r1569
r1570
r1571
r1573
r1574
r1575
r1576
r1578
r1579
r1580
r1581
r1583
r1584
r1585
r1586
r1587
r1588
r1589
r1590
r1591
r1592
r1593
r1594
r1595
r1596
r1597
r1598
r1599
r1600
r1602
r1603
r1604
r1605
r1606
r1607
r1608
r1609
r1610
r1611
r1612
r1613
r1614
r1615
r1616
r1617
r1618
r1619
r1620
r1621
r1622
r1623
r1624
r1625
r1626
r1627
r1628
r1629
r1630
r1631
r1632
r1633
r1634
r1635
r1636
r1637
r1638
r1639
r1640
r1641
r1642
r1643
r1644
r1645
r1646
r1647
r1648
r1649
r1650
r1651
r1652
r1653
r1654
r1655
r1656
r1657
r1658
r1659
r1660
r1661
r1662
r1664
r1665
r1666
r1667
r1668
r1669
r1670
r1671
r1672
r1673
r1674
r1675
r1676
r1677
r1678
r1679
r1680
r1681
r1682
r1683
r1684


RE2382R
unknown metabolite 'CE5655_r' created
RE2383C
unknown metabolite 'CE5842_c' created
unknown metabolite 'CE5843_c' created
RE2383R
unknown metabolite 'CE5842_r' created
unknown metabolite 'CE5843_r' created
RE2387C
RE2387R
RE2398C
unknown metabolite 'CE4898_c' created
RE2398R
unknown metabolite 'CE4898_r' created
RE2452C
unknown metabolite 'CE5016_c' created
RE2493C
unknown metabolite 'C06453_c' created
RE2533C
unknown metabolite 'CE5643_c' created
unknown metabolite 'CE6000_c' created
RE2651R
RE2655R
RE2658C
RE2658R
RE2659R
RE2705C
unknown metabolite 'CE7047_c' created
unknown metabolite 'CE5986_c' created
RE2899C
unknown metabolite 'C02147_c' created
RE2948C
RE2975C
unknown metabolite 'CE2207_c' created
RE2975M
unknown metabolite 'CE2207_m' created
RE3050R
RE3144C
unknown metabolite 'CE6027_c' created
RE3144M
unknown metabolite 'CE6027_m' created
RE3295C
unknown metabolite 'CE7101_c' created
unknown metabolite 'CE6219_c' created
RE3347C
RE3370C
unknown metabolite 'CE7144_c' cr

r0121
unknown metabolite 'fapntp_n' created
DHFR2i
r0398
r0399
r0403
r0512
r0514
r0522
unknown metabolite '5fthf_m' created
r0523
r0708
unknown metabolite 'HC01710_c' created
r0709
unknown metabolite 'HC01710_n' created
r0776
unknown metabolite 'HC01652_n' created
r0777
unknown metabolite 'HC01652_c' created
r0778
r0792
3HAD100
unknown metabolite '3hdecACP_c' created
unknown metabolite 'tdec2eACP_c' created
3HAD100m
unknown metabolite 'dc2coa_m' created
unknown metabolite '3hdcoa_m' created
3HAD102n3m
unknown metabolite '2decdicoa_m' created
unknown metabolite '3hoc101_7Zcoa_m' created
3HAD102n3p
unknown metabolite '2decdicoa_x' created
unknown metabolite '3hoc101_7Zcoa_x' created
3HAD110m
unknown metabolite 'M00071_m' created
unknown metabolite '3houndcoa_m' created
3HAD120
unknown metabolite '3hddecACP_c' created
unknown metabolite 'tddec2eACP_c' created
3HAD120m
unknown metabolite 'dd2coa_m' created
unknown metabolite '3hddcoa_m' created
3HAD121n6em
unknown metabolite 'tddedi2coa_m'

ACACT90m
unknown metabolite 'hepcoa_m' created
ACACT9p
unknown metabolite '3ohxccoa_x' created
ACACTDMNm
unknown metabolite 'dmhptcoa_m' created
ACACTMPm
unknown metabolite 'ibcoa_m' created
ACACTTMTDp
unknown metabolite 'tmhndccoa_x' created
ACCOAC
unknown metabolite 'malcoa_c' created
ACCOACm
unknown metabolite 'malcoa_m' created
ACCOAL2
unknown metabolite '5apade_c' created
ACMAT1
unknown metabolite 'acACP_c' created
ACMAT1m
unknown metabolite 'acACP_m' created
OCD11CRNCPT2_1
unknown metabolite 'ocdececrn_m' created
ACOAD121n5m
unknown metabolite 'CE2596_m' created
ACOAD130m
ACOAD140m
ACOAD140p
ACOAD141n7m
ACOAD142n6em
ACOAD142n6p
unknown metabolite 'tetdecdicoa_x' created
unknown metabolite 'CE2432_x' created
ACOAD143n3m
ACOAD143n3p
ACOAD161n9m
ACOAD162n6em
ACOAD162n6p
unknown metabolite 'CE0849_x' created
unknown metabolite 'CE2433_x' created
ACOAD163n3m
ACOAD163n3p
ACOAD163n6m
unknown metabolite 'hexdectecoa_m' created
ACOAD163n6p
unknown metabolite 'CE2440_x' created
unknown met

FAE2042
unknown metabolite '3oecstcoa_r' created
FAE220
unknown metabolite 'CE2250_r' created
FAE221
unknown metabolite 'ecscoa_r' created
unknown metabolite 'CE5152_r' created
FAE223
unknown metabolite '3odocastricoa_r' created
FAE224
unknown metabolite 'CE5156_r' created
FAE225
unknown metabolite 'CE4819_r' created
FAE240
unknown metabolite 'CE2253_r' created
FAE241
unknown metabolite '3odttcsecoa_r' created
FAE244
unknown metabolite 'CE4834_r' created
FAE245
unknown metabolite '3otcpcoa_r' created
FAE260
unknown metabolite '3ohexcoa_r' created
FAEL183
FAEL184
FAEL204
FAEL205
FAOXC10080m
FAOXC10080x
FAOXC101C102m
unknown metabolite 'dece4coa_m' created
unknown metabolite 'dec24dicoa_m' created
FAOXC101C102x
unknown metabolite 'dece4coa_x' created
unknown metabolite 'dec24dicoa_x' created
FAOXC101C8m
FAOXC101C8x
FAOXC101_3Em
FAOXC101_4Em
FAOXC101_4Zm
FAOXC101_4Zx
FAOXC101m
FAOXC101x
FAOXC102C101m
FAOXC102C101x
FAOXC102C103m
unknown metabolite 'dectricoa_m' created
FAOXC102C103x
unknow

HMR_0940
unknown metabolite 'M00430_c' created
HMR_0942
unknown metabolite 'C14768_c' created
HMR_0945
unknown metabolite 'M01051_c' created
HMR_0946
unknown metabolite 'C14769_c' created
HMR_0949
unknown metabolite 'M01208_c' created
HMR_0950
unknown metabolite 'C14770_c' created
HMR_0953
unknown metabolite 'M00277_c' created
HMR_0954
unknown metabolite 'C14771_c' created
HMR_0957
unknown metabolite 'M00364_c' created
HMR_0960
unknown metabolite '5HPET_c' created
unknown metabolite 'C04805_c' created
HMR_0963
unknown metabolite '5HPET_m' created
unknown metabolite 'C04805_m' created
HMR_0980
unknown metabolite 'CE2084_c' created
HMR_0985
unknown metabolite 'arachd_n' created
unknown metabolite '15HPET_n' created
HMR_0986
unknown metabolite '15HPET_r' created
HMR_0987
unknown metabolite '15HPET_x' created
HMR_0988
unknown metabolite 'M00377_c' created
HMR_0991
unknown metabolite 'M00390_c' created
HMR_0992
unknown metabolite 'M00276_c' created
HMR_0993
unknown metabolite 'M00296_c' cre

RE3167C
unknown metabolite 'CE4846_c' created
RE3167R
unknown metabolite 'CE4846_r' created
RE3168C
RE3168R
RE3169C
unknown metabolite 'CE4852_c' created
RE3169R
unknown metabolite 'CE4852_r' created
RE3170C
unknown metabolite 'CE4850_c' created
RE3170R
unknown metabolite 'CE4850_r' created
RE3171C
unknown metabolite 'CE4848_c' created
RE3171R
unknown metabolite 'CE4848_r' created
RE3172C
RE3172R
RE3173C
unknown metabolite 'CE4853_c' created
RE3173R
unknown metabolite 'CE4853_r' created
RE3174C
unknown metabolite 'CE4851_c' created
RE3174R
unknown metabolite 'CE4851_r' created
RE3175C
RE3175R
RE3176C
RE3176R
RE3177M
unknown metabolite 'CE4801_m' created
unknown metabolite 'CE4803_m' created
RE3178M
unknown metabolite 'CE4800_m' created
unknown metabolite 'CE4796_m' created
RE3179M
unknown metabolite 'CE4802_m' created
unknown metabolite 'CE4804_m' created
RE3184M
RE3185M
unknown metabolite 'CE4798_m' created
RE3186M
unknown metabolite 'CE4808_m' created
RE3189M
unknown metabolite 'CE47

RE1818X
unknown metabolite 'CE4990_x' created
unknown metabolite 'CE5531_x' created
RE1819C
unknown metabolite 'CE4993_c' created
RE1819M
unknown metabolite 'CE4993_m' created
RE1819X
unknown metabolite 'CE4993_x' created
RE1978C
unknown metabolite 'C11695_c' created
unknown metabolite 'etha_c' created
RE2050C
unknown metabolite 'CE5700_c' created
RE2050R
unknown metabolite 'C13856_r' created
unknown metabolite 'CE5700_r' created
RE2051C
unknown metabolite 'CE3481_c' created
RE2051G
unknown metabolite 'CE3481_g' created
unknown metabolite 'C13856_g' created
RE2051R
unknown metabolite 'CE3481_r' created
RE2067C
unknown metabolite 'CE5726_c' created
RE2068C
unknown metabolite 'CE5727_c' created
RE2069C
RE2070C
unknown metabolite 'CE5730_c' created
RE2078M
unknown metabolite 'prostge2_m' created
RE2079R
unknown metabolite 'C05957_r' created
RE2080C
unknown metabolite 'C05957_c' created
RE2360C
unknown metabolite 'CE4936_c' created
RE2360N
unknown metabolite 'CE4936_n' created
RE2958C
unkn

GALTg
GAMYe
unknown metabolite 'glygn5_e' created
unknown metabolite 'Tyr_ggn_e' created
GBPS
unknown metabolite 'g16bp_c' created
GF6PTA
GFUCS
unknown metabolite 'gdpddman_c' created
GGNG
unknown metabolite 'Tyr_ggn_c' created
unknown metabolite 'ggn_c' created
GLBRAN
unknown metabolite 'glygn1_c' created
GLCAASE1ly
unknown metabolite 'hs_deg9_l' created
unknown metabolite 'hs_deg10_l' created
GLCAASE4ly
unknown metabolite 'cs_a_deg2_l' created
unknown metabolite 'cs_a_deg3_l' created
GLCAASE5ly
unknown metabolite 'cs_c_deg2_l' created
unknown metabolite 'cs_c_deg3_l' created
GLCAASE6ly
unknown metabolite 'cs_d_deg3_l' created
unknown metabolite 'cs_d_deg4_l' created
GLCAASE7ly
unknown metabolite 'cs_e_deg3_l' created
unknown metabolite 'cs_e_deg4_l' created
GLCAASE8ly
unknown metabolite 'ha_deg1_l' created
GLCAASE9ly
unknown metabolite 'ha_pre1_l' created
GLCAE1g
unknown metabolite 'cs_b_pre4_g' created
GLCAE2g
unknown metabolite 'hs_pre10_g' created
unknown metabolite 'hs_pre11_g' c

HSD3B13r
unknown metabolite '17ahprgnlone_r' created
unknown metabolite '17ahprgstrn_r' created
HSD3B2r
HSD3B3r
HYPTROX
LCYSTCBOXL
LEUKTNAA
unknown metabolite 'acleuktrE4_c' created
LTB4DH
P45011A1m
P45011B11m
P45011B12m
P45011B21m
P45017A1r
P45017A2r
P45017A3r
P45017A4r
P45019A1r
P45019A2r
P4501B1r
P45021A1r
P45021A2r
P4503A43r
P4503A7r
PRGNLONESULT
PROSTA1I
PROSTC1I
PROSTE1S
PROSTE2S
PROSTE3S
unknown metabolite 'pgh3_c' created
PROSTGA2S
PRSOTGEPS
RE1096C
RE1096M
unknown metabolite '17ahprgstrn_m' created
unknown metabolite 'andrstndn_m' created
RE1099C
RE1099G
unknown metabolite 'CE1352_g' created
unknown metabolite '17ahprgnlone_g' created
RE1099L
unknown metabolite 'CE1352_l' created
unknown metabolite '17ahprgnlone_l' created
RE1099R
unknown metabolite 'CE1352_r' created
RE1100G
unknown metabolite 'chsterols_g' created
RE1100L
unknown metabolite 'chsterols_l' created
RE1134C
RE1134M
unknown metabolite '17ahprgnlone_m' created
unknown metabolite 'dhea_m' created
RE1135G
unknown me

HMR_0654
unknown metabolite 'C04308_c' created
HMR_0665
unknown metabolite 'HC02063_c' created
HMR_0667
unknown metabolite 'HC02064_c' created
unknown metabolite 'HC02071_c' created
HMR_0668
unknown metabolite 'HC02069_c' created
unknown metabolite 'HC02075_c' created
HMR_0669
unknown metabolite 'HC02072_c' created
HMR_0670
unknown metabolite 'HC02073_c' created
HMR_0671
unknown metabolite 'HC02068_c' created
unknown metabolite 'HC02074_c' created
HMR_0672
unknown metabolite 'HC02070_c' created
unknown metabolite 'HC02076_c' created
HMR_0673
HMR_0674
HMR_0675
HMR_0676
HMR_0677
HMR_0678
HMR_0679
HMR_0680
HMR_0681
HMR_0682
HMR_0683
HMR_0684
HMR_0703
unknown metabolite 'M02114_x' created
HMR_0705
unknown metabolite 'M00550_x' created
HMR_0706
unknown metabolite 'M02017_x' created
HMR_0708
unknown metabolite 'M02017_c' created
unknown metabolite 'M00532_c' created
HMR_5254
unknown metabolite 'pchol_cho_l' created
unknown metabolite 'Rtotal2_l' created
unknown metabolite 'lpchol_cho_l' crea

HMR_9498
unknown metabolite 'M00200_c' created
unknown metabolite 'M01719_c' created
unknown metabolite 'M00216_c' created
unknown metabolite 'M01720_c' created
HMR_9499
unknown metabolite 'M02918_c' created
HMR_9500
unknown metabolite 'M00203_c' created
unknown metabolite 'M00202_c' created
HMR_9501
unknown metabolite 'focytC_c' created
unknown metabolite 'ficytC_c' created
HMR_9502
unknown metabolite 'M02669_c' created
unknown metabolite 'CE5799_c' created
unknown metabolite 'CE5800_c' created
HMR_9503
unknown metabolite 'M02743_c' created
HMR_9504
unknown metabolite 'M01166_c' created
unknown metabolite 'M02989_c' created
HMR_9505
unknown metabolite 'M01676_c' created
unknown metabolite 'M01317_c' created
HMR_9506
unknown metabolite 'M00915_c' created
unknown metabolite 'M02755_c' created
HMR_9507
unknown metabolite 'M01129_c' created
HMR_9508
unknown metabolite 'M00096_c' created
unknown metabolite 'M01864_c' created
HMR_9509
unknown metabolite 'M02438_c' created
HMR_9510
unknown m

unknown metabolite '34dhoxmand_m' created
r0753
r0754
unknown metabolite '3mox4hpac_m' created
unknown metabolite 'homoval_m' created
r0755
r0756
unknown metabolite '3m4hpga_m' created
unknown metabolite '3mox4hoxm_m' created
r0757
r0774
unknown metabolite 'HC01609_e' created
unknown metabolite 'cpppg1_e' created
r0775
r0779
unknown metabolite 'HC01223_m' created
unknown metabolite '3hmp_m' created
r0800
unknown metabolite 'HC01797_c' created
r1135
unknown metabolite 'HC02110_r' created
r1164
unknown metabolite 'HC02020_r' created
r1165
unknown metabolite 'HC02021_r' created
r1166
unknown metabolite 'HC02022_r' created
r1167
r1168
unknown metabolite 'HC02024_r' created
r1169
unknown metabolite 'HC02025_r' created
r1170
unknown metabolite 'HC02026_r' created
r1171
unknown metabolite 'HC02027_r' created
r1172
r1173
unknown metabolite 'HC02020_l' created
r1174
r1175
r1176
unknown metabolite 'HC02022_l' created
unknown metabolite 'ocdca_l' created
r1177
r1178
unknown metabolite 'HC02023_l'

HMR_1685
unknown metabolite 'xol7ah2al_m' created
HMR_1689
unknown metabolite 'CE0233_m' created
unknown metabolite 'CE5133_m' created
HMR_1691
HMR_1703
unknown metabolite 'M00617_x' created
HMR_1704
HMR_1706
HMR_1708
unknown metabolite 'M00754_x' created
HMR_1710
HMR_1730
HMR_1735
unknown metabolite 'xol24oh_c' created
HMR_1737
HMR_1738
unknown metabolite 'M00978_c' created
HMR_1739
unknown metabolite 'M00976_c' created
HMR_1740
unknown metabolite 'M01081_c' created
HMR_1741
HMR_1742
unknown metabolite 'M01077_c' created
HMR_1743
unknown metabolite 'M01080_c' created
HMR_1746
unknown metabolite 'M01077_m' created
unknown metabolite 'M01076_m' created
HMR_1747
unknown metabolite 'M01080_m' created
unknown metabolite 'M01079_m' created
HMR_1748
unknown metabolite 'M00746_m' created
HMR_1749
unknown metabolite 'M00753_m' created
HMR_1750
unknown metabolite 'M02977_m' created
HMR_1751
unknown metabolite 'M00742_m' created
HMR_1754
unknown metabolite 'M02977_c' created
unknown metabolite '

GLUTRSm
GLYTRS
GapFill-R01100
GapFill-R03511
unknown metabolite 'vke_c' created
GapFill-R03595
unknown metabolite 'seln_c' created
unknown metabolite 'selnp_c' created
GapFill-R03599
unknown metabolite 'selcys_c' created
GapFill-R04941
unknown metabolite 'selcyst_c' created
GapFill-R05830
unknown metabolite '2hdvk_c' created
HACD9m
HIBDm
HISTRS
HISTRSm
HMR_4241
unknown metabolite 'Nmelys_n' created
unknown metabolite 'M02127_n' created
HMR_5130
HMR_5144
unknown metabolite 'fmettrna_c' created
HMR_5146
HMR_5149
HMR_6975
unknown metabolite 'M00213_c' created
HMR_6976
unknown metabolite 'Ndmelys_c' created
HMR_6977
unknown metabolite 'Ntmelys_c' created
HMR_6978
HMR_6983
HMR_6984
unknown metabolite 'M01036_c' created
HMR_6985
unknown metabolite 'M00232_c' created
HMR_7131
unknown metabolite 'selni_c' created
HMR_7137
unknown metabolite 'M02469_c' created
HMR_7141
unknown metabolite 'M02895_c' created
HMR_7620
unknown metabolite 'M02893_c' created
HMR_8761
HMR_8762
HOLYSK
ILEOX
ILETA
ILETA

SEBACACT
SEBACIDTD
unknown metabolite 'sebacid_x' created
SEBCOACROT
SEBCOAPET
SERATB0tc
STCOATxc
SUBEACTD
unknown metabolite 'subeac_c' created
SUBERCACT
unknown metabolite 'c8dc_x' created
SUBERCROT
SUBERICACT
SUCCCROT
SUCCOAPET
TAUPAT1c
TDCRNe
unknown metabolite 'ttdcrn_e' created
THMATPe
THR3D
THRATB0tc
TIGCRNe
unknown metabolite 'c51crn_e' created
TRPATB0tc
TTDCAFATPc
TYRATB0tc
VALATB0tc
VITEtl
VITKtl
peplys_synthesis
r1092
unknown metabolite 'HC00001_c' created
r1093
unknown metabolite 'HC00002_c' created
r1094
unknown metabolite 'HC00003_c' created
r1095
unknown metabolite 'HC00005_c' created
r1096
r1097
unknown metabolite 'HC02222_m' created
r1098
unknown metabolite 'HC00006_c' created
r1099
unknown metabolite 'HC00007_c' created
r1100
unknown metabolite 'HC00008_c' created
r1101
unknown metabolite 'HC01852_c' created
r1103
unknown metabolite 'HC01942_c' created
r1104
unknown metabolite 'HC01943_c' created
r1105
unknown metabolite 'HC01944_c' created
r1112
unknown metabolite 'H

In [5]:
duplicated_reactions = duplicated_reactions(model)
duplicated_reactions

  c /= stddev[:, None]
  c /= stddev[None, :]


array([], shape=(0, 2), dtype=int64)

### 2.2 Fix duplicated reactions in the dataset from the list duplicated_reactions obtained above
Duplicated reactions are itereated in a for loop and mapped in the original dataset. A request is made in the 
BiGG database http://bigg.ucsd.edu with each of the duplicated reactions. If any of the duplicated reactions is in BiGG, the other reaction automatically changes its name to the one located in Bigg. This way we unifiy the names of our reactions to those in BiGG.

In [None]:
i = 0
for rxn in tqdm(duplicated_reactions):
    
    session = HTMLSession()
    
    response=session.get('http://bigg.ucsd.edu/models/iCHOv1/reactions/'+reactions['Reaction'][rxn[0]])
    if response.status_code == 200:
        if reactions['Reaction'][rxn[0]] == rxns_attributes['Reaction'][rxn[0]] and reactions.loc[rxn[1], 'Reaction'] == rxns_attributes.loc[rxn[1], 'Reaction']:
            print(f'1 Reaction {reactions.iloc[rxn[1],2]} changed for {reactions.iloc[rxn[0],2]} present in CHOv1 model')
            reactions.loc[rxn[1], 'Reaction'] = reactions['Reaction'][rxn[0]]
            rxns_attributes.loc[rxn[1], 'Reaction'] = rxns_attributes['Reaction'][rxn[0]]
            i += 1
    
    elif response.status_code != 200:
        response=session.get('http://bigg.ucsd.edu/models/iCHOv1/reactions/'+reactions['Reaction'][rxn[1]])
        if response.status_code == 200:
            if reactions['Reaction'][rxn[0]] == rxns_attributes['Reaction'][rxn[0]] and reactions.loc[rxn[1], 'Reaction'] == rxns_attributes.loc[rxn[1], 'Reaction']:
                print(f'2 Reaction {reactions.iloc[rxn[0],2]} changed for {reactions.iloc[rxn[1],2]} present in CHOv1 model')
                reactions.loc[rxn[0], 'Reaction'] = reactions['Reaction'][rxn[1]]
                rxns_attributes.loc[rxn[0], 'Reaction'] = rxns_attributes['Reaction'][rxn[1]]
                i += 1
            
        elif response.status_code != 200:
            response=session.get('http://bigg.ucsd.edu/universal/reactions/'+reactions['Reaction'][rxn[0]])
            if response.status_code == 200:
                if reactions['Reaction'][rxn[0]] == rxns_attributes['Reaction'][rxn[0]] and reactions.loc[rxn[1], 'Reaction'] == rxns_attributes.loc[rxn[1], 'Reaction']:
                    print(f'3 Reaction {reactions.iloc[rxn[1],2]} changed for {reactions.iloc[rxn[0],2]} present BiGG database')
                    reactions.loc[rxn[1], 'Reaction'] = reactions['Reaction'][rxn[0]]
                    rxns_attributes.loc[rxn[1], 'Reaction'] = rxns_attributes['Reaction'][rxn[0]]
                    i += 1
                
            elif response.status_code != 200:
                response=session.get('http://bigg.ucsd.edu/universal/reactions/'+reactions['Reaction'][rxn[1]])
                if response.status_code == 200:
                    if reactions['Reaction'][rxn[0]] == rxns_attributes['Reaction'][rxn[0]] and reactions.loc[rxn[1], 'Reaction'] == rxns_attributes.loc[rxn[1], 'Reaction']:
                        print(f'4 Reaction {reactions.iloc[rxn[0],2]} changed for {reactions.iloc[rxn[1],2]} present BiGG database')
                        reactions.loc[rxn[0], 'Reaction'] = reactions['Reaction'][rxn[1]]
                        rxns_attributes.loc[rxn[0], 'Reaction'] = rxns_attributes['Reaction'][rxn[1]]
                        i += 1
                elif response.status_code != 200:
                    if reactions['Reaction'][rxn[0]] == rxns_attributes['Reaction'][rxn[0]] and reactions.loc[rxn[1], 'Reaction'] == rxns_attributes.loc[rxn[1], 'Reaction']:
                        print(f'5 Reaction {reactions.iloc[rxn[1],2]} changed for {reactions.iloc[rxn[0],2]} not present in Bigg DB')
                        reactions.loc[rxn[1], 'Reaction'] = reactions['Reaction'][rxn[0]]
                        rxns_attributes.loc[rxn[1], 'Reaction'] = rxns_attributes['Reaction'][rxn[0]]
                        i += 1

print(len(duplicated_reactions))
print(i)

In [None]:
# Store the original column order
column_order_rxns = reactions.columns.tolist()
column_order_att = rxns_attributes.columns.tolist()

# Group by 'BiGG ID' and keep the first non-null value in each group, then reset the index
reactions = reactions.groupby('Reaction').first().reset_index()
rxns_attributes = rxns_attributes.groupby('Reaction').first().reset_index()

# Rearrange the columns to the original order
reactions = reactions[column_order_rxns]
rxns_attributes = rxns_attributes[column_order_att]

In [None]:
reactions

In [None]:
##############################################################
#### ---------------------------------------------------- ####
#### ---- Update Rxns and  Attributes Google Sheets ----- ####
#### ---------------------------------------------------- ####
##############################################################
sheet.update_google_sheet(sheet_rxns, reactions)
sheet.update_google_sheet(sheet_attributes, rxns_attributes)
print("Google Sheet updated.")

In [6]:
# Check that Reactions IDs and formulas are equal in "Rxns" and "Attributes" sheets
reactions = sheet.read_google_sheet(sheet_rxns)
rxns_attributes = sheet.read_google_sheet(sheet_attributes)

rxnsIDseq = list(reactions['Reaction']) == list(rxns_attributes['Reaction'])
if rxnsIDseq:
    print('Reaction IDs in the Rxns and Attributes Sheets are equal\n')
else:
    rxns_sheet_ids = set(list(reactions['Reaction']))
    attr_sheet_ids = set(list(rxns_attributes['Reaction']))
    print(f'Reaction IDs that are in Rxns Sheet and not in Attributes Sheet {rxns_sheet_ids - attr_sheet_ids}\n')
    print(f'Reaction IDs that are in Attributes Sheet and not in Rxns Sheet {attr_sheet_ids - rxns_sheet_ids}\n')

rxnsforseq = list(reactions['Reaction Formula']) == list(rxns_attributes['Reaction Formula'])
if rxnsforseq:
    print('Reaction Formulas in the Rxns and Attributes Sheets are equal')
else:
    rxns_sheet_forms = set(list(reactions['Reaction Formula']))
    attr_sheet_forms = set(list(rxns_attributes['Reaction Formula']))
    print(f'Reaction formulas that are in Rxns Sheet and not in Attributes Sheet {rxns_sheet_forms - attr_sheet_forms}\n')
    print(f'Reaction formulas that are in Attributes Sheet and not in Rxns Sheet {attr_sheet_forms - rxns_sheet_forms}\n')

Reaction IDs that are in Rxns Sheet and not in Attributes Sheet set()

Reaction IDs that are in Attributes Sheet and not in Rxns Sheet set()

Reaction formulas that are in Rxns Sheet and not in Attributes Sheet set()

Reaction formulas that are in Attributes Sheet and not in Rxns Sheet set()



## 3. Add Database links and Pre-checks to "Rxns" and "Attributes" Sheet <a id='bigg'></a>
This section of the notebook is design to add extra features in the dataset without comprimising previous data.

In [None]:
# Libraries import
import time
import requests
import pandas as pd
# from requests_html import HTMLSession
from bs4 import BeautifulSoup

from utils import duplicated_reactions
from google_sheet import GoogleSheet

In [None]:
# Generate the necessary datasets
KEY_FILE_PATH = 'credentials.json'
SPREADSHEET_ID_v3 = '1MlBXeHIKw8k8fZyXm-sN__AHTRSunJxar_-bqvukZws'

# Initialize the GoogleSheet object
sheet = GoogleSheet(SPREADSHEET_ID_v3, KEY_FILE_PATH)

# Read data from the Google Sheet
sheet_rxns = 'Rxns'
sheet_attributes = 'Attributes'

reactions = sheet.read_google_sheet(sheet_rxns)
rxns_attributes = sheet.read_google_sheet(sheet_attributes)

### 3.1 Add BiGG and EBI links to the "Rxns" and "Attributes" Sheets
The functions created **fetch_url** and **extract_ec_numbers** check and retrieve information from the BiGG database. If the queried reaction is in BiGG, the link to the reaction is added to the **BiGG database** column of the **"Rxns" Sheet"**. If the reaction also has an EC Number in the BiGG page, the **extract_ec_numbers** function retrieves this information to be added to the **EC Number** column of the **"Attributes" Sheet**.

In [None]:
# Define functions

def fetch_url(url, max_retries=5):
    """
    Fetches a given URL and attempts to extract EC (Enzyme Commission)
    numbers associated with reactions from the response.
    
    Parameters:
    - url (str): The URL to fetch.
    - max_retries (int, optional): The maximum number of retries if the request fails. Default is 5.

    Returns:
    - response (requests.models.Response or None): The server's response to the request.
    Returns None if the request fails.
    - ec_numbers (tuple or None): A tuple containing extracted EC numbers as strings and their corresponding links.
    Returns None if no EC numbers are found or if the request fails.

    Note:
    If the server responds with a status code other than 200, both the response and ec_numbers will be set to None.
    """
        
    session = HTMLSession()
    retries = 0
    while retries < max_retries:
        try:
            response = session.get(url)
            if response.status_code == 200:
                ec_numbers = extract_ec_numbers(response)
                return response,ec_numbers
            elif response.status_code != 200:
                response = None
                ec_numbers = None
                return response,ec_numbers
            else:
                retries += 1
        except requests.exceptions.RequestException as e:
            print(f"Error occurred: {e}, retrying...")
            retries += 1
            time.sleep(2)  # wait for 2 seconds before next retry
            response = None
            ec_numbers = None
    return response,ec_numbers

def extract_ec_numbers(response):
    """
    Extracts EC (Enzyme Commission) numbers and their corresponding links from the HTML content
    of a given server response.
    
    Parameters:
    - response (requests.models.Response): The server's response containing the HTML content.

    Returns:
    - ec_numbers_string (str or None): A string containing the extracted EC numbers separated by commas.
    Returns None if no EC numbers are found.
    - ec_links_string (str or None): A string containing the links associated with the extracted EC numbers,
    separated by commas. Returns None if no EC numbers are found.

    Note:
    If the provided response does not contain any EC numbers, this function will print a message
    stating 'No EC Number associated with this rxns in BiGG' and return None for both the EC numbers
    and their links.
    
    """
    
    # Initialize BeautifulSoup with the response content
    soup = BeautifulSoup(response.content, 'html.parser')

    # Find all elements containing EC Numbers
    ec_number_elements = soup.find_all('a', href=lambda x: x and 'ec-code' in x)
    
    if len(ec_number_elements) == 0:
        print('No EC Number asociated with this rxns in BiGG')
        return None
    
    elif len(ec_number_elements) > 0:
        # Create a list to hold the EC Numbers and their links
        ec_numbers = []
        ec_links = []

        for element in ec_number_elements:
            # Extract the EC Number from the text of the element
            ec_number = element.text.strip()
            ec_numbers.append(ec_number)
            # Extract the link from the href attribute of the element
            link = element['href']
            ec_links.append(link)
        
        ec_numbers_string = ', '.join(ec_numbers)
        ec_links_string = ', '.join(ec_links)

        return ec_numbers_string, ec_links_string

In [None]:
# Create df copies to be compared before the update of the dataset
og_reactions = reactions.copy()
og_rxns_attributes = rxns_attributes.copy()

In [None]:
in_chov1=0
in_bigg=0
not_bigg=0

for i,row in reactions.iterrows():
    rxn=row['Reaction']
    response_esp, ec_data_esp = fetch_url('http://bigg.ucsd.edu/models/iCHOv1/reactions/'+rxn)
    
    if response_esp is not None:
        print(rxn,'in CHOv1')
        reactions.loc[i, 'BiGG database'] = 'http://bigg.ucsd.edu/models/iCHOv1/reactions/'+rxn
        if rxns_attributes.loc[i,'EC Number']=='':
            if ec_data_esp:
                rxns_attributes.loc[i,'EC Number'] = ec_data_esp[0]
                rxns_attributes.loc[i,'EC Number Link'] = ec_data_esp[1]
                print(f'{rxn} EC Number found: {ec_data_esp[0]}, {ec_data_esp[1]}')
        in_chov1+=1
        
    else:
        response_gen, ec_data_gen = fetch_url('http://bigg.ucsd.edu/universal/reactions/'+rxn)
        if response_gen is not None:
            print(rxn,'in Bigg')
            reactions.loc[i, 'BiGG database'] = 'http://bigg.ucsd.edu/universal/reactions/'+rxn
            if rxns_attributes.loc[i,'EC Number']=='':
                if ec_data_gen:
                    rxns_attributes.loc[i,'EC Number'] = ec_data_gen[0]
                    rxns_attributes.loc[i,'EC Number Link'] = ec_data_gen[1]
                    print(f'{rxn} EC Number found: {ec_data_gen[0]}, {ec_data_gen[1]}')
            in_bigg+=1
        else:
            print(rxn,'not in Bigg')
            not_bigg+=1


In [None]:
print(f'Reactions from CHOv1 reconstruction {in_chov1}')
print(f'Reactions in BiGG database {in_bigg} [not including those in CHOv1)')
print(f'Reactions NOT in BiGG {not_bigg}')

In [None]:
############################################################
#### -------------------------------------------------- ####
#### ---- Update Rxns and Attributes Google Sheet ----- ####
#### -------------------------------------------------- ####
############################################################
if not og_reactions.equals(reactions):
    sheet.update_google_sheet(sheet_rxns, reactions)
    print("Rxns Sheet updated")
if not og_rxns_attributes.equals(rxns_attributes):
    sheet.update_google_sheet(sheet_attributes, rxns_attributes)
    print("Attributes Sheet updated.")

### 3.2 Add a "Pre-check" tag to the Rxns Sheet
Here we add a "Pre-check" tag to those reactions that don't have a GPR annotation in any of the previous reconstructions. The aim of this is to facilitate the manual curation effort by providing a tag to those reaction that are relatively difficult to curate.

In [None]:
# Create df copies to be compared before the update of the dataset
og_reactions = reactions.copy()

In [None]:
# Update reactions df
counter = 0
for i,row in reactions.iterrows():
    curated = row['Curated']
    gpr_hef = row['GPR_hef']
    gpr_fou = row['GPR_fou']
    gpr_yeo = row['GPR_yeo']
    gpr_recon = row['GPR_Recon3D']
    gpr_final = row['GPR_final']
    if (curated == '') and (gpr_hef == '') and (gpr_fou == '') and (gpr_yeo == '') and (gpr_recon == '') and (gpr_final == ''):
        reactions.loc[i,'Curated'] = 'Pre-check'
        reactions.loc[i,'Conf. Score'] = '1'
        counter+=1
        
print(f'Total number of Pre-check reactions:{counter}')

In [None]:
#############################################
#### ----------------------------------- ####
#### ---- Update Rxns Google Sheet ----- ####
#### ----------------------------------- ####
#############################################

if not og_reactions.equals(reactions): # checks if there has been any update on the original dataset
    sheet.update_google_sheet(sheet_rxns, reactions)
    print("Rxns Sheet updated")

In [None]:
# Load df containing all the reactions and a "x" mark indicating from which CHO recon each reaction comes from
all_dfs3 = pd.read_excel('../Data/Reconciliation/datasets/all_dfs3.xlsx')
if 'Unnamed: 0' in all_dfs3.columns:
    all_dfs3 = all_dfs3.drop('Unnamed: 0', axis=1)

In [None]:
# Load df containing all the recon3d reactions
recon3d = pd.read_excel('../Data/Reconciliation/datasets/recon3D_all_reactions.xlsx')
if 'Unnamed: 0' in recon3d.columns:
    recon3d = recon3d.drop('Unnamed: 0', axis=1)

In [None]:
# Generate listS with all the reactions present in the three CHO reconstructions
rxns_list = []
rxns_hef_list = []
rxns_fou_list = []
rxns_yeo_list = []
for i,row in all_dfs3.iterrows():
    if row['hef']=='X' and row['fou']=='X' and row['yeo']=='X':
        rxns_list.append(row['Reaction'])
    if row['hef']=='X':
        rxns_hef_list.append(row['Reaction'])
    if row['fou']=='X':
        rxns_fou_list.append(row['Reaction'])
    if row['yeo']=='X':
        rxns_yeo_list.append(row['Reaction'])

In [None]:
# Generate a list with all the reactions present in the three CHO reconstructions
rxns_recon3d_list = list(recon3d['m_reaction'])

In [None]:
# Update reactions df
counter = 0
for i,row in reactions.iterrows():
    rxn=row['Reaction']
    curated = row['Curated']
    gpr_hef = row['GPR_hef']
    gpr_fou = row['GPR_fou']
    gpr_yeo = row['GPR_yeo']
    gpr_recon = row['GPR_Recon3D']
    gpr_final = row['GPR_final']
    
    if (rxn in rxns_hef_list):
        reactions.loc[i,'iCHO1766'] = 'X'
    else:
        reactions.loc[i,'iCHO1766'] = '-'
    if (rxn in rxns_fou_list):
        reactions.loc[i,'iCH02101'] = 'X'
    else:
        reactions.loc[i,'iCH02101'] = '-'
    if (rxn in rxns_yeo_list):
        reactions.loc[i,'iCHO2291'] = 'X'
    else:
        reactions.loc[i,'iCHO2291'] = '-'
    if (rxn in rxns_recon3d_list):
        reactions.loc[i,'RECON3D'] = 'X'
    else:
        reactions.loc[i,'RECON3D'] = '-'

    if (curated == '') and (rxn in rxns_list) and (gpr_hef == '') and (gpr_fou == '') and (gpr_yeo == '') and (gpr_recon == '') and (gpr_final == ''):
        reactions.loc[i,'Curated'] = 'Pre-check(CHO)'
        reactions.loc[i,'Conf. Score'] = '1'
    elif (curated == '') and (rxn in rxns_hef_list) and (rxn in rxns_recon3d_list) and (gpr_hef == '') and (gpr_fou == '') and (gpr_yeo == '') and (gpr_recon == '') and (gpr_final == ''):
        reactions.loc[i,'Curated'] = 'Pre-check(Hefzi_RECON)'
        reactions.loc[i,'Conf. Score'] = '1'