# Metabolites

## Description
In this notebook we create a dataframe containing all the available information for the metabolites accounted in our reconstruction.<br>
[A. Generation of Metabolites dataset](#generation) <br>
&nbsp;&nbsp;&nbsp;&nbsp;[1.Retreive info from our reconstruction](#rxns) <br>
&nbsp;&nbsp;&nbsp;&nbsp;[2.Retrieve information from Recon3D, iCHO2291 and iCHO1766](#met) <br>
&nbsp;&nbsp;&nbsp;&nbsp;[3.Add all the metabolites information into our metabolites dataset](#combine)

In [2]:
# Import libraries
import gspread
import pandas as pd
import numpy as np
import cobra
from cobra.io import read_sbml_model

from tqdm.notebook import tqdm



In [3]:
# Define functions

def df_to_dict(df, key_col):
    """
    This function takes a pandas dataframe and a key column, and returns a dictionary
    with the key column as the dictionary keys and the rest of the columns as the values.
    """
    # Create an empty dictionary to hold the key-value pairs
    my_dict = {}
    
    # Loop through each row in the dataframe
    for index, row in df.iterrows():
        # Use the value in the key column as the dictionary key
        key_value = row[key_col]
        
        # Use the rest of the columns as the dictionary values
        value_dict = row.drop(key_col).to_dict()
        
        # Add the key-value pair to the dictionary
        my_dict[key_value] = value_dict
    
    return my_dict

<a id='generation'></a>
## A. Generation of Metabolites dataset
We start by creating a list of all the metabolites included in the reactions of our reconstruction (1). Then we create a dataset containing all the metabolites info from Recon3D, iCHO2291 and iCHO1766 models, including supplementary information from Recon 3D (2). Now we can map back this information into the metabolites from our reconstruction and generate an excell file for uploading into Google Sheets (3).

<a id='rxns'></a>
### 1. Retrieve a list of all the metabolites from our reconstruction
The list of all the reactions and the metabolites involved are in the Rxns Sheet in the Google Sheet.

In [4]:
# give service account details to gspread
sa = gspread.service_account(filename='credentials.json')

# sa is a gspread client, which can be used for connecting to the sheets
# by using the open method and the sheet name.
cho_recon = sa.open('CHO Network Reconstruction + Recon3D')

# we also need to specify the page name before getting the data. In this case we use the Rxns sheet.
rxns_sheet = cho_recon.worksheet('Rxns')

In [5]:
# We can extract the data using the get_all_records method and create a pd DataFrame
df = pd.DataFrame(rxns_sheet.get_all_records())
df

Unnamed: 0,Curated,Reaction,Reaction Name,Reaction Formula,Subsystem,GPR_hef,GPR_fou,GPR_yeo,GPR_Recon3D,GPR_final,Conf. Score,Curation Notes,References
0,PD,10FTHF5GLUtl,"5-glutamyl-10FTHF transport, lysosomal",10fthf5glu_c --> 10fthf5glu_l,"TRANSPORT, LYSOSOMAL",,,,,,1,No information available in the literature abo...,
1,PD,10FTHF5GLUtm,"5-glutamyl-10FTHF transport, mitochondrial",10fthf5glu_m --> 10fthf5glu_c,"TRANSPORT, MITOCHONDRIAL",,,,,,1,No information available in the literature abo...,
2,PD,10FTHF6GLUtl,"6-glutamyl-10FTHF transport, lysosomal",10fthf6glu_c --> 10fthf6glu_l,"TRANSPORT, LYSOSOMAL",,,,,,1,No information available in the literature abo...,
3,PD,10FTHF6GLUtm,"6-glutamyl-10FTHF transport, mitochondrial",10fthf6glu_m --> 10fthf6glu_c,"TRANSPORT, MITOCHONDRIAL",,,,,,1,No information available in the literature abo...,
4,PD,10FTHF7GLUtl,"7-glutamyl-10FTHF transport, lysosomal",10fthf7glu_c --> 10fthf7glu_l,"TRANSPORT, LYSOSOMAL",,,,,,1,No information available in the literature abo...,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
11907,,r2436,Mitochondrial Carrier (Mc) Tcdb:2.A.29.8.3,crn_m + odecrn_c <=> crn_c + odecrn_m,"Transport, mitochondrial",,,,,100765000,,,
11908,,r2437,Mitochondrial Carrier (Mc) Tcdb:2.A.29.8.3,crn_m + pcrn_c <=> crn_c + pcrn_m,"Transport, mitochondrial",,,,,100765000,,,
11909,,r2438,Mitochondrial Carrier (Mc) Tcdb:2.A.29.8.3,c4crn_c + crn_m <=> c4crn_m + crn_c,"Transport, mitochondrial",,,,,100765000,,,
11910,,r2439,"Zinc (Zn2+)-Iron (Fe2+) Permease (Zip), Tcdb:2...",hco3_c + so4_e --> hco3_e + so4_c,"Transport, extracellular",,,,,103159861,,,


In [6]:
# Create a cobra model to identify the metabolites involved in our reconstruction
model = cobra.Model("iCHOxxxx")
lr = []

for _, row in df.iterrows():
    r = cobra.Reaction(row['Reaction'])
    lr.append(r)
    
model.add_reactions(lr)
model

Set parameter Username
Academic license - for non-commercial use only - expires 2024-03-24


0,1
Name,iCHOxxxx
Memory address,12a933070
Number of metabolites,0
Number of reactions,11912
Number of genes,0
Number of groups,0
Objective expression,0
Compartments,


In [7]:
# With the built in function "build_reaction_from_string" we can identify the metabolites
for i,r in enumerate(tqdm(model.reactions)):
    r.build_reaction_from_string(df['Reaction Formula'][i])

  0%|          | 0/11912 [00:00<?, ?it/s]

unknown metabolite '10fthf5glu_c' created
unknown metabolite '10fthf5glu_l' created
unknown metabolite '10fthf5glu_m' created
unknown metabolite '10fthf6glu_c' created
unknown metabolite '10fthf6glu_l' created
unknown metabolite '10fthf6glu_m' created
unknown metabolite '10fthf7glu_c' created
unknown metabolite '10fthf7glu_l' created
unknown metabolite '10fthf7glu_m' created
unknown metabolite '10fthf_c' created
unknown metabolite '10fthf_l' created
unknown metabolite '10fthf_m' created
unknown metabolite '11docrtsl_c' created
unknown metabolite '11docrtsl_m' created
unknown metabolite '11docrtsl_r' created
unknown metabolite '11docrtstrn_c' created
unknown metabolite '11docrtstrn_m' created
unknown metabolite '11docrtstrn_r' created
unknown metabolite '12HPET_c' created
unknown metabolite 'atp_c' created
unknown metabolite 'h2o_c' created
unknown metabolite '12HPET_e' created
unknown metabolite 'adp_c' created
unknown metabolite 'h_c' created
unknown metabolite 'pi_c' created
unknown 

unknown metabolite '3odcoa_x' created
unknown metabolite 'occoa_x' created
unknown metabolite 'dcacoa_x' created
unknown metabolite 'btcoa_m' created
unknown metabolite 'btcoa_x' created
unknown metabolite 'ddcacoa_x' created
unknown metabolite 'pentcoa_m' created
unknown metabolite 'tdcoa_x' created
unknown metabolite 'hxcoa_m' created
unknown metabolite 'hxcoa_x' created
unknown metabolite 'c61_3Zcoa_m' created
unknown metabolite 'c61_3Zcoa_x' created
unknown metabolite '3ohodcoa_x' created
unknown metabolite 'hepcoa_m' created
unknown metabolite '3ohxccoa_x' created
unknown metabolite 'dmhptcoa_m' created
unknown metabolite 'ibcoa_m' created
unknown metabolite 'tmhndccoa_x' created
unknown metabolite 'acac_e' created
unknown metabolite 'acac_m' created
unknown metabolite 'acac_x' created
unknown metabolite 'acald_e' created
unknown metabolite 'acald_c' created
unknown metabolite 'acald_m' created
unknown metabolite 'acald_r' created
unknown metabolite 'acald_x' created
unknown metab

unknown metabolite 'asn_L_m' created
unknown metabolite 'phe_L_c' created
unknown metabolite 'phe_L_e' created
unknown metabolite 'asntrna_c' created
unknown metabolite 'trnaasn_c' created
unknown metabolite 'asntrna_m' created
unknown metabolite 'trnaasn_m' created
unknown metabolite 'asn_L_l' created
unknown metabolite 'ala_B_c' created
unknown metabolite 'cbp_c' created
unknown metabolite 'cbasp_c' created
unknown metabolite 'asp_D_c' created
unknown metabolite 'asp_D_e' created
unknown metabolite 'k_c' created
unknown metabolite 'k_e' created
unknown metabolite 'asp_D_x' created
unknown metabolite 'Nacasp_m' created
unknown metabolite 'asp_L_e' created
unknown metabolite 'pro_L_c' created
unknown metabolite 'pro_L_e' created
unknown metabolite 'oaa_m' created
unknown metabolite 'asptrna_c' created
unknown metabolite 'trnaasp_c' created
unknown metabolite 'asptrna_m' created
unknown metabolite 'trnaasp_m' created
unknown metabolite 'asp_L_l' created
unknown metabolite 'h2o_n' create

unknown metabolite 'duri_c' created
unknown metabolite 'duri_n' created
unknown metabolite 'dcyt_e' created
unknown metabolite 'c101_3Zcoa_m' created
unknown metabolite 'dc2coa_x' created
unknown metabolite 'c101_3Zcoa_x' created
unknown metabolite 'c102_3Z_7Zcoa_m' created
unknown metabolite 'c102_3Z_7Zcoa_x' created
unknown metabolite 'c163_3Z_7Z_10Zcoa_m' created
unknown metabolite 'c163_3Z_7Z_10Zcoa_x' created
unknown metabolite 'c164_3Z_7Z_10Z_13Zcoa_m' created
unknown metabolite 'c164_3Z_7Z_10Z_13Zcoa_x' created
unknown metabolite 'c225_3Z_7Z_10Z_13Z_16Zcoa_m' created
unknown metabolite 'c225_3Z_7Z_10Z_13Z_16Zcoa_x' created
unknown metabolite '3ddcrn_e' created
unknown metabolite '4h2oglt_m' created
unknown metabolite 'pro_D_x' created
unknown metabolite 'pro_D_c' created
unknown metabolite 'pro_L_r' created
unknown metabolite 'dchac_c' created
unknown metabolite 'dchac_e' created
unknown metabolite 'dchac_r' created
unknown metabolite 'tdechola_c' created
unknown metabolite 'tde

unknown metabolite 'phyt_x' created
unknown metabolite 'phytcoa_x' created
unknown metabolite 'vacccoa_m' created
unknown metabolite 'c201coa_x' created
unknown metabolite 'odecoa_x' created
unknown metabolite 'c220coa_x' created
unknown metabolite 'c221coa_x' created
unknown metabolite 'c50coa_m' created
unknown metabolite 'c6coa_m' created
unknown metabolite 'c70coa_m' created
unknown metabolite 'c13_trimethylcoa_x' created
unknown metabolite 'c11_trimethylcoa_x' created
unknown metabolite 'omhdecacid_r' created
unknown metabolite 'omhdecacid_c' created
unknown metabolite 'sebacid_c' created
unknown metabolite 'fdp_c' created
unknown metabolite 'f1p_c' created
unknown metabolite 'glyald_c' created
unknown metabolite 's17bp_c' created
unknown metabolite 'e4p_c' created
unknown metabolite 'xu1p_D_c' created
unknown metabolite 'gcald_c' created
unknown metabolite 'f6p_c' created
unknown metabolite 'f26bp_c' created
unknown metabolite 'fe2_m' created
unknown metabolite 'ppp9_m' created
u

unknown metabolite 'hgthf_c' created
unknown metabolite 'glu_L_l' created
unknown metabolite 'thf_l' created
unknown metabolite 'Tyr_ggn_c' created
unknown metabolite 'ggn_c' created
unknown metabolite 'leuktrC4_r' created
unknown metabolite 'glu_L_r' created
unknown metabolite 'leuktrD4_r' created
unknown metabolite 'leuktrE4_c' created
unknown metabolite 'leuktrF4_c' created
unknown metabolite 'ser_L_m' created
unknown metabolite '3htmelys_c' created
unknown metabolite '4tmeabut_c' created
unknown metabolite '3htmelys_m' created
unknown metabolite '4tmeabut_m' created
unknown metabolite 'gmp_m' created
unknown metabolite 'thcrm_cho_l' created
unknown metabolite 'glac_c' created
unknown metabolite 'glcr_c' created
unknown metabolite 'glac_m' created
unknown metabolite 'glcr_m' created
unknown metabolite 'glac_r' created
unknown metabolite 'glygn1_c' created
unknown metabolite 'glygn2_c' created
unknown metabolite 'hs_deg9_l' created
unknown metabolite 'glcur_l' created
unknown metabol

unknown metabolite 'm4masn_g' created
unknown metabolite 'n2m2nmasn_g' created
unknown metabolite 'n5m2masn_g' created
unknown metabolite 'l5fn5m2masn_g' created
unknown metabolite 'l6fn6m2masn_g' created
unknown metabolite 'm_em_3gacpail_cho_r' created
unknown metabolite 'm_em_3gacpail_prot_cho_r' created
unknown metabolite 'mem2emgacpail_prot_cho_r' created
unknown metabolite 'm4masn_e' created
unknown metabolite 'm7masnB_r' created
unknown metabolite 'm7masnB_g' created
unknown metabolite 'm8masn_r' created
unknown metabolite 'm8masn_g' created
unknown metabolite '3mldz_c' created
unknown metabolite 'malon_m' created
unknown metabolite 'malcrn_m' created
unknown metabolite 'malcoa_x' created
unknown metabolite 'HC10859_c' created
unknown metabolite 'HC10859_m' created
unknown metabolite 'mal_L_x' created
unknown metabolite 'man_r' created
unknown metabolite 'normete_L_c' created
unknown metabolite '3mgcoa_m' created
unknown metabolite 'mercplac_c' created
unknown metabolite 'mercpla

unknown metabolite 'adhlam_m' created
unknown metabolite 'pdx5p_c' created
unknown metabolite 'pydx5p_c' created
unknown metabolite 'pydxn_c' created
unknown metabolite 'peamn_c' created
unknown metabolite 'pecgoncoa_r' created
unknown metabolite 'pe_cho_m' created
unknown metabolite 'pdmeeta_c' created
unknown metabolite 'pe_cho_g' created
unknown metabolite 's7p_c' created
unknown metabolite '3php_c' created
unknown metabolite 'prostgh2_c' created
unknown metabolite 'prostgd2_c' created
unknown metabolite 'prostgh2_r' created
unknown metabolite 'prostgd2_r' created
unknown metabolite 'prostge2_r' created
unknown metabolite 'prostgi2_r' created
unknown metabolite '2pglyc_c' created
unknown metabolite 'pgp_cho_c' created
unknown metabolite 'pheacgln_c' created
unknown metabolite 'phaccoa_m' created
unknown metabolite 'pheacgly_m' created
unknown metabolite 'pheacgly_c' created
unknown metabolite 'phpyr_c' created
unknown metabolite 'phe_L_m' created
unknown metabolite 'phpyr_m' created

unknown metabolite 'xol7aone_m' created
unknown metabolite 'xoltri27_c' created
unknown metabolite 'CE0233_c' created
unknown metabolite 'xol7ah2al_c' created
unknown metabolite 'CE1272_c' created
unknown metabolite 'xoltriol_r' created
unknown metabolite 'CE1272_r' created
unknown metabolite 'CE1273_r' created
unknown metabolite 'CE1277_c' created
unknown metabolite 'CE1277_r' created
unknown metabolite 'CE1279_c' created
unknown metabolite 'CE1279_r' created
unknown metabolite 'CE1278_c' created
unknown metabolite 'CE1278_r' created
unknown metabolite 'CE5242_c' created
unknown metabolite 'CE5252_m' created
unknown metabolite 'CE5242_m' created
unknown metabolite 'CE5252_r' created
unknown metabolite 'CE5242_r' created
unknown metabolite 'CE5252_x' created
unknown metabolite 'CE5242_x' created
unknown metabolite 'CE5243_c' created
unknown metabolite 'CE5249_m' created
unknown metabolite 'CE5243_m' created
unknown metabolite 'CE5249_r' created
unknown metabolite 'CE5243_r' created
unk

unknown metabolite 'CE2414_c' created
unknown metabolite 'CE5122_c' created
unknown metabolite 'CE2416_c' created
unknown metabolite 'CE5126_c' created
unknown metabolite 'dopa_l' created
unknown metabolite 'CE5276_l' created
unknown metabolite 'dopa_x' created
unknown metabolite 'CE4793_c' created
unknown metabolite 'CE5114_c' created
unknown metabolite 'CE5114_r' created
unknown metabolite 'CE4812_c' created
unknown metabolite 'CE2209_c' created
unknown metabolite 'CE4811_c' created
unknown metabolite 'C04295_c' created
unknown metabolite 'C04295_m' created
unknown metabolite '5adtststerone_m' created
unknown metabolite 'C04295_r' created
unknown metabolite 'CE4810_c' created
unknown metabolite 'CE4821_c' created
unknown metabolite 'CE4819_c' created
unknown metabolite 'CE4817_c' created
unknown metabolite 'CE4816_c' created
unknown metabolite 'CE4824_c' created
unknown metabolite 'CE4820_c' created
unknown metabolite 'CE4818_c' created
unknown metabolite 'CE2313_r' created
unknown m

unknown metabolite 'acryl_c' created
unknown metabolite 'ch4s_c' created
unknown metabolite 'CE6248_c' created
unknown metabolite 'CE6248_r' created
unknown metabolite 'C04843_c' created
unknown metabolite 'C04843_r' created
unknown metabolite 'C04843_x' created
unknown metabolite '12HPET_m' created
unknown metabolite '12HPET_n' created
unknown metabolite 'CE6250_c' created
unknown metabolite 'CE6250_m' created
unknown metabolite 'CE6250_r' created
unknown metabolite 'CE6250_x' created
unknown metabolite 'CE6251_c' created
unknown metabolite 'CE6251_r' created
unknown metabolite 'CE6240_c' created
unknown metabolite 'CE6241_c' created
unknown metabolite 'CE6240_m' created
unknown metabolite 'CE6241_m' created
unknown metabolite 'CE6240_r' created
unknown metabolite 'CE6241_r' created
unknown metabolite 'CE6242_c' created
unknown metabolite 'CE6242_m' created
unknown metabolite 'CE6242_r' created
unknown metabolite 'CE6243_c' created
unknown metabolite 'CE6243_m' created
unknown metabol

unknown metabolite 'sdhpt_c' created
unknown metabolite 'seahcys_c' created
unknown metabolite 'seasmet_c' created
unknown metabolite 'sebacid_x' created
unknown metabolite 'sebcoa_x' created
unknown metabolite 'sel_c' created
unknown metabolite 'ser_D_c' created
unknown metabolite 'sertrna_m' created
unknown metabolite 'trnaser_m' created
unknown metabolite 'ser_L_l' created
unknown metabolite 'ser_L_x' created
unknown metabolite 'slfcys_c' created
unknown metabolite 'sgalside_cho_c' created
unknown metabolite 'ethamp_r' created
unknown metabolite 'hxdcal_r' created
unknown metabolite 'aanam_c' created
unknown metabolite 'l2n2m2mn_c' created
unknown metabolite 'acngalgbside_cho_g' created
unknown metabolite 'sph1p_c' created
unknown metabolite 'sl_L_c' created
unknown metabolite 'sl_L_m' created
unknown metabolite 'selmetox_c' created
unknown metabolite 'selmetox_r' created
unknown metabolite 'sphmyln_cho_g' created
unknown metabolite 'sphmyln_cho_l' created
unknown metabolite 'spc_ch

unknown metabolite 'HC02228_c' created
unknown metabolite 'HC01842_c' created
unknown metabolite 'HC01797_c' created
unknown metabolite 'HC00004_c' created
unknown metabolite 'HC00319_c' created
unknown metabolite 'HC00319_m' created
unknown metabolite 'arachd_l' created
unknown metabolite 'hdca_l' created
unknown metabolite 'na1_r' created
unknown metabolite 'strdnc_l' created
unknown metabolite 'strdnc_r' created
unknown metabolite 'lnlc_l' created
unknown metabolite 'lnlc_r' created
unknown metabolite 'lanost_x' created
unknown metabolite 'HC01787_c' created
unknown metabolite 'HC00004_r' created
unknown metabolite '2obut_m' created
unknown metabolite 'cdpchol_r' created
unknown metabolite 'HC02020_r' created
unknown metabolite 'hdd2coa_r' created
unknown metabolite 'HC02021_r' created
unknown metabolite 'HC02024_r' created
unknown metabolite 'HC02025_r' created
unknown metabolite 'HC02026_r' created
unknown metabolite 'HC02027_r' created
unknown metabolite 'HC02020_l' created
unkno

unknown metabolite 'atvethgluc_r' created
unknown metabolite 'atvlacgluc_r' created
unknown metabolite 'atvlac_e' created
unknown metabolite 'gluside_hs_g' created
unknown metabolite 'galgluside_hs_g' created
unknown metabolite 'dgcholcoa_x' created
unknown metabolite 'chito2pdol_L_c' created
unknown metabolite 'mpdol_L_c' created
unknown metabolite 'chito2pdol_U_c' created
unknown metabolite 'mpdol_U_c' created
unknown metabolite 'galgluside_hs_e' created
unknown metabolite 'gluside_hs_e' created
unknown metabolite 'gd1b_hs_l' created
unknown metabolite 'gd2_hs_l' created
unknown metabolite 'gm1a_hs_l' created
unknown metabolite 'gm2a_hs_l' created
unknown metabolite 'dolmanp_L_r' created
unknown metabolite 'memgacpail_hs_r' created
unknown metabolite 'dolp_L_r' created
unknown metabolite 'm2emgacpail_hs_r' created
unknown metabolite 'dolmanp_U_r' created
unknown metabolite 'dolp_U_r' created
unknown metabolite 'C02712_e' created
unknown metabolite 'C03681_e' created
unknown metabolit

unknown metabolite '3ddecdicoa_m' created
unknown metabolite '3ddecdicoa_x' created
unknown metabolite '2ddecdicoa_x' created
unknown metabolite '2dodtricoa_m' created
unknown metabolite '2dodtricoa_x' created
unknown metabolite '3dodtricoa_m' created
unknown metabolite '3dodtricoa_x' created
unknown metabolite 'c12dc_x' created
unknown metabolite 'c12dc_c' created
unknown metabolite 'c12dccoa_c' created
unknown metabolite 'tmtrdcoa_x' created
unknown metabolite 'tridcoa_m' created
unknown metabolite 'tetd7ecoa_m' created
unknown metabolite 'tetdecdicoa_m' created
unknown metabolite 'tetdecdicoa_x' created
unknown metabolite '5tedtricoa_m' created
unknown metabolite '5tedtricoa_x' created
unknown metabolite 'c14dccoa_x' created
unknown metabolite 'hexde7coa_x' created
unknown metabolite '3hdeccoa_m' created
unknown metabolite 'hexddcoa_m' created
unknown metabolite '2hexdtricoa_x' created
unknown metabolite 'hexdtrcoa_m' created
unknown metabolite '4hexdtricoa_m' created
unknown metabo

unknown metabolite 'M00468_c' created
unknown metabolite 'M00465_c' created
unknown metabolite 'M00453_c' created
unknown metabolite 'M00480_c' created
unknown metabolite 'M00477_c' created
unknown metabolite 'M00446_c' created
unknown metabolite 'M00441_c' created
unknown metabolite 'M00501_c' created
unknown metabolite 'ttccoa_c' created
unknown metabolite 'M00500_c' created
unknown metabolite 'M00449_c' created
unknown metabolite 'M00482_c' created
unknown metabolite 'M00483_c' created
unknown metabolite 'M00492_c' created
unknown metabolite 'M00457_c' created
unknown metabolite 'M00464_c' created
unknown metabolite 'M00452_c' created
unknown metabolite 'M00459_c' created
unknown metabolite 'M00466_c' created
unknown metabolite 'M00455_c' created
unknown metabolite 'M00450_c' created
unknown metabolite 'M00439_c' created
unknown metabolite 'M00444_c' created
unknown metabolite 'M00436_c' created
unknown metabolite 'M00443_c' created
unknown metabolite 'M00479_c' created
unknown meta

unknown metabolite 'M00954_c' created
unknown metabolite 'M00966_c' created
unknown metabolite 'M00967_c' created
unknown metabolite 'M00964_c' created
unknown metabolite 'M00960_c' created
unknown metabolite 'M00956_c' created
unknown metabolite 'M01068_c' created
unknown metabolite 'cholcoar_c' created
unknown metabolite 'cholcoar_m' created
unknown metabolite 'cholcoas_m' created
unknown metabolite 'cholcoads_m' created
unknown metabolite 'HC01459_m' created
unknown metabolite 'cholcoaone_m' created
unknown metabolite 'M01094_m' created
unknown metabolite 'CE5133_m' created
unknown metabolite 'M00617_x' created
unknown metabolite 'M00754_x' created
unknown metabolite 'HC00958_c' created
unknown metabolite 'xol24oh_c' created
unknown metabolite 'M00978_c' created
unknown metabolite 'M00976_c' created
unknown metabolite 'M01081_c' created
unknown metabolite 'M01083_c' created
unknown metabolite 'M01077_c' created
unknown metabolite 'M01080_c' created
unknown metabolite 'M01077_m' crea

unknown metabolite 'hdd2crn_r' created
unknown metabolite 'M01191_r' created
unknown metabolite 'hpdcacrn_r' created
unknown metabolite 'hpdcacoa_r' created
unknown metabolite 'M02102_r' created
unknown metabolite 'M00004_r' created
unknown metabolite 'M02103_r' created
unknown metabolite 'M01237_r' created
unknown metabolite 'stcrn_r' created
unknown metabolite 'odecrn_r' created
unknown metabolite 'M00020_r' created
unknown metabolite 'vacccrn_r' created
unknown metabolite 'vacccoa_r' created
unknown metabolite 'elaidcrn_r' created
unknown metabolite 'M00127_r' created
unknown metabolite 'M02638_r' created
unknown metabolite 'M00116_r' created
unknown metabolite 'lneldccrn_r' created
unknown metabolite 'lneldccoa_r' created
unknown metabolite 'M02611_r' created
unknown metabolite 'M02612_r' created
unknown metabolite 'arachcrn_r' created
unknown metabolite 'M01777_r' created
unknown metabolite 'M01775_r' created
unknown metabolite 'M01236_r' created
unknown metabolite 'M01776_r' crea

unknown metabolite 'ttdcea_l' created
unknown metabolite 'M01479_l' created
unknown metabolite 'M00117_l' created
unknown metabolite 'M01470_l' created
unknown metabolite 'M02745_l' created
unknown metabolite 'M01506_l' created
unknown metabolite 'ptdca_l' created
unknown metabolite 'M01477_l' created
unknown metabolite 'M01197_l' created
unknown metabolite 'HC02021_l' created
unknown metabolite 'M01495_l' created
unknown metabolite 'hpdca_l' created
unknown metabolite 'M01454_l' created
unknown metabolite 'M00003_l' created
unknown metabolite 'M01485_l' created
unknown metabolite 'M01238_l' created
unknown metabolite 'M01464_l' created
unknown metabolite 'M00019_l' created
unknown metabolite 'M01486_l' created
unknown metabolite 'elaid_l' created
unknown metabolite 'M01478_l' created
unknown metabolite 'M00115_l' created
unknown metabolite 'M01474_l' created
unknown metabolite 'lneldc_l' created
unknown metabolite 'M01502_l' created
unknown metabolite 'M02613_l' created
unknown metabo

unknown metabolite 'M01358_l' created
unknown metabolite 'M02757_l' created
unknown metabolite 'M01352_l' created
unknown metabolite 'M00245_c' created
unknown metabolite 'M00684_c' created
unknown metabolite 'M01995_c' created
unknown metabolite 'dhlam_c' created
unknown metabolite 'M00209_c' created
unknown metabolite 'HC01595_c' created
unknown metabolite 'M02397_c' created
unknown metabolite 'dad_5_c' created
unknown metabolite 'M00210_c' created
unknown metabolite 'M00208_c' created
unknown metabolite 'CE7145_m' created
unknown metabolite 'CE5850_m' created
unknown metabolite 'CE5851_m' created
unknown metabolite 'oh1_m' created
unknown metabolite 'CE5849_m' created
unknown metabolite 'CE5845_m' created
unknown metabolite 'CE1925_m' created
unknown metabolite 'M01938_c' created
unknown metabolite 'M01938_r' created
unknown metabolite 'CE5847_m' created
unknown metabolite 'CE5848_m' created
unknown metabolite 'CE5723_m' created
unknown metabolite 'CE1926_m' created
unknown metaboli

unknown metabolite 'M02446_c' created
unknown metabolite 'M02446_e' created
unknown metabolite 'M02447_c' created
unknown metabolite 'M02447_e' created
unknown metabolite 'M02449_c' created
unknown metabolite 'M02449_e' created
unknown metabolite 'M02451_c' created
unknown metabolite 'M02451_e' created
unknown metabolite 'M01966_c' created
unknown metabolite '1p2cbxl_c' created
unknown metabolite 'egme_c' created
unknown metabolite 'M00561_c' created
unknown metabolite 'M02723_c' created
unknown metabolite 'zn2_g' created
unknown metabolite 'zn2_r' created
unknown metabolite 'M01678_c' created
unknown metabolite 'M01677_c' created
unknown metabolite 'M02671_c' created
unknown metabolite 'M01437_c' created
unknown metabolite 'M01436_c' created
unknown metabolite 'M00136_c' created
unknown metabolite 'M00137_c' created
unknown metabolite 'M00193_c' created
unknown metabolite 'M00192_c' created
unknown metabolite 'M03104_c' created
unknown metabolite 'M02988_c' created
unknown metabolite 

unknown metabolite 'rsv_r' created
unknown metabolite 'ndersv_r' created
unknown metabolite 'ndersv_c' created
unknown metabolite 'ndersv_e' created
unknown metabolite 'galgluside_hs_l' created
unknown metabolite 'acnam_e' created
unknown metabolite 'gd1a_hs_e' created
unknown metabolite 'gm1_hs_e' created
unknown metabolite 'gd1b_hs_e' created
unknown metabolite 'gt1b_hs_e' created
unknown metabolite 'gd3_hs_e' created
unknown metabolite 'gq1b_hs_e' created
unknown metabolite 'gd2_hs_e' created
unknown metabolite 'gd1a_hs_n' created
unknown metabolite 'gm1_hs_n' created
unknown metabolite 'gm1b_hs_e' created
unknown metabolite 'ga1_hs_e' created
unknown metabolite 'nfdac_r' created
unknown metabolite 'nfdoh_r' created
unknown metabolite 'octdececrn_c' created
unknown metabolite 'octdececrn_m' created
unknown metabolite 'octdececoa_c' created
unknown metabolite 'odecrn_e' created
unknown metabolite 'oleth_e' created
unknown metabolite 'oxyp_e' created
unknown metabolite 'paf_hs_e' crea

unknown metabolite 'CE2883_c' created
unknown metabolite 'CE2887_c' created
unknown metabolite 'CE2884_c' created
unknown metabolite 'CE2888_c' created
unknown metabolite 'CE1950_c' created
unknown metabolite 'cynt_c' created
unknown metabolite 'CE1950_e' created
unknown metabolite 'cynt_e' created
unknown metabolite 'so3_e' created
unknown metabolite 'CE1950_l' created
unknown metabolite 'cynt_l' created
unknown metabolite 'so3_l' created
unknown metabolite 'CE1950_n' created
unknown metabolite 'cynt_n' created
unknown metabolite 'so3_n' created
unknown metabolite 'CE1352_g' created
unknown metabolite '17ahprgnlone_g' created
unknown metabolite 'CE1352_l' created
unknown metabolite '17ahprgnlone_l' created
unknown metabolite 'CE1352_r' created
unknown metabolite 'chsterols_g' created
unknown metabolite 'chsterols_l' created
unknown metabolite 'prgnlones_g' created
unknown metabolite 'prgnlone_g' created
unknown metabolite 'prgnlones_l' created
unknown metabolite 'prgnlone_l' created
u

unknown metabolite 'dolglcp_L_c' created
unknown metabolite 'dolglcp_U_c' created
unknown metabolite 'valarggly_e' created
unknown metabolite 'valarggly_c' created
unknown metabolite 'valhisasn_e' created
unknown metabolite 'valhisasn_c' created
unknown metabolite 'valleuphe_e' created
unknown metabolite 'valleuphe_c' created
unknown metabolite 'vallystyr_e' created
unknown metabolite 'vallystyr_c' created
unknown metabolite 'valphearg_e' created
unknown metabolite 'valphearg_c' created
unknown metabolite 'valprotrp_e' created
unknown metabolite 'valprotrp_c' created
unknown metabolite 'valserarg_e' created
unknown metabolite 'valserarg_c' created
unknown metabolite 'valtrpphe_e' created
unknown metabolite 'valtrpphe_c' created
unknown metabolite 'valtrpval_e' created
unknown metabolite 'valtrpval_c' created
unknown metabolite 'valval_e' created
unknown metabolite 'valval_c' created
unknown metabolite 'vanilpyr_c' created
unknown metabolite 'xol7ah3_e' created
unknown metabolite 'xol7a

In [8]:
# We first create a list of the metabolites and then a pandas df with it
metabolites_list = []
for met in model.metabolites:
    metabolites_list.append(met.id)
    
metabolites = pd.DataFrame(metabolites_list, columns =['BiGG ID'])
metabolites

Unnamed: 0,BiGG ID
0,10fthf5glu_c
1,10fthf5glu_l
2,10fthf5glu_m
3,10fthf6glu_c
4,10fthf6glu_l
...,...
8661,HC02098_c
8662,HC02099_c
8663,HC01988_c
8664,HC10856_m


<a id='met'></a>
### 2. Retrieve information from all the metabolites on Recon3D, iCHO2291 and iCHO1766
We use two datasets for this, first we take information from the Recon3D.xml, iCHO2291.xml and iCHO1766 files from which we get the metabolite ID, Name, Formula and Compartment. We then add the metadata for the available metabolites from Recon3D supplementary files.

In [72]:
# read the Recon3D model
recon3d_model = read_sbml_model('../Data/GPR_Curation/Recon3D.xml')

In [73]:
# Generate a dataset containing all the metabolites, chemical formula of each metabolite and compartment
num_rows = len(recon3d_model.metabolites)
recon3d_model_metabolites = pd.DataFrame(index=range(num_rows), columns=['BiGG ID', 'Name', 'Formula', 'Compartment'])
for i,met in enumerate(recon3d_model.metabolites):
    id_ = met.id
    name = met.name
    formula = met.formula
    comp = met.compartment
    recon3d_model_metabolites.iloc[i] = [id_, name, formula, comp]

In [74]:
recon3d_model_metabolites

Unnamed: 0,BiGG ID,Name,Formula,Compartment
0,10fthf_c,10-Formyltetrahydrofolate,C20H21N7O7,c
1,10fthf_l,10-Formyltetrahydrofolate,C20H21N7O7,l
2,10fthf_m,10-Formyltetrahydrofolate,C20H21N7O7,m
3,11docrtsl_c,11docrtsl c,C21H30O4,c
4,11docrtsl_m,11docrtsl c,C21H30O4,m
...,...,...,...,...
5830,caproic_e,Caproic Acid,C6H11O2,e
5831,1a25dhvitd2_c,"1-alpha,25-Dihydroxyvitamin D2",C28H44O3,c
5832,1a25dhvitd2_e,"1-alpha,25-Dihydroxyvitamin D2",C28H44O3,e
5833,protein_c,Torasemide-M3,,c


In [75]:
# read the Yeo's model
iCHO2291_model = read_sbml_model('../Data/Reconciliation/models/iCHO2291.xml')

In [76]:
# Generate a dataset containing all the metabolites, chemical formula of each metabolite and compartment from Yeo's model
num_rows = len(iCHO2291_model.metabolites)
iCHO2291_model_metabolites = pd.DataFrame(index=range(num_rows), columns=['BiGG ID', 'Name', 'Formula', 'Compartment'])
for i,met in enumerate(iCHO2291_model.metabolites):
    id_ = met.id
    name = met.name
    formula = met.formula
    comp = met.compartment
    iCHO2291_model_metabolites.iloc[i] = [id_, name, formula, comp]
    
iCHO2291_model_metabolites['BiGG ID'] = iCHO2291_model_metabolites['BiGG ID'].str.replace("[", "_", regex=False)
iCHO2291_model_metabolites['BiGG ID'] = iCHO2291_model_metabolites['BiGG ID'].str.replace("]", "", regex=False)
iCHO2291_model_metabolites

Unnamed: 0,BiGG ID,Name,Formula,Compartment
0,10fthf5glu_c,10-formyltetrahydrofolate-[Glu](5),C40H45N11O19,c
1,10fthf5glu_l,10-formyltetrahydrofolate-[Glu](5),C40H45N11O19,l
2,10fthf5glu_m,10-formyltetrahydrofolate-[Glu](5),C40H45N11O19,m
3,10fthf6glu_c,10-formyltetrahydrofolate-[Glu](6),C45H51N12O22,c
4,10fthf6glu_l,10-formyltetrahydrofolate-[Glu](6),C45H51N12O22,l
...,...,...,...,...
3967,Rtotal3crn_c,Rtotal3crn[c],CO2R3C7H14NO2,c
3968,Rtotal3crn_m,Rtotal3crn[m],CO2R3C7H14NO2,m
3969,Rtotalcrn_c,Rtotalcrn[c],CO2RC7H14NO2,c
3970,Rtotalcrn_m,Rtotalcrn[m],CO2RC7H14NO2,m


In [77]:
# read Hefzi's model
iCHO1766_model = read_sbml_model('../Data/Reconciliation/models/iCHOv1_final.xml')

In [78]:
# Generate a dataset containing all the metabolites, chemical formula of each metabolite and compartment from Hefzi's model
num_rows = len(iCHO1766_model.metabolites)
iCHO1766_model_metabolites = pd.DataFrame(index=range(num_rows), columns=['BiGG ID', 'Name', 'Formula', 'Compartment'])
for i,met in enumerate(iCHO1766_model.metabolites):
    id_ = met.id
    name = met.name
    formula = met.formula
    comp = met.compartment
    iCHO1766_model_metabolites.iloc[i] = [id_, name, formula, comp]

iCHO1766_model_metabolites

Unnamed: 0,BiGG ID,Name,Formula,Compartment
0,coke_r,cocaine,C17H21NO4,r
1,h2o_r,H2O,H2O,r
2,bz_r,benzoate,C7H5O2,r
3,egme_r,ecgonine methyl ester,C10H17NO3,r
4,h_r,proton,H,r
...,...,...,...,...
4451,igg_g,igg[g],,g
4452,nicrns_c,Nicotinate D-ribonucleoside,C11H13NO6,c
4453,bilglcur_r,Bilirubin monoglucuronide,C39H44N4O12,r
4454,pcollglys_c,Procollagen L-lysine,C7H14N3O2R2,c


In [79]:
models_metabolites = pd.concat([recon3d_model_metabolites, iCHO2291_model_metabolites, iCHO1766_model_metabolites])
models_metabolites = models_metabolites.groupby('BiGG ID').first()
models_metabolites = models_metabolites.reset_index(drop = False)
models_metabolites

Unnamed: 0,BiGG ID,Name,Formula,Compartment
0,10fthf5glu_c,10-formyltetrahydrofolate-[Glu](5),C40H45N11O19,c
1,10fthf5glu_e,10-formyltetrahydrofolate-[Glu](5),C40H45N11O19,e
2,10fthf5glu_l,10-formyltetrahydrofolate-[Glu](5),C40H45N11O19,l
3,10fthf5glu_m,10-formyltetrahydrofolate-[Glu](5),C40H45N11O19,m
4,10fthf6glu_c,10-formyltetrahydrofolate-[Glu](6),C45H51N12O22,c
...,...,...,...,...
8527,zym_int2_r,Zymosterol intermediate 2 C27H42O,C27H42O,r
8528,zymst_c,Zymosterol C27H44O,C27H44O,c
8529,zymst_r,Zymosterol C27H44O,C27H44O,r
8530,zymstnl_c,5alpha-cholest-8-en-3beta-ol,C27H46O,c


In [54]:
#Generation of a dataset containing all the information from Recon3D metabolites Supplementary Data.
recon3d_metabolites_meta = pd.read_excel('../Data/Metabolites/metabolites.recon3d.xlsx', header = 0)
recon3d_metabolites_meta['BiGG ID'] = recon3d_metabolites_meta['BiGG ID'].str.replace("[", "_", regex=False)
recon3d_metabolites_meta['BiGG ID'] = recon3d_metabolites_meta['BiGG ID'].str.replace("]", "", regex=False)
recon3d_metabolites_meta

Unnamed: 0,BiGG ID,KEGG,CHEBI,PubChem,Inchi,Hepatonet,EHMNID,SMILES,INCHI2,CC_ID,Stereoisomer Information of Metabolite Identified,Charge of the Metabolite Identified,CID_ID,PDB (ligand-expo) Experimental Coordinates File Url,Pub Chem Url,ChEBI Url
0,10fthf5glu_c,,,,,,,,,,,,,,,
1,10fthf5glu_l,,,,,,,,,,,,,,,
2,10fthf5glu_m,,,,,,,,,,,,,,,
3,10fthf6glu_c,,,,InChI=1/C45H58N12O22/c46-45-55-36-35(38(67)56-...,,,,,,,,,,,
4,10fthf6glu_l,,,,InChI=1/C45H58N12O22/c46-45-55-36-35(38(67)56-...,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5282,,,CID100015232,,,,,,,,,,,,,
5283,,,CID100123634,,,,,,,,,,,,,
5284,,,CID100206527,,,,,,,,,,,,,
5285,,,CID105479141,,,,,,,,,,,,,


In [55]:
# Transformation of the "recon3d_metabolites_meta" into a dict to map it into the "recon3d_model_metabolites"
recon3dmet_dict = df_to_dict(recon3d_metabolites_meta, 'BiGG ID')

In [56]:
# Mapping into the "recon3d_model_metabolites" dataset
models_metabolites[['KEGG','CHEBI', 'PubChem','Inchi', 'Hepatonet', 'EHMNID', 'SMILES', 'INCHI2',
                          'CC_ID','Stereoisomer Information of Metabolite Identified', 'Charge of the Metabolite Identified',
    'CID_ID','PDB (ligand-expo) Experimental Coordinates  File Url', 'Pub Chem Url',
    'ChEBI Url']] = models_metabolites['BiGG ID'].apply(lambda x: pd.Series(recon3dmet_dict.get(x, None), dtype=object))

In [57]:
models_metabolites

Unnamed: 0,BiGG ID,Name,Formula,Compartment,KEGG,CHEBI,PubChem,Inchi,Hepatonet,EHMNID,SMILES,INCHI2,CC_ID,Stereoisomer Information of Metabolite Identified,Charge of the Metabolite Identified,CID_ID,PDB (ligand-expo) Experimental Coordinates File Url,Pub Chem Url,ChEBI Url
0,10fthf5glu_c,10-formyltetrahydrofolate-[Glu](5),C40H45N11O19,c,,,,,,,,,,,,,,,
1,10fthf5glu_e,10-formyltetrahydrofolate-[Glu](5),C40H45N11O19,e,,,,,,,,,,,,,,,
2,10fthf5glu_l,10-formyltetrahydrofolate-[Glu](5),C40H45N11O19,l,,,,,,,,,,,,,,,
3,10fthf5glu_m,10-formyltetrahydrofolate-[Glu](5),C40H45N11O19,m,,,,,,,,,,,,,,,
4,10fthf6glu_c,10-formyltetrahydrofolate-[Glu](6),C45H51N12O22,c,,,,InChI=1/C45H58N12O22/c46-45-55-36-35(38(67)56-...,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8527,zym_int2_r,Zymosterol intermediate 2 C27H42O,C27H42O,r,C05437,18252.0,22298942.0,InChI=1S/C27H42O/c1-18(2)7-6-8-19(3)23-11-12-2...,,,[H][C@@]12CCC3=C(CC[C@]4(C)[C@]([H])(CC[C@@]34...,InChI=1S/C27H42O/c1-18(2)7-6-8-19(3)23-11-12-2...,,,Neutral,22298942.0,,https://pubchem.ncbi.nlm.nih.gov/compound/2229...,https://www.ebi.ac.uk/chebi/searchId.do?chebiI...
8528,zymst_c,Zymosterol C27H44O,C27H44O,c,,,,,,,,,,,,,,,
8529,zymst_r,Zymosterol C27H44O,C27H44O,r,C05437,18252.0,92746.0,InChI=1S/C27H44O/c1-18(2)7-6-8-19(3)23-11-12-2...,HC01451,C05437,[H][C@@]12CCC3=C(CC[C@]4(C)[C@]([H])(CC[C@@]34...,InChI=1S/C27H44O/c1-18(2)7-6-8-19(3)23-11-12-2...,,,Neutral,92746.0,,https://pubchem.ncbi.nlm.nih.gov/compound/92746,https://www.ebi.ac.uk/chebi/searchId.do?chebiI...
8530,zymstnl_c,5alpha-cholest-8-en-3beta-ol,C27H46O,c,,,,,,,,,,,,,,,


In [58]:
# Transform the final Recon3D Metabolites dataset into a dictionary to map it into our dataset
final_met_dict = df_to_dict(models_metabolites, 'BiGG ID')

<a id='combine'></a>
### 3. Add all the metabolites information into our metabolites dataset
With the dictionary created in **Step 2** we can use the information to map it in the metabolites dataset created in **Step 1** which contains all the metabolites of our reconstruction.

In [59]:
metabolites[['Name', 'Formula', 'Compartment', 'KEGG','CHEBI', 'PubChem','Inchi', 'Hepatonet', 'EHMNID', 'SMILES',
             'INCHI2','CC_ID','Stereoisomer Information of Metabolite Identified', 'Charge of the Metabolite Identified',
    'CID_ID','PDB (ligand-expo) Experimental Coordinates  File Url', 'Pub Chem Url',
    'ChEBI Url']] = metabolites['BiGG ID'].apply(lambda x: pd.Series(final_met_dict.get(x, None), dtype=object))

In [60]:
# Update the Compartment column in the final dataset
for i,row in metabolites.iterrows():
    if row['Compartment'] == 'c':
        metabolites.loc[i, 'Compartment'] = 'c - cytosol'
    if row['Compartment'] == 'l':
        metabolites.loc[i, 'Compartment'] = 'l - lysosome'
    if row['Compartment'] == 'm':
        metabolites.loc[i, 'Compartment'] = 'm - mitochondria'
    if row['Compartment'] == 'r':
        metabolites.loc[i, 'Compartment'] = 'r - endoplasmic reticulum'
    if row['Compartment'] == 'e':
        metabolites.loc[i, 'Compartment'] = 'e - extracellular space'
    if row['Compartment'] == 'x':
        metabolites.loc[i, 'Compartment'] = 'x - peroxisome/glyoxysome'
    if row['Compartment'] == 'n':
        metabolites.loc[i, 'Compartment'] = 'n - nucleus'
    if row['Compartment'] == 'g':
        metabolites.loc[i, 'Compartment'] = 'g - golgi apparatus'
    if row['Compartment'] == 'im':
        metabolites.loc[i, 'Compartment'] = 'im - intermembrane space of mitochondria'

In [None]:
metabolites.to_excel('../Data/Metabolites/metabolites.xlsx')

### 4. Unique metabolite identification

In [4]:
# give service account details to gspread
sa = gspread.service_account(filename='credentials.json')

# sa is a gspread client, which can be used for connecting to the sheets
# by using the open method and the sheet name.
cho_recon = sa.open('CHO Network Reconstruction + Recon3D')

# we also need to specify the page name before getting the data. In this case we use the Rxns sheet.
metabolites_sheet = cho_recon.worksheet('Metabolites')

# We can extract the data using the get_all_records method and create a pd DataFrame
metabolites = pd.DataFrame(metabolites_sheet.get_all_records())
metabolites

Unnamed: 0,Curated,BiGG ID,Name,Formula,Compartment,KEGG,CHEBI,PubChem,Inchi,Hepatonet,EHMNID,SMILES,INCHI2,CC_ID,Stereoisomer Information of Metabolite Identified,Charge of the Metabolite Identified,CID_ID,PDB (ligand-expo) Experimental Coordinates File Url,Pub Chem Url,ChEBI Url
0,,10fthf5glu_c,10-formyltetrahydrofolate-[Glu](5),C40H45N11O19,c - cytosol,,,,,,,,,,,,,,,
1,,10fthf5glu_l,10-formyltetrahydrofolate-[Glu](5),C40H45N11O19,l - lysosome,,,,,,,,,,,,,,,
2,,10fthf5glu_m,10-formyltetrahydrofolate-[Glu](5),C40H45N11O19,m - mitochondria,,,,,,,,,,,,,,,
3,,10fthf6glu_c,10-formyltetrahydrofolate-[Glu](6),C45H51N12O22,c - cytosol,,,,InChI=1/C45H58N12O22/c46-45-55-36-35(38(67)56-...,,,,,,,,,,,
4,,10fthf6glu_l,10-formyltetrahydrofolate-[Glu](6),C45H51N12O22,l - lysosome,,,,InChI=1/C45H58N12O22/c46-45-55-36-35(38(67)56-...,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8661,,HC02098_c,3-Hydroxystearoyl-ACP,C18H35O2SR,c - cytosol,C16220,,,,,,CCCCCCCCCCCCCCCC(O)CC(=O)S[*],,,,Neutral,,,,https://www.ebi.ac.uk/chebi/searchId.do;jsessi...
8662,,HC02099_c,(2E)-Octadecenoyl-ACP,C18H33OSR,c - cytosol,C16221,,,,,,CCCCCCCCCCCCCCC\C=C\C(=O)S[*],,,,Neutral,,,,http://www.ebi.ac.uk/chebi/searchId.do?chebiId...
8663,,HC01988_c,Stearoyl-ACP,X,c - cytosol,C04088,,,,,,,,,,,,,,
8664,,HC10856_m,"Trans,Cis-Octadeca-2,9-Dienoyl Coenzyme A",C39H62N7O17P3S,m - mitochondria,,,,,,,CCCCCCCC\C=C/CCCCC\C=C\C(=O)SCCNC(=O)CCNC(=O)[...,InChI=1S/C39H66N7O17P3S/c1-4-5-6-7-8-9-10-11-1...,,,Neutral,86289273,,https://pubchem.ncbi.nlm.nih.gov/compound/8628...,https://www.ebi.ac.uk/chebi/searchId.do;jsessi...


In [5]:
print("Duplicated rxns by BiGG ID = ", len(metabolites['BiGG ID']) - len(metabolites['BiGG ID'].unique()))
print("Duplicated rxns by Name = ", len(metabolites['Name']) - len(metabolites['Name'].unique()))
print("Duplicated rxns by Formula = ", len(metabolites['Formula']) - len(metabolites['Formula'].unique()))
print("Duplicated rxns by KEGG = ", len(metabolites['KEGG']) - len(metabolites['KEGG'].unique()))

Duplicated rxns by BiGG ID =  0
Duplicated rxns by Name =  3752
Duplicated rxns by Formula =  5837
Duplicated rxns by KEGG =  7536


In [6]:
# Group the metabolites by formula and create an excel file to curate them manually.
grouped_by_formula = metabolites.groupby('Formula', group_keys=True).apply(lambda x:x)
grouped_by_formula.to_excel('../Data/Metabolites/grouped_by_formula_recon3d.xlsx')

## B. Metabolites Curation
In this second part of the notebook we curate missing information in the metabolites dataset generated above. Since many metabolites have been manually curated in the "Metabolites" google sheet file, we generate a new dataframe using the gspread library to obtain the metabolites dataset with all the changes

### 5. Generate new dataframe from the "Metabolites" sheet 

In [6]:
# give service account details to gspread
sa = gspread.service_account(filename='credentials.json')

# sa is a gspread client, which can be used for connecting to the sheets
# by using the open method and the sheet name.
cho_recon = sa.open('CHO Network Reconstruction')

# we also need to specify the page name before getting the data. In this case we use the Rxns sheet.
metabolites_sheet = cho_recon.worksheet('Metabolites')

# We can extract the data using the get_all_records method and create a pd DataFrame
metabolites = pd.DataFrame(metabolites_sheet.get_all_records())
metabolites

Unnamed: 0,Curated,BiGG ID,Name,Formula,Compartment,KEGG,CHEBI,PubChem,Inchi,Hepatonet,EHMNID,SMILES,INCHI2,CC_ID,Stereoisomer Information of Metabolite Identified,Charge of the Metabolite Identified,CID_ID,PDB (ligand-expo) Experimental Coordinates File Url,Pub Chem Url,ChEBI Url
0,,10fthf5glu_c,10-formyltetrahydrofolate-[Glu](5),C40H45N11O19,c - cytosol,,,,,,,,,,,,,,,
1,,10fthf5glu_l,10-formyltetrahydrofolate-[Glu](5),C40H45N11O19,l - lysosome,,,,,,,,,,,,,,,
2,,10fthf5glu_m,10-formyltetrahydrofolate-[Glu](5),C40H45N11O19,m - mitochondria,,,,,,,,,,,,,,,
3,,10fthf6glu_c,10-formyltetrahydrofolate-[Glu](6),C45H51N12O22,c - cytosol,,,,InChI=1/C45H58N12O22/c46-45-55-36-35(38(67)56-...,,,,,,,,,,,
4,,10fthf6glu_l,10-formyltetrahydrofolate-[Glu](6),C45H51N12O22,l - lysosome,,,,InChI=1/C45H58N12O22/c46-45-55-36-35(38(67)56-...,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5426,,HC02216_c,Prostaglandin-f2beta,C20H33O5,c - cytosol,,28922,439702,InChI=1S/C20H34O5/c1-2-3-6-9-15(21)12-13-17-16...,,,CCCCC[C@H](O)\C=C\[C@H]1[C@H](O)C[C@@H](O)[C@@...,InChI=1S/C20H34O5/c1-2-3-6-9-15(21)12-13-17-16...,,,Neutral,5280506,,https://pubchem.ncbi.nlm.nih.gov/compound/5280506,https://www.ebi.ac.uk/chebi/searchId.do;jsessi...
5427,,HC01361_e,Dihydroneopterin,C9H13N5O4,e - extracellular space,C04874,17001,,InChI=1S/C9H13N5O4/c10-9-13-7-5(8(18)14-9)12-3...,,,Nc1nc2c(c(=O)[nH]1)N=C(C(O)C(O)CO)CN2,,,,,,,,
5428,,HC02217_c,Prostaglandin-g2,C20H31O6,c - cytosol,,27647,,InChI=1S/C20H32O6/c1-2-3-6-9-15(24-23)12-13-17...,,C05956,CCCCC[C@H](OO)\C=C\[C@H]1[C@H]2C[C@H](OO2)[C@@...,InChI=1S/C20H32O6/c1-2-3-6-9-15(24-23)12-13-17...,PGX,,Neutral,5280883,http://ligand-expo.rcsb.org/pyapps/ldHandler.p...,https://pubchem.ncbi.nlm.nih.gov/compound/5280883,https://www.ebi.ac.uk/chebi/searchId.do?chebiI...
5429,PD,fdxo_2_2_c,Oxidized ferredoxin,Fe2S2X,c - cytosol,C00139,17908,,,,,[*][*][Fe+3]1([*][*])[S-2][Fe+3]([*][*])([*][*...,,,,,,,,


In [66]:
# Get BiGG descriptive names
import requests
from bs4 import BeautifulSoup
import time

# Unknown Mets: metabolites without names
unkown_mets = metabolites[metabolites['Name'] == '']

Descriptive_Names = [''] * len(unkown_mets)
Formulae = [''] * len(Descriptive_Names)
Changed = [True] * len(Descriptive_Names)

for Met_Counter, metID in enumerate(tqdm(unkown_mets['BiGG ID'].iloc[:])):
    print(Met_Counter)
    input_str = metID[:-2]
    response = requests.get(f"http://bigg.ucsd.edu/universal/metabolites/{input_str}")
    time.sleep(1)
    # Check if the request was successful
    if response.status_code != 200:
        D_Name = "BiGG ID not found in BiGG"
        Formulae_B = "BiGG ID not found in BiGG"
        Changed[Met_Counter] = False       
    else:    
        soup = BeautifulSoup(response.content, 'html.parser')
        N_Header = soup.find('h4', string='Descriptive name:')
        D_Name = N_Header.find_next_sibling('p').text
        N_Formulae = soup.find('h4', string='Formulae in BiGG models: ')
        Formulae_B = N_Formulae.find_next_sibling('p').text    
        if D_Name is None:
            D_Name = "Name not found in BiGG"            
        elif Formulae_B is None:
            Formulae_B = "Formula not found in BiGG"                
    Descriptive_Names[Met_Counter] = D_Name
    Formulae[Met_Counter] = Formulae_B

  0%|          | 0/160 [00:00<?, ?it/s]

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159


### Update empty metabolites

In [67]:
for Met_Counter, metID in enumerate(unkown_mets['BiGG ID']):
    print('before',unkown_mets['BiGG ID'].iloc[Met_Counter])
    print('before',unkown_mets['Formula'].iloc[Met_Counter])
    print('before',unkown_mets['Name'].iloc[Met_Counter])
    if unkown_mets['Formula'].iloc[Met_Counter] == '':
        unkown_mets['Formula'].iloc[Met_Counter] = Formulae[Met_Counter]  
    unkown_mets['Name'].iloc[Met_Counter] = Descriptive_Names[Met_Counter]
    print('..............................................')
    print('after',unkown_mets['BiGG ID'].iloc[Met_Counter])
    print('after',unkown_mets['Formula'].iloc[Met_Counter])
    print('after',unkown_mets['Name'].iloc[Met_Counter])
    print('..............................................')
    print('..............................................')
    print('..............................................')

before 3hoc101_7Zcoa_m
before C31H48N7O18P3S
before 
..............................................
after 3hoc101_7Zcoa_m
after C31H48N7O18P3S
after BiGG ID not found in BiGG
..............................................
..............................................
..............................................
before 3hoc101_7Zcoa_x
before C31H48N7O18P3S
before 
..............................................
after 3hoc101_7Zcoa_x
after C31H48N7O18P3S
after BiGG ID not found in BiGG
..............................................
..............................................
..............................................
before 3hoc121_6Ecoa_m
before C33H52N7O18P3S
before 
..............................................
after 3hoc121_6Ecoa_m
after C33H52N7O18P3S
after BiGG ID not found in BiGG
..............................................
..............................................
..............................................
before 3hoc122_6Z_9Zcoa_m
before C33H50N7O18P3S
befo

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  unkown_mets['Name'].iloc[Met_Counter] = Descriptive_Names[Met_Counter]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  unkown_mets['Formula'].iloc[Met_Counter] = Formulae[Met_Counter]


In [68]:
metabolites.update(unkown_mets)

# Manual Curation
for bigg_id in metabolites['BiGG ID']:
    # xtra = Xanthurenic acid; C10H6NO4
    # http://bigg.ucsd.edu/models/iCHOv1/reactions/r0647
    if 'xtra' in bigg_id:
        metabolites.loc[metabolites['BiGG ID'] == bigg_id, 'Name'] = 'Xanthurenic acid'
        metabolites.loc[metabolites['BiGG ID'] == bigg_id, 'Formula'] = 'C10H6NO4'
    # chedxch = Bilirubin-monoglucuronoside; C39H42N4O122-
    # Reactions name = 'ATP-binding Cassette (ABC) TCDB:3.A.1.208.2' --> https://metabolicatlas.org/identifier/TCDB/3.A.1.208.2
    elif 'chedxch' in bigg_id:
        metabolites.loc[metabolites['BiGG ID'] == bigg_id, 'Name'] = 'Bilirubin-monoglucuronoside'
        metabolites.loc[metabolites['BiGG ID'] == bigg_id, 'Formula'] = 'C39H42N4O122-'
    # chatGTP
    elif '3hoc246_6Z_9Z_12Z_15Z_18Z_21Zcoa' in bigg_id:
        metabolites.loc[metabolites['BiGG ID'] == bigg_id, 'Name'] = 'CoA molecule that has a 24-carbon fatty acid with six double bonds, with the location of the double bonds specified by the numbers and Zs'
    # chatGTP
    elif 'c247_2Z_6Z_9Z_12Z_15Z_18Z_21Zcoa' in bigg_id:
        metabolites.loc[metabolites['BiGG ID'] == bigg_id, 'Name'] = 'CoA molecule that has a modified version of the same 24-carbon fatty acid, with a hydroxyl group added at the third carbon position'
    # chatGTP
    elif '3hoc143_5Z_8Z_11Zcoa' in bigg_id:
        metabolites.loc[metabolites['BiGG ID'] == bigg_id, 'Name'] = 'CoA molecule that has a 14-carbon fatty acid with three double bonds, with the location of the double bonds specified by the numbers and Zs.'
    # chatGTP
    elif '3oc143_5Z_8Z_11Zcoa' in bigg_id:
        metabolites.loc[metabolites['BiGG ID'] == bigg_id, 'Name'] = 'CoA molecule that has a modified version of the same 14-carbon fatty acid, with the hydroxyl group removed and one of the double bonds converted to a keto group'
    # chatGTP
    elif 'acgalgalacglcgalgluside' in bigg_id:
        metabolites.loc[metabolites['BiGG ID'] == bigg_id, 'Name'] = 'Complex glycosphingolipid that contains multiple sugar residues'

    # 12e8hdx WTF?
    # hdxur Dead End

metabolites.to_excel('../Data/Metabolites/metabolites_final.xlsx')

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  unkown_mets['Name'].iloc[Met_Counter] = Descriptive_Names[Met_Counter]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  unkown_mets['Formula'].iloc[Met_Counter] = Formulae[Met_Counter]


## C. Identification of duplicated metabolites
The idea here is to add the metabolites from the reactions added from Recon 3D, at the same time that we do not over write data from our own Metabolites dataset.

In [29]:
# give service account details to gspread
sa = gspread.service_account(filename='credentials.json')

# sa is a gspread client, which can be used for connecting to the sheets
# by using the open method and the sheet name.
cho_recon = sa.open('CHO Network Reconstruction + Recon3D')

# we also need to specify the page name before getting the data. In this case we use the Rxns sheet.
met_sheet = cho_recon.worksheet('Metabolites')

In [30]:
# We can extract the data using the get_all_records method and create a pd DataFrame
met = pd.DataFrame(met_sheet.get_all_records())
met

Unnamed: 0,Curated,BiGG ID,Name,Formula,Compartment,KEGG,CHEBI,PubChem,Inchi,Hepatonet,EHMNID,SMILES,INCHI2,CC_ID,Stereoisomer Information of Metabolite Identified,Charge of the Metabolite Identified,CID_ID,PDB (ligand-expo) Experimental Coordinates File Url,Pub Chem Url,ChEBI Url
0,,10fthf5glu_c,10-formyltetrahydrofolate-[Glu](5),C40H45N11O19,c - cytosol,,,,,,,,,,,,,,,
1,,10fthf5glu_l,10-formyltetrahydrofolate-[Glu](5),C40H45N11O19,l - lysosome,,,,,,,,,,,,,,,
2,,10fthf5glu_m,10-formyltetrahydrofolate-[Glu](5),C40H45N11O19,m - mitochondria,,,,,,,,,,,,,,,
3,,10fthf6glu_c,10-formyltetrahydrofolate-[Glu](6),C45H51N12O22,c - cytosol,,,,InChI=1/C45H58N12O22/c46-45-55-36-35(38(67)56-...,,,,,,,,,,,
4,,10fthf6glu_l,10-formyltetrahydrofolate-[Glu](6),C45H51N12O22,l - lysosome,,,,InChI=1/C45H58N12O22/c46-45-55-36-35(38(67)56-...,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8661,,HC02098_c,3-Hydroxystearoyl-ACP,C18H35O2SR,c - cytosol,C16220,,,,,,CCCCCCCCCCCCCCCC(O)CC(=O)S[*],,,,Neutral,,,,https://www.ebi.ac.uk/chebi/searchId.do;jsessi...
8662,,HC02099_c,(2E)-Octadecenoyl-ACP,C18H33OSR,c - cytosol,C16221,,,,,,CCCCCCCCCCCCCCC\C=C\C(=O)S[*],,,,Neutral,,,,http://www.ebi.ac.uk/chebi/searchId.do?chebiId...
8663,,HC01988_c,Stearoyl-ACP,X,c - cytosol,C04088,,,,,,,,,,,,,,
8664,,HC10856_m,"Trans,Cis-Octadeca-2,9-Dienoyl Coenzyme A",C39H62N7O17P3S,m - mitochondria,,,,,,,CCCCCCCC\C=C/CCCCC\C=C\C(=O)SCCNC(=O)CCNC(=O)[...,InChI=1S/C39H66N7O17P3S/c1-4-5-6-7-8-9-10-11-1...,,,Neutral,86289273,,https://pubchem.ncbi.nlm.nih.gov/compound/8628...,https://www.ebi.ac.uk/chebi/searchId.do;jsessi...


In [31]:
# Convert metabolites names to lower case
met['Name'] = met['Name'].str.lower()
met['BiGG ID'] = met['BiGG ID'].str[:-2]
met

Unnamed: 0,Curated,BiGG ID,Name,Formula,Compartment,KEGG,CHEBI,PubChem,Inchi,Hepatonet,EHMNID,SMILES,INCHI2,CC_ID,Stereoisomer Information of Metabolite Identified,Charge of the Metabolite Identified,CID_ID,PDB (ligand-expo) Experimental Coordinates File Url,Pub Chem Url,ChEBI Url
0,,10fthf5glu,10-formyltetrahydrofolate-[glu](5),C40H45N11O19,c - cytosol,,,,,,,,,,,,,,,
1,,10fthf5glu,10-formyltetrahydrofolate-[glu](5),C40H45N11O19,l - lysosome,,,,,,,,,,,,,,,
2,,10fthf5glu,10-formyltetrahydrofolate-[glu](5),C40H45N11O19,m - mitochondria,,,,,,,,,,,,,,,
3,,10fthf6glu,10-formyltetrahydrofolate-[glu](6),C45H51N12O22,c - cytosol,,,,InChI=1/C45H58N12O22/c46-45-55-36-35(38(67)56-...,,,,,,,,,,,
4,,10fthf6glu,10-formyltetrahydrofolate-[glu](6),C45H51N12O22,l - lysosome,,,,InChI=1/C45H58N12O22/c46-45-55-36-35(38(67)56-...,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8661,,HC02098,3-hydroxystearoyl-acp,C18H35O2SR,c - cytosol,C16220,,,,,,CCCCCCCCCCCCCCCC(O)CC(=O)S[*],,,,Neutral,,,,https://www.ebi.ac.uk/chebi/searchId.do;jsessi...
8662,,HC02099,(2e)-octadecenoyl-acp,C18H33OSR,c - cytosol,C16221,,,,,,CCCCCCCCCCCCCCC\C=C\C(=O)S[*],,,,Neutral,,,,http://www.ebi.ac.uk/chebi/searchId.do?chebiId...
8663,,HC01988,stearoyl-acp,X,c - cytosol,C04088,,,,,,,,,,,,,,
8664,,HC10856,"trans,cis-octadeca-2,9-dienoyl coenzyme a",C39H62N7O17P3S,m - mitochondria,,,,,,,CCCCCCCC\C=C/CCCCC\C=C\C(=O)SCCNC(=O)CCNC(=O)[...,InChI=1S/C39H66N7O17P3S/c1-4-5-6-7-8-9-10-11-1...,,,Neutral,86289273,,https://pubchem.ncbi.nlm.nih.gov/compound/8628...,https://www.ebi.ac.uk/chebi/searchId.do;jsessi...


In [94]:
# Generate a list with duplicated metabolites

grouped = met.groupby(['Name', 'Formula'])

# Initialize an empty dictionary to store the results
duplicated_metabolites = []

# Iterate over the grouped DataFrame
for (Name, Formula), group in grouped:
    # Check if the group has more than one element (i.e., duplicate) and filter out those metabolites whose names are unknown
    if group['BiGG ID'].nunique() > 1 and Name != 'bigg id not found in bigg':
        unique_ids = group['BiGG ID'].unique()
        duplicated_metabolites.append((Name, Formula, unique_ids))

        


In [99]:
len(duplicated_metabolites)

135

In [102]:
duplicated_metabolites

[('(2s)-pristanoyl coenzyme a',
  'C40H68N7O17P3S',
  array(['Spristcoa', 'CE5125'], dtype=object)),
 ('(3s)-3-hydroxy-cis,cis-palmito-7,10-dienoyl-coa',
  'C37H58N7O18P3S',
  array(['3hoc162_7Z_10Zcoa', 'CE2418'], dtype=object)),
 ('(3s)-3-hydroxydodec-cis-6-enoyl-coa',
  'C33H52N7O18P3S',
  array(['3hoc121_6Zcoa', 'CE2420'], dtype=object)),
 ('(3s)-3-hydroxylinoleoyl-coa',
  'C39H62N7O18P3S',
  array(['3hoc182_9Z_12Zcoa', 'CE2421'], dtype=object)),
 ('(alpha-d-mannosyl)5-beta-d-mannosyl-diacetylchitobiosyl-l-asparagine (protein)',
  'C52H87N2O40X',
  array(['m5masnB1', 'm5masnC', 'm5masnB2'], dtype=object)),
 ('(alpha-d-mannosyl)6-beta-d-mannosyl-diacetylchitobiosyl-l-asparagine (protein)',
  'C58H97N2O45X',
  array(['m6masnC', 'm6masnB2', 'm6masnB1', 'm6masnA'], dtype=object)),
 ('(gal)4 (glc)1 (glcnac)3 (lfuc)2 (cer)1',
  'C84H145N4O50RCO',
  array(['fucfucgalacglcgalacglcgal14acglcgalgluside_cho',
         'fucfucgalacglc13galacglcgal14acglcgalgluside_cho'], dtype=object)),
 ('(ga

In [6]:
met.to_dict(orient='records')

[{'Curated': '',
  'BiGG ID': '10fthf5glu_c',
  'Name': '10-formyltetrahydrofolate-[glu](5)',
  'Formula': 'C40H45N11O19',
  'Compartment': 'c - cytosol',
  'KEGG': '',
  'CHEBI': '',
  'PubChem': '',
  'Inchi': '',
  'Hepatonet': '',
  'EHMNID': '',
  'SMILES': '',
  'INCHI2': '',
  'CC_ID': '',
  'Stereoisomer Information of Metabolite Identified': '',
  'Charge of the Metabolite Identified': '',
  'CID_ID': '',
  'PDB (ligand-expo) Experimental Coordinates  File Url': '',
  'Pub Chem Url': '',
  'ChEBI Url': ''},
 {'Curated': '',
  'BiGG ID': '10fthf5glu_l',
  'Name': '10-formyltetrahydrofolate-[glu](5)',
  'Formula': 'C40H45N11O19',
  'Compartment': 'l - lysosome',
  'KEGG': '',
  'CHEBI': '',
  'PubChem': '',
  'Inchi': '',
  'Hepatonet': '',
  'EHMNID': '',
  'SMILES': '',
  'INCHI2': '',
  'CC_ID': '',
  'Stereoisomer Information of Metabolite Identified': '',
  'Charge of the Metabolite Identified': '',
  'CID_ID': '',
  'PDB (ligand-expo) Experimental Coordinates  File Url': '