# Metabolites

## Description
This notebook contains a complete description of all the processes involved in the creations of the "Metabolites" sheet in the "CHO Network Reconstruction" Google Sheets file. <br>
[**(A)** Generation of Metabolites](#A_generation_dataset) dataset from the available informations in previous models. <br>
[**(B)** Metabolites Curation](#B_curation) for those metabolites that we don't have formulas.


In [1]:
# Import libraries
import gspread
import pandas as pd
import numpy as np
import cobra
from cobra.io import read_sbml_model
from cobra.io.mat import load_matlab_model

from tqdm.notebook import tqdm



In [2]:
# Define functions

def df_to_dict(df, key_col):
    """
    This function takes a pandas dataframe and a key column, and returns a dictionary
    with the key column as the dictionary keys and the rest of the columns as the values.
    """
    # Create an empty dictionary to hold the key-value pairs
    my_dict = {}
    
    # Loop through each row in the dataframe
    for index, row in df.iterrows():
        # Use the value in the key column as the dictionary key
        key_value = row[key_col]
        
        # Use the rest of the columns as the dictionary values
        value_dict = row.drop(key_col).to_dict()
        
        # Add the key-value pair to the dictionary
        my_dict[key_value] = value_dict
    
    return my_dict

## A. Generation of Metabolites dataset
We start by creating a list of all the metabolites included in the reactions of our reconstruction (1). Then we create a dataset containing all the metabolites info from Recon3D, iCHO2291 and iCHO1766 models, including supplementary information from Recon 3D (2). Now we can map back this information into the metabolites from our reconstruction and generate an excell file for uploading into Google Sheets (3).

### 1. Retrieve a list of all the metabolites from our reconstruction
The list of all the reactions and the metabolites involved are in the Rxns Sheet in the Google Sheet.

In [3]:
# give service account details to gspread
sa = gspread.service_account(filename='credentials.json')

# sa is a gspread client, which can be used for connecting to the sheets
# by using the open method and the sheet name.
cho_recon = sa.open('CHO Network Reconstruction + Recon3D')

# we also need to specify the page name before getting the data. In this case we use the Rxns sheet.
rxns_sheet = cho_recon.worksheet('Rxns')

In [4]:
# We can extract the data using the get_all_records method and create a pd DataFrame
df = pd.DataFrame(rxns_sheet.get_all_records())
df

Unnamed: 0,Curated,Reaction,Reaction Name,Reaction Formula,Subsystem,GPR_hef,GPR_fou,GPR_yeo,GPR_Recon3D,GPR_final,Conf. Score,Curation Notes,References
0,PD,10FTHF5GLUtl,"5-glutamyl-10FTHF transport, lysosomal",10fthf5glu_c --> 10fthf5glu_l,"TRANSPORT, LYSOSOMAL",,,,,,1,No information available in the literature abo...,
1,PD,10FTHF5GLUtm,"5-glutamyl-10FTHF transport, mitochondrial",10fthf5glu_m --> 10fthf5glu_c,"TRANSPORT, MITOCHONDRIAL",,,,,,1,No information available in the literature abo...,
2,PD,10FTHF6GLUtl,"6-glutamyl-10FTHF transport, lysosomal",10fthf6glu_c --> 10fthf6glu_l,"TRANSPORT, LYSOSOMAL",,,,,,1,No information available in the literature abo...,
3,PD,10FTHF6GLUtm,"6-glutamyl-10FTHF transport, mitochondrial",10fthf6glu_m --> 10fthf6glu_c,"TRANSPORT, MITOCHONDRIAL",,,,,,1,No information available in the literature abo...,
4,PD,10FTHF7GLUtl,"7-glutamyl-10FTHF transport, lysosomal",10fthf7glu_c --> 10fthf7glu_l,"TRANSPORT, LYSOSOMAL",,,,,,1,No information available in the literature abo...,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
11907,,r2436,Mitochondrial Carrier (Mc) Tcdb:2.A.29.8.3,crn_m + odecrn_c <=> crn_c + odecrn_m,"Transport, mitochondrial",,,,,100765000,,,
11908,,r2437,Mitochondrial Carrier (Mc) Tcdb:2.A.29.8.3,crn_m + pcrn_c <=> crn_c + pcrn_m,"Transport, mitochondrial",,,,,100765000,,,
11909,,r2438,Mitochondrial Carrier (Mc) Tcdb:2.A.29.8.3,c4crn_c + crn_m <=> c4crn_m + crn_c,"Transport, mitochondrial",,,,,100765000,,,
11910,,r2439,"Zinc (Zn2+)-Iron (Fe2+) Permease (Zip), Tcdb:2...",hco3_c + so4_e --> hco3_e + so4_c,"Transport, extracellular",,,,,103159861,,,


In [5]:
# Create a cobra model to identify the metabolites involved in our reconstruction
model = cobra.Model("iCHOxxxx")
lr = []

for _, row in df.iterrows():
    r = cobra.Reaction(row['Reaction'])
    lr.append(r)
    
model.add_reactions(lr)
model

0,1
Name,iCHOxxxx
Memory address,1479bf1c0
Number of metabolites,0
Number of reactions,11912
Number of genes,0
Number of groups,0
Objective expression,0
Compartments,


In [6]:
# With the built in function "build_reaction_from_string" we can identify the metabolites
for i,r in enumerate(tqdm(model.reactions)):
    r.build_reaction_from_string(df['Reaction Formula'][i])

  0%|          | 0/11912 [00:00<?, ?it/s]

unknown metabolite '10fthf5glu_c' created
unknown metabolite '10fthf5glu_l' created
unknown metabolite '10fthf5glu_m' created
unknown metabolite '10fthf6glu_c' created
unknown metabolite '10fthf6glu_l' created
unknown metabolite '10fthf6glu_m' created
unknown metabolite '10fthf7glu_c' created
unknown metabolite '10fthf7glu_l' created
unknown metabolite '10fthf7glu_m' created
unknown metabolite '10fthf_c' created
unknown metabolite '10fthf_l' created
unknown metabolite '10fthf_m' created
unknown metabolite '11docrtsl_c' created
unknown metabolite '11docrtsl_m' created
unknown metabolite '11docrtsl_r' created
unknown metabolite '11docrtstrn_c' created
unknown metabolite '11docrtstrn_m' created
unknown metabolite '11docrtstrn_r' created
unknown metabolite '12HPET_c' created
unknown metabolite 'atp_c' created
unknown metabolite 'h2o_c' created
unknown metabolite '12HPET_e' created
unknown metabolite 'adp_c' created
unknown metabolite 'h_c' created
unknown metabolite 'pi_c' created
unknown 

unknown metabolite '4hpro_LT_c' created
unknown metabolite 'ala_L_e' created
unknown metabolite '4hpro_LT_e' created
unknown metabolite 'ala_L_c' created
unknown metabolite '4hpro_LT_m' created
unknown metabolite '4mop_c' created
unknown metabolite '4mop_m' created
unknown metabolite '4mop_e' created
unknown metabolite '4mptnl_c' created
unknown metabolite '4mptnl_e' created
unknown metabolite '4mptnl_m' created
unknown metabolite '4mptnl_r' created
unknown metabolite '4mtolbutamide_e' created
unknown metabolite '4mtolbutamide_c' created
unknown metabolite '4nphsf_e' created
unknown metabolite '4nphsf_c' created
unknown metabolite '4nph_c' created
unknown metabolite '4nph_e' created
unknown metabolite 'cl_e' created
unknown metabolite 'cl_c' created
unknown metabolite '4pyrdx_c' created
unknown metabolite '4pyrdx_e' created
unknown metabolite '5adtststeroneglc_c' created
unknown metabolite '5adtststeroneglc_e' created
unknown metabolite '5adtststeroneglc_r' created
unknown metabolite '

unknown metabolite 'lnlncacoa_x' created
unknown metabolite 'arachcoa_m' created
unknown metabolite 'dlnlcgcoa_m' created
unknown metabolite 'dlnlcgcoa_x' created
unknown metabolite 'eicostetcoa_m' created
unknown metabolite 'eicostetcoa_x' created
unknown metabolite 'clpndcoa_m' created
unknown metabolite 'clpndcoa_x' created
unknown metabolite 'fad_x' created
unknown metabolite 'fadh2_x' created
unknown metabolite 'c226_2Z_4Z_7Z_10Z_13Z_16Zcoa_m' created
unknown metabolite 'c226_2Z_4Z_7Z_10Z_13Z_16Zcoa_x' created
unknown metabolite 'crvncoa_m' created
unknown metabolite 'nrvnccoa_x' created
unknown metabolite 'tettet6coa_m' created
unknown metabolite 'tettet6coa_x' created
unknown metabolite 'tetpent6coa_m' created
unknown metabolite 'tetpent6coa_x' created
unknown metabolite 'tethex3coa_x' created
unknown metabolite 'hexccoa_x' created
unknown metabolite 'ivcoa_m' created
unknown metabolite '3mb2coa_m' created
unknown metabolite '2mp2coa_m' created
unknown metabolite 'Sdmhptcoa_m' c

unknown metabolite 'cgly_e' created
unknown metabolite 'amp_x' created
unknown metabolite 'amp_r' created
unknown metabolite 'strch1_e' created
unknown metabolite 'strch2_e' created
unknown metabolite 'glygn2_e' created
unknown metabolite 'glygn4_e' created
unknown metabolite 'andrstrnglc_c' created
unknown metabolite 'andrstrnglc_e' created
unknown metabolite 'andrstrnglc_r' created
unknown metabolite 'andrstrn_e' created
unknown metabolite 'andrstrn_c' created
unknown metabolite 'andrstrn_r' created
unknown metabolite 'anth_e' created
unknown metabolite 'anth_c' created
unknown metabolite 'antipyrene_e' created
unknown metabolite 'antipyrene_c' created
unknown metabolite 'ap4a_c' created
unknown metabolite 'ala_B_m' created
unknown metabolite 'msa_m' created
unknown metabolite 'apnnox_e' created
unknown metabolite 'apnnox_c' created
unknown metabolite 'ApoA1_c' created
unknown metabolite 'ApoA1_e' created
unknown metabolite 'ApoA1_r' created
unknown metabolite 'apoC_c' created
unknow

unknown metabolite 'bildglcur_r' created
unknown metabolite 'bilglcur_e' created
unknown metabolite 'bilglcur_c' created
unknown metabolite 'bilglcur_r' created
unknown metabolite 'biliverd_c' created
unknown metabolite 'bilirub_c' created
unknown metabolite 'bilirub_e' created
unknown metabolite 'bilirub_r' created
unknown metabolite 'biocyt_n' created
unknown metabolite 'dolmanp_r' created
unknown metabolite 'memgacpail_cho_r' created
unknown metabolite 'dolp_r' created
unknown metabolite 'h_r' created
unknown metabolite 'm2emgacpail_cho_r' created
unknown metabolite 'bnpyr23ox_c' created
unknown metabolite 'bnpyr23diol_c' created
unknown metabolite 'bnpyr23ox_r' created
unknown metabolite 'h2o_r' created
unknown metabolite 'bnpyr23diol_r' created
unknown metabolite 'bnpyr23ox_x' created
unknown metabolite 'bnpyr23diol_x' created
unknown metabolite 'bnpyr45ox_c' created
unknown metabolite 'bnpyr45diol_c' created
unknown metabolite 'bnpyr45ox_r' created
unknown metabolite 'bnpyr45diol

unknown metabolite 'pail_cho_c' created
unknown metabolite 'cdpdag_cho_m' created
unknown metabolite 'cdprbtl_c' created
unknown metabolite 'cdprbtl_g' created
unknown metabolite 'cdp_c' created
unknown metabolite 'cdp_g' created
unknown metabolite 'ctp_c' created
unknown metabolite 'ctp_m' created
unknown metabolite 'dag_cho_c' created
unknown metabolite 'pchol_cho_c' created
unknown metabolite 'crm_cho_c' created
unknown metabolite 'crmp_cho_c' created
unknown metabolite 'crm_cho_g' created
unknown metabolite 'crm_cho_r' created
unknown metabolite 'gluside_cho_c' created
unknown metabolite 'gluside_cho_g' created
unknown metabolite 'gluside_cho_r' created
unknown metabolite '35cgmp_e' created
unknown metabolite 'chsterol_r' created
unknown metabolite 'xol25oh_r' created
unknown metabolite 'chol_c' created
unknown metabolite 'chol_n' created
unknown metabolite 'coa_n' created
unknown metabolite 'chtn_e' created
unknown metabolite 'chtbs_e' created
unknown metabolite 'cholp_c' created


unknown metabolite 'dlnlcgcoa_r' created
unknown metabolite 'eicostetcoa_r' created
unknown metabolite 'tmndnccoa_r' created
unknown metabolite 'tettet6coa_r' created
unknown metabolite 'tetpent6coa_r' created
unknown metabolite 'tetpent3coa_r' created
unknown metabolite 'tethex3coa_r' created
unknown metabolite 'tag_cho_c' created
unknown metabolite 'dgchol_e' created
unknown metabolite 'lpdmd_e' created
unknown metabolite 'dgchol_c' created
unknown metabolite 'lpdmd_c' created
unknown metabolite 'dgmp_c' created
unknown metabolite 'dgmp_m' created
unknown metabolite 'dadp_m' created
unknown metabolite 'dgdp_m' created
unknown metabolite 'dgsn_m' created
unknown metabolite 'dgsn_e' created
unknown metabolite 'dgsn_c' created
unknown metabolite 'dgtp_m' created
unknown metabolite '3dhguln_c' created
unknown metabolite 'dhdascb_e' created
unknown metabolite 'Rtotalcoa_x' created
unknown metabolite 'dhap_x' created
unknown metabolite 'dhap_m' created
unknown metabolite 'dhcholestanate_c'

unknown metabolite 'etfox_m' created
unknown metabolite 'etfrd_m' created
unknown metabolite 'etha_c' created
unknown metabolite 'etoh_e' created
unknown metabolite '10fthf5glu_e' created
unknown metabolite '10fthf6glu_e' created
unknown metabolite '10fthf7glu_e' created
unknown metabolite '10fthf_e' created
unknown metabolite '11_cis_retfa_e' created
unknown metabolite '13_cis_retnglc_e' created
unknown metabolite '1glyc_cho_e' created
unknown metabolite '23cump_e' created
unknown metabolite '3hexdcrn_e' created
unknown metabolite '3tdcrn_e' created
unknown metabolite '3ump_e' created
unknown metabolite '4abutn_e' created
unknown metabolite '4hphac_e' created
unknown metabolite '5dhf_e' created
unknown metabolite '5thf_e' created
unknown metabolite '6dhf_e' created
unknown metabolite '6thf_e' created
unknown metabolite '7dhf_e' created
unknown metabolite '7thf_e' created
unknown metabolite '9_cis_retfa_e' created
unknown metabolite 'C02470_e' created
unknown metabolite 'C02528_e' crea

unknown metabolite 'ptdca_c' created
unknown metabolite 'ptdcacoa_c' created
unknown metabolite 'hdca_x' created
unknown metabolite 'hpdca_c' created
unknown metabolite 'hpdcacoa_c' created
unknown metabolite 'ocdca_x' created
unknown metabolite 'vacc_x' created
unknown metabolite 'octd11ecoa_x' created
unknown metabolite 'lneldc_x' created
unknown metabolite 'lneldccoa_x' created
unknown metabolite 'pristcoa_c' created
unknown metabolite 'arach_x' created
unknown metabolite 'eicostet_x' created
unknown metabolite 'tmndnc_c' created
unknown metabolite 'tmndnc_x' created
unknown metabolite 'phyt_c' created
unknown metabolite 'phytcoa_c' created
unknown metabolite 'docosac_x' created
unknown metabolite 'adrn_x' created
unknown metabolite 'dcsptn1_x' created
unknown metabolite 'clpnd_x' created
unknown metabolite 'crvnc_x' created
unknown metabolite 'lgnc_c' created
unknown metabolite 'lgnc_x' created
unknown metabolite 'nrvnc_c' created
unknown metabolite 'tettet6_c' created
unknown meta

unknown metabolite 'glyc3p_c' created
unknown metabolite 'glu5sa_c' created
unknown metabolite '1pyr5c_c' created
unknown metabolite 'glu5sa_m' created
unknown metabolite '1pyr5c_m' created
unknown metabolite 'glu5p_m' created
unknown metabolite '6pgl_r' created
unknown metabolite '6pgl_c' created
unknown metabolite 'gudac_c' created
unknown metabolite 'gacpail_cho_c' created
unknown metabolite 'gacpail_cho_r' created
unknown metabolite 'paps_g' created
unknown metabolite 'pap_g' created
unknown metabolite 'sgalside_cho_g' created
unknown metabolite 'galacglcgalgbside_cho_c' created
unknown metabolite 'ksi_deg26_l' created
unknown metabolite 'ksi_deg27_l' created
unknown metabolite 'ksi_deg29_l' created
unknown metabolite 'ksi_deg30_l' created
unknown metabolite 'ksi_deg32_l' created
unknown metabolite 'ksi_deg33_l' created
unknown metabolite 'ksi_deg35_l' created
unknown metabolite 'ksi_deg36_l' created
unknown metabolite 'ksi_deg38_l' created
unknown metabolite 'ksi_deg39_l' created


unknown metabolite 'gullac_c' created
unknown metabolite 'glucys_c' created
unknown metabolite 'pram_c' created
unknown metabolite '2hog_m' created
unknown metabolite 'glutrna_c' created
unknown metabolite 'trnaglu_c' created
unknown metabolite 'glutrna_m' created
unknown metabolite 'trnaglu_m' created
unknown metabolite 'glx_c' created
unknown metabolite 'oxa_x' created
unknown metabolite 'glyc3p_x' created
unknown metabolite 'ivgly_m' created
unknown metabolite '2mbgly_m' created
unknown metabolite 'ibgly_m' created
unknown metabolite 'gudac_m' created
unknown metabolite 'glyc3p_m' created
unknown metabolite 'g3pi_c' created
unknown metabolite 'glyc_c' created
unknown metabolite 'mi1p_D_c' created
unknown metabolite 'glyc_R_c' created
unknown metabolite 'glyclt_x' created
unknown metabolite 'glyc_S_c' created
unknown metabolite 'glyc_m' created
unknown metabolite 'glygly_c' created
unknown metabolite 'glyleu_c' created
unknown metabolite 'lgt_S_c' created
unknown metabolite 'lgt_S_m'

unknown metabolite 'hmgth_c' created
unknown metabolite 'hmgcoa_r' created
unknown metabolite 'acac_r' created
unknown metabolite 'hlys_c' created
unknown metabolite 'hom_L_c' created
unknown metabolite 'hpdcacrn_c' created
unknown metabolite 'hpdcacrn_m' created
unknown metabolite '5HPET_r' created
unknown metabolite 'hpyr_c' created
unknown metabolite 'hop_c' created
unknown metabolite '1p3h5c_m' created
unknown metabolite 'hpyr_m' created
unknown metabolite 'hpyr_x' created
unknown metabolite 'hretn_c' created
unknown metabolite 'hretn_n' created
unknown metabolite 'hs_deg1_l' created
unknown metabolite 'hs_deg2_l' created
unknown metabolite 'hs_deg6_l' created
unknown metabolite 'hs_deg7_l' created
unknown metabolite 'hs_deg12_l' created
unknown metabolite 'hs_deg13_l' created
unknown metabolite 'hs_deg18_l' created
unknown metabolite 'hs_deg19_l' created
unknown metabolite 'cortsn_r' created
unknown metabolite 'andrstndn_r' created
unknown metabolite 'tststerone_r' created
unknown

unknown metabolite 'mi1345p_c' created
unknown metabolite 'mi134p_c' created
unknown metabolite 'mi1346p_n' created
unknown metabolite 'mi1346p_c' created
unknown metabolite 'mi13p_c' created
unknown metabolite 'mi34p_c' created
unknown metabolite 'mi1456p_n' created
unknown metabolite 'mi145p_n' created
unknown metabolite 'mi145p_c' created
unknown metabolite 'mi14p_c' created
unknown metabolite 'mi4p_D_c' created
unknown metabolite 'mi14p_n' created
unknown metabolite 'mi1p_D_n' created
unknown metabolite 'mi3456p_c' created
unknown metabolite 'mi3p_D_c' created
unknown metabolite '2mcacn_c' created
unknown metabolite 'micit_c' created
unknown metabolite 'ppmi12346p_c' created
unknown metabolite 'minohp_n' created
unknown metabolite 'ppmi12346p_n' created
unknown metabolite 'm5masnC_g' created
unknown metabolite 'm5masnB2_g' created
unknown metabolite 'm6masnB1_g' created
unknown metabolite 'm6masnA_g' created
unknown metabolite 'm7masnC_g' created
unknown metabolite 'mmcoa_S_m' crea

unknown metabolite 'prostgh2_c' created
unknown metabolite 'prostgd2_c' created
unknown metabolite 'prostgh2_r' created
unknown metabolite 'prostgd2_r' created
unknown metabolite 'prostge2_r' created
unknown metabolite 'prostgi2_r' created
unknown metabolite '2pglyc_c' created
unknown metabolite 'pgp_cho_c' created
unknown metabolite 'pheacgln_c' created
unknown metabolite 'phaccoa_m' created
unknown metabolite 'pheacgly_m' created
unknown metabolite 'pheacgly_c' created
unknown metabolite 'phpyr_c' created
unknown metabolite 'phe_L_m' created
unknown metabolite 'phpyr_m' created
unknown metabolite 'thbpt4acam_c' created
unknown metabolite 'phetrna_c' created
unknown metabolite 'trnaphe_c' created
unknown metabolite 'phetrna_m' created
unknown metabolite 'trnaphe_m' created
unknown metabolite 'phe_L_l' created
unknown metabolite 'succ_x' created
unknown metabolite 'pail345p_cho_c' created
unknown metabolite 'pail345p_cho_n' created
unknown metabolite 'pail34p_cho_c' created
unknown met

unknown metabolite 'atp_g' created
unknown metabolite 'pail_cho_g' created
unknown metabolite 'adp_g' created
unknown metabolite 'pail5p_cho_g' created
unknown metabolite 'pail_cho_r' created
unknown metabolite 'pail5p_cho_r' created
unknown metabolite 'pail45p_cho_m' created
unknown metabolite 'pail5p_cho_m' created
unknown metabolite 'CE0469_c' created
unknown metabolite 'C03693_c' created
unknown metabolite 'CE2418_c' created
unknown metabolite 'CE2422_c' created
unknown metabolite 'CE2417_c' created
unknown metabolite 'CE2424_c' created
unknown metabolite 'CE2420_c' created
unknown metabolite 'CE0693_c' created
unknown metabolite 'N1aspmd_x' created
unknown metabolite '3aap_x' created
unknown metabolite 'ptrc_x' created
unknown metabolite 'C03413_x' created
unknown metabolite 'CE2180_c' created
unknown metabolite 'estrone_l' created
unknown metabolite 'nadph_l' created
unknown metabolite 'CE2180_l' created
unknown metabolite 'nadp_l' created
unknown metabolite 'CE2180_r' created
un

unknown metabolite 'CE1310_m' created
unknown metabolite 'C05300_r' created
unknown metabolite 'CE2963_c' created
unknown metabolite 'CE2964_c' created
unknown metabolite 'CE5932_c' created
unknown metabolite 'CE5013_c' created
unknown metabolite 'CE5014_c' created
unknown metabolite 'CE3136_c' created
unknown metabolite 'CE5786_c' created
unknown metabolite 'CE5788_c' created
unknown metabolite 'CE5789_c' created
unknown metabolite 'CE5794_c' created
unknown metabolite 'CE5795_c' created
unknown metabolite 'CE5796_c' created
unknown metabolite 'CE5794_l' created
unknown metabolite 'CE5795_l' created
unknown metabolite 'CE5796_l' created
unknown metabolite 'CE5797_c' created
unknown metabolite 'CE5798_c' created
unknown metabolite 'CE5456_c' created
unknown metabolite 'CE5276_c' created
unknown metabolite 'CE5025_c' created
unknown metabolite 'CE5276_x' created
unknown metabolite 'CE5025_x' created
unknown metabolite 'C03958_e' created
unknown metabolite 'CE5820_e' created
unknown meta

unknown metabolite 'CE3554_c' created
unknown metabolite 'CE3554_r' created
unknown metabolite 'CE5138_c' created
unknown metabolite 'CE5138_r' created
unknown metabolite 'CE5139_c' created
unknown metabolite 'CE5139_r' created
unknown metabolite 'CE5140_c' created
unknown metabolite 'CE5140_r' created
unknown metabolite 'CE5141_c' created
unknown metabolite 'CE5525_c' created
unknown metabolite 'CE2567_c' created
unknown metabolite '15HPET_n' created
unknown metabolite 'CE2567_n' created
unknown metabolite '15HPET_r' created
unknown metabolite 'CE2567_r' created
unknown metabolite 'CE7172_c' created
unknown metabolite 'CE7172_n' created
unknown metabolite 'C06315_c' created
unknown metabolite 'C06315_n' created
unknown metabolite 'C06315_r' created
unknown metabolite 'CE2567_x' created
unknown metabolite 'C06315_x' created
unknown metabolite 'C06314_c' created
unknown metabolite 'C06314_r' created
unknown metabolite 'C06314_x' created
unknown metabolite 'leuktrA4_n' created
unknown me

unknown metabolite 'CE5945_r' created
unknown metabolite 'CE5946_r' created
unknown metabolite 'CE5947_c' created
unknown metabolite 'CE7074_c' created
unknown metabolite 'CE7072_c' created
unknown metabolite 'CE7074_r' created
unknown metabolite 'CE7072_r' created
unknown metabolite 'CE5966_m' created
unknown metabolite 'CE5967_m' created
unknown metabolite 'CE5966_x' created
unknown metabolite 'CE5967_x' created
unknown metabolite 'CE5967_c' created
unknown metabolite 'CE5968_c' created
unknown metabolite 'CE5968_m' created
unknown metabolite 'CE5968_x' created
unknown metabolite 'CE5969_m' created
unknown metabolite 'CE5970_m' created
unknown metabolite 'CE5969_x' created
unknown metabolite 'CE5970_x' created
unknown metabolite 'CE5969_c' created
unknown metabolite 'CE5969_r' created
unknown metabolite 'CE5971_m' created
unknown metabolite 'CE5971_x' created
unknown metabolite 'CE5971_c' created
unknown metabolite 'CE5966_c' created
unknown metabolite 'avite2_r' created
unknown meta

unknown metabolite 'CN0020_r' created
unknown metabolite 'CN0021_r' created
unknown metabolite 'CN0022_c' created
unknown metabolite 'CN0022_r' created
unknown metabolite 'CN0021_x' created
unknown metabolite 'CN0022_x' created
unknown metabolite 'CN0023_c' created
unknown metabolite 'CN0023_r' created
unknown metabolite 'CN0016_c' created
unknown metabolite 'CN0017_c' created
unknown metabolite 'CN0016_r' created
unknown metabolite 'CN0017_r' created
unknown metabolite 'CN0018_c' created
unknown metabolite 'CN0018_r' created
unknown metabolite 'CN0017_x' created
unknown metabolite 'CN0018_x' created
unknown metabolite 'CN0019_c' created
unknown metabolite 'CN0019_r' created
unknown metabolite 'rna_c' created
unknown metabolite 'rna_prod_c' created
unknown metabolite 'HC02199_c' created
unknown metabolite 'HC02200_c' created
unknown metabolite 'HC02201_c' created
unknown metabolite 'xu5p_D_c' created
unknown metabolite 'Rtotal2crn_c' created
unknown metabolite 'Rtotal2crn_m' created
un

unknown metabolite 'val_L_l' created
unknown metabolite 'vitd2_m' created
unknown metabolite 'vitd2_c' created
unknown metabolite 'vitd3_m' created
unknown metabolite 'dhcholestanate_r' created
unknown metabolite 'dhcholestancoa_r' created
unknown metabolite 'thcholstoic_r' created
unknown metabolite 'cholcoar_r' created
unknown metabolite 'wharachd_c' created
unknown metabolite 'whtststerone_c' created
unknown metabolite 'xan_x' created
unknown metabolite 'xol7ah2_r' created
unknown metabolite 'xoldiolone_m' created
unknown metabolite 'xoltri24_c' created
unknown metabolite 'xoltri25_c' created
unknown metabolite 'xser_r' created
unknown metabolite 'xyl_D_c' created
unknown metabolite 'xylnact_D_c' created
unknown metabolite 'xylt_c' created
unknown metabolite 'zn2_c' created
unknown metabolite 'HC02119_c' created
unknown metabolite 'HC00822_l' created
unknown metabolite 'dpcoa_m' created
unknown metabolite 'HC01672_c' created
unknown metabolite 'HC01434_c' created
unknown metabolite 

unknown metabolite '3dhchol_c' created
unknown metabolite '3dhcdchol_c' created
unknown metabolite '3dhcdchol_e' created
unknown metabolite '3dhchol_e' created
unknown metabolite '3dhdchol_c' created
unknown metabolite '3dhdchol_e' created
unknown metabolite '3dhlchol_c' created
unknown metabolite '3dhlchol_e' created
unknown metabolite '3hadicoa_x' created
unknown metabolite '3hibup_S_r' created
unknown metabolite '3hibupglu_S_r' created
unknown metabolite '3hlvst_c' created
unknown metabolite '3hlvstacid_c' created
unknown metabolite '3hpcoa_c' created
unknown metabolite '3hpp_c' created
unknown metabolite '3hpvscoa_m' created
unknown metabolite '3hpvstetcoa_m' created
unknown metabolite '3hpvscoa_x' created
unknown metabolite '3hpvstetcoa_x' created
unknown metabolite '3hpvstet_c' created
unknown metabolite '3hpvstet_e' created
unknown metabolite '3hpvs_r' created
unknown metabolite '3hpvs_c' created
unknown metabolite '3hpvs_e' created
unknown metabolite '3hsmv_c' created
unknown m

unknown metabolite 'atvacylgluc_r' created
unknown metabolite 'atvethgluc_r' created
unknown metabolite 'atvlacgluc_r' created
unknown metabolite 'atvlac_e' created
unknown metabolite 'gluside_hs_g' created
unknown metabolite 'galgluside_hs_g' created
unknown metabolite 'dgcholcoa_x' created
unknown metabolite 'chito2pdol_L_c' created
unknown metabolite 'mpdol_L_c' created
unknown metabolite 'chito2pdol_U_c' created
unknown metabolite 'mpdol_U_c' created
unknown metabolite 'galgluside_hs_e' created
unknown metabolite 'gluside_hs_e' created
unknown metabolite 'gd1b_hs_l' created
unknown metabolite 'gd2_hs_l' created
unknown metabolite 'gm1a_hs_l' created
unknown metabolite 'gm2a_hs_l' created
unknown metabolite 'dolmanp_L_r' created
unknown metabolite 'memgacpail_hs_r' created
unknown metabolite 'dolp_L_r' created
unknown metabolite 'm2emgacpail_hs_r' created
unknown metabolite 'dolmanp_U_r' created
unknown metabolite 'dolp_U_r' created
unknown metabolite 'C02712_e' created
unknown meta

unknown metabolite 'dspvs_e' created
unknown metabolite 'eandrstrn_c' created
unknown metabolite 'eandrstrn_e' created
unknown metabolite 'eic21114tr_c' created
unknown metabolite 'dece4coa_m' created
unknown metabolite 'dec24dicoa_m' created
unknown metabolite 'dece4coa_x' created
unknown metabolite 'dec24dicoa_x' created
unknown metabolite 'dece3coa_x' created
unknown metabolite 'dec47dicoa_m' created
unknown metabolite 'dectricoa_m' created
unknown metabolite 'dec47dicoa_x' created
unknown metabolite 'dectricoa_x' created
unknown metabolite '2decdicoa_m' created
unknown metabolite 'octe5coa_m' created
unknown metabolite '2decdicoa_x' created
unknown metabolite 'octe5coa_x' created
unknown metabolite '3decdicoa_m' created
unknown metabolite '3decdicoa_x' created
unknown metabolite 'tmuncoa_x' created
unknown metabolite 'noncoa_m' created
unknown metabolite 'dd5ecoa_m' created
unknown metabolite '2ddecdicoa_m' created
unknown metabolite '3ddecdicoa_m' created
unknown metabolite '3ddec

unknown metabolite 'hisglnala_c' created
unknown metabolite 'hisglugln_e' created
unknown metabolite 'hisglugln_c' created
unknown metabolite 'hisglu_e' created
unknown metabolite 'hisglu_c' created
unknown metabolite 'hisglylys_e' created
unknown metabolite 'hisglylys_c' created
unknown metabolite 'hishislys_e' created
unknown metabolite 'hishislys_c' created
unknown metabolite 'hislysala_e' created
unknown metabolite 'hislysala_c' created
unknown metabolite 'hislysglu_e' created
unknown metabolite 'hislysglu_c' created
unknown metabolite 'hislysile_e' created
unknown metabolite 'hislysile_c' created
unknown metabolite 'hislysthr_e' created
unknown metabolite 'hislysthr_c' created
unknown metabolite 'hislysval_e' created
unknown metabolite 'hislysval_c' created
unknown metabolite 'hismetgln_e' created
unknown metabolite 'hismetgln_c' created
unknown metabolite 'hismet_e' created
unknown metabolite 'hismet_c' created
unknown metabolite 'hisphearg_e' created
unknown metabolite 'hisphear

unknown metabolite 'HC02057_c' created
unknown metabolite 'HC02058_c' created
unknown metabolite 'HC02059_c' created
unknown metabolite 'HC02060_c' created
unknown metabolite 'HC02061_c' created
unknown metabolite 'cdpdag_hs_r' created
unknown metabolite 'cmp_r' created
unknown metabolite 'pail_hs_r' created
unknown metabolite 'HC02079_c' created
unknown metabolite 'HC02066_c' created
unknown metabolite 'HC02067_c' created
unknown metabolite 'M02913_c' created
unknown metabolite 'M02686_c' created
unknown metabolite 'M02758_c' created
unknown metabolite 'M02958_r' created
unknown metabolite 'HC02063_c' created
unknown metabolite 'HC02064_c' created
unknown metabolite 'HC02071_c' created
unknown metabolite 'HC02069_c' created
unknown metabolite 'HC02075_c' created
unknown metabolite 'HC02072_c' created
unknown metabolite 'HC02073_c' created
unknown metabolite 'HC02068_c' created
unknown metabolite 'HC02074_c' created
unknown metabolite 'HC02070_c' created
unknown metabolite 'HC02076_c' 

unknown metabolite 'lanost_c' created
unknown metabolite 'M00939_c' created
unknown metabolite 'M00937_c' created
unknown metabolite '44mctr_c' created
unknown metabolite '44mzym_c' created
unknown metabolite 'M00961_c' created
unknown metabolite 'M00957_c' created
unknown metabolite '4mzym_int1_c' created
unknown metabolite '4mzym_int2_c' created
unknown metabolite 'HC02110_c' created
unknown metabolite 'M00963_c' created
unknown metabolite 'M00959_c' created
unknown metabolite 'M00955_c' created
unknown metabolite 'M01067_c' created
unknown metabolite 'zymst_c' created
unknown metabolite 'chlstol_c' created
unknown metabolite 'C05109_c' created
unknown metabolite 'M00940_c' created
unknown metabolite 'M00938_c' created
unknown metabolite 'M00942_c' created
unknown metabolite 'CE2314_c' created
unknown metabolite 'M00962_c' created
unknown metabolite 'M00958_c' created
unknown metabolite 'M00954_c' created
unknown metabolite 'M00966_c' created
unknown metabolite 'M00967_c' created
unk

unknown metabolite 'M01770_c' created
unknown metabolite 'M01770_m' created
unknown metabolite 'CE4843_m' created
unknown metabolite 'M00022_c' created
unknown metabolite 'M00022_m' created
unknown metabolite 'M00023_m' created
unknown metabolite 'M00263_c' created
unknown metabolite 'M00263_m' created
unknown metabolite 'CE4847_m' created
unknown metabolite 'M01729_r' created
unknown metabolite 'ddcacoa_r' created
unknown metabolite 'M03049_r' created
unknown metabolite 'M03050_r' created
unknown metabolite 'M02973_r' created
unknown metabolite 'tdcoa_r' created
unknown metabolite 'ttdcrn_r' created
unknown metabolite 'M00129_r' created
unknown metabolite 'M02975_r' created
unknown metabolite 'HC10784_r' created
unknown metabolite 'M02976_r' created
unknown metabolite 'M01141_r' created
unknown metabolite 'ptdcacrn_r' created
unknown metabolite 'ptdcacoa_r' created
unknown metabolite 'hdd2crn_r' created
unknown metabolite 'M01191_r' created
unknown metabolite 'hpdcacrn_r' created
unkn

unknown metabolite 'C05280_x' created
unknown metabolite 'C05279_x' created
unknown metabolite 'M03016_x' created
unknown metabolite 'M00715_x' created
unknown metabolite 'M00879_x' created
unknown metabolite 'CE5158_x' created
unknown metabolite 'CE5157_x' created
unknown metabolite 'CE5156_x' created
unknown metabolite 'CE5155_x' created
unknown metabolite 'CE5154_x' created
unknown metabolite 'CE5153_x' created
unknown metabolite 'CE5152_x' created
unknown metabolite 'CE5151_x' created
unknown metabolite 'CE5150_x' created
unknown metabolite 'CE5148_x' created
unknown metabolite 'CE5144_x' created
unknown metabolite 'HC10856_x' created
unknown metabolite 'HC10857_x' created
unknown metabolite 'HC10858_x' created
unknown metabolite 'M01191_x' created
unknown metabolite 'M03019_x' created
unknown metabolite 'M00172_x' created
unknown metabolite 'M00849_x' created
unknown metabolite 'M01141_x' created
unknown metabolite 'M03022_x' created
unknown metabolite 'M01573_x' created
unknown m

unknown metabolite 'M01358_m' created
unknown metabolite 'M01631_m' created
unknown metabolite 'M02050_c' created
unknown metabolite 'M01957_c' created
unknown metabolite 'HC01609_c' created
unknown metabolite 'HC01361_c' created
unknown metabolite 'M03081_c' created
unknown metabolite 'M02421_c' created
unknown metabolite 'M03063_c' created
unknown metabolite 'M02335_c' created
unknown metabolite 'M03064_c' created
unknown metabolite 'M02340_c' created
unknown metabolite 'M03065_c' created
unknown metabolite 'M02341_c' created
unknown metabolite 'M03066_c' created
unknown metabolite 'M02342_c' created
unknown metabolite 'M03067_c' created
unknown metabolite 'M02351_c' created
unknown metabolite 'M03068_c' created
unknown metabolite 'M02376_c' created
unknown metabolite 'M03069_c' created
unknown metabolite 'M02377_c' created
unknown metabolite 'M03070_c' created
unknown metabolite 'M02006_c' created
unknown metabolite 'M03071_c' created
unknown metabolite 'M02380_c' created
unknown me

unknown metabolite 'M00222_c' created
unknown metabolite 'C03820_c' created
unknown metabolite 'paf_hs_c' created
unknown metabolite 'M00206_c' created
unknown metabolite 'M00218_c' created
unknown metabolite 'M00217_c' created
unknown metabolite 'M02893_c' created
unknown metabolite 'M00197_c' created
unknown metabolite 'M00199_c' created
unknown metabolite 'M00215_c' created
unknown metabolite 'M02496_c' created
unknown metabolite 'M00673_c' created
unknown metabolite 'M02506_c' created
unknown metabolite 'M02505_c' created
unknown metabolite 'M01750_c' created
unknown metabolite 'M02828_m' created
unknown metabolite 'M02665_m' created
unknown metabolite 'M02807_c' created
unknown metabolite 'M01388_c' created
unknown metabolite 'M01389_c' created
unknown metabolite 'citr_L_g' created
unknown metabolite 'dopa_m' created
unknown metabolite 'dopa_g' created
unknown metabolite 'HC02065_l' created
unknown metabolite 'fucgalacglcgalgluside_hs_g' created
unknown metabolite 'fuc14galacglcga

unknown metabolite 'ileprolys_c' created
unknown metabolite 'ileserarg_e' created
unknown metabolite 'ileserarg_c' created
unknown metabolite 'iletrptyr_e' created
unknown metabolite 'iletrptyr_c' created
unknown metabolite 'indole_c' created
unknown metabolite 'indoxyl_c' created
unknown metabolite 'inds_c' created
unknown metabolite 'inds_e' created
unknown metabolite 'isochol_c' created
unknown metabolite 'isochol_e' created
unknown metabolite 'isolvstacid_c' created
unknown metabolite 'isolvstacid_e' created
unknown metabolite 'dpcoa_l' created
unknown metabolite 'lca24g_c' created
unknown metabolite 'lca24g_r' created
unknown metabolite 'lca24g_e' created
unknown metabolite 'lca3g_c' created
unknown metabolite 'lca3g_r' created
unknown metabolite 'lca3g_e' created
unknown metabolite 'lca3s_c' created
unknown metabolite 'lca3s_e' created
unknown metabolite 'pchol_hs_e' created
unknown metabolite 'pcholmyr_hs_e' created
unknown metabolite 'pcholole_hs_e' created
unknown metabolite '

unknown metabolite 'pcholeic_hs_c' created
unknown metabolite 'pcholet_hs_c' created
unknown metabolite 'pcholdoc_hs_c' created
unknown metabolite 'pcholhep_hs_c' created
unknown metabolite 'pchollinl_hs_c' created
unknown metabolite 'pcholmyr_hs_c' created
unknown metabolite 'pcholn15_hs_c' created
unknown metabolite 'pcholn1836_hs_c' created
unknown metabolite 'pcholn183_hs_c' created
unknown metabolite 'pcholn19_hs_c' created
unknown metabolite 'pcholn201_hs_c' created
unknown metabolite 'pcholn203_hs_c' created
unknown metabolite 'pcholn204_hs_c' created
unknown metabolite 'pcholn205_hs_c' created
unknown metabolite 'pcholn224_hs_c' created
unknown metabolite 'pcholn2254_hs_c' created
unknown metabolite 'pcholn225_hs_c' created
unknown metabolite 'pcholn24_hs_c' created
unknown metabolite 'pcholn261_hs_c' created
unknown metabolite 'pcholn281_hs_c' created
unknown metabolite 'pcholn28_hs_c' created
unknown metabolite 'pcholole_hs_c' created
unknown metabolite 'pcholpalme_hs_c' crea

unknown metabolite 'CE2953_c' created
unknown metabolite 'CE2958_c' created
unknown metabolite 'CE2961_c' created
unknown metabolite 'CE2962_c' created
unknown metabolite 'CE5756_c' created
unknown metabolite 'CE5757_c' created
unknown metabolite 'CE5787_e' created
unknown metabolite 'CE5791_e' created
unknown metabolite 'C03958_c' created
unknown metabolite 'CE5820_c' created
unknown metabolite 'CE1925_c' created
unknown metabolite 'CE5853_c' created
unknown metabolite 'CE1926_c' created
unknown metabolite 'CE5854_c' created
unknown metabolite 'ddsmsterol_n' created
unknown metabolite 'dsmsterol_n' created
unknown metabolite 'C06948_r' created
unknown metabolite 'C07486_r' created
unknown metabolite 'fald_r' created
unknown metabolite 'CE2615_c' created
unknown metabolite 'CE2616_c' created
unknown metabolite 'CE5166_m' created
unknown metabolite 'CE5166_x' created
unknown metabolite 'HC02191_x' created
unknown metabolite 'HC02192_x' created
unknown metabolite 'HC02193_x' created
unkn

unknown metabolite 'smvacid_e' created
unknown metabolite 'simvgluc_r' created
unknown metabolite 'smvacid_r' created
unknown metabolite 'smv_c' created
unknown metabolite 'smv_e' created
unknown metabolite 'phsphings_c' created
unknown metabolite 'phsph1p_c' created
unknown metabolite 'sphings_n' created
unknown metabolite 'sphs1p_n' created
unknown metabolite 'sphgn_n' created
unknown metabolite 'sph1p_n' created
unknown metabolite 'sphmyln180241_hs_e' created
unknown metabolite 'sphmyln18114_hs_e' created
unknown metabolite 'sphmyln18115_hs_e' created
unknown metabolite 'sphmyln181161_hs_e' created
unknown metabolite 'sphmyln18116_hs_e' created
unknown metabolite 'sphmyln18117_hs_e' created
unknown metabolite 'sphmyln181181_hs_e' created
unknown metabolite 'sphmyln18118_hs_e' created
unknown metabolite 'sphmyln181201_hs_e' created
unknown metabolite 'sphmyln18120_hs_e' created
unknown metabolite 'sphmyln18121_hs_e' created
unknown metabolite 'sphmyln181221_hs_e' created
unknown meta

unknown metabolite 'HC01651_n' created
unknown metabolite 'HC01118_r' created
unknown metabolite 'HC01496_m' created
unknown metabolite 'HC01408_m' created
unknown metabolite 'HC01223_m' created
unknown metabolite 'HC01587_c' created
unknown metabolite 'HC01321_c' created
unknown metabolite 'HC01255_c' created
unknown metabolite 'HC01588_c' created
unknown metabolite 'HC01322_c' created
unknown metabolite 'HC01596_c' created
unknown metabolite 'HC01597_c' created
unknown metabolite 'HC01323_c' created
unknown metabolite 'HC01593_c' created
unknown metabolite 'HC01594_c' created
unknown metabolite 'HC01326_c' created
unknown metabolite 'HC01605_c' created
unknown metabolite 'HC01606_c' created
unknown metabolite 'HC01335_c' created
unknown metabolite 'HC01602_c' created
unknown metabolite 'HC01603_c' created
unknown metabolite 'HC01710_n' created
unknown metabolite 'HC01601_c' created
unknown metabolite 'HC01397_x' created
unknown metabolite 'HC01412_x' created
unknown metabolite 'HC014

In [7]:
# We first create a list of the metabolites and then a pandas df with it
metabolites_list = []
for met in model.metabolites:
    metabolites_list.append(met.id)
    
metabolites = pd.DataFrame(metabolites_list, columns =['BiGG ID'])
metabolites

Unnamed: 0,BiGG ID
0,10fthf5glu_c
1,10fthf5glu_l
2,10fthf5glu_m
3,10fthf6glu_c
4,10fthf6glu_l
...,...
8661,HC02098_c
8662,HC02099_c
8663,HC01988_c
8664,HC10856_m


### 2. Retrieve information from all the metabolites on Recon3D, iCHO2291 and iCHO1766
We use two datasets for this, first we take information from the Recon3D.xml, iCHO2291.xml and iCHO1766 files from which we get the metabolite ID, Name, Formula and Compartment. We then add the metadata for the available metabolites from Recon3D supplementary files.

In [8]:
# read the Recon3D model
recon3d_model = load_matlab_model('../Data/GPR_Curation/Recon3D_301.mat')

No defined compartments in model Recon3D. Compartments will be deduced heuristically using regular expressions.
Using regular expression found the following compartments:c, e, g, i, l, m, n, r, x


In [9]:
# Generate a dataset containing all the metabolites, chemical formula of each metabolite and compartment
num_rows = len(recon3d_model.metabolites)
recon3d_model_metabolites = pd.DataFrame(index=range(num_rows), columns=['BiGG ID', 'Name', 'Formula', 'Compartment'])
for i,met in enumerate(recon3d_model.metabolites):
    id_ = met.id
    name = met.name
    formula = met.formula
    comp = met.compartment
    recon3d_model_metabolites.iloc[i] = [id_, name, formula, comp]
    
recon3d_model_metabolites['BiGG ID'] = recon3d_model_metabolites['BiGG ID'].str.replace("[", "_", regex=False)
recon3d_model_metabolites['BiGG ID'] = recon3d_model_metabolites['BiGG ID'].str.replace("]", "", regex=False)

In [10]:
recon3d_model_metabolites

Unnamed: 0,BiGG ID,Name,Formula,Compartment
0,10fthf5glu_c,10-Formyltetrahydrofolate-[Glu](5),C40H45N11O19,c
1,10fthf5glu_l,10-Formyltetrahydrofolate-[Glu](5),C40H45N11O19,l
2,10fthf5glu_m,10-Formyltetrahydrofolate-[Glu](5),C40H45N11O19,m
3,10fthf6glu_c,10-Formyltetrahydrofolate-[Glu](6),C45H51N12O22,c
4,10fthf6glu_l,10-Formyltetrahydrofolate-[Glu](6),C45H51N12O22,l
...,...,...,...,...
8394,1a25dhvitd2_c,"1-Alpha,25-Dihydroxyvitamin D2",C28H44O3,c
8395,1a25dhvitd2_e,"1-Alpha,25-Dihydroxyvitamin D2",C28H44O3,e
8396,protein_c,Torasemide-M3,,c
8397,h_i,Proton,H,i


In [11]:
# read the Yeo's model
iCHO2291_model = read_sbml_model('../Data/Reconciliation/models/iCHO2291.xml')

In [12]:
# Generate a dataset containing all the metabolites, chemical formula of each metabolite and compartment from Yeo's model
num_rows = len(iCHO2291_model.metabolites)
iCHO2291_model_metabolites = pd.DataFrame(index=range(num_rows), columns=['BiGG ID', 'Name', 'Formula', 'Compartment'])
for i,met in enumerate(iCHO2291_model.metabolites):
    id_ = met.id
    name = met.name
    formula = met.formula
    comp = met.compartment
    iCHO2291_model_metabolites.iloc[i] = [id_, name, formula, comp]
    
iCHO2291_model_metabolites['BiGG ID'] = iCHO2291_model_metabolites['BiGG ID'].str.replace("[", "_", regex=False)
iCHO2291_model_metabolites['BiGG ID'] = iCHO2291_model_metabolites['BiGG ID'].str.replace("]", "", regex=False)
iCHO2291_model_metabolites

Unnamed: 0,BiGG ID,Name,Formula,Compartment
0,10fthf5glu_c,10-formyltetrahydrofolate-[Glu](5),C40H45N11O19,c
1,10fthf5glu_l,10-formyltetrahydrofolate-[Glu](5),C40H45N11O19,l
2,10fthf5glu_m,10-formyltetrahydrofolate-[Glu](5),C40H45N11O19,m
3,10fthf6glu_c,10-formyltetrahydrofolate-[Glu](6),C45H51N12O22,c
4,10fthf6glu_l,10-formyltetrahydrofolate-[Glu](6),C45H51N12O22,l
...,...,...,...,...
3967,Rtotal3crn_c,Rtotal3crn[c],CO2R3C7H14NO2,c
3968,Rtotal3crn_m,Rtotal3crn[m],CO2R3C7H14NO2,m
3969,Rtotalcrn_c,Rtotalcrn[c],CO2RC7H14NO2,c
3970,Rtotalcrn_m,Rtotalcrn[m],CO2RC7H14NO2,m


In [13]:
# read Hefzi's model
iCHO1766_model = read_sbml_model('../Data/Reconciliation/models/iCHOv1_final.xml')

In [14]:
# Generate a dataset containing all the metabolites, chemical formula of each metabolite and compartment from Hefzi's model
num_rows = len(iCHO1766_model.metabolites)
iCHO1766_model_metabolites = pd.DataFrame(index=range(num_rows), columns=['BiGG ID', 'Name', 'Formula', 'Compartment'])
for i,met in enumerate(iCHO1766_model.metabolites):
    id_ = met.id
    name = met.name
    formula = met.formula
    comp = met.compartment
    iCHO1766_model_metabolites.iloc[i] = [id_, name, formula, comp]

iCHO1766_model_metabolites

Unnamed: 0,BiGG ID,Name,Formula,Compartment
0,coke_r,cocaine,C17H21NO4,r
1,h2o_r,H2O,H2O,r
2,bz_r,benzoate,C7H5O2,r
3,egme_r,ecgonine methyl ester,C10H17NO3,r
4,h_r,proton,H,r
...,...,...,...,...
4451,igg_g,igg[g],,g
4452,nicrns_c,Nicotinate D-ribonucleoside,C11H13NO6,c
4453,bilglcur_r,Bilirubin monoglucuronide,C39H44N4O12,r
4454,pcollglys_c,Procollagen L-lysine,C7H14N3O2R2,c


In [15]:
models_metabolites = pd.concat([recon3d_model_metabolites, iCHO2291_model_metabolites, iCHO1766_model_metabolites])
models_metabolites = models_metabolites.groupby('BiGG ID').first()
models_metabolites = models_metabolites.reset_index(drop = False)
models_metabolites

Unnamed: 0,BiGG ID,Name,Formula,Compartment
0,10fthf5glu_c,10-Formyltetrahydrofolate-[Glu](5),C40H45N11O19,c
1,10fthf5glu_e,10-Formyltetrahydrofolate-[Glu](5),C40H45N11O19,e
2,10fthf5glu_l,10-Formyltetrahydrofolate-[Glu](5),C40H45N11O19,l
3,10fthf5glu_m,10-Formyltetrahydrofolate-[Glu](5),C40H45N11O19,m
4,10fthf6glu_c,10-Formyltetrahydrofolate-[Glu](6),C45H51N12O22,c
...,...,...,...,...
9727,zym_int2_r,Zymosterol Intermediate 2,C27H42O,r
9728,zymst_c,Zymosterol,C27H44O,c
9729,zymst_r,Zymosterol,C27H44O,r
9730,zymstnl_c,Cholesta-8(9)-En-3Beta-Ol,C27H46O,c


In [16]:
#Generation of a dataset containing all the information from Recon3D metabolites Supplementary Data.
recon3d_metabolites_meta = pd.read_excel('../Data/Metabolites/metabolites.recon3d.xlsx', header = 0)
recon3d_metabolites_meta['BiGG ID'] = recon3d_metabolites_meta['BiGG ID'].str.replace("[", "_", regex=False)
recon3d_metabolites_meta['BiGG ID'] = recon3d_metabolites_meta['BiGG ID'].str.replace("]", "", regex=False)
recon3d_metabolites_meta

Unnamed: 0,BiGG ID,KEGG,CHEBI,PubChem,Inchi,Hepatonet,EHMNID,SMILES,INCHI2,CC_ID,Stereoisomer Information of Metabolite Identified,Charge of the Metabolite Identified,CID_ID,PDB (ligand-expo) Experimental Coordinates File Url,Pub Chem Url,ChEBI Url
0,10fthf5glu_c,,,,,,,,,,,,,,,
1,10fthf5glu_l,,,,,,,,,,,,,,,
2,10fthf5glu_m,,,,,,,,,,,,,,,
3,10fthf6glu_c,,,,InChI=1/C45H58N12O22/c46-45-55-36-35(38(67)56-...,,,,,,,,,,,
4,10fthf6glu_l,,,,InChI=1/C45H58N12O22/c46-45-55-36-35(38(67)56-...,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5282,,,CID100015232,,,,,,,,,,,,,
5283,,,CID100123634,,,,,,,,,,,,,
5284,,,CID100206527,,,,,,,,,,,,,
5285,,,CID105479141,,,,,,,,,,,,,


In [17]:
# Transformation of the "recon3d_metabolites_meta" into a dict to map it into the "recon3d_model_metabolites"
recon3dmet_dict = df_to_dict(recon3d_metabolites_meta, 'BiGG ID')

In [18]:
# Mapping into the "recon3d_model_metabolites" dataset
models_metabolites[['KEGG','CHEBI', 'PubChem','Inchi', 'Hepatonet', 'EHMNID', 'SMILES', 'INCHI2',
                          'CC_ID','Stereoisomer Information of Metabolite Identified', 'Charge of the Metabolite Identified',
    'CID_ID','PDB (ligand-expo) Experimental Coordinates  File Url', 'Pub Chem Url',
    'ChEBI Url']] = models_metabolites['BiGG ID'].apply(lambda x: pd.Series(recon3dmet_dict.get(x, None), dtype=object))

In [19]:
models_metabolites

Unnamed: 0,BiGG ID,Name,Formula,Compartment,KEGG,CHEBI,PubChem,Inchi,Hepatonet,EHMNID,SMILES,INCHI2,CC_ID,Stereoisomer Information of Metabolite Identified,Charge of the Metabolite Identified,CID_ID,PDB (ligand-expo) Experimental Coordinates File Url,Pub Chem Url,ChEBI Url
0,10fthf5glu_c,10-Formyltetrahydrofolate-[Glu](5),C40H45N11O19,c,,,,,,,,,,,,,,,
1,10fthf5glu_e,10-Formyltetrahydrofolate-[Glu](5),C40H45N11O19,e,,,,,,,,,,,,,,,
2,10fthf5glu_l,10-Formyltetrahydrofolate-[Glu](5),C40H45N11O19,l,,,,,,,,,,,,,,,
3,10fthf5glu_m,10-Formyltetrahydrofolate-[Glu](5),C40H45N11O19,m,,,,,,,,,,,,,,,
4,10fthf6glu_c,10-Formyltetrahydrofolate-[Glu](6),C45H51N12O22,c,,,,InChI=1/C45H58N12O22/c46-45-55-36-35(38(67)56-...,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9727,zym_int2_r,Zymosterol Intermediate 2,C27H42O,r,C05437,18252.0,22298942.0,InChI=1S/C27H42O/c1-18(2)7-6-8-19(3)23-11-12-2...,,,[H][C@@]12CCC3=C(CC[C@]4(C)[C@]([H])(CC[C@@]34...,InChI=1S/C27H42O/c1-18(2)7-6-8-19(3)23-11-12-2...,,,Neutral,22298942.0,,https://pubchem.ncbi.nlm.nih.gov/compound/2229...,https://www.ebi.ac.uk/chebi/searchId.do?chebiI...
9728,zymst_c,Zymosterol,C27H44O,c,,,,,,,,,,,,,,,
9729,zymst_r,Zymosterol,C27H44O,r,C05437,18252.0,92746.0,InChI=1S/C27H44O/c1-18(2)7-6-8-19(3)23-11-12-2...,HC01451,C05437,[H][C@@]12CCC3=C(CC[C@]4(C)[C@]([H])(CC[C@@]34...,InChI=1S/C27H44O/c1-18(2)7-6-8-19(3)23-11-12-2...,,,Neutral,92746.0,,https://pubchem.ncbi.nlm.nih.gov/compound/92746,https://www.ebi.ac.uk/chebi/searchId.do?chebiI...
9730,zymstnl_c,Cholesta-8(9)-En-3Beta-Ol,C27H46O,c,,,,,,,,,,,,,,,


In [20]:
# Transform the final Recon3D Metabolites dataset into a dictionary to map it into our dataset
final_met_dict = df_to_dict(models_metabolites, 'BiGG ID')

### 3. Add all the metabolites information into our metabolites dataset
With the dictionary created in **Step 2** we can use the information to map it in the metabolites dataset created in **Step 1** which contains all the metabolites of our reconstruction.

In [21]:
metabolites[['Name', 'Formula', 'Compartment', 'KEGG','CHEBI', 'PubChem','Inchi', 'Hepatonet', 'EHMNID', 'SMILES',
             'INCHI2','CC_ID','Stereoisomer Information of Metabolite Identified', 'Charge of the Metabolite Identified',
    'CID_ID','PDB (ligand-expo) Experimental Coordinates  File Url', 'Pub Chem Url',
    'ChEBI Url']] = metabolites['BiGG ID'].apply(lambda x: pd.Series(final_met_dict.get(x, None), dtype=object))

In [22]:
# Update the Compartment column in the final dataset
for i,row in metabolites.iterrows():
    if row['Compartment'] == 'c':
        metabolites.loc[i, 'Compartment'] = 'c - cytosol'
    if row['Compartment'] == 'l':
        metabolites.loc[i, 'Compartment'] = 'l - lysosome'
    if row['Compartment'] == 'm':
        metabolites.loc[i, 'Compartment'] = 'm - mitochondria'
    if row['Compartment'] == 'r':
        metabolites.loc[i, 'Compartment'] = 'r - endoplasmic reticulum'
    if row['Compartment'] == 'e':
        metabolites.loc[i, 'Compartment'] = 'e - extracellular space'
    if row['Compartment'] == 'x':
        metabolites.loc[i, 'Compartment'] = 'x - peroxisome/glyoxysome'
    if row['Compartment'] == 'n':
        metabolites.loc[i, 'Compartment'] = 'n - nucleus'
    if row['Compartment'] == 'g':
        metabolites.loc[i, 'Compartment'] = 'g - golgi apparatus'
    if row['Compartment'] == 'im':
        metabolites.loc[i, 'Compartment'] = 'im - intermembrane space of mitochondria'

In [23]:
metabolites.to_excel('../Data/Metabolites/metabolites.xlsx')

### 4. Unique metabolite identification

In [24]:
print("Duplicated rxns by BiGG ID = ", len(metabolites['BiGG ID']) - len(metabolites['BiGG ID'].unique()))
print("Duplicated rxns by Name = ", len(metabolites['Name']) - len(metabolites['Name'].unique()))
print("Duplicated rxns by Formula = ", len(metabolites['Formula']) - len(metabolites['Formula'].unique()))
print("Duplicated rxns by KEGG = ", len(metabolites['KEGG']) - len(metabolites['KEGG'].unique()))

Duplicated rxns by BiGG ID =  0
Duplicated rxns by Name =  4203
Duplicated rxns by Formula =  5890
Duplicated rxns by KEGG =  7578


## B. Metabolites Curation
In this second part of the notebook we curate missing information in the metabolites dataset generated above. Since many metabolites have been manually curated in the "Metabolites" google sheet file, we generate a new dataframe using the gspread library to obtain the metabolites dataset with all the changes

### 5. Generate new dataframe from the "Metabolites" sheet 

In [25]:
# give service account details to gspread
sa = gspread.service_account(filename='credentials.json')

# sa is a gspread client, which can be used for connecting to the sheets
# by using the open method and the sheet name.
cho_recon = sa.open('CHO Network Reconstruction')

# we also need to specify the page name before getting the data. In this case we use the Rxns sheet.
cur_metabolites_sheet = cho_recon.worksheet('Metabolites')

# We can extract the data using the get_all_records method and create a pd DataFrame
cur_metabolites = pd.DataFrame(cur_metabolites_sheet.get_all_records())
cur_metabolites

Unnamed: 0,Curated,BiGG ID,Name,Formula,Compartment,KEGG,CHEBI,PubChem,Inchi,Hepatonet,EHMNID,SMILES,INCHI2,CC_ID,Stereoisomer Information of Metabolite Identified,Charge of the Metabolite Identified,CID_ID,PDB (ligand-expo) Experimental Coordinates File Url,Pub Chem Url,ChEBI Url
0,,10fthf5glu_c,10-formyltetrahydrofolate-[Glu](5),C40H45N11O19,c - cytosol,,,,,,,,,,,,,,,
1,,10fthf5glu_l,10-formyltetrahydrofolate-[Glu](5),C40H45N11O19,l - lysosome,,,,,,,,,,,,,,,
2,,10fthf5glu_m,10-formyltetrahydrofolate-[Glu](5),C40H45N11O19,m - mitochondria,,,,,,,,,,,,,,,
3,,10fthf6glu_c,10-formyltetrahydrofolate-[Glu](6),C45H51N12O22,c - cytosol,,,,InChI=1/C45H58N12O22/c46-45-55-36-35(38(67)56-...,,,,,,,,,,,
4,,10fthf6glu_l,10-formyltetrahydrofolate-[Glu](6),C45H51N12O22,l - lysosome,,,,InChI=1/C45H58N12O22/c46-45-55-36-35(38(67)56-...,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5426,,HC02216_c,Prostaglandin-f2beta,C20H33O5,c - cytosol,,28922,439702,InChI=1S/C20H34O5/c1-2-3-6-9-15(21)12-13-17-16...,,,CCCCC[C@H](O)\C=C\[C@H]1[C@H](O)C[C@@H](O)[C@@...,InChI=1S/C20H34O5/c1-2-3-6-9-15(21)12-13-17-16...,,,Neutral,5280506,,https://pubchem.ncbi.nlm.nih.gov/compound/5280506,https://www.ebi.ac.uk/chebi/searchId.do;jsessi...
5427,,HC01361_e,Dihydroneopterin,C9H13N5O4,e - extracellular space,C04874,17001,,InChI=1S/C9H13N5O4/c10-9-13-7-5(8(18)14-9)12-3...,,,Nc1nc2c(c(=O)[nH]1)N=C(C(O)C(O)CO)CN2,,,,,,,,
5428,,HC02217_c,Prostaglandin-g2,C20H31O6,c - cytosol,,27647,,InChI=1S/C20H32O6/c1-2-3-6-9-15(24-23)12-13-17...,,C05956,CCCCC[C@H](OO)\C=C\[C@H]1[C@H]2C[C@H](OO2)[C@@...,InChI=1S/C20H32O6/c1-2-3-6-9-15(24-23)12-13-17...,PGX,,Neutral,5280883,http://ligand-expo.rcsb.org/pyapps/ldHandler.p...,https://pubchem.ncbi.nlm.nih.gov/compound/5280883,https://www.ebi.ac.uk/chebi/searchId.do?chebiI...
5429,PD,fdxo_2_2_c,Oxidized ferredoxin,Fe2S2X,c - cytosol,C00139,17908,,,,,[*][*][Fe+3]1([*][*])[S-2][Fe+3]([*][*])([*][*...,,,,,,,,


In [26]:
# Get BiGG descriptive names
import requests
from bs4 import BeautifulSoup
import time

# Unknown Mets: metabolites without names
unkown_mets = cur_metabolites[cur_metabolites['Name'] == '']

Descriptive_Names = [''] * len(unkown_mets)
Formulae = [''] * len(Descriptive_Names)
Changed = [True] * len(Descriptive_Names)

for Met_Counter, metID in enumerate(tqdm(unkown_mets['BiGG ID'].iloc[:])):
    print(Met_Counter)
    input_str = metID[:-2]
    response = requests.get(f"http://bigg.ucsd.edu/universal/metabolites/{input_str}")
    time.sleep(1)
    # Check if the request was successful
    if response.status_code != 200:
        D_Name = "BiGG ID not found in BiGG"
        Formulae_B = "BiGG ID not found in BiGG"
        Changed[Met_Counter] = False       
    else:    
        soup = BeautifulSoup(response.content, 'html.parser')
        N_Header = soup.find('h4', string='Descriptive name:')
        D_Name = N_Header.find_next_sibling('p').text
        N_Formulae = soup.find('h4', string='Formulae in BiGG models: ')
        Formulae_B = N_Formulae.find_next_sibling('p').text    
        if D_Name is None:
            D_Name = "Name not found in BiGG"            
        elif Formulae_B is None:
            Formulae_B = "Formula not found in BiGG"                
    Descriptive_Names[Met_Counter] = D_Name
    Formulae[Met_Counter] = Formulae_B

0it [00:00, ?it/s]

### Update empty metabolites

In [27]:
for Met_Counter, metID in enumerate(unkown_mets['BiGG ID']):
    print('before',unkown_mets['BiGG ID'].iloc[Met_Counter])
    print('before',unkown_mets['Formula'].iloc[Met_Counter])
    print('before',unkown_mets['Name'].iloc[Met_Counter])
    if unkown_mets['Formula'].iloc[Met_Counter] == '':
        unkown_mets['Formula'].iloc[Met_Counter] = Formulae[Met_Counter]  
    unkown_mets['Name'].iloc[Met_Counter] = Descriptive_Names[Met_Counter]
    print('..............................................')
    print('after',unkown_mets['BiGG ID'].iloc[Met_Counter])
    print('after',unkown_mets['Formula'].iloc[Met_Counter])
    print('after',unkown_mets['Name'].iloc[Met_Counter])
    print('..............................................')
    print('..............................................')
    print('..............................................')

In [28]:
cur_metabolites.update(unkown_mets)

# Manual Curation
for bigg_id in cur_metabolites['BiGG ID']:
    # xtra = Xanthurenic acid; C10H6NO4
    # http://bigg.ucsd.edu/models/iCHOv1/reactions/r0647
    if 'xtra' in bigg_id:
        cur_metabolites.loc[metabolites['BiGG ID'] == bigg_id, 'Name'] = 'Xanthurenic acid'
        cur_metabolites.loc[metabolites['BiGG ID'] == bigg_id, 'Formula'] = 'C10H6NO4'
    # chedxch = Bilirubin-monoglucuronoside; C39H42N4O122-
    # Reactions name = 'ATP-binding Cassette (ABC) TCDB:3.A.1.208.2' --> https://metabolicatlas.org/identifier/TCDB/3.A.1.208.2
    elif 'chedxch' in bigg_id:
        cur_metabolites.loc[metabolites['BiGG ID'] == bigg_id, 'Name'] = 'Bilirubin-monoglucuronoside'
        cur_metabolites.loc[metabolites['BiGG ID'] == bigg_id, 'Formula'] = 'C39H42N4O122-'
    # chatGTP
    elif '3hoc246_6Z_9Z_12Z_15Z_18Z_21Zcoa' in bigg_id:
        cur_metabolites.loc[metabolites['BiGG ID'] == bigg_id, 'Name'] = 'CoA molecule that has a 24-carbon fatty acid with six double bonds, with the location of the double bonds specified by the numbers and Zs'
    # chatGTP
    elif 'c247_2Z_6Z_9Z_12Z_15Z_18Z_21Zcoa' in bigg_id:
        cur_metabolites.loc[metabolites['BiGG ID'] == bigg_id, 'Name'] = 'CoA molecule that has a modified version of the same 24-carbon fatty acid, with a hydroxyl group added at the third carbon position'
    # chatGTP
    elif '3hoc143_5Z_8Z_11Zcoa' in bigg_id:
        cur_metabolites.loc[metabolites['BiGG ID'] == bigg_id, 'Name'] = 'CoA molecule that has a 14-carbon fatty acid with three double bonds, with the location of the double bonds specified by the numbers and Zs.'
    # chatGTP
    elif '3oc143_5Z_8Z_11Zcoa' in bigg_id:
        cur_metabolites.loc[metabolites['BiGG ID'] == bigg_id, 'Name'] = 'CoA molecule that has a modified version of the same 14-carbon fatty acid, with the hydroxyl group removed and one of the double bonds converted to a keto group'
    # chatGTP
    elif 'acgalgalacglcgalgluside' in bigg_id:
        cur_metabolites.loc[metabolites['BiGG ID'] == bigg_id, 'Name'] = 'Complex glycosphingolipid that contains multiple sugar residues'

    # 12e8hdx WTF?
    # hdxur Dead End

cur_metabolites.to_excel('../Data/Metabolites/metabolites_final.xlsx')

## C. Addition of metabolites from Recon3D
The idea here is to add the metabolites from the reactions added from Recon 3D, at the same time that we do not over write data from our own Metabolites dataset.

In [51]:
subset = metabolites[~metabolites['BiGG ID'].isin(cur_metabolites['BiGG ID'])]
subset = subset.reset_index(drop=True)

In [52]:
subset

Unnamed: 0,BiGG ID,Name,Formula,Compartment,KEGG,CHEBI,PubChem,Inchi,Hepatonet,EHMNID,SMILES,INCHI2,CC_ID,Stereoisomer Information of Metabolite Identified,Charge of the Metabolite Identified,CID_ID,PDB (ligand-expo) Experimental Coordinates File Url,Pub Chem Url,ChEBI Url
0,11docrtsl_e,11-Deoxycortisol,C21H30O4,e - extracellular space,,,,,,,,,,,,,,,
1,11docrtstrn_e,11-Deoxycorticosterone,C21H30O3,e - extracellular space,,,,,,,,,,,,,,,
2,12dhchol_c,12-Dehydrocholic acid; 12-Oxodeoxycholic acid;...,C24H37O5,c - cytosol,,,,,,,,,,,,,,,
3,12dhchol_e,12-Dehydrocholic acid; 12-Oxodeoxycholic acid;...,C24H37O5,e - extracellular space,,,,,,,,,,,,,,,
4,tacr_r,Tacrolimus,C44H69NO12,r - endoplasmic reticulum,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3230,HC02098_c,3-Hydroxystearoyl-ACP,C18H35O2SR,c - cytosol,C16220,,,,,,CCCCCCCCCCCCCCCC(O)CC(=O)S[*],,,,Neutral,,,,https://www.ebi.ac.uk/chebi/searchId.do;jsessi...
3231,HC02099_c,(2E)-Octadecenoyl-ACP,C18H33OSR,c - cytosol,C16221,,,,,,CCCCCCCCCCCCCCC\C=C\C(=O)S[*],,,,Neutral,,,,http://www.ebi.ac.uk/chebi/searchId.do?chebiId...
3232,HC01988_c,Stearoyl-ACP,X,c - cytosol,C04088,,,,,,,,,,,,,,
3233,HC10856_m,"Trans,Cis-Octadeca-2,9-Dienoyl Coenzyme A",C39H62N7O17P3S,m - mitochondria,,,,,,,CCCCCCCC\C=C/CCCCC\C=C\C(=O)SCCNC(=O)CCNC(=O)[...,InChI=1S/C39H66N7O17P3S/c1-4-5-6-7-8-9-10-11-1...,,,Neutral,86289273.0,,https://pubchem.ncbi.nlm.nih.gov/compound/8628...,https://www.ebi.ac.uk/chebi/searchId.do;jsessi...


In [53]:
len(metabolites['BiGG ID'][~metabolites['BiGG ID'].isin(cur_metabolites['BiGG ID'])])

3235

In [54]:
# Transform the final Recon3D Metabolites dataset into a dictionary to map it into our dataset
final_met_dict = df_to_dict(models_metabolites, 'BiGG ID')

In [55]:
final_met_dict

{'10fthf5glu_c': {'Name': '10-Formyltetrahydrofolate-[Glu](5)',
  'Formula': 'C40H45N11O19',
  'Compartment': 'c',
  'KEGG': nan,
  'CHEBI': nan,
  'PubChem': nan,
  'Inchi': nan,
  'Hepatonet': nan,
  'EHMNID': nan,
  'SMILES': nan,
  'INCHI2': nan,
  'CC_ID': nan,
  'Stereoisomer Information of Metabolite Identified': nan,
  'Charge of the Metabolite Identified': nan,
  'CID_ID': nan,
  'PDB (ligand-expo) Experimental Coordinates  File Url': nan,
  'Pub Chem Url': nan,
  'ChEBI Url': nan},
 '10fthf5glu_e': {'Name': '10-Formyltetrahydrofolate-[Glu](5)',
  'Formula': 'C40H45N11O19',
  'Compartment': 'e',
  'KEGG': nan,
  'CHEBI': nan,
  'PubChem': nan,
  'Inchi': nan,
  'Hepatonet': nan,
  'EHMNID': nan,
  'SMILES': nan,
  'INCHI2': nan,
  'CC_ID': nan,
  'Stereoisomer Information of Metabolite Identified': nan,
  'Charge of the Metabolite Identified': nan,
  'CID_ID': nan,
  'PDB (ligand-expo) Experimental Coordinates  File Url': nan,
  'Pub Chem Url': nan,
  'ChEBI Url': nan},
 '10ft

In [56]:
subset[['Name', 'Formula', 'Compartment', 'KEGG','CHEBI', 'PubChem','Inchi', 'Hepatonet', 'EHMNID', 'SMILES',
             'INCHI2','CC_ID','Stereoisomer Information of Metabolite Identified', 'Charge of the Metabolite Identified',
    'CID_ID','PDB (ligand-expo) Experimental Coordinates  File Url', 'Pub Chem Url',
    'ChEBI Url']] = subset['BiGG ID'].apply(lambda x: pd.Series(final_met_dict.get(x, None), dtype=object))

In [59]:
# Update the Compartment column in the final dataset
for i,row in subset.iterrows():
    if row['Compartment'] == 'c':
        subset.loc[i, 'Compartment'] = 'c - cytosol'
    if row['Compartment'] == 'l':
        subset.loc[i, 'Compartment'] = 'l - lysosome'
    if row['Compartment'] == 'm':
        subset.loc[i, 'Compartment'] = 'm - mitochondria'
    if row['Compartment'] == 'r':
        subset.loc[i, 'Compartment'] = 'r - endoplasmic reticulum'
    if row['Compartment'] == 'e':
        subset.loc[i, 'Compartment'] = 'e - extracellular space'
    if row['Compartment'] == 'x':
        subset.loc[i, 'Compartment'] = 'x - peroxisome/glyoxysome'
    if row['Compartment'] == 'n':
        subset.loc[i, 'Compartment'] = 'n - nucleus'
    if row['Compartment'] == 'g':
        subset.loc[i, 'Compartment'] = 'g - golgi apparatus'
    if row['Compartment'] == 'im':
        subset.loc[i, 'Compartment'] = 'im - intermembrane space of mitochondria'

In [60]:
subset.to_excel('../Data/Metabolites/metabolites_recon3d_toadd.xlsx')