# Metabolites
In the first part of this notebook we create a dataframe containing all the available information for the metabolites accounted in our reconstruction. The dataframe generated will constitute the **"Metabolites Sheet"** in our reconstruction. In the second part of this notebook we curate and identify duplicated metabolites in our dataset. <br><br>
[1. Generation of Metabolites dataset](#generation) <br>
&nbsp;&nbsp;&nbsp;&nbsp;**1.1 Retrieve a list of all the metabolites from our reconstruction** <br>
&nbsp;&nbsp;&nbsp;&nbsp;**1.2 Retrieve information from all the metabolites on Recon3D, iCHO2291 and iCHO1766**<br>
&nbsp;&nbsp;&nbsp;&nbsp;**1.3 Add all the metabolites information into our metabolites dataset** <br>
&nbsp;&nbsp;&nbsp;&nbsp;**1.4 Unique metabolite identification** <br><br>
[2. Metabolites Curation](#curation) <br>
&nbsp;&nbsp;&nbsp;&nbsp;**2.1 Update missing information in metabolites dataset from BiGG** <br>
&nbsp;&nbsp;&nbsp;&nbsp;**2.2 Update missing information in metabolites dataset from other databases** <br>
&nbsp;&nbsp;&nbsp;&nbsp;**2.3 Identification of duplicated metabolites** <br>

<a id='generation'></a>
## 1. Generation of Metabolites dataset
We start by creating a list of all the metabolites included in the reactions of our reconstruction (1). Then we create a dataset containing all the metabolites info from Recon3D, iCHO2291 and iCHO1766 models, including supplementary information from Recon 3D (2). Now we can map back this information into the metabolites from our reconstruction and generate an excell file for uploading into Google Sheets (3). Finally, we estimate how many duplicated metabolites we have in our dataset by calculating occurences in different identifiers (5).

In [None]:
# Import libraries
import gspread
import pandas as pd
import numpy as np
import requests
import time

import cobra
from cobra import Model
from cobra.io import read_sbml_model

from tqdm.notebook import tqdm

from google_sheet import GoogleSheet
from utils import df_to_dict

### 1.1 Retrieve a list of all the metabolites from our reconstruction
The list of all the reactions and the metabolites involved are in the Rxns Sheet in the Google Sheet.

In [None]:
KEY_FILE_PATH = 'credentials.json'
SPREADSHEET_ID = '1MlBXeHIKw8k8fZyXm-sN__AHTRSunJxar_-bqvukZws'

# Initialize the GoogleSheet object
sheet = GoogleSheet(SPREADSHEET_ID, KEY_FILE_PATH)

# Read data from the Google Sheet and crete "rxns" df
sheet_rxns = 'Rxns'
rxns = sheet.read_google_sheet(sheet_rxns)

In [None]:
# Create a cobra model to identify the metabolites involved in our reconstruction
model = cobra.Model("iCHOxxxx")
lr = []

for _, row in rxns.iterrows():
    r = cobra.Reaction(row['Reaction'])
    lr.append(r)
    
model.add_reactions(lr)
model

In [None]:
# With the built in function "build_reaction_from_string" we can identify the metabolites
for i,r in enumerate(tqdm(model.reactions)):
    r.build_reaction_from_string(df['Reaction Formula'][i])

In [None]:
# We first create a list of the metabolites and then a pandas df with it
metabolites_list = []
for met in model.metabolites:
    metabolites_list.append(met.id)
    
metabolites = pd.DataFrame(metabolites_list, columns =['BiGG ID'])
metabolites

### 1.2 Retrieve information from all the metabolites on Recon3D, iCHO2291 and iCHO1766
We use two datasets for this, first we take information from the Recon3D.xml, iCHO2291.xml and iCHO1766 files from which we get the metabolite ID, Name, Formula and Compartment. We then add the metadata for the available metabolites from Recon3D supplementary files.

In [None]:
# read the Recon3D model
recon3d_model = read_sbml_model('../Data/GPR_Curation/Recon3D.xml')

In [None]:
# Generate a dataset containing all the metabolites, chemical formula of each metabolite and compartment
num_rows = len(recon3d_model.metabolites)
recon3d_model_metabolites = pd.DataFrame(index=range(num_rows), columns=['BiGG ID', 'Name', 'Formula', 'Compartment'])
for i,met in enumerate(recon3d_model.metabolites):
    id_ = met.id
    name = met.name
    formula = met.formula
    comp = met.compartment
    recon3d_model_metabolites.iloc[i] = [id_, name, formula, comp]

In [None]:
recon3d_model_metabolites

In [None]:
# read the Yeo's model
iCHO2291_model = read_sbml_model('../Data/Reconciliation/models/iCHO2291.xml')

In [None]:
# Generate a dataset containing all the metabolites, chemical formula of each metabolite and compartment from Yeo's model
num_rows = len(iCHO2291_model.metabolites)
iCHO2291_model_metabolites = pd.DataFrame(index=range(num_rows), columns=['BiGG ID', 'Name', 'Formula', 'Compartment'])
for i,met in enumerate(iCHO2291_model.metabolites):
    id_ = met.id
    name = met.name
    formula = met.formula
    comp = met.compartment
    iCHO2291_model_metabolites.iloc[i] = [id_, name, formula, comp]
    
iCHO2291_model_metabolites['BiGG ID'] = iCHO2291_model_metabolites['BiGG ID'].str.replace("[", "_", regex=False)
iCHO2291_model_metabolites['BiGG ID'] = iCHO2291_model_metabolites['BiGG ID'].str.replace("]", "", regex=False)
iCHO2291_model_metabolites

In [None]:
# read Hefzi's model
iCHO1766_model = read_sbml_model('../Data/Reconciliation/models/iCHOv1_final.xml')

In [None]:
# Generate a dataset containing all the metabolites, chemical formula of each metabolite and compartment from Hefzi's model
num_rows = len(iCHO1766_model.metabolites)
iCHO1766_model_metabolites = pd.DataFrame(index=range(num_rows), columns=['BiGG ID', 'Name', 'Formula', 'Compartment'])
for i,met in enumerate(iCHO1766_model.metabolites):
    id_ = met.id
    name = met.name
    formula = met.formula
    comp = met.compartment
    iCHO1766_model_metabolites.iloc[i] = [id_, name, formula, comp]

iCHO1766_model_metabolites

In [None]:
models_metabolites = pd.concat([recon3d_model_metabolites, iCHO2291_model_metabolites, iCHO1766_model_metabolites])
models_metabolites = models_metabolites.groupby('BiGG ID').first()
models_metabolites = models_metabolites.reset_index(drop = False)
models_metabolites

In [None]:
#Generation of a dataset containing all the information from Recon3D metabolites Supplementary Data.
recon3d_metabolites_meta = pd.read_excel('../Data/Metabolites/metabolites.recon3d.xlsx', header = 0)
recon3d_metabolites_meta['BiGG ID'] = recon3d_metabolites_meta['BiGG ID'].str.replace("[", "_", regex=False)
recon3d_metabolites_meta['BiGG ID'] = recon3d_metabolites_meta['BiGG ID'].str.replace("]", "", regex=False)
recon3d_metabolites_meta

In [None]:
# Transformation of the "recon3d_metabolites_meta" into a dict to map it into the "recon3d_model_metabolites"
recon3dmet_dict = df_to_dict(recon3d_metabolites_meta, 'BiGG ID')

In [None]:
# Mapping into the "recon3d_model_metabolites" dataset
models_metabolites[['KEGG','CHEBI', 'PubChem','Inchi', 'Hepatonet', 'EHMNID', 'SMILES', 'INCHI2',
                          'CC_ID','Stereoisomer Information of Metabolite Identified', 'Charge of the Metabolite Identified',
    'CID_ID','PDB (ligand-expo) Experimental Coordinates  File Url', 'Pub Chem Url',
    'ChEBI Url']] = models_metabolites['BiGG ID'].apply(lambda x: pd.Series(recon3dmet_dict.get(x, None), dtype=object))

In [None]:
models_metabolites

In [None]:
# Transform the final Recon3D Metabolites dataset into a dictionary to map it into our dataset
final_met_dict = df_to_dict(models_metabolites, 'BiGG ID')

### 1.3 Add all the metabolites information into our metabolites dataset
With the dictionary created in **Step 2** we can use the information to map it in the metabolites dataset created in **Step 1** which contains all the metabolites of our reconstruction.

In [None]:
metabolites[['Name', 'Formula', 'Compartment', 'KEGG','CHEBI', 'PubChem','Inchi', 'Hepatonet', 'EHMNID', 'SMILES',
             'INCHI2','CC_ID','Stereoisomer Information of Metabolite Identified', 'Charge of the Metabolite Identified',
    'CID_ID','PDB (ligand-expo) Experimental Coordinates  File Url', 'Pub Chem Url',
    'ChEBI Url']] = metabolites['BiGG ID'].apply(lambda x: pd.Series(final_met_dict.get(x, None), dtype=object))

In [None]:
# Update the Compartment column in the final dataset
for i,row in metabolites.iterrows():
    if row['Compartment'] == 'c':
        metabolites.loc[i, 'Compartment'] = 'c - cytosol'
    if row['Compartment'] == 'l':
        metabolites.loc[i, 'Compartment'] = 'l - lysosome'
    if row['Compartment'] == 'm':
        metabolites.loc[i, 'Compartment'] = 'm - mitochondria'
    if row['Compartment'] == 'r':
        metabolites.loc[i, 'Compartment'] = 'r - endoplasmic reticulum'
    if row['Compartment'] == 'e':
        metabolites.loc[i, 'Compartment'] = 'e - extracellular space'
    if row['Compartment'] == 'x':
        metabolites.loc[i, 'Compartment'] = 'x - peroxisome/glyoxysome'
    if row['Compartment'] == 'n':
        metabolites.loc[i, 'Compartment'] = 'n - nucleus'
    if row['Compartment'] == 'g':
        metabolites.loc[i, 'Compartment'] = 'g - golgi apparatus'
    if row['Compartment'] == 'im':
        metabolites.loc[i, 'Compartment'] = 'im - intermembrane space of mitochondria'

In [None]:
# The dataset generated is stored as an Excel file in the "Data" folder
metabolites.to_excel('../Data/Metabolites/metabolites.xlsx')

### 1.4 Unique metabolite identification
This next block of code gives us an idea of how many duplicated metabolites we have in our generated dataset based on the IDs, Name, Formula and KEGG IDs.

In [None]:
##### ----- Generate datasets from Google Sheet ----- #####

#Credential file
KEY_FILE_PATH = 'credentials.json'

# #CHO Network Reconstruction + Recon3D_v2 Google Sheet ID
# SPREADSHEET_ID = '1MlBXeHIKw8k8fZyXm-sN__AHTRSunJxar_-bqvukZws'

#CHO Network Reconstruction + Recon3D_v3 Google Sheet ID
SPREADSHEET_ID = '1MlBXeHIKw8k8fZyXm-sN__AHTRSunJxar_-bqvukZws'

# Initialize the GoogleSheet object
sheet = GoogleSheet(SPREADSHEET_ID, KEY_FILE_PATH)

# Read data from the Google Sheet
sheet_met = 'Metabolites'
metabolites = sheet.read_google_sheet(sheet_met)

In [None]:
print("Duplicated rxns by BiGG ID = ", len(metabolites['BiGG ID']) - len(metabolites['BiGG ID'].unique()))
print("Duplicated rxns by Name = ", len(metabolites['Name']) - len(metabolites['Name'].unique()))
print("Duplicated rxns by Formula = ", len(metabolites['Formula']) - len(metabolites['Formula'].unique()))
print("Duplicated rxns by KEGG = ", len(metabolites['KEGG']) - len(metabolites['KEGG'].unique()))

<a id='curation'></a>
## 2. Metabolites Curation
In this second part of the notebook we curate missing information in the metabolites dataset generated above. Since many metabolites have been manually curated in the "Metabolites" google sheet file, we generate a new dataframe using the gspread library to obtain the metabolites dataset with all the changes

In [1]:
# Import libraries
import pandas as pd
import numpy as np
import time
import requests
from bs4 import BeautifulSoup

from tqdm.notebook import tqdm

from google_sheet import GoogleSheet
from metabolite_identifiers import getPubchemCID, getChEMBLID, getCIDSmilesInChI, getCIDFormula

### 2.1 Update missing information in metabolites dataset from BiGG

In [None]:
#Generate the "metabolites" dataset from our Google Sheet file

#Credential file
KEY_FILE_PATH = 'credentials.json'

#CHO Network Reconstruction + Recon3D_v3 Google Sheet ID
SPREADSHEET_ID = '1MlBXeHIKw8k8fZyXm-sN__AHTRSunJxar_-bqvukZws'

# Initialize the GoogleSheet object
sheet = GoogleSheet(SPREADSHEET_ID, KEY_FILE_PATH)

# Read data from the Google Sheet
sheet_met = 'Metabolites'
metabolites = sheet.read_google_sheet(sheet_met)

In [None]:
# Get BiGG descriptive names from the BiGG database

# Unknown Mets: metabolites without names
unkown_mets = metabolites[metabolites['Name'] == '']

Descriptive_Names = [''] * len(unkown_mets)
Formulae = [''] * len(Descriptive_Names)
Changed = [True] * len(Descriptive_Names)

for Met_Counter, metID in enumerate(tqdm(unkown_mets['BiGG ID'].iloc[:])):
    print(Met_Counter)
    input_str = metID[:-2]
    response = requests.get(f"http://bigg.ucsd.edu/universal/metabolites/{input_str}")
    time.sleep(1)
    # Check if the request was successful
    if response.status_code != 200:
        D_Name = "BiGG ID not found in BiGG"
        Formulae_B = "BiGG ID not found in BiGG"
        Changed[Met_Counter] = False       
    else:    
        soup = BeautifulSoup(response.content, 'html.parser')
        N_Header = soup.find('h4', string='Descriptive name:')
        D_Name = N_Header.find_next_sibling('p').text
        N_Formulae = soup.find('h4', string='Formulae in BiGG models: ')
        Formulae_B = N_Formulae.find_next_sibling('p').text    
        if D_Name is None:
            D_Name = "Name not found in BiGG"            
        elif Formulae_B is None:
            Formulae_B = "Formula not found in BiGG"                
    Descriptive_Names[Met_Counter] = D_Name
    Formulae[Met_Counter] = Formulae_B

In [None]:
for Met_Counter, metID in enumerate(unkown_mets['BiGG ID']):
    print('before',unkown_mets['BiGG ID'].iloc[Met_Counter])
    print('before',unkown_mets['Formula'].iloc[Met_Counter])
    print('before',unkown_mets['Name'].iloc[Met_Counter])
    if unkown_mets['Formula'].iloc[Met_Counter] == '':
        unkown_mets['Formula'].iloc[Met_Counter] = Formulae[Met_Counter]  
    unkown_mets['Name'].iloc[Met_Counter] = Descriptive_Names[Met_Counter]
    print('..............................................')
    print('after',unkown_mets['BiGG ID'].iloc[Met_Counter])
    print('after',unkown_mets['Formula'].iloc[Met_Counter])
    print('after',unkown_mets['Name'].iloc[Met_Counter])
    print('..............................................')
    print('..............................................')
    print('..............................................')

In [None]:
metabolites.update(unkown_mets)

# Manual Curation
for bigg_id in metabolites['BiGG ID']:
    # xtra = Xanthurenic acid; C10H6NO4
    # http://bigg.ucsd.edu/models/iCHOv1/reactions/r0647
    if 'xtra' in bigg_id:
        metabolites.loc[metabolites['BiGG ID'] == bigg_id, 'Name'] = 'Xanthurenic acid'
        metabolites.loc[metabolites['BiGG ID'] == bigg_id, 'Formula'] = 'C10H6NO4'
    # chedxch = Bilirubin-monoglucuronoside; C39H42N4O122-
    # Reactions name = 'ATP-binding Cassette (ABC) TCDB:3.A.1.208.2' --> https://metabolicatlas.org/identifier/TCDB/3.A.1.208.2
    elif 'chedxch' in bigg_id:
        metabolites.loc[metabolites['BiGG ID'] == bigg_id, 'Name'] = 'Bilirubin-monoglucuronoside'
        metabolites.loc[metabolites['BiGG ID'] == bigg_id, 'Formula'] = 'C39H42N4O122-'
    # chatGTP
    elif '3hoc246_6Z_9Z_12Z_15Z_18Z_21Zcoa' in bigg_id:
        metabolites.loc[metabolites['BiGG ID'] == bigg_id, 'Name'] = 'CoA molecule that has a 24-carbon fatty acid with six double bonds, with the location of the double bonds specified by the numbers and Zs'
    # chatGTP
    elif 'c247_2Z_6Z_9Z_12Z_15Z_18Z_21Zcoa' in bigg_id:
        metabolites.loc[metabolites['BiGG ID'] == bigg_id, 'Name'] = 'CoA molecule that has a modified version of the same 24-carbon fatty acid, with a hydroxyl group added at the third carbon position'
    # chatGTP
    elif '3hoc143_5Z_8Z_11Zcoa' in bigg_id:
        metabolites.loc[metabolites['BiGG ID'] == bigg_id, 'Name'] = 'CoA molecule that has a 14-carbon fatty acid with three double bonds, with the location of the double bonds specified by the numbers and Zs.'
    # chatGTP
    elif '3oc143_5Z_8Z_11Zcoa' in bigg_id:
        metabolites.loc[metabolites['BiGG ID'] == bigg_id, 'Name'] = 'CoA molecule that has a modified version of the same 14-carbon fatty acid, with the hydroxyl group removed and one of the double bonds converted to a keto group'
    # chatGTP
    elif 'acgalgalacglcgalgluside' in bigg_id:
        metabolites.loc[metabolites['BiGG ID'] == bigg_id, 'Name'] = 'Complex glycosphingolipid that contains multiple sugar residues'

    # 12e8hdx WTF?
    # hdxur Dead End

metabolites.to_excel('../Data/Metabolites/metabolites_final.xlsx')

### 2.2 Update missing information in metabolites dataset from other databases
Here we use different functions from the "metabolites" module to try to fetch Inchi, SMILES and database identifiers for all the metabolites in our reconstruction

In [2]:
#Credential file
KEY_FILE_PATH = 'credentials.json'

#CHO Network Reconstruction + Recon3D_v3 Google Sheet ID
SPREADSHEET_ID = '1MlBXeHIKw8k8fZyXm-sN__AHTRSunJxar_-bqvukZws'

# Initialize the GoogleSheet object
sheet = GoogleSheet(SPREADSHEET_ID, KEY_FILE_PATH)

# Read data from the Google Sheet
sheet_met = 'Metabolites'
metabolites = sheet.read_google_sheet(sheet_met)

In [None]:
# Get PubChem IDs using the getPubchemCID() function

counter = 0
no_match = [] #create an empty list with PubChem IDs that don't match with the formulas in the dataset
for i,met in tqdm(metabolites.iterrows()):
    cmp = met['Name']
    if met['PubChem']=='NaN':
        pubchem_id = getPubchemCID(cmp,'')
         
        if pubchem_id:
            if (len(pubchem_id)>1): #If there is more than 1 Pubchem ID, check which one correspond to our metabolite
                match_found = False
                for _id in pubchem_id:
                    form = getCIDFormula(_id)
                    
                    # Compare the formula obtained from the PubChem ID to the one in our dataset
                    if (form == met['Formula']):
                        match_found = True
                        metabolites.loc[i, 'PubChem'] = _id
                        print('Match found:'+met['BiGG ID'], _id)
                        break # break the loop as we found the match
                        
                if not match_found:  # if no match was found
                    _id = pubchem_id[0]  # take the first ID in the pubchem_id list
                    metabolites.loc[i, 'PubChem'] = _id  
                    print('Not match found:'+met['BiGG ID'], pubchem_id)
                    no_match.append([met['BiGG ID'], pubchem_id])
                    
            # If there is only one ID associated to that metabolite        
            else:
                metabolites.loc[i, 'PubChem'] = pubchem_id[0]
                print(met['BiGG ID'], pubchem_id[0])
            counter +=1
            print(counter)


In [3]:
# Get the Inchi and SMILES for the metabolites with PubChem IDs retrieved previously

counter = 0
for i,met in metabolites.iterrows():
    if (met['PubChem'] != 'NaN' and (met['Inchi']=='' or met['SMILES']=='')):
        try:
            Inchi_SMILES = getCIDSmilesInChI(met['PubChem'])
            SMILES = Inchi_SMILES[0]
            Inchi = Inchi_SMILES[1]
            
            if met['Inchi']=='':
                metabolites.loc[i, 'Inchi'] = Inchi
            if met['SMILES']=='':
                metabolites.loc[i, 'SMILES'] = SMILES
                
            print(met['BiGG ID'])
            print(SMILES)
            print(Inchi)
            print('............')
        except KeyError:
            print(met['BiGG ID']+' Inchi and SMILES cannot be retrieved')
        
        counter +=1
        print(counter)

13_cis_oretn_c
CC1=C(C(CCC1=O)(C)C)C=CC(=CC=CC(=CC(=O)O)C)C
InChI=1S/C20H26O3/c1-14(7-6-8-15(2)13-19(22)23)9-10-17-16(3)18(21)11-12-20(17,4)5/h6-10,13H,11-12H2,1-5H3,(H,22,23)/b8-6+,10-9+,14-7+,15-13-
............
1
13_cis_oretn_n
CC1=C(C(CCC1=O)(C)C)C=CC(=CC=CC(=CC(=O)O)C)C
InChI=1S/C20H26O3/c1-14(7-6-8-15(2)13-19(22)23)9-10-17-16(3)18(21)11-12-20(17,4)5/h6-10,13H,11-12H2,1-5H3,(H,22,23)/b8-6+,10-9+,14-7+,15-13-
............
2
13_cis_retn_c
CC1=C(C(CCC1)(C)C)C=CC(=CC=CC(=CC(=O)O)C)C
InChI=1S/C20H28O2/c1-15(8-6-9-16(2)14-19(21)22)11-12-18-17(3)10-7-13-20(18,4)5/h6,8-9,11-12,14H,7,10,13H2,1-5H3,(H,21,22)/b9-6+,12-11+,15-8+,16-14-
............
3
13_cis_retn_n
CC1=C(C(CCC1)(C)C)C=CC(=CC=CC(=CC(=O)O)C)C
InChI=1S/C20H28O2/c1-15(8-6-9-16(2)14-19(21)22)11-12-18-17(3)10-7-13-20(18,4)5/h6,8-9,11-12,14H,7,10,13H2,1-5H3,(H,21,22)/b9-6+,12-11+,15-8+,16-14-
............
4
13_cis_retn_r
CC1=C(C(CCC1)(C)C)C=CC(=CC=CC(=CC(=O)O)C)C
InChI=1S/C20H28O2/c1-15(8-6-9-16(2)14-19(21)22)11-12-18-17(3)10-7-13-20

3ivcoa_m
CC(C)(CC(=O)SCCNC(=O)CCNC(=O)C(C(C)(C)COP(=O)(O)OP(=O)(O)OCC1C(C(C(O1)N2C=NC3=C(N=CN=C32)N)O)OP(=O)(O)O)O)O
InChI=1S/C26H44N7O18P3S/c1-25(2,20(37)23(38)29-6-5-15(34)28-7-8-55-16(35)9-26(3,4)39)11-48-54(45,46)51-53(43,44)47-10-14-19(50-52(40,41)42)18(36)24(49-14)33-13-32-17-21(27)30-12-31-22(17)33/h12-14,18-20,24,36-37,39H,5-11H2,1-4H3,(H,28,34)(H,29,38)(H,43,44)(H,45,46)(H2,27,30,31)(H2,40,41,42)/t14-,18-,19-,20?,24-/m1/s1
............
24
3m4hpga_c
COC1=C(C=CC(=C1)C(C=O)O)O
InChI=1S/C9H10O4/c1-13-9-4-6(8(12)5-10)2-3-7(9)11/h2-5,8,11-12H,1H3
............
25
3m4hpga_m
COC1=C(C=CC(=C1)C(C=O)O)O
InChI=1S/C9H10O4/c1-13-9-4-6(8(12)5-10)2-3-7(9)11/h2-5,8,11-12H,1H3
............
26
3mox4hoxm_c
COC1=C(C=CC(=C1)C(C(=O)O)O)O
InChI=1S/C9H10O5/c1-14-7-4-5(2-3-6(7)10)8(11)9(12)13/h2-4,8,10-11H,1H3,(H,12,13)
............
27
3ohdcoa_x
CCCCCCCCCCCCCC(=O)CC(=O)SCCNC(=O)CCNC(=O)C(C(C)(C)COP(=O)(O)OP(=O)(O)OCC1C(C(C(O1)N2C=NC3=C(N=CN=C32)N)O)OP(=O)(O)O)O
InChI=1S/C37H64N7O18P3S/c1-4-5-6-7-8-9-10-

6dhf_l
C1=CC(=CC=C1C(=O)N(C(=O)CCC(C(=O)O)N)C(C(=O)CCC(C(=O)O)N)(C(=O)OC(=O)CCC(C(=O)O)N)C(CC(=O)O)(C(=O)CCC(C(=O)O)N)C(=O)CCC(C(=O)O)N)NCC2=CN=C3C(=N2)C(=O)N=C(N3)N
InChI=1S/C44H54N12O21/c45-21(36(66)67)5-10-26(57)43(15-30(61)62,27(58)11-6-22(46)37(68)69)44(28(59)12-7-23(47)38(70)71,41(76)77-31(63)14-9-25(49)40(74)75)56(29(60)13-8-24(48)39(72)73)35(65)18-1-3-19(4-2-18)51-16-20-17-52-33-32(53-20)34(64)55-42(50)54-33/h1-4,17,21-25,51H,5-16,45-49H2,(H,61,62)(H,66,67)(H,68,69)(H,70,71)(H,72,73)(H,74,75)(H3,50,52,54,55,64)/t21-,22-,23-,24-,25-,44+/m0/s1
............
52
6dhf_m
C1=CC(=CC=C1C(=O)N(C(=O)CCC(C(=O)O)N)C(C(=O)CCC(C(=O)O)N)(C(=O)OC(=O)CCC(C(=O)O)N)C(CC(=O)O)(C(=O)CCC(C(=O)O)N)C(=O)CCC(C(=O)O)N)NCC2=CN=C3C(=N2)C(=O)N=C(N3)N
InChI=1S/C44H54N12O21/c45-21(36(66)67)5-10-26(57)43(15-30(61)62,27(58)11-6-22(46)37(68)69)44(28(59)12-7-23(47)38(70)71,41(76)77-31(63)14-9-25(49)40(74)75)56(29(60)13-8-24(48)39(72)73)35(65)18-1-3-19(4-2-18)51-16-20-17-52-33-32(53-20)34(64)55-42(50)54-33/h1-4,17,

CE0713_x
CCCCCC=CCC=CCCCCCC(=O)CC(=O)SCCNC(=O)CCNC(=O)C(C(C)(C)COP(=O)(O)OP(=O)(O)OCC1C(C(C(O1)N2C=NC3=C(N=CN=C32)N)O)OP(=O)(O)O)O
InChI=1S/C39H64N7O18P3S/c1-4-5-6-7-8-9-10-11-12-13-14-15-16-17-27(47)22-30(49)68-21-20-41-29(48)18-19-42-37(52)34(51)39(2,3)24-61-67(58,59)64-66(56,57)60-23-28-33(63-65(53,54)55)32(50)38(62-28)46-26-45-31-35(40)43-25-44-36(31)46/h8-9,11-12,25-26,28,32-34,38,50-51H,4-7,10,13-24H2,1-3H3,(H,41,48)(H,42,52)(H,56,57)(H,58,59)(H2,40,43,44)(H2,53,54,55)/b9-8-,12-11-/t28-,32-,33-,34+,38-/m1/s1
............
88
CE0785_m
CCCCCC=CCC=CCCCC(=O)SCCNC(=O)CCNC(=O)C(C(C)(C)COP(=O)(O)OP(=O)(O)OCC1C(C(C(O1)N2C=NC3=C(N=CN=C32)N)O)OP(=O)(O)O)O
InChI=1S/C35H58N7O17P3S/c1-4-5-6-7-8-9-10-11-12-13-14-15-26(44)63-19-18-37-25(43)16-17-38-33(47)30(46)35(2,3)21-56-62(53,54)59-61(51,52)55-20-24-29(58-60(48,49)50)28(45)34(57-24)42-23-41-27-31(36)39-22-40-32(27)42/h8-9,11-12,22-24,28-30,34,45-46H,4-7,10,13-21H2,1-3H3,(H,37,43)(H,38,47)(H,51,52)(H,53,54)(H2,36,39,40)(H2,48,49,50)/b9-8-,12-1

CE1761_c
CC1=C(C(CCC1O)(C)C)C=CC(=CC=CC(=CCO)C)C
InChI=1S/C20H30O2/c1-15(7-6-8-16(2)12-14-21)9-10-18-17(3)19(22)11-13-20(18,4)5/h6-10,12,19,21-22H,11,13-14H2,1-5H3
............
116
CE1761_r
CC1=C(C(CCC1O)(C)C)C=CC(=CC=CC(=CCO)C)C
InChI=1S/C20H30O2/c1-15(7-6-8-16(2)12-14-21)9-10-18-17(3)19(22)11-13-20(18,4)5/h6-10,12,19,21-22H,11,13-14H2,1-5H3
............
117
CE1918_c
C1=CC2=C(C=C1O)C(=CN2)CCO
InChI=1S/C10H11NO2/c12-4-3-7-6-11-10-2-1-8(13)5-9(7)10/h1-2,5-6,11-13H,3-4H2
............
118
CE1918_m
C1=CC2=C(C=C1O)C(=CN2)CCO
InChI=1S/C10H11NO2/c12-4-3-7-6-11-10-2-1-8(13)5-9(7)10/h1-2,5-6,11-13H,3-4H2
............
119
CE1925_c
CC1=C(C2=C(CCC(O2)(C)CCC(=O)O)C(=C1O)C)C
InChI=1S/C16H22O4/c1-9-10(2)15-12(11(3)14(9)19)5-7-16(4,20-15)8-6-13(17)18/h19H,5-8H2,1-4H3,(H,17,18)
............
120
CE1925_e
CC1=C(C2=C(CCC(O2)(C)CCC(=O)O)C(=C1O)C)C
InChI=1S/C16H22O4/c1-9-10(2)15-12(11(3)14(9)19)5-7-16(4,20-15)8-6-13(17)18/h19H,5-8H2,1-4H3,(H,17,18)
............
121
CE1925_l
CC1=C(C2=C(CCC(O2)(C)CCC(=O)O)C(=

CE2421_c
CCCCCC=CCC=CCCCCCC(CC(=O)SCCNC(=O)CCNC(=O)C(C(C)(C)COP(=O)([O-])OP(=O)([O-])OCC1C(C(C(O1)N2C=NC3=C(N=CN=C32)N)O)OP(=O)([O-])[O-])O)O
InChI=1S/C39H66N7O18P3S/c1-4-5-6-7-8-9-10-11-12-13-14-15-16-17-27(47)22-30(49)68-21-20-41-29(48)18-19-42-37(52)34(51)39(2,3)24-61-67(58,59)64-66(56,57)60-23-28-33(63-65(53,54)55)32(50)38(62-28)46-26-45-31-35(40)43-25-44-36(31)46/h8-9,11-12,25-28,32-34,38,47,50-51H,4-7,10,13-24H2,1-3H3,(H,41,48)(H,42,52)(H,56,57)(H,58,59)(H2,40,43,44)(H2,53,54,55)/p-4/b9-8-,12-11-/t27-,28+,32-,33-,34?,38+/m0/s1
............
169
CE2445_c
CCCCCC=CCC(C=CC=CC=CC(CCCC(=O)O)O)O
InChI=1S/C20H32O4/c1-2-3-4-5-6-9-13-18(21)14-10-7-8-11-15-19(22)16-12-17-20(23)24/h6-11,14-15,18-19,21-22H,2-5,12-13,16-17H2,1H3,(H,23,24)/b8-7+,9-6-,14-10+,15-11+/t18-,19-/m1/s1
............
170
CE2446_m
CCCCCC=CCC(C=CC=CC=CC(CCCC(=O)O)O)O
InChI=1S/C20H32O4/c1-2-3-4-5-6-9-13-18(21)14-10-7-8-11-15-19(22)16-12-17-20(23)24/h6-11,14-15,18-19,21-22H,2-5,12-13,16-17H2,1H3,(H,23,24)/b8-7+,9-6-,14-10+,1

CE2838_e
C(C1C(C(C(C(O1)OCC(C(C(C(C=O)OC2C(C(C(C(O2)CO)OC3C(C(C(C(O3)CO)O)O)O)OC4C(C(C(C(O4)CO)O)O)O)OC5C(C(C(C(O5)CO)O)O)O)OC6C(C(C(C(O6)CO)O)O)O)OC7C(C(C(C(O7)CO)O)O)O)OC8C(C(C(C(O8)CO)O)O)O)O)O)O)O
InChI=1S/C54H92O46/c55-1-11-21(64)28(71)35(78)47(86-11)85-10-20(95-48-36(79)29(72)22(65)12(2-56)87-48)44(98-51-39(82)32(75)25(68)15(5-59)90-51)42(96-49-37(80)30(73)23(66)13(3-57)88-49)18(8-62)93-54-46(100-53-41(84)34(77)27(70)17(7-61)92-53)45(99-52-40(83)33(76)26(69)16(6-60)91-52)43(19(9-63)94-54)97-50-38(81)31(74)24(67)14(4-58)89-50/h8,11-61,63-84H,1-7,9-10H2/t11-,12-,13-,14-,15-,16-,17-,18+,19-,20-,21-,22-,23-,24-,25-,26-,27-,28+,29+,30+,31+,32+,33+,34+,35-,36-,37-,38-,39-,40-,41-,42-,43-,44-,45+,46-,47+,48-,49-,50-,51-,52-,53-,54+/m1/s1
............
195
CE2839_c
C(C1C(C(C(C(O1)OC2C(OC(C(C2O)O)OC3C(OC(C(C3O)O)OC4C(OC(C(C4O)O)OC5C(OC(C(C5O)O)OC6C(OC(C(C6O)O)OC7C(OC(C(C7O)O)OC8C(OC(C(C8O)O)OC9C(OC(C(C9O)O)OC1C(OC(C(C1O)O)O)CO)CO)CO)CO)CO)CO)CO)CO)CO)O)O)O)O
InChI=1S/C60H102O51/c61-1-11-21

CE2917_c
CC(C)CC(C(=O)NC(CO)C(=O)NC(CC1=CN=CN1)C(=O)NC(CCCCN)C(=O)NCC(=O)N2CCCC2C(=O)NC(CCSC)C(=O)N3CCCC3C(=O)NC(CC4=CC=CC=C4)C(=O)O)NC(=O)C(CCCN=C(N)N)NC(=O)C5CCCN5C(=O)C(CCCN=C(N)N)NC(=O)C(CCC(=O)N)N
InChI=1S/C69H111N23O16S/c1-39(2)32-47(86-58(98)44(17-9-26-78-68(73)74)83-63(103)52-20-12-29-91(52)65(105)45(18-10-27-79-69(75)76)84-56(96)42(71)22-23-54(72)94)59(99)89-50(37-93)61(101)87-48(34-41-35-77-38-81-41)60(100)82-43(16-7-8-25-70)57(97)80-36-55(95)90-28-11-19-51(90)62(102)85-46(24-31-109-3)66(106)92-30-13-21-53(92)64(104)88-49(67(107)108)33-40-14-5-4-6-15-40/h4-6,14-15,35,38-39,42-53,93H,7-13,16-34,36-37,70-71H2,1-3H3,(H2,72,94)(H,77,81)(H,80,97)(H,82,100)(H,83,103)(H,84,96)(H,85,102)(H,86,98)(H,87,101)(H,88,104)(H,89,99)(H,107,108)(H4,73,74,78)(H4,75,76,79)/t42-,43+,44-,45-,46+,47-,48-,49+,50-,51+,52+,53-/m0/s1
............
217
CE2917_e
CC(C)CC(C(=O)NC(CO)C(=O)NC(CC1=CN=CN1)C(=O)NC(CCCCN)C(=O)NCC(=O)N2CCCC2C(=O)NC(CCSC)C(=O)N3CCCC3C(=O)NC(CC4=CC=CC=C4)C(=O)O)NC(=O)C(CCCN=C(N)N)NC

CE4898_c
CC1=C(C=C2CCC(OC2=C1C)(C)CCCC(C)CCCC(C)CCCC(C)C(=O)O)O
InChI=1S/C28H46O4/c1-19(12-8-14-21(3)27(30)31)10-7-11-20(2)13-9-16-28(6)17-15-24-18-25(29)22(4)23(5)26(24)32-28/h18-21,29H,7-17H2,1-6H3,(H,30,31)/t19-,20+,21?,28-/m1/s1
............
246
CE4898_r
CC1=C(C=C2CCC(OC2=C1C)(C)CCCC(C)CCCC(C)CCCC(C)C(=O)O)O
InChI=1S/C28H46O4/c1-19(12-8-14-21(3)27(30)31)10-7-11-20(2)13-9-16-28(6)17-15-24-18-25(29)22(4)23(5)26(24)32-28/h18-21,29H,7-17H2,1-6H3,(H,30,31)/t19-,20+,21?,28-/m1/s1
............
247
CE4980_c
CCCCCC(C=CC1C(CC(=O)C1CC=CCCCC(=O)OC(CO)CO)O)O
InChI=1S/C23H38O7/c1-2-3-6-9-17(26)12-13-20-19(21(27)14-22(20)28)10-7-4-5-8-11-23(29)30-18(15-24)16-25/h4,7,12-13,17-20,22,24-26,28H,2-3,5-6,8-11,14-16H2,1H3/b7-4-,13-12+/t17-,19-,20-,22-/m0/s1
............
248
CE4987_c
CCCCCC=CCC(CCC=CC=CC(CCCC(=O)O)O)O
InChI=1S/C20H34O4/c1-2-3-4-5-6-9-13-18(21)14-10-7-8-11-15-19(22)16-12-17-20(23)24/h6-9,11,15,18-19,21-22H,2-5,10,12-14,16-17H2,1H3,(H,23,24)/b8-7+,9-6-,15-11-/t18-,19-/m0/s1
............
24

CE5158_c
CCCCCCCCC=CCCCCCCCCCCCC=CC(=O)SCCNC(=O)CCNC(=O)C(C(C)(C)COP(=O)([O-])OP(=O)([O-])OCC1C(C(C(O1)N2C=NC3=C(N=CN=C32)N)O)OP(=O)([O-])[O-])O
InChI=1S/C45H78N7O17P3S/c1-4-5-6-7-8-9-10-11-12-13-14-15-16-17-18-19-20-21-22-23-24-25-36(54)73-29-28-47-35(53)26-27-48-43(57)40(56)45(2,3)31-66-72(63,64)69-71(61,62)65-30-34-39(68-70(58,59)60)38(55)44(67-34)52-33-51-37-41(46)49-32-50-42(37)52/h11-12,24-25,32-34,38-40,44,55-56H,4-10,13-23,26-31H2,1-3H3,(H,47,53)(H,48,57)(H,61,62)(H,63,64)(H2,46,49,50)(H2,58,59,60)/p-4/b12-11-,25-24+/t34-,38-,39-,40+,44-/m1/s1
............
271
CE5162_c
CCCCCCCCC=CCCCCCC=CC(=O)SCCNC(=O)CCNC(=O)C(C(C)(C)COP(=O)([O-])OP(=O)([O-])OCC1C(C(C(O1)N2C=NC3=C(N=CN=C32)N)O)OP(=O)([O-])[O-])O
InChI=1S/C39H66N7O17P3S/c1-4-5-6-7-8-9-10-11-12-13-14-15-16-17-18-19-30(48)67-23-22-41-29(47)20-21-42-37(51)34(50)39(2,3)25-60-66(57,58)63-65(55,56)59-24-28-33(62-64(52,53)54)32(49)38(61-28)46-27-45-31-35(40)43-26-44-36(31)46/h11-12,18-19,26-28,32-34,38,49-50H,4-10,13-17,20-25H2,1-3H3,

CE5244_r
CC12CCC3C(C1CCC2=O)CCC4=C(C(=C(C=C34)SCC(C(=O)NCC(=O)O)NC(=O)CCC(C(=O)O)N)O)O
InChI=1S/C28H37N3O9S/c1-28-9-8-13-14(17(28)4-6-21(28)32)2-3-15-16(13)10-20(25(37)24(15)36)41-12-19(26(38)30-11-23(34)35)31-22(33)7-5-18(29)27(39)40/h10,13-14,17-19,36-37H,2-9,11-12,29H2,1H3,(H,30,38)(H,31,33)(H,34,35)(H,39,40)/t13?,14?,17?,18-,19-,28+/m1/s1
............
291
CE5244_x
CC12CCC3C(C1CCC2=O)CCC4=C(C(=C(C=C34)SCC(C(=O)NCC(=O)O)NC(=O)CCC(C(=O)O)N)O)O
InChI=1S/C28H37N3O9S/c1-28-9-8-13-14(17(28)4-6-21(28)32)2-3-15-16(13)10-20(25(37)24(15)36)41-12-19(26(38)30-11-23(34)35)31-22(33)7-5-18(29)27(39)40/h10,13-14,17-19,36-37H,2-9,11-12,29H2,1H3,(H,30,38)(H,31,33)(H,34,35)(H,39,40)/t13?,14?,17?,18-,19-,28+/m1/s1
............
292
CE5251_c
CC12CCC3C(C1CCC2=O)CCC4=C3C=CC(=O)C4=O
InChI=1S/C18H20O3/c1-18-9-8-11-10-4-6-15(19)17(21)13(10)3-2-12(11)14(18)5-7-16(18)20/h4,6,11-12,14H,2-3,5,7-9H2,1H3/t11-,12-,14+,18+/m1/s1
............
293
CE5251_m
CC12CCC3C(C1CCC2=O)CCC4=C3C=CC(=O)C4=O
InChI=1S/C18H20O3/c1-18-

CE5786_e
CCC(C)C(C(=O)NC(C)C(=O)NC(CCCN=C(N)N)C(=O)NC(CCCN=C(N)N)C(=O)NC(CC1=CN=CN1)C(=O)N2CCCC2C(=O)NC(CC3=CC=C(C=C3)O)C(=O)NC(CC4=CC=CC=C4)C(=O)NC(CC(C)C)C(=O)O)N
InChI=1S/C56H85N17O11/c1-6-32(4)45(57)52(81)66-33(5)46(75)67-38(15-10-22-63-55(58)59)47(76)68-39(16-11-23-64-56(60)61)48(77)71-42(28-36-29-62-30-65-36)53(82)73-24-12-17-44(73)51(80)70-41(27-35-18-20-37(74)21-19-35)49(78)69-40(26-34-13-8-7-9-14-34)50(79)72-43(54(83)84)25-31(2)3/h7-9,13-14,18-21,29-33,38-45,74H,6,10-12,15-17,22-28,57H2,1-5H3,(H,62,65)(H,66,81)(H,67,75)(H,68,76)(H,69,78)(H,70,80)(H,71,77)(H,72,79)(H,83,84)(H4,58,59,63)(H4,60,61,64)/t32-,33-,38-,39-,40-,41-,42-,43-,44-,45-/m0/s1
............
330
CE5787_e
CCC(C)C(C(=O)NC(C)C(=O)NC(CCCN=C(N)N)C(=O)O)N
InChI=1S/C15H30N6O4/c1-4-8(2)11(16)13(23)20-9(3)12(22)21-10(14(24)25)6-5-7-19-15(17)18/h8-11H,4-7,16H2,1-3H3,(H,20,23)(H,21,22)(H,24,25)(H4,17,18,19)/t8-,9-,10+,11-/m0/s1
............
331
CE5788_c
CCC(C)C(C(=O)NC(C)C(=O)NC(CCCN=C(N)N)C(=O)NC(CCCN=C(N)N)C(=O)NC(CC1=C

CE5843_c
CC1=C(C2=C(CCC(O2)(C)CCCC(C)CCCC(C)CCCC(C)C(=O)O)C(=C1O)C)C
InChI=1S/C29H48O4/c1-19(13-9-15-21(3)28(31)32)11-8-12-20(2)14-10-17-29(7)18-16-25-24(6)26(30)22(4)23(5)27(25)33-29/h19-21,30H,8-18H2,1-7H3,(H,31,32)/t19-,20+,21?,29+/m1/s1
............
349
CE5843_r
CC1=C(C2=C(CCC(O2)(C)CCCC(C)CCCC(C)CCCC(C)C(=O)O)C(=C1O)C)C
InChI=1S/C29H48O4/c1-19(13-9-15-21(3)28(31)32)11-8-12-20(2)14-10-17-29(7)18-16-25-24(6)26(30)22(4)23(5)27(25)33-29/h19-21,30H,8-18H2,1-7H3,(H,31,32)/t19-,20+,21?,29+/m1/s1
............
350
CE5860_c
CC(=O)NCCC(=O)C1=C(C=CC(=C1)OC)N
InChI=1S/C12H16N2O3/c1-8(15)14-6-5-12(16)10-7-9(17-2)3-4-11(10)13/h3-4,7H,5-6,13H2,1-2H3,(H,14,15)
............
351
CE5865_c
CC(C)CC(C(=O)NC(CO)C(=O)O)N
InChI=1S/C9H18N2O4/c1-5(2)3-6(10)8(13)11-7(4-12)9(14)15/h5-7,12H,3-4,10H2,1-2H3,(H,11,13)(H,14,15)/t6-,7+/m1/s1
............
352
CE5866_c
CC(C)CC(C(=O)O)NC(=O)C(C)N
InChI=1S/C9H18N2O3/c1-5(2)4-7(9(13)14)11-8(12)6(3)10/h5-7H,4,10H2,1-3H3,(H,11,12)(H,13,14)/t6-,7+/m1/s1
............
353
CE5

CE6182_x
CC(C)(COP(=O)([O-])OP(=O)([O-])OCC1C(C(C(O1)N2C=NC3=C(N=CN=C32)N)O)OP(=O)([O-])[O-])C(C(=O)NCCC(=O)NCCSC(=O)CCCCC=CCC(C=CC=CC=CC(CCCC(=O)O)O)O)O
InChI=1S/C41H64N7O21P3S/c1-41(2,24-66-72(63,64)69-71(61,62)65-23-29-35(68-70(58,59)60)34(55)40(67-29)48-26-47-33-37(42)45-25-46-38(33)48)36(56)39(57)44-20-19-30(51)43-21-22-73-32(54)18-11-5-3-4-8-13-27(49)14-9-6-7-10-15-28(50)16-12-17-31(52)53/h4,6-10,14-15,25-29,34-36,40,49-50,55-56H,3,5,11-13,16-24H2,1-2H3,(H,43,51)(H,44,57)(H,52,53)(H,61,62)(H,63,64)(H2,42,45,46)(H2,58,59,60)/p-4/b7-6+,8-4-,14-9+,15-10-/t27-,28-,29+,34-,35-,36?,40+/m0/s1
............
377
CE6183_c
CC(C)(COP(=O)([O-])OP(=O)([O-])OCC1C(C(C(O1)N2C=NC3=C(N=CN=C32)N)O)OP(=O)([O-])[O-])C(C(=O)NCCC(=O)NCCSC(=O)CC(=O)CCC=CCC(C=CC=CC=CC(CCCC(=O)O)O)O)O
InChI=1S/C41H62N7O22P3S/c1-41(2,23-67-73(64,65)70-72(62,63)66-22-29-35(69-71(59,60)61)34(56)40(68-29)48-25-47-33-37(42)45-24-46-38(33)48)36(57)39(58)44-18-17-30(52)43-19-20-74-32(55)21-28(51)14-9-5-8-13-26(49)11-6-3-4-7-12-27(

CE6228_x
CC(C)(COP(=O)([O-])OP(=O)([O-])OCC1C(C(C(O1)N2C=NC3=C(N=CN=C32)N)O)OP(=O)([O-])[O-])C(C(=O)NCCC(=O)NCCSC(=O)CC=CC=CC=CC=CC(C(CCCC(=O)O)O)SCC(C(=O)O)N)O
InChI=1S/C40H61N8O22P3S2/c1-40(2,21-67-73(64,65)70-72(62,63)66-19-26-33(69-71(59,60)61)32(54)38(68-26)48-23-47-31-35(42)45-22-46-36(31)48)34(55)37(56)44-16-15-28(50)43-17-18-74-30(53)14-9-7-5-3-4-6-8-12-27(75-20-24(41)39(57)58)25(49)11-10-13-29(51)52/h3-9,12,22-27,32-34,38,49,54-55H,10-11,13-21,41H2,1-2H3,(H,43,50)(H,44,56)(H,51,52)(H,57,58)(H,62,63)(H,64,65)(H2,42,45,46)(H2,59,60,61)/p-4/b5-3-,6-4+,9-7+,12-8+/t24-,25+,26-,27-,32+,33+,34?,38-/m1/s1
............
394
CE6232_c
CCCCCC(C=CC1C2CC(C1CCCCCCC(=O)O)OO2)OO
InChI=1S/C20H34O6/c1-2-3-6-9-15(24-23)12-13-17-16(18-14-19(17)26-25-18)10-7-4-5-8-11-20(21)22/h12-13,15-19,23H,2-11,14H2,1H3,(H,21,22)/b13-12+/t15-,16-,17-,18+,19-/m0/s1
............
395
CE6232_r
CCCCCC(C=CC1C2CC(C1CCCCCCC(=O)O)OO2)OO
InChI=1S/C20H34O6/c1-2-3-6-9-15(24-23)12-13-17-16(18-14-19(17)26-25-18)10-7-4-5-8-11-2

CE6508_c
CCC=CCC=CCC=CCC=CC=CC(CCCC(=O)O)OO
InChI=1S/C20H30O4/c1-2-3-4-5-6-7-8-9-10-11-12-13-14-16-19(24-23)17-15-18-20(21)22/h3-4,6-7,9-10,12-14,16,19,23H,2,5,8,11,15,17-18H2,1H3,(H,21,22)
............
419
CE6508_n
CCC=CCC=CCC=CCC=CC=CC(CCCC(=O)O)OO
InChI=1S/C20H30O4/c1-2-3-4-5-6-7-8-9-10-11-12-13-14-16-19(24-23)17-15-18-20(21)22/h3-4,6-7,9-10,12-14,16,19,23H,2,5,8,11,15,17-18H2,1H3,(H,21,22)
............
420
CE7072_c
CC1=C(C=C2CCC(OC2=C1C)(C)CCC=C(C)CCC=C(C)CCC=C(C)C(=O)O)O
InChI=1S/C28H40O4/c1-19(12-8-14-21(3)27(30)31)10-7-11-20(2)13-9-16-28(6)17-15-24-18-25(29)22(4)23(5)26(24)32-28/h10,13-14,18,29H,7-9,11-12,15-17H2,1-6H3,(H,30,31)/b19-10+,20-13+,21-14+/t28-/m1/s1
............
421
CE7072_r
CC1=C(C=C2CCC(OC2=C1C)(C)CCC=C(C)CCC=C(C)CCC=C(C)C(=O)O)O
InChI=1S/C28H40O4/c1-19(12-8-14-21(3)27(30)31)10-7-11-20(2)13-9-16-28(6)17-15-24-18-25(29)22(4)23(5)26(24)32-28/h10,13-14,18,29H,7-9,11-12,15-17H2,1-6H3,(H,30,31)/b19-10+,20-13+,21-14+/t28-/m1/s1
............
422
CE7074_c
CC1=C(C=C2CCC(OC2

CN0012_x
CC(C)NP(=O)(NC(C)C)F
InChI=1S/C6H16FN2OP/c1-5(2)8-11(7,10)9-6(3)4/h5-6H,1-4H3,(H2,8,9,10)
............
453
G00005_c
CC(CCC=C(C)CCC=C(C)CCC=C(C)CCC=C(C)C)CCOP(=O)(O)OP(=O)(O)OC1C(C(C(C(O1)CO)OC2C(C(C(C(O2)CO)OC3C(C(C(C(O3)CO)O)O)O)O)NC(=O)C)O)NC(=O)C
InChI=1S/C47H82N2O22P2/c1-26(2)13-9-14-27(3)15-10-16-28(4)17-11-18-29(5)19-12-20-30(6)21-22-64-72(60,61)71-73(62,63)70-46-37(49-32(8)54)40(57)43(35(25-52)67-46)68-45-36(48-31(7)53)39(56)44(34(24-51)66-45)69-47-42(59)41(58)38(55)33(23-50)65-47/h13,15,17,19,30,33-47,50-52,55-59H,9-12,14,16,18,20-25H2,1-8H3,(H,48,53)(H,49,54)(H,60,61)(H,62,63)/b27-15+,28-17+,29-19-/t30?,33-,34-,35-,36-,37-,38-,39-,40-,41+,42+,43-,44-,45+,46-,47+/m1/s1
............
454
HC02027_l
CCCCCC=CCC=CCC=CCC=CCCCC(=O)OC1CCC2(C3CCC4(C(C3CC=C2C1)CCC4C(C)CCCC(C)C)C)C
InChI=1S/C47H76O2/c1-7-8-9-10-11-12-13-14-15-16-17-18-19-20-21-22-23-27-45(48)49-40-32-34-46(5)39(36-40)28-29-41-43-31-30-42(38(4)26-24-25-37(2)3)47(43,6)35-33-44(41)46/h11-12,14-15,17-18,20-21,28,37-38

c16dc_c
C[N+](C)(C)CC(CC(=O)[O-])OC(=O)CCCCCCCCCCCCCCC(=O)O
InChI=1S/C23H43NO6/c1-24(2,3)19-20(18-22(27)28)30-23(29)17-15-13-11-9-7-5-4-6-8-10-12-14-16-21(25)26/h20H,4-19H2,1-3H3,(H-,25,26,27,28)
............
481
c3dc_c
C[N+](C)(C)CC(CC(=O)[O-])OC(=O)CC(=O)O
InChI=1S/C10H17NO6/c1-11(2,3)6-7(4-8(12)13)17-10(16)5-9(14)15/h7H,4-6H2,1-3H3,(H-,12,13,14,15)/t7-/m0/s1
............
482
c4crn_c
CCCC(=O)OC(CC(=O)[O-])C[N+](C)(C)C
InChI=1S/C11H21NO4/c1-5-6-11(15)16-9(7-10(13)14)8-12(2,3)4/h9H,5-8H2,1-4H3
............
483
c4crn_e
CCCC(=O)OC(CC(=O)[O-])C[N+](C)(C)C
InChI=1S/C11H21NO4/c1-5-6-11(15)16-9(7-10(13)14)8-12(2,3)4/h9H,5-8H2,1-4H3
............
484
c4crn_m
CCCC(=O)OC(CC(=O)[O-])C[N+](C)(C)C
InChI=1S/C11H21NO4/c1-5-6-11(15)16-9(7-10(13)14)8-12(2,3)4/h9H,5-8H2,1-4H3
............
485
c4crn_x
CCCC(=O)OC(CC(=O)[O-])C[N+](C)(C)C
InChI=1S/C11H21NO4/c1-5-6-11(15)16-9(7-10(13)14)8-12(2,3)4/h9H,5-8H2,1-4H3
............
486
c51crn_c
CC=C(C)C(=O)OC(CC(=O)[O-])C[N+](C)(C)C
InChI=1S/C12H21NO4/c1-6-9(2)12(

damp_m
C1C(C(OC1N2C=NC3=C(N=CN=C32)N)COP(=O)(O)O)O
InChI=1S/C10H14N5O6P/c11-9-8-10(13-3-12-9)15(4-14-8)7-1-5(16)6(21-7)2-20-22(17,18)19/h3-7,16H,1-2H2,(H2,11,12,13)(H2,17,18,19)/t5-,6+,7+/m0/s1
............
519
damp_n
C1C(C(OC1N2C=NC3=C(N=CN=C32)N)COP(=O)(O)O)O
InChI=1S/C10H14N5O6P/c11-9-8-10(13-3-12-9)15(4-14-8)7-1-5(16)6(21-7)2-20-22(17,18)19/h3-7,16H,1-2H2,(H2,11,12,13)(H2,17,18,19)/t5-,6+,7+/m0/s1
............
520
dchac_e
CC(CCC(=O)O)C1CCC2C1(C(CC3C2CCC4C3(CCC(C4)O)C)O)C
InChI=1S/C24H40O4/c1-14(4-9-22(27)28)18-7-8-19-17-6-5-15-12-16(25)10-11-23(15,2)20(17)13-21(26)24(18,19)3/h14-21,25-26H,4-13H2,1-3H3,(H,27,28)/t14-,15-,16-,17+,18-,19+,20+,21+,23+,24-/m1/s1
............
521
dcholcoa_c
CC(CCC(=O)SCCNC(=O)CCNC(=O)C(C(C)(C)COP(=O)(O)OP(=O)(O)OCC1C(C(C(O1)N2C=NC3=C(N=CN=C32)N)O)OP(=O)(O)O)O)C4CCC5C4(CCC6C5C(CC7C6(CCC(C7)O)C)O)C
InChI=1S/C45H74N7O19P3S/c1-24(27-7-8-28-34-29(11-14-45(27,28)5)44(4)13-10-26(53)18-25(44)19-30(34)54)6-9-33(56)75-17-16-47-32(55)12-15-48-41(59)38(58)43(2,3)21-

digalsgalside_cho_c
C1=CC(=C(C=C1N(O)O)[N+](=O)[O-])N=N
InChI=1S/C6H6N4O4/c7-8-5-2-1-4(9(11)12)3-6(5)10(13)14/h1-3,7,11-12H
............
540
digalsgalside_cho_e
C1=CC(=C(C=C1N(O)O)[N+](=O)[O-])N=N
InChI=1S/C6H6N4O4/c7-8-5-2-1-4(9(11)12)3-6(5)10(13)14/h1-3,7,11-12H
............
541
dlnlcgcrn_c
CCCCCC=CCC=CCC=CCCCCC(=O)O
InChI=1S/C18H30O2/c1-2-3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-18(19)20/h6-7,9-10,12-13H,2-5,8,11,14-17H2,1H3,(H,19,20)/b7-6-,10-9-,13-12-
............
542
dlnlcgcrn_m
CCCCCC=CCC=CCC=CCCCCC(=O)O
InChI=1S/C18H30O2/c1-2-3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-18(19)20/h6-7,9-10,12-13H,2-5,8,11,14-17H2,1H3,(H,19,20)/b7-6-,10-9-,13-12-
............
543
doco13ac_e
CCCCCCCCC=CCCCCCCCCCCCC(=O)O
InChI=1S/C22H42O2/c1-2-3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-18-19-20-21-22(23)24/h9-10H,2-8,11-21H2,1H3,(H,23,24)/b10-9-
............
544
doco13ecoa_c
CCCCCCCCC=CCCCCCCCCCCCC(=O)SCCNC(=O)CCNC(=O)C(C(C)(C)COP(=O)(O)OP(=O)(O)OCC1C(C(C(O1)N2C=NC3=C(N=CN=C32)N)O)OP(=O)(O)O)O
InChI=1S/C43H76N7O1

fucgalacglcgalgluside_cho_g
CCCCCCCCCCCCCC=CC(C(COC1C(C(C(C(O1)CO)OC2C(C(C(C(O2)CO)OC3C(C(C(C(O3)CO)O)OC4C(C(C(C(O4)CO)O)O)OC5C(C(C(C(O5)C)O)O)O)NC(=O)C)O)O)O)O)NC=O)O
InChI=1S/C51H90N2O27/c1-4-5-6-7-8-9-10-11-12-13-14-15-16-17-27(60)26(52-23-58)22-71-48-41(69)38(66)44(30(20-56)75-48)78-50-42(70)39(67)43(31(21-57)76-50)77-47-32(53-25(3)59)45(35(63)29(19-55)73-47)79-51-46(37(65)34(62)28(18-54)74-51)80-49-40(68)36(64)33(61)24(2)72-49/h16-17,23-24,26-51,54-57,60-70H,4-15,18-22H2,1-3H3,(H,52,58)(H,53,59)/b17-16+/t24-,26-,27+,28+,29+,30+,31+,32+,33+,34-,35+,36+,37-,38+,39+,40-,41+,42+,43-,44+,45+,46+,47-,48+,49-,50-,51-/m0/s1
............
573
gal_c
C(C1C(C(C(C(O1)O)O)O)O)O
InChI=1S/C6H12O6/c7-1-2-3(8)4(9)5(10)6(11)12-2/h2-11H,1H2/t2-,3+,4+,5-,6+/m1/s1
............
574
gal_e
C(C1C(C(C(C(O1)O)O)O)O)O
InChI=1S/C6H12O6/c7-1-2-3(8)4(9)5(10)6(11)12-2/h2-11H,1H2/t2-,3+,4+,5-,6+/m1/s1
............
575
gal_l
C(C1C(C(C(C(O1)O)O)O)O)O
InChI=1S/C6H12O6/c7-1-2-3(8)4(9)5(10)6(11)12-2/h2-11H,1H2/t2-,3+,4+

hx2coa_x
CCCC=CC(=O)SCCNC(=O)CCNC(=O)C(C(C)(C)COP(=O)(O)OP(=O)(O)OCC1C(C(C(O1)N2C=NC3=C(N=CN=C32)N)O)OP(=O)(O)O)O
InChI=1S/C27H44N7O17P3S/c1-4-5-6-7-18(36)55-11-10-29-17(35)8-9-30-25(39)22(38)27(2,3)13-48-54(45,46)51-53(43,44)47-12-16-21(50-52(40,41)42)20(37)26(49-16)34-15-33-19-23(28)31-14-32-24(19)34/h6-7,14-16,20-22,26,37-38H,4-5,8-13H2,1-3H3,(H,29,35)(H,30,39)(H,43,44)(H,45,46)(H2,28,31,32)(H2,40,41,42)/b7-6+/t16-,20-,21-,22?,26-/m1/s1
............
613
hyptaur_c
C(CS(=O)O)N
InChI=1S/C2H7NO2S/c3-1-2-6(4)5/h1-3H2,(H,4,5)
............
614
iodine_c
II
InChI=1S/I2/c1-2
............
615
ivcrn_c
CC(C)CC(=O)OC(CC(=O)[O-])C[N+](C)(C)C
InChI=1S/C12H23NO4/c1-9(2)6-12(16)17-10(7-11(14)15)8-13(3,4)5/h9-10H,6-8H2,1-5H3
............
616
ivcrn_e
CC(C)CC(=O)OC(CC(=O)[O-])C[N+](C)(C)C
InChI=1S/C12H23NO4/c1-9(2)6-12(16)17-10(7-11(14)15)8-13(3,4)5/h9-10H,6-8H2,1-5H3
............
617
lac_L_c
CC(C(=O)O)O
InChI=1S/C3H6O3/c1-2(4)3(5)6/h2,4H,1H3,(H,5,6)/t2-/m0/s1
............
618
lac_L_e
CC(C(=O)O)O
InChI=

mn_c
CC(=O)NC1C(C(C(OC1O)CO)OC2C(C(C(C(O2)CO)O)O)O)O
InChI=1S/C14H25NO11/c1-4(18)15-7-9(20)12(6(3-17)24-13(7)23)26-14-11(22)10(21)8(19)5(2-16)25-14/h5-14,16-17,19-23H,2-3H2,1H3,(H,15,18)/t5-,6-,7-,8-,9-,10+,11+,12-,13-,14-/m1/s1
............
658
mn_l
CC(=O)NC1C(C(C(OC1O)CO)OC2C(C(C(C(O2)CO)O)O)O)O
InChI=1S/C14H25NO11/c1-4(18)15-7-9(20)12(6(3-17)24-13(7)23)26-14-11(22)10(21)8(19)5(2-16)25-14/h5-14,16-17,19-23H,2-3H2,1H3,(H,15,18)/t5-,6-,7-,8-,9-,10+,11+,12-,13-,14-/m1/s1
............
659
nac_c
C1=CC(=CN=C1)C(=O)O
InChI=1S/C6H5NO2/c8-6(9)5-2-1-3-7-4-5/h1-4H,(H,8,9)
............
660
nac_e
C1=CC(=CN=C1)C(=O)O
InChI=1S/C6H5NO2/c8-6(9)5-2-1-3-7-4-5/h1-4H,(H,8,9)
............
661
nad_m
C1=CC(=C[N+](=C1)C2C(C(C(O2)COP(=O)(O)OP(=O)(O)OCC3C(C(C(O3)N4C=NC5=C(N=CN=C54)N)O)O)O)O)C(=O)N
InChI=1S/C21H27N7O14P2/c22-17-12-19(25-7-24-17)28(8-26-12)21-16(32)14(30)11(41-21)6-39-44(36,37)42-43(34,35)38-5-10-13(29)15(31)20(40-10)27-3-1-2-9(4-27)18(23)33/h1-4,7-8,10-11,13-16,20-21,29-32H,5-6H2,(H5-,22,23,24,

onpthl_c
C1=CC=C2C3C(O3)C=CC2=C1
InChI=1S/C10H8O/c1-2-4-8-7(3-1)5-6-9-10(8)11-9/h1-6,9-10H
............
694
onpthl_e
C1=CC=C2C3C(O3)C=CC2=C1
InChI=1S/C10H8O/c1-2-4-8-7(3-1)5-6-9-10(8)11-9/h1-6,9-10H
............
695
orn_c
C(CC(C(=O)O)N)CN
InChI=1S/C5H12N2O2/c6-3-1-2-4(7)5(8)9/h4H,1-3,6-7H2,(H,8,9)/t4-/m0/s1
............
696
orot5p_c
C1=C(N(C(=O)NC1=O)C2C(C(C(O2)COP(=O)(O)O)O)O)C(=O)O
InChI=1S/C10H13N2O11P/c13-5-1-3(9(16)17)12(10(18)11-5)8-7(15)6(14)4(23-8)2-22-24(19,20)21/h1,4,6-8,14-15H,2H2,(H,16,17)(H,11,13,18)(H2,19,20,21)/t4-,6-,7-,8-/m1/s1
............
697
orot_c
C1=C(NC(=O)NC1=O)C(=O)O
InChI=1S/C5H4N2O4/c8-3-1-2(4(9)10)6-5(11)7-3/h1H,(H,9,10)(H2,6,7,8,11)
............
698
oxa_c
C(=O)(C(=O)O)O
InChI=1S/C2H2O4/c3-1(4)2(5)6/h(H,3,4)(H,5,6)
............
699
pa_cho_c
C(C(COP(=O)([O-])[O-])OC=O)OC=O
InChI=1S/C5H9O8P/c6-3-11-1-5(12-4-7)2-13-14(8,9)10/h3-5H,1-2H2,(H2,8,9,10)/p-2
............
700
pa_cho_n
C(C(COP(=O)([O-])[O-])OC=O)OC=O
InChI=1S/C5H9O8P/c6-3-11-1-5(12-4-7)2-13-14(8,9)10/h

prist_x
CC(C)CCCC(C)CCCC(C)CCCC(C)C(=O)O
InChI=1S/C19H38O2/c1-15(2)9-6-10-16(3)11-7-12-17(4)13-8-14-18(5)19(20)21/h15-18H,6-14H2,1-5H3,(H,20,21)
............
734
pristanal_c
CC(C)CCCC(C)CCCC(C)CCCC(C)C=O
InChI=1S/C19H38O/c1-16(2)9-6-10-17(3)11-7-12-18(4)13-8-14-19(5)15-20/h15-19H,6-14H2,1-5H3
............
735
progly_c
C1CC(NC1)C(=O)NCC(=O)O
InChI=1S/C7H12N2O3/c10-6(11)4-9-7(12)5-2-1-3-8-5/h5,8H,1-4H2,(H,9,12)(H,10,11)/t5-/m0/s1
............
736
progly_e
C1CC(NC1)C(=O)NCC(=O)O
InChI=1S/C7H12N2O3/c10-6(11)4-9-7(12)5-2-1-3-8-5/h5,8H,1-4H2,(H,9,12)(H,10,11)/t5-/m0/s1
............
737
retncoa_c
CC1=C(C(CCC1)(C)C)C=CC(=CC=CC(=CC(=O)SCCNC(=O)CCNC(=O)C(C(C)(C)COP(=O)(O)OP(=O)(O)OCC2C(C(C(O2)N3C=NC4=C(N=CN=C43)N)O)OP(=O)(O)O)O)C)C
InChI=1S/C41H62N7O17P3S/c1-25(13-14-28-27(3)12-9-16-40(28,4)5)10-8-11-26(2)20-31(50)69-19-18-43-30(49)15-17-44-38(53)35(52)41(6,7)22-62-68(59,60)65-67(57,58)61-21-29-34(64-66(54,55)56)33(51)39(63-29)48-24-47-32-36(42)45-23-46-37(32)48/h8,10-11,13-14,20,23-24,29,33-35,

tridcoa_m
CCCCCCCCCCCCC(=O)SCCNC(=O)CCNC(=O)C(C(C)(C)COP(=O)(O)OP(=O)(O)OCC1C(C(C(O1)N2C=NC3=C(N=CN=C32)N)O)OP(=O)(O)O)O
InChI=1S/C34H60N7O17P3S/c1-4-5-6-7-8-9-10-11-12-13-14-25(43)62-18-17-36-24(42)15-16-37-32(46)29(45)34(2,3)20-55-61(52,53)58-60(50,51)54-19-23-28(57-59(47,48)49)27(44)33(56-23)41-22-40-26-30(35)38-21-39-31(26)41/h21-23,27-29,33,44-45H,4-20H2,1-3H3,(H,36,42)(H,37,46)(H,50,51)(H,52,53)(H2,35,38,39)(H2,47,48,49)/t23-,27-,28-,29?,33-/m1/s1
............
774
trp_L_e
C1=CC=C2C(=C1)C(=CN2)CC(C(=O)O)N
InChI=1S/C11H12N2O2/c12-9(11(14)15)5-7-6-13-10-4-2-1-3-8(7)10/h1-4,6,9,13H,5,12H2,(H,14,15)/t9-/m0/s1
............
775
trp_L_l
C1=CC=C2C(=C1)C(=CN2)CC(C(=O)O)N
InChI=1S/C11H12N2O2/c12-9(11(14)15)5-7-6-13-10-4-2-1-3-8(7)10/h1-4,6,9,13H,5,12H2,(H,14,15)/t9-/m0/s1
............
776
txa2_c
CCCCCC(C=CC1C(C2CC(O2)O1)CC=CCCCC(=O)O)O
InChI=1S/C20H32O5/c1-2-3-6-9-15(21)12-13-17-16(18-14-20(24-17)25-18)10-7-4-5-8-11-19(22)23/h4,7,12-13,15-18,20-21H,2-3,5-6,8-11,14H2,1H3,(H,22,23)/b7-4-,13-1

In [4]:
sheet.update_google_sheet(sheet_met, metabolites)
print("Google Sheet updated.")

Google Sheet updated.


### 2.3 Identification of duplicated metabolites
The idea here is to add the metabolites from the reactions added from Recon 3D, at the same time that we do not over write data from our own Metabolites dataset.

In [None]:
from google_sheet import GoogleSheet

KEY_FILE_PATH = 'credentials.json'
SPREADSHEET_ID = '1MlBXeHIKw8k8fZyXm-sN__AHTRSunJxar_-bqvukZws'

# Initialize the GoogleSheet object
sheet = GoogleSheet(SPREADSHEET_ID, KEY_FILE_PATH)

# Read data from the Google Sheet
sheet_met = 'Metabolites'
sheet_rxns = 'Rxns'
shee_attributes = 'Attributes'

met = sheet.read_google_sheet(sheet_met)
rxns = sheet.read_google_sheet(sheet_rxns)
attributes = sheet.read_google_sheet(shee_attributes)

In [None]:
# Convert metabolites names to lower case and remove the compartment
met['Name'] = met['Name'].str.lower()
met_copy = met.copy()
met_copy['BiGG ID'] = met_copy['BiGG ID'].str[:-2]
met_copy

In [None]:
# Generate a list with duplicated metabolites

grouped = met_copy.groupby(['Name', 'Formula'])

# Initialize an empty dictionary to store the results
duplicated_metabolites = []

# Iterate over the grouped DataFrame
for (Name, Formula), group in grouped:
    # Check if the group has more than one element (i.e., duplicate) and filter out those metabolites whose names are unknown
    if group['BiGG ID'].nunique() > 1 and Name != 'bigg id not found in bigg':
        unique_ids = group['BiGG ID'].unique()
        duplicated_metabolites.append((Name, Formula, unique_ids))

        


In [None]:
len(duplicated_metabolites)

In [None]:
# Generate empty dict to store the existence of each duplicated metabolite in BiGG
duplicated_dict = {}


for metabolite in tqdm(duplicated_metabolites):
    duplicated_dict[metabolite[0]] = {}
    for big_id in metabolite[2]:
        time.sleep(1)
        # Check if the metabolite is in BiGG "OK" or not "NO"
        response = requests.get(f"http://bigg.ucsd.edu/universal/metabolites/{big_id}")
        if response.status_code == 200:
            duplicated_dict[metabolite[0]][big_id] = 'OK'
        else:
            duplicated_dict[metabolite[0]][big_id] ='NO'
        


In [None]:
duplicated_dict.pop('proton')
duplicated_dict

In [None]:
duplicated_dict

In [None]:
# Create a dictionary to store the 'OK' subkey for each key in duplicated_dict
ok_dict = {}

# Iterate over keys in duplicated_dict
for key in duplicated_dict:
    # Create an empty list to store 'NO' subkeys for this key
    no_list = []
    # Iterate over subkeys and values in sub-dictionary
    for subkey, value in duplicated_dict[key].items():
        # If the value is 'OK', save the subkey to a variable
        if value == 'OK':
            ok_dict[key] = subkey
        # If the value is 'NO', add the subkey to the list
        elif value == 'NO':
            no_list.append(subkey)
    # Replace all 'NO' subkeys with the 'OK' subkey for this key
    if key in ok_dict:
        ok_subkey = ok_dict[key]
        for no_subkey in no_list:
            met['BiGG ID'] = met['BiGG ID'].str.replace(no_subkey, ok_subkey)
            rxns['Reaction Formula'] = rxns['Reaction Formula'].str.replace(no_subkey, ok_subkey)
            attributes['Reaction Formula'] = attributes['Reaction Formula'].str.replace(no_subkey, ok_subkey)
    # Reset the 'ok_subkey' and 'no_subkey' variables at the end of each iteration over keys
    ok_dict[key] = None

In [None]:
# Store the original column order
column_order = met.columns.tolist()

# Group by 'BiGG ID' and keep the first non-null value in each group, then reset the index
met = met.groupby('BiGG ID').first().reset_index()

# Rearrange the columns to the original order
met = met[column_order]

met

In [None]:
# Update the Google Sheet with the modified DataFrame
sheet.update_google_sheet(sheet_rxns, rxns)
sheet.update_google_sheet(shee_attributes, attributes)
sheet.update_google_sheet(sheet_met, met)
print("Google Sheet updated.")

In [None]:
# Check for diferences between the metabolites in the "Rxns" and "Metabolites" Sheets

model = Model("iCHO")
lr = []
for _, row in rxns.iterrows():
    r = Reaction(row['Reaction'])
    lr.append(r)    
model.add_reactions(lr)

for i,r in enumerate(tqdm(model.reactions)):
    print(r.id)
    r.build_reaction_from_string(rxns['Reaction Formula'][i]) 
    
model_met_list = []
for m in model.metabolites:
    model_met_list.append(m.id)
    
sheet_met_list = list(met['BiGG ID'])

model = set(model_met_list)
sheet = set(sheet_met_list)

In [None]:
diff1 = model - sheet
print(f'Metabolites in the Rxns Sheet not present in the Metabolites Sheet:{list(diff1)}\n')


diff2 = sheet - model
print(f'Metabolites in the Metabolites Sheet not present in the Rxns Sheet:{list(diff2)}\n')

equal = (sheet == model)
if equal:
    print('Both sheets contains the same exactly metabolites')

In [None]:
# Pandas AI

In [None]:
import pandas as pd
from pandasai import PandasAI

# Sample DataFrame

# Instantiate a LLM
from pandasai.llm.openai import OpenAI
llm = OpenAI(api_token='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')

pandas_ai = PandasAI(llm, conversational=True)
pandas_ai.run(met, prompt='Plot a pie chart of all the compartments and the amount of metabolites in each compartment, using different colors for each bar')

In [None]:
pandas_ai = PandasAI(llm, conversational=True)
pandas_ai.run(met, prompt='How many metabolites are in the nuleus compartment?')

In [None]:
# Convert metabolites names to lower case and remove the compartment
met['Name'] = met['Name'].str.lower()
met_copy = met.copy()
met_copy['BiGG ID'] = met_copy['BiGG ID'].str[:-2]
met = met_copy.groupby('BiGG ID').first().reset_index()
met

In [None]:
pandas_ai = PandasAI(llm, conversational=False)
pandas_ai.run(met, prompt='Which metabolites better correlate?')

In [None]:
met

In [None]:
import pandas as pd

data = '''
Curated         BiGG ID   \n176                 M00056_m  \\\n193                 M00071_m   \n1014                CE2038_x   \n1352                CE4799_m   \n1360                CE4806_m   \n1361                CE4807_m   \n1876                CE5938_x   \n1982              leuktrB4_c   \n2531                M00056_m   \n2540                M00071_m   \n2916                M01191_m   \n2918                M01191_x   \n3019          xolest226_hs_l   \n3023          xolest205_hs_l   \n5636                M01191_x   \n5794                M01191_m   \n5795                M01191_x   \n6078              leuktrB4_c   \n7439                CE4799_m   \n7440                CE4807_m   \n7441                CE2038_x   \n7442                CE4806_m   \n7443                CE5938_x   \n8036    Than  xolest205_hs_l   \n8039    Than  xolest226_hs_l   \n\n                                                   Name         Formula   \n176                                   (2e)-nonenoyl-coa  C30H46N7O17P3S  \\\n193                                 (2e)-undecenoyl-coa  C32H50N7O17P3S   \n1014             trans-2,3-dehydropristanoyl coenzyme a  C40H66N7O17P3S   \n1352          2,6-dimethyl-trans-2-heptenoyl coenzyme a  C30H46N7O17P3S   \n1360        4(r),8-dimethyl-trans-2-nonenoyl coenzyme a  C32H50N7O17P3S   \n1361              4-methyl-trans-2-pentenoyl coenzyme a  C27H40N7O17P3S   \n1876    (4r,8r,12r)-trimethyl-2e-tridecenoyl coenzyme a  C37H60N7O17P3S   \n1982     5,12-dihydroxy-6,8,10,14-eicosatetraenoic acid        C20H31O4   \n2531                           (2e)-nonenoyl coenzyme a  C30H46N7O17P3S   \n2540                         (2e)-undecenoyl coenzyme a  C32H50N7O17P3S   \n2916                         7z-hexadecenoyl coenzyme a  C37H60N7O17P3S   \n2918                         7z-hexadecenoyl coenzyme a  C37H60N7O17P3S   \n3019  cholesteryl docosahexanoate, cholesterol-ester...        C49H76O2   \n3023  1-timnodnoyl-cholesterol, cholesterol-ester (2...        C47H74O2   \n5636                         7z-hexadecenoyl coenzyme a  C37H60N7O17P3S   \n5794                         7z-hexadecenoyl coenzyme a  C37H60N7O17P3S   \n5795                         7z-hexadecenoyl coenzyme a  C37H60N7O17P3S   \n6078                                 leukotriene b4(1-)        C20H31O4   \n7439                 2,6-dimethyl-trans-2-heptenoyl-coa  C30H46N7O17P3S   \n7440                     4-methyl-trans-2-pentenoyl-coa  C27H40N7O17P3S   \n7441                    trans-2,3-dehydropristanoyl-coa  C40H66N7O17P3S   \n7442               4(r),8-dimethyl-trans-2-nonenoyl-coa  C32H50N7O17P3S   \n7443         (4r,8r,12r)-trimethyl-(2e)-tridecenoyl-coa  C37H60N7O17P3S   \n8036  1-timnodnoyl-cholesterol, cholesterol-ester (2...        C47H74O2   \n8039  cholesteryl docosahexanoate, cholesterol-ester...        C49H76O2   \n\n                    Compartment  KEGG  CHEBI   PubChem   \n176            m - mitochondria  None   None      None  \\\n193            m - mitochondria                          \n1014  x - peroxisome/glyoxysome        63803  56927963   \n1352           m - mitochondria                          \n1360           m - mitochondria                          \n1361           m - mitochondria                          \n1876  x - peroxisome/glyoxysome               53481434   \n1982                c - cytosol  None   None      None   \n2531           m - mitochondria  None   None      None   \n2540           m - mitochondria                          \n2916           m - mitochondria  None   None      None   \n2918  x - peroxisome/glyoxysome  None   None      None   \n3019               l - lysosome  None   None      None   \n3023               l - lysosome  None   None      None   \n5636  x - peroxisome/glyoxysome  None   None      None   \n5794           m - mitochondria  None   None      None   \n5795  x - peroxisome/glyoxysome  None   None      None   \n6078                c - cytosol        15647   5280492   \n7439           m - mitochondria                          \n7440           m - mitochondria                          \n7441  x - peroxisome/glyoxysome  None   None      None   \n7442           m - mitochondria                          \n7443  x - peroxisome/glyoxysome  None   None      None   \n8036               l - lysosome               53477889   \n8039               l - lysosome               14274978   \n\n                                                  
...'''

# Split the data into lines
lines = data.split('\n')[1:]  # The first line is empty

# Split each line into fields
lines = [line.split() for line in lines]

# Create a DataFrame
df = pd.DataFrame(lines, columns=['Curated', 'BiGG ID', 'Name', 'Formula', 'Compartment', 'KEGG', 'CHEBI', 'PubChem'])


In [None]:
df