# Metabolomics Data Inspection

By Garrett Roell and Christina Schenk

Tested on biodesign_3.7 kernel on jprime

This notebook gets metabolite data from the model, and uses it to attempt to match unknown metabolites in LCMS data.                                          
### Method: 
<ol>
<li>Set up imports</li>
<li>Load model and relevant data</li>
<li>Extract metabolite data from the genome scale mode</li>
<li>Check for matches between model metabolites and LCMS data</li>
</ol>


### 1. Set up imports

In [1]:
import cobra
import pandas as pd

### 2. Load model and relevant data

In [2]:
model = cobra.io.read_sbml_model("../models/r_opacus_annotated_curated.xml")

# load LCMS intracellular data (Can use other file names to get other LCMS data sets)
lcms_df = pd.read_csv('../data/metabolomics/LCMS_extracellular_metabolites_positive.csv')
lcms_df.head(2)

Unnamed: 0,m/z,RT [min],Name,Tags,Foston_Ex_1-2 (F6),Foston_Ex_1-9 (F7),Foston_Ex_1-10 (F8),Foston_Ex_1-11 (F9),Foston_Ex_1-12 (F10),Foston_Ex_1-13 (F11),...,Foston_Ex_4-52 (F73),Foston_Ex_4-53 (F74),Foston_Ex_4-54 (F75),Foston_Ex_4-55 (F76),Foston_Ex_4-56 (F77),Foston_Ex_4-57 (F78),Foston_Ex_4-58 (F79),Foston_Ex_4-59 (F80),Foston_Ex_4-60 (F81),Foston_Ex_4-64 (F82)
0,113.03449,3.109,Uracil,Confirmed ID (HIgh Confidence),24301.7878,27544.39,27719.69,28568.39,25825.51,27127.06,...,39332.3,37647.01,35963.52,26299.89,23757.63,79872.54,23541.28,22358.29,22917.143,25707.82
1,148.06024,5.053,O-Acetyl-DL-serine,Confirmed ID (HIgh Confidence),82578.31694,34037510.0,27341870.0,48123230.0,9047609.0,19434220.0,...,12145360.0,9120849.0,19831000.0,11413260.0,5268308.0,45010330.0,4042393.0,1120618.0,2237182.321,4857366.0


### 3. Extract metabolite data from the genome scale model

In [3]:
# create a list to hold metabolite data
row_data = []

# loop over the metabolites in the model
for m in model.metabolites:
    
    # get MetaNetX id if present
    if 'metanetx.chemical' in m.annotation.keys():
        metanetx_id = m.annotation['metanetx.chemical']
        url = f'https://www.metanetx.org/chem_info/{metanetx_id}'
        
        # metanetx_df = pd.read_html(url, flavor='bs4')
        # print(metanetx_df)
    else:
        metanetx_id = ''
    
    # get KEGG id if present
    if 'kegg.compound' in m.annotation.keys():
        kegg_id = m.annotation['kegg.compound']
    else:
        kegg_id = ''
        
    # create a dictionary for each metabolite's information
    row_data.append({
        "formula_molecular_weight": m.formula_weight,
        "name": m.name,
        "formula": m.formula,
        "metabolite_id": m.id,
        "metanetx_id": metanetx_id,
        "kegg_id": kegg_id,
    })
    
# convert the row data into a data frame
metabolite_df = pd.DataFrame(row_data)

# sort by molecular weight
metabolite_df.sort_values(by=['formula_molecular_weight'], inplace=True)

metabolite_df.head(5)

Unnamed: 0,formula_molecular_weight,name,formula,metabolite_id,metanetx_id,kegg_id
1955,0.0,Plastoquinol,,pqh2_p,,
1953,0.0,Ferrocytochrome c6,,focytc6_p,,
1952,0.0,Ferricytochrome c6,,ficytc6_p,,
1954,0.0,Plastoquinone,,pq_p,,
1131,1.00794,H+,H,h_c,MNXM1,C00080


Save metabolite data from the model as a csv

In [4]:
metabolite_df.to_csv('../data/metabolomics/model_metabolites.csv', index=False, header=True)

### 4. Check for matches between model metabolites and LCMS data

In [5]:
# define a helper function to get the model metabilte data from a given molecular weight
def molecular_weight_to_metabolite_data(molecular_weight):

    # keep track of the closest mass distance between the given 
    # molecular weight and model metabolite's molecular weight
    minimum_mass_difference = 1000
    
    # define an arbitrary closest metabolite
    closest_molecular_weight_data = metabolite_df[0]
    
    # loop over metabolite data
    for _, row in metabolite_df.iterrows():
        
        # check if this metabolite is the closest in mass to the given molecular weight
        if abs(row.formula_molecular_weight - molecular_weight) < minimum_mass_difference:
            # if so, the update the data for the the closest metabolite and the min mass distance
            closest_molecular_weight_data = row
            minimum_mass_difference = abs(row.formula_molecular_weight - molecular_weight)

    # return the data from the metabolite with the closest molecular weight
    return closest_molecular_weight_data

# a testing function
# molecular_weight_to_metabolite_data(148.06024)

In [6]:
lcms_df[['m/z', 'RT [min]', 'Name', 'Tags', 'Foston_Ex_1-2 (F6)']].head(5)

Unnamed: 0,m/z,RT [min],Name,Tags,Foston_Ex_1-2 (F6)
0,113.03449,3.109,Uracil,Confirmed ID (HIgh Confidence),24301.79
1,148.06024,5.053,O-Acetyl-DL-serine,Confirmed ID (HIgh Confidence),82578.32
2,162.07599,4.449,N-Methyl-L-Glutamic acid,Confirmed ID (HIgh Confidence),13866.73
3,124.03935,3.352,Nicotinic acid/Niacin,Confirmed ID (HIgh Confidence),1735242.0
4,190.07092,4.881,N-Acetyl-DL-glutamic acid,Confirmed ID (HIgh Confidence),9956.026


In [7]:
# loop over metabolites that have LCMS measurements
#for _, row in lcms_df.iterrows():
#     molecular_weight = row['m/z']
#     print(row.Name, molecular_weight_to_metabolite_data(molecular_weight))