# Automated Multiple Reaction Monitoring (MRM)-profiling and Ozone Electrospray Ionizaton (OzESI)-MRM Informatics Platform for High-throughput Lipidomics


In this jupyter notebook you will automate the data analysis of the lipidome. This is a challenging problem to perform manually due to the diverse nature of lipids and the many potential isomers. In this notebook you will analyze mzML files containing data from lipid MRMs, with ozone off and ozone on. The goal is to identify possible double-bond locations in a lipid, in this case a TAG (triacylglycerols).

In [1]:
from IPython.display import Image

![title](Figures/agilent_lcms.png)

The examples shown here were run on an Agilent 6495C Triple Quadrupole LC/MS (example shown above) that has been connected to an ozone line (not shown in picture) for ozoneolysis of lipids.

![title](Figures/TAG_example.png)
Here is an example of a TAG. Notice how many possibilities there are for locations of one double-bond there could be and how convoluted the analysis can become! This image is obtained from LipidMaps.org

Import all necessary libraries

In [2]:
#Import all the necessary libraries
import pymzml
import csv
import os
import pandas as pd
import numpy as np
import math
from matplotlib import pyplot as plt
import re
import plotly.express as px

No module named 'ms_deisotope._c.averagine' averagine
No module named 'ms_deisotope._c.scoring'
No module named 'ms_deisotope._c.deconvoluter_base'
No module named 'ms_deisotope._c.deconvoluter_base'
No module named 'ms_deisotope._c.deconvoluter_base'


MAKE CLASSES FOR EACH LIPID

In [68]:
# lipid_types = ["CE","TAG","CER","FFA","PC","PE","PG","PI","SM","AC"]
# database_path = "lipid_database/Lipid_Database.xlsx"

# #loop through all sheets in SUPPLE_2.XLS and make a df of Compound Name, Parent Ion, and Product Ion
# mrm_list_new = pd.read_excel('lipid_database/Lipid_Database.xlsx', sheet_name = None)
# mrm_list_new = pd.concat(mrm_list_new, ignore_index=True)
# mrm_list_offical = mrm_list_new[['Compound Name', 'Parent Ion', 'Product Ion', 'Class']]
# #Add underscore to middle of columns names
# mrm_list_offical.columns = mrm_list_offical.columns.str.replace(' ', '_')
# #round Parent Ion and Product Ion to 1 decimal place
# mrm_list_offical['Parent_Ion'] = np.floor(mrm_list_offical['Parent_Ion'].round(1))
# mrm_list_offical['Product_Ion'] = np.floor(mrm_list_offical['Product_Ion'].round(1))
# #create transition column by combining Parent Ion and Product Ion with arrow between numbers
# mrm_list_offical['Transition'] = mrm_list_offical['Parent_Ion'].astype(str) + ' -> ' + mrm_list_offical['Product_Ion'].astype(str)
# #change column compound name to lipid
# mrm_list_offical = mrm_list_offical.rename(columns={'Compound_Name': 'Lipid'})
# #make a column called Class match lipid column to lipid types



# pd.set_option('display.max_rows', None)
# print(mrm_list_offical.head(None))


In [3]:
##COnnor Edit
# def read_mrm_list(filename):
#     mrm_list_new = pd.read_excel(filename, sheet_name=None)
#     mrm_list_new = pd.concat(mrm_list_new, ignore_index=True)
#     mrm_list_offical = mrm_list_new[['Compound Name', 'Parent Ion', 'Product Ion', 'Class']]
#     # Add underscore to middle of columns names
#     mrm_list_offical.columns = mrm_list_offical.columns.str.replace(' ', '_')
#     # Round Parent Ion and Product Ion to 1 decimal place
#     mrm_list_offical['Parent_Ion'] = np.floor(mrm_list_offical['Parent_Ion'].round(1))
#     mrm_list_offical['Product_Ion'] = np.floor(mrm_list_offical['Product_Ion'].round(1))
#     # Create transition column by combining Parent Ion and Product Ion with arrow between numbers
#     mrm_list_offical['Transition'] = mrm_list_offical['Parent_Ion'].astype(str) + ' -> ' + mrm_list_offical['Product_Ion'].astype(str)
#     # Change column compound name to lipid
#     mrm_list_offical = mrm_list_offical.rename(columns={'Compound_Name': 'Lipid'})
#     # Make a column called Class match lipid column to lipid types
#     return mrm_list_offical


def read_mrm_list(filename):
    mrm_list_new = pd.read_excel(filename, sheet_name=None)
    mrm_list_new = pd.concat(mrm_list_new, ignore_index=True)
    mrm_list_offical = mrm_list_new[['Compound Name', 'Parent Ion', 'Product Ion', 'Class']]
    # Add underscore to middle of columns names
    mrm_list_offical.columns = mrm_list_offical.columns.str.replace(' ', '_')
    # Round Parent Ion and Product Ion to 1 decimal place
    mrm_list_offical['Parent_Ion'] = np.round(mrm_list_offical['Parent_Ion'],1)
    mrm_list_offical['Product_Ion'] = np.round(mrm_list_offical['Product_Ion'],1)
    # Create transition column by combining Parent Ion and Product Ion with arrow between numbers
    mrm_list_offical['Transition'] = mrm_list_offical['Parent_Ion'].astype(str) + ' -> ' + mrm_list_offical['Product_Ion'].astype(str)
    # Change column compound name to lipid
    mrm_list_offical = mrm_list_offical.rename(columns={'Compound_Name': 'Lipid'})
    # Make a column called Class match lipid column to lipid types
    return mrm_list_offical

mrm_database = read_mrm_list('lipid_database/Lipid_Database.xlsx')
mrm_database.tail()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  mrm_list_offical['Parent_Ion'] = np.round(mrm_list_offical['Parent_Ion'],1)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  mrm_list_offical['Product_Ion'] = np.round(mrm_list_offical['Product_Ion'],1)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  mrm_list_offical['Transition'] = mrm_list_offical['

Unnamed: 0,Lipid,Parent_Ion,Product_Ion,Class,Transition
3264,STD_15:0-18:1(d7) PI (NH4 Salt),847.6,570.4,STD_15:0-18:1(d7) PI (NH4 Salt),847.6 -> 570.4
3265,STD_15:0-18:1(d7) PS (Na Salt),755.5,570.4,STD_15:0-18:1(d7) PS (Na Salt),755.5 -> 570.4
3266,STD_15:0-18:1(d7)-15:0 TAG,829.8,570.4,STD_15:0-18:1(d7)-15:0 TAG,829.8 -> 570.4
3267,STD_18:1(d7) Chol Ester,675.6,369.1,STD_18:1(d7) Chol Ester,675.6 -> 369.1
3268,STD_d18:1-18:1(d9) SM,738.6,184.1,STD_d18:1-18:1(d9) SM,738.6 -> 184.1


In [4]:
list_of_lipid_classes = mrm_database['Class'].unique()
print(list_of_lipid_classes)

['PC' 'PE' 'SM' 'Cer' 'CAR' 'TAG' 'DAG' 'PS' 'PI' 'PG' 'CE' 'FA'
 'STD 15:0-18:1-d7 DG' 'STD 18:1 (d7) Lyso PC' 'STD 18:1 (d7) Lyso PE'
 'STD 18:1(d7) MAG' 'STD C15 ceramide-D7' 'STD_15:0-18:1(d7) PC'
 'STD_15:0-18:1(d7) PE' 'STD_15:0-18:1(d7) PG (Na Salt)'
 'STD_15:0-18:1(d7) PI (NH4 Salt)' 'STD_15:0-18:1(d7) PS (Na Salt)'
 'STD_15:0-18:1(d7)-15:0 TAG' 'STD_18:1(d7) Chol Ester'
 'STD_d18:1-18:1(d9) SM']


Load mzML file and convert to pandas dataframe and csv file. |
Columns = Q1, Q3, Intensity, Transition, Lipid, Class  |
Parsed data is also stored as csv file in data_csv

In [4]:
###COnnor Edit


#Create for loop to load all mzml files from the data folder into the run object from pymzml reader function and store in pandas dataframe
#Create empty dictionary to store all the data
df_OzESI = pd.DataFrame(columns=['Lipid','Parent_Ion','Product_Ion','Intensity','Retention_Time','Transition','Class','Sample_ID'])
###
# OzESI_time = {}
###

data_folder = os.listdir('./data_mzml/liver_LD/') #Path to the mzml files
path_to_mzml_files = './data_mzml/liver_LD/'
#data_dict = {} #Empty dictionary to store all the data
df = pd.DataFrame(columns=['Lipid','Parent_Ion','Product_Ion','Intensity','Transition','Class','Sample_ID'])
#Create a similar for loop, except store all data in a single pandas dataframe
df_all = pd.DataFrame(columns=['Lipid','Parent_Ion','Product_Ion','Intensity','Transition','Class','Sample_ID']) #Create empty pandas dataframe to store the data
#df_all = pd.DataFrame(columns=['Q1','Q3','Intensity','Transition','Lipid','Class']) #Create empty pandas dataframe to store the data




##My edit
for file in data_folder:
        if file.endswith('.mzML'):
                print(file)
                run = pymzml.run.Reader(path_to_mzml_files+file, skip_chromatogram=False) #Load the mzml file into the run object
                print('Spectrum # = ',run.get_spectrum_count())
                print('Chromatogram # =',run.get_chromatogram_count())


                
                #create pandas dataframe to store the data with the columns Parent Ion, Product Ion, Intensity, Transition Lipid and Class
                #df_sample = pd.DataFrame(columns=['Parent_Ion','Product_Ion','Intensity','Transition','Lipid','Class']) #Create empty pandas dataframe to store the data
                #df_sample = pd.DataFrame(columns=['Q1','Q3','Intensity','Transition','Lipid','Class']) #Create empty pandas dataframe to store the data
                q1_mz = 0 #Create empty variables to store the Q1 and Q3 m/z values
                q3_mz = 0
                count = 0 #Create a counter to keep track of the number of transitions
                for spectrum in run:
                        
                        ###
                        # if isinstance(spectrum,pymzml.spec.Chromatogram):
                        #         for time, intensity in spectrum.peaks():
                        #                 print(time, intensity)
                        #                 OzESI_time[time] = intensity
                        #         # OzESI_time.append(time_list)
                        ###

                        for element in spectrum.ID.split(' '):
                                # print('element',element)
                                intensity_store = np.array([])
                                if 'Q1' in element:
                                        #print('Q1',element)
                                        q1 = element.split('=')
                                        #print('q1',q1[1])
                                        q1_mz= np.round((float(q1[1])),1)
                                        # print('q1',q1)
                                
                                if 'Q3' in element:
                                        #print('Q3',element)
                                        q3 = element.split('=')
                                        #print('q3',q3[1])
                                        q3_mz=np.round(float(q3[1]),1)
                                        # print('q3',q3)
                                        # df_sample.loc[count,'Q1'] = q1_mz
                                        # df_sample.loc[count,'Q3'] = q3_mz
                                        
                                        for mz,intensity in spectrum.peaks(): #Get the m/z and intensity values from the spectrum
                                                intensity_store = np.append(intensity_store,intensity) #Store the intensity values in an array
                                                
                        
                                
                                if 'Q3' in element:
                                        # print(intensity_sum)
                                        intensity_sum = np.sum(intensity_store) #Sum the intensity values
                                        df_all.loc[count,'Parent_Ion'] = q1_mz #Store the Q1 and Q3 m/z values in the pandas dataframe
                                        df_all.loc[count,'Product_Ion'] = q3_mz
                                        #round the Q1 and Q3 m/z values to 1 decimal places
                                        df_all.loc[count,'Parent_Ion'] = np.round(df_all.loc[count,'Parent_Ion'],1)
                                        df_all.loc[count,'Product_Ion'] = np.round(df_all.loc[count,'Product_Ion'],1)
                                        df_all.loc[count,'Intensity'] = intensity_sum #Store the intensity values in the pandas dataframe
                                        df_all.loc[count,'Transition'] = str(q1_mz)+ ' -> '+ str(q3_mz) #Store the transition values in the pandas dataframe
                                        #add file name to Sample_ID column without the mzmL extension
                                        df_all.loc[count,'Sample_ID'] = file[:-5]
                                        count+=1
        #append df_all to df_all2
        df = df.append(df_all, ignore_index=True)
df.tail(5) 


TAG161_equisplash_033123_r001.mzML
Spectrum # =  None
Chromatogram # = 15
DG-161_blank_033123_r004.mzML


  df = df.append(df_all, ignore_index=True)


Spectrum # =  None
Chromatogram # = 98
TAG204_FAD131-5xFAD-M2liver_033123_r006.mzML


  df = df.append(df_all, ignore_index=True)


Spectrum # =  None
Chromatogram # = 143
TAG140_FAD173-5xFAD-M1liver_033123_r005.mzML


  df = df.append(df_all, ignore_index=True)


Spectrum # =  None
Chromatogram # = 185


  df = df.append(df_all, ignore_index=True)


DG-161_DOD73-5xFAD-M2liver_033123_r008.mzML
Spectrum # =  None
Chromatogram # = 98
TAG160_FAD131-5xFAD-M2liver_033123_r006.mzML
Spectrum # =  None
Chromatogram # = 185
FFA_FAD131-5xFAD-M2liver_033123_r006.mzML
Spectrum # =  None
Chromatogram # = 182
CER_blank_033123_r004.mzML
Spectrum # =  None
Chromatogram # = 167
DG-180_FAD131-5xFAD-M2liver_033123_r006.mzML
Spectrum # =  None
Chromatogram # = 109
PE_FAD131-5xFAD-M1liver_033123_r009.mzML
Spectrum # =  None
Chromatogram # = 150
TAG181_equisplash_033123_r001.mzML
Spectrum # =  None
Chromatogram # = 15
TAG225_FAD173-5xFAD-M1liver_033123_r005.mzML
Spectrum # =  None
Chromatogram # = 131
TAG204_DOD73-5xFAD-M2liver_033123_r008.mzML
Spectrum # =  None
Chromatogram # = 143
CER_FAD131-5xFAD-M4liver_033123_r007.mzML
Spectrum # =  None
Chromatogram # = 167
PG_FAD131-5xFAD-M4liver_033123_r007.mzML
Spectrum # =  None
Chromatogram # = 145
TAG183_equisplash_033123_r001.mzML
Spectrum # =  None
Chromatogram # = 15
TAG161_FAD131-5xFAD-M2liver_033123_r0

  df = df.append(df_all, ignore_index=True)


DG-161_FAD173-5xFAD-M1liver_033123_r005.mzML
Spectrum # =  None
Chromatogram # = 98
DG-161_equisplash_033123_r001.mzML
Spectrum # =  None
Chromatogram # = 15
DG-182_FAD131-5xFAD-M4liver_033123_r007.mzML
Spectrum # =  None
Chromatogram # = 88
TAG161_FAD131-5xFAD-M1liver_033123_r009.mzML
Spectrum # =  None
Chromatogram # = 173
TAG181_FAD173-5xFAD-M1liver_033123_r005.mzML
Spectrum # =  None
Chromatogram # = 173
DG-181_FAD131-5xFAD-M1liver_033123_r009.mzML
Spectrum # =  None
Chromatogram # = 98
PI_blank_033123_r004.mzML
Spectrum # =  None
Chromatogram # = 287
AC_FAD173-5xFAD-M1liver_033123_r005.mzML
Spectrum # =  None
Chromatogram # = 87
TAG161_DOD73-5xFAD-M2liver_033123_r008.mzML
Spectrum # =  None
Chromatogram # = 173
TAG204_FAD131-5xFAD-M1liver_033123_r009.mzML
Spectrum # =  None
Chromatogram # = 143
DG-160_FAD173-5xFAD-M1liver_033123_r005.mzML
Spectrum # =  None
Chromatogram # = 109
DG-181_equisplash_033123_r001.mzML
Spectrum # =  None
Chromatogram # = 15
TAG183_blank_033123_r004.mzML


TAG180_FAD131-5xFAD-M1liver_033123_r009.mzML
Spectrum # =  None
Chromatogram # = 185
DG-180_FAD173-5xFAD-M1liver_033123_r005.mzML
Spectrum # =  None
Chromatogram # = 109
TAG180_equisplash_033123_r001.mzML
Spectrum # =  None
Chromatogram # = 15
TAG204_blank_033123_r004.mzML
Spectrum # =  None
Chromatogram # = 143
CE_equisplash_033123_r001.mzML
Spectrum # =  None
Chromatogram # = 15
DG-180_blank_033123_r004.mzML
Spectrum # =  None
Chromatogram # = 109
PI_equisplash_033123_r001.mzML
Spectrum # =  None
Chromatogram # = 15
TAG182_equisplash_033123_r001.mzML
Spectrum # =  None
Chromatogram # = 15
CE_blank_033123_r004.mzML
Spectrum # =  None
Chromatogram # = 38
DG-181_blank_033123_r004.mzML
Spectrum # =  None
Chromatogram # = 98
TAG180_FAD131-5xFAD-M4liver_033123_r007.mzML
Spectrum # =  None
Chromatogram # = 185
TAG183_DOD73-5xFAD-M2liver_033123_r008.mzML
Spectrum # =  None
Chromatogram # = 153
CE_FAD173-5xFAD-M1liver_033123_r005.mzML
Spectrum # =  None
Chromatogram # = 38
PS_equisplash_03312

Unnamed: 0,Lipid,Parent_Ion,Product_Ion,Intensity,Transition,Class,Sample_ID
44042,,992.7,715.7,4189.660347,992.7 -> 715.7,,PI_FAD131-5xFAD-M4liver_033123_r007
44043,,994.7,717.7,5518.28038,994.7 -> 717.7,,PI_FAD131-5xFAD-M4liver_033123_r007
44044,,994.7,717.7,4096.040291,994.7 -> 717.7,,PI_FAD131-5xFAD-M4liver_033123_r007
44045,,996.7,719.7,4642.240368,996.7 -> 719.7,,PI_FAD131-5xFAD-M4liver_033123_r007
44046,,996.7,719.7,4708.040348,996.7 -> 719.7,,PI_FAD131-5xFAD-M4liver_033123_r007


In [5]:
df.tail(5) #Print the pandas dataframe
df['Lipid'] = np.nan
df['Class'] = np.nan
len(df)# 44047 before drop nan ##AFTER nan 44047
# len(df["Transition"])
# len(df["Sample_ID"])
# #len(set(df["Transition"]))
# len(set(df["Sample_ID"]))

44047

Load MRM transitions from csv file to pandas dataframe. This list will be used to identify the possible lipids in our sample.

In [None]:
# #Match Ions in df to mrm_database and append Lipid and Class columns to df
# for index in range(len(df)):
#     for row in range(len(mrm_database)):
#         if mrm_database.loc[row,'Parent_Ion'] == df.loc[index,'Parent_Ion'] and mrm_database.loc[row,'Product_Ion'] == df.loc[index,'Product_Ion']:
#             df.loc[index,'Lipid'] = mrm_database.loc[row,'Lipid']
#             df.loc[index,'Class'] = mrm_database.loc[row,'Class']

# df_matching = df.dropna() #drop rows with NaN values
            

In [8]:
# mrm_database['Parent_Ion'] = mrm_database['Parent_Ion'].str.strip()
# mrm_database['Product_Ion'] = mrm_database['Product_Ion'].str.strip()
# df['Parent_Ion'] = df['Parent_Ion'].str.strip()
# df['Product_Ion'] = df['Product_Ion'].str.strip()

mrm_database['Parent_Ion'] = mrm_database['Parent_Ion'].astype(float)
mrm_database['Product_Ion'] = mrm_database['Product_Ion'].astype(float)
df['Parent_Ion'] = df['Parent_Ion'].astype(float)
df['Product_Ion'] = df['Product_Ion'].astype(float)

ion_dict = {}
for index, row in mrm_database.iterrows():
    ion_dict[(row['Parent_Ion'], row['Product_Ion'])] = (row['Lipid'], row['Class'])

def match_ions(row):
    ions = (row['Parent_Ion'], row['Product_Ion'])
    if ions in ion_dict:
        row['Lipid'], row['Class'] = ion_dict[ions]
    return row

df_matched = df.apply(match_ions, axis=1)
df_matching = df_matched.dropna()#
len(df_matching)

42820

In [18]:
from collections import defaultdict

ion_dict = defaultdict(list)
for index, row in mrm_database.iterrows():
    ion_dict[(row['Parent_Ion'], row['Product_Ion'])].append((row['Lipid'], row['Class']))



def match_ions(row):
    ions = (row['Parent_Ion'], row['Product_Ion'])
    if ions in ion_dict:
        row['Lipid'], row['Class'] = zip(*ion_dict[ions])
    return row

df = df.apply(match_ions, axis=1)

df_matching = df_matched.dropna()#
len(df_matching)

42820

In [24]:
from collections import defaultdict

tolerance = 0.01  # Adjust the tolerance value according to your needs

ion_dict = defaultdict(list)
for index, row in mrm_database.iterrows():
    ion_dict[(row['Parent_Ion'], row['Product_Ion'])].append((row['Lipid'], row['Class']))

def within_tolerance(a, b):
    return abs(a - b) <= tolerance

def match_ions(row):
    ions = (row['Parent_Ion'], row['Product_Ion'])
    matched_lipids = []
    matched_classes = []
    
    for key, value in ion_dict.items():
        if within_tolerance(ions[0], key[0]) and within_tolerance(ions[1], key[1]):
            matched_lipids.extend([match[0] for match in value])
            matched_classes.extend([match[1] for match in value])
    
    if matched_lipids and matched_classes:
        row['Lipid'] = ':'.join(matched_lipids)
        row['Class'] = ':'.join(matched_classes)
    
    return row

df = df.apply(match_ions, axis=1)

df_matching = df.dropna()
len(df_matching)

44041

In [25]:
len(df)

44047

In [26]:
df_matching.to_csv("DF_Nandropped_Liver_LD_newest_with matching multiple lipids.csv")

In [31]:
# create a dictionary to store the matching parent/product ion pairs and their corresponding lipids and classes
# ion_dict = {}
# for index, row in mrm_database.iterrows():
#     ion_dict[(row['Parent_Ion'], row['Product_Ion'])] = (row['Lipid'], row['Class'])

# # create empty columns for Lipid and Class
# #df['Lipid'] = np.nan
# #df['Class'] = np.nan

# # loop through the rows in df and check if the corresponding parent/product ion pair exists in ion_dict
# for index, row in df.iterrows():
#     if (row['Parent_Ion'], row['Product_Ion']) in ion_dict:
#         df.at[index, 'Lipid'] = ion_dict[(row['Parent_Ion'], row['Product_Ion'])][0]
#         df.at[index, 'Class'] = ion_dict[(row['Parent_Ion'], row['Product_Ion'])][1]

# len(df)        

# # drop rows with NaN values
# df_matching = df.dropna()#
# df_matching.tail(100)
# len(df_matching) ##after matchihg ## with 28164 for 2 round ###1 round is 42820 ### with round to whole 43556 ###Floor is 44041

44041

In [43]:
df_matching.head(None)
len(df)
df.to_csv("DF_no_Nandropped_Liver_LD_new.csv")
df_matched.to_csv("DF_Nandropped_Liver_LD_new.csv")
len(df_matched)
###with 1 round it was 42820 with Nan and 44047 without nan

44047

In [8]:
#for loop to check if that file already exists and if it does then add a number to the end of the file name so data is not overwritten
name_of_folder = 'canola'
name_of_file = 'canola'


for i in range(5):
    if not os.path.isfile('data_results/data/data_matching/'+name_of_folder+'/'+name_of_file+'_{}.xlsx'.format(i)):
        df_matching.to_excel('data_results/data/data_matching/'+name_of_folder+'/'+name_of_file+'_{}.xlsx'.format(i), index=False)
        break
    elif not os.path.isfile('data_results/data/data_matching/'+name_of_folder+'/'+name_of_file+'_{}.xlsx'.format(i)):
        df_matching.to_excel('data_results/data/data_matching/'+name_of_folder+'/'+name_of_file+'_{}.xlsx'.format(i), index=False)
        break
    else:
        pass
df_matching.tail(5)

Unnamed: 0,Lipid,Parent_Ion,Product_Ion,Intensity,Transition,Class,Sample_ID
30,[TG(54:6)]_FA18:1,896.0,597.0,3145901.463318,896.0 -> 597.0,TAG,TailoredTAG18-1_O3off_RBDCanola0.0005mgmL_0207...
31,[TG(54:5)]_FA18:1,898.0,599.0,12535827.163513,898.0 -> 599.0,TAG,TailoredTAG18-1_O3off_RBDCanola0.0005mgmL_0207...
32,"[TG(55:11),TG(54:4)]_FA18:1",900.0,601.0,14229465.170547,900.0 -> 601.0,TAG,TailoredTAG18-1_O3off_RBDCanola0.0005mgmL_0207...
33,"[TG(55:10),TG(54:3)]_FA18:1",902.0,603.0,34867382.116547,902.0 -> 603.0,TAG,TailoredTAG18-1_O3off_RBDCanola0.0005mgmL_0207...
34,"[TG(55:9),TG(54:2)]_FA18:1",904.0,605.0,12858486.610764,904.0 -> 605.0,TAG,TailoredTAG18-1_O3off_RBDCanola0.0005mgmL_0207...


In [79]:
#import visualization libraries
import umap
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go


IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html



In [81]:
#plot transition versus intensity of df_matching
fig = px.bar(df_matching, x="Transition", y="Intensity", color="Lipid", hover_data=['Lipid','Class'])
fig.show()


#plot lipid class versus intensity of df_matching in a bar chart
fig = px.bar(df_matching, x="Class", y="Intensity", color="Class", hover_data=['Lipid','Class'])
fig.show()
#plot lipid class versus intensity of df_matching in a pie chart
fig = px.pie(df_matching, values='Intensity', names='Class', title='Lipid Class')
fig.show()
#make a plotly heatmap of the intensity of each transition in each sample
fig = go.Figure(data=go.Heatmap(
                     z=df_matching['Intensity'],
                        x=df_matching['Lipid'],
                        y=df_matching['Class'],
                        colorscale='Viridis'))
fig.show()

# #plot sample ID versus intensity of df_matching
# fig = px.bar(df_matching, x="Sample_ID", y="Intensity", color="Sample_ID", hover_data=['Lipid','Class'])
# fig.show()
# #plot sample ID versus intensity of df_matching in a pie chart
# fig = px.pie(df_matching, values='Intensity', names='Sample_ID', title='Sample ID')
# fig.show()