# Downstream Analysis -- feature annotation
### Introduction

In this case study, we process a MALDI-TIMS-MS1 lipids dataset of mouse skin tissue, extract features with m/z and ion mobility, then convert ion mobility into collision cross section(CCS), and annotate features by looking up m/z and CCS in CCS compendium.

In [1]:
import timsimaging

# enable visualization in the Jupyter notebook
from bokeh.io import show, output_notebook
output_notebook()
# disable FutureWarning
import warnings
warnings.filterwarnings('ignore', category=FutureWarning)

In [2]:
bruker_d_folder_name = r"D:\dataset\Melanie\MF402upper.d"
dataset = timsimaging.spectrum.MSIDataset(bruker_d_folder_name)
dataset

100%|██████████████████████████████████████████████████████████████████████████| 65052/65052 [00:13<00:00, 4783.46it/s]


MSIDataset with 65052 pixels
        mz range: 299.999-1350.004
        mobility range: 0.700-1.960
        

In [3]:
dataset.image()

### Peak processing
First we run the processing workflow to get the feature list. There are 3 columns in the result: m/z, ion mobility and intensity. The ion mobility is in $1/K_0$ (inverse reduced mobility), of which the unit is $V\cdotp s\cdotp cm^{-2}$

In [4]:
results = dataset.process(sampling_ratio=0.1, frequency_threshold=0.02, tolerance=3, adaptive_window=True, visualize=False)
peak_list = results["peak_list"]
peak_list

Computing mean spectrum...
Traversing graph...
Finding local maxima...
Summarizing...


100%|████████████████████████████████████████████████████████████████████████████████| 930/930 [02:41<00:00,  5.77it/s]


Unnamed: 0,mz_values,mobility_values,total_intensity
1,302.058416,0.906070,28.807071
2,303.065771,0.907390,273.366334
3,303.065934,1.076893,73.819985
4,303.065516,0.948773,8.603689
5,304.069069,0.906924,3.887010
...,...,...,...
926,1253.899905,1.844601,14.027978
927,1255.913279,1.853024,26.725288
928,1281.929607,1.870141,25.601230
929,1283.943538,1.877011,2.597694


### Converting ion mobility into CCS
However, $1/K_0$ depends on the ion mobility technique and instrument. The data was collected by trapped ion mobility spectrometry(TIMS) while most ion mobility databases are from drift tube(DTIMS), to allow cross-platform comparison, we need to convert ion mobility into CCS, a property only depends on the ion itself. 

TIMSImaging converts $1/K_0$ into CCS by a linear model fitted with reference ion mobilities of calibrants. The metadata of calibration is stored in the raw data:

In [5]:
dataset.cali_info

Unnamed: 0_level_0,KeyPolarity,Value
KeyName,Unnamed: 1_level_1,Unnamed: 2_level_1
CalibrationDateTime,+,2024-07-28T12:37:35+02:00
CalibrationUser,+,unknown
CalibrationSoftware,+,timsTOF
CalibrationSoftwareVersion,+,5.1.8
MzCalibrationMode,+,4
MzStandardDeviationPPM,+,0.209581
ReferenceMassList,+,RedP_0_to_2000 07-03-2024 New List
MzCalibrationSpectrumDescription,+,<unknown>
ReferenceMassPeakNames,+,b'p3\x00p5\x00p9\x00p17\x00p21\x00p25\x00p29\x...
ReferencePeakMasses,+,"b'\x17\xd9\xce\xf7S""\x8a@\xc5 \xb0r\xe8\x11\x8..."


Compute CCS. We also have an option to call interal CCS conversion function from Bruker's TDFSDK API. Here we compute CCS in both methods.

In [6]:
ccs_linear_model = dataset.ccs_calibrator(method="linear")
peak_list["CCS_linear"] = ccs_linear_model.transform(peak_list["mz_values"], peak_list["mobility_values"], charge=1)
ccs_bruker_model = dataset.ccs_calibrator(method="internal")
peak_list["CCS_bruker"] = ccs_bruker_model.transform(peak_list["mz_values"], peak_list["mobility_values"], charge=1)
peak_list["Delta"] = (peak_list["CCS_bruker"]-peak_list["CCS_linear"])/peak_list["CCS_bruker"]

In [7]:
peak_list

Unnamed: 0,mz_values,mobility_values,total_intensity,CCS_linear,CCS_bruker,Delta
1,302.058416,0.906070,28.807071,189.238376,189.666498,0.002257
2,303.065771,0.907390,273.366334,189.489590,189.916152,0.002246
3,303.065934,1.076893,73.819985,225.188273,225.393028,0.000908
4,303.065516,0.948773,8.603689,198.205047,198.577456,0.001875
5,304.069069,0.906924,3.887010,189.364654,189.791992,0.002252
...,...,...,...,...,...,...
926,1253.899905,1.844601,14.027978,374.204010,373.482918,-0.001931
927,1255.913279,1.853024,26.725288,375.913545,375.181832,-0.001950
928,1281.929607,1.870141,25.601230,379.316539,378.563683,-0.001989
929,1283.943538,1.877011,2.597694,380.709363,379.947853,-0.002004


The results from TIMSImaging are almost the same with those from Bruker's method.

### Query features against CCS database
Now we can search m/z and CCS in a database to obtain putative feature annotation. Here we use CCS compendium, which could be downloaded from https://mcleanresearchgroup.shinyapps.io/CCS-Compendium

In [8]:
import pandas as pd
def query_feature(mz, ccs, i, library, ppm_tol=20, ccs_tol=5):

    columns = ['Compound', 'Neutral.Formula', 'CAS',
        'Theoretical.mz', 'Ion.Species', 'Charge',
        'CCS', 'Super.Class',
        'Class', 'Subclass']
    # ppm window
    mz_tol = mz * ppm_tol * 1e-6
    mz_min, mz_max = mz - mz_tol, mz + mz_tol
    
    ccs_min, ccs_max = ccs - ccs_tol, ccs + ccs_tol
    hit_index =  (library['mz'].between(mz_min, mz_max)) & (library['CCS'].between(ccs_min, ccs_max))
    # candidate subset by adduct and mz
    hits = library.loc[hit_index, columns].copy()
    hits["feature_id"] = i
    hits["measured_mz"] = mz
    hits["measured_CCS"] = ccs
    hits["ppm_error"] = (hits["Theoretical.mz"] - mz) / mz * 1e6
    hits["ccs_dev_%"] = ((hits["CCS"] - ccs) / ccs).abs() * 100

    if hits.empty:
        return pd.DataFrame()
    return hits


To save time, we search against a subset of the database: lipids with +1 charge state:

In [12]:
library = pd.read_csv(r"D:\dataset\UnifiedCCSCompendium_FullDataSet_2025-07-28.csv")
lipids_lib = library.loc[
    (library["Super.Class"]=="Lipids and lipid-like molecules")&
    (library["Charge"]==1)
    ]
lipids_lib

Unnamed: 0,Compound,Neutral.Formula,CAS,InChi,InChiKey,Theoretical.mz,mz,Ion.Species,Ion.Species.Agilent,Charge,...,Rep5,Rep6,Rep7,Rep8,Rep9,Rep10,Rep11,Rep12,Rep13,Rep14
1045,5-iPF2alpha-VI,C20H34O5,180469-63-0,InChI=1S/C20H34O5/c1-2-3-4-5-6-7-10-16-17(19(2...,RZCPXIZGLPAGEV-UHFFFAOYSA-N,377.2304,377.2304,[M+Na],(M+Na)+,1,...,,,,,,,,,,
1047,8-iso-15(R)-Prostaglandin F2alpha,C20H34O5,214748-65-9,InChI=1S/C20H34O5/c1-2-3-6-9-15(21)12-13-17-16...,PXGPLTODNUVGFL-PGWUFSIFSA-N,377.2304,377.2304,[M+Na],(M+Na)+,1,...,,,,,,,,,,
1049,8-iso-Prostaglandin F2alpha,C20H34O5,27415-26-5,InChI=1S/C20H34O5/c1-2-3-6-9-15(21)12-13-17-16...,PXGPLTODNUVGFL-NAPLMKITSA-N,377.2304,377.2304,[M+Na],(M+Na)+,1,...,,,,,,,,,,
1051,15(R)-Prostaglandin F2alpha,C20H34O5,37658-84-7,InChI=1S/C20H34O5/c1-2-3-6-9-15(21)12-13-17-16...,PXGPLTODNUVGFL-CKXCCYAOSA-N,377.2304,377.2304,[M+Na],(M+Na)+,1,...,,,,,,,,,,
1053,11-beta-Prostaglandin F2alpha,C20H34O5,38432-87-0,InChI=1S/C20H34O5/c1-2-3-6-9-15(21)12-13-17-16...,PXGPLTODNUVGFL-ZWAKLXPCSA-N,377.2304,377.2304,[M+Na],(M+Na)+,1,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3666,Progesterone,C21H30O2,57-83-0,InChI=1S/C21H30O2/c1-13(22)17-6-7-18-16-5-4-14...,RJKFOVLPORLFTN-LEKSSAKUSA-N,337.2144,337.2143,[M+Na],(M+Na)+,1,...,,,,,,,,,,
3668,17-alpha-Hydroxyprogesterone,C21H30O3,68-96-2,InChI=1S/C21H30O3/c1-13(22)21(24)11-8-18-16-5-...,DBPWSSGDRRHUNT-CEGNMAFCSA-N,331.2273,331.2273,[M+H],(M+H)+,1,...,,,,,,,,,,
3669,17-alpha-Hydroxyprogesterone,C21H30O3,68-96-2,InChI=1S/C21H30O3/c1-13(22)21(24)11-8-18-16-5-...,DBPWSSGDRRHUNT-CEGNMAFCSA-N,353.2093,353.2093,[M+Na],(M+Na)+,1,...,,,,,,,,,,
3670,d7-Cholesterol Ester (18:1),C45H71D7O2,1416275-35-8,InChI=1S/C45H78O2/c1-7-8-9-10-11-12-13-14-15-1...,RJECHNNFRHZQKU-IHPCOYDHSA-N,680.6339,680.6305,[M+Na],(M+Na)+,1,...,,,,,,,,,,


Query features with m/z tolerance=20 ppm and CCS tolerance=5 $\mathring{A}^2$:

In [10]:
all_results = []
for i, feat in peak_list.iterrows():
    all_results.append(query_feature(feat['mz_values'], feat['CCS_linear'], i, lipids_lib, ppm_tol=20, ccs_tol=5))

In [11]:
final_results = pd.concat([r for r in all_results if not r.empty],
                              ignore_index=True) if any(not r.empty for r in all_results) else pd.DataFrame()
final_results

Unnamed: 0,Compound,Neutral.Formula,CAS,Theoretical.mz,Ion.Species,Charge,CCS,Super.Class,Class,Subclass,feature_id,measured_mz,measured_CCS,ppm_error,ccs_dev_%
0,LysoPC (13:0),C21H44NO7P,20559-17-5,454.2933,[M+H],1,217.7,Lipids and lipid-like molecules,Glycerophospholipids,Glycerophosphocholines,179,454.302122,212.734417,-19.418093,2.33417
1,Lithocholyltaurine,C26H45NO5S,6042-32-6,506.2916,[M+Na],1,213.3,Lipids and lipid-like molecules,Steroids and steroid derivatives,"Bile acids, alcohols and derivatives",232,506.299203,217.302892,-15.017336,1.84208
2,LysoPC (16:0),C24H50NO7P,17364-16-8,518.3223,[M+Na],1,234.3,Lipids and lipid-like molecules,Glycerophospholipids,Glycerophosphocholines,239,518.332071,235.84059,-18.850706,0.653234
3,LysoPC (18:1),C26H52NO7P,19420-56-5,522.3559,[M+H],1,233.2,Lipids and lipid-like molecules,Glycerophospholipids,Glycerophosphocholines,245,522.365273,235.736654,-17.942795,1.076054
4,"1,2-Didecanoyl-sn-glycero-3-phosphoethanolamine",C25H50NO8P,253685-27-7,524.3352,[M+H],1,235.4,Lipids and lipid-like molecules,Glycerophospholipids,Glycerophosphoethanolamines,250,524.344495,236.731846,-17.727345,0.562597
5,LysoPC (18:0),C26H54NO7P,19420-57-6,524.3716,[M+H],1,238.8,Lipids and lipid-like molecules,Glycerophospholipids,Glycerophosphocholines,251,524.380728,240.115902,-17.406327,0.548028
6,LysoPC (2-18:0),C26H54NO7P,27098-24-4,524.3716,[M+H],1,240.7,Lipids and lipid-like molecules,Glycerophospholipids,Glycerophosphocholines,251,524.380728,240.115902,-17.406327,0.243257
7,Platelet-activating Factor,C26H54NO7P,74389-68-7,524.3716,[M+H],1,239.3,Lipids and lipid-like molecules,Glycerophospholipids,Glycerophosphocholines,251,524.380728,240.115902,-17.406327,0.339795
8,LysoPC (19:0),C27H56NO7P,108273-88-7,538.3872,[M+H],1,240.5,Lipids and lipid-like molecules,Glycerophospholipids,Glycerophosphocholines,262,538.395407,243.788239,-15.242698,1.34881
9,PC (17:0/02:0),C27H56NO7P,93037-84-4,538.3872,[M+H],1,244.2,Lipids and lipid-like molecules,Glycerophospholipids,Glycerophosphocholines,262,538.395407,243.788239,-15.242698,0.168901


We got some matches and we can view their ion images:

In [24]:
image, _  = timsimaging.plotting.image(dataset, i=539, results=results)
show(image)

### Compare with annotation results from METASPACE

In [15]:
metaspace_results = pd.read_csv(r"D:\dataset\Melanie\metaspace_annotations.csv", header=2)

In [32]:
metaspace_ions = metaspace_results[["formula", "adduct"]].copy()
metaspace_ions["adduct"] = metaspace_ions["adduct"].map(lambda x: f"[{x}]".format(x))
timsimaging_ions = final_results[["Neutral.Formula", "Ion.Species"]].copy()
timsimaging_ions.columns = ["formula", "adduct"]
intersection = timsimaging_ions.merge(
    metaspace_ions,
    on=["formula", "adduct"],
    how="inner"
)
intersection

Unnamed: 0,formula,adduct
0,C40H76NO8P,[M+H]
1,C40H80NO8P,[M+H]
2,C39H78NO8P,[M+Na]
3,C40H80NO8P,[M+Na]
4,C41H76NO8P,[M+Na]
5,C41H80NO8P,[M+Na]
6,C41H82NO8P,[M+Na]
7,C42H78NO8P,[M+Na]
8,C42H82NO8P,[M+Na]
9,C48H96NO8P,[M+Na]


![](C40H80NO8P+H.png){#ion image from metaspace width="60%"}