# Downstream Analysis --Precursor Selection
### Introduction

In this case study, we process a MALDI-TIMS-MS1 natural product dataset of bacterial-fungal co-culture, then use statistical method to filter the feature list for informative precursors as prioritized targets in following iprm-PASEF experiments.
The dataset is from Laura Sanchez Lab and available at Massive.  
Shepherd RA, Luu GT, Sanchez LM. MALDI-TIMS-MS2 Imaging and Annotation of Natural Products in Fungal-Bacterial Co-Culture. bioRxiv [Preprint]. 2025 May 13:2025.05.11.653367. doi: 10.1101/2025.05.11.653367. PMID: 40463019; PMCID: PMC12132468. https://doi.org/10.1101/2025.05.11.653367

In [1]:
import timsimaging

# enable visualization in the Jupyter notebook
from bokeh.io import show, output_notebook
output_notebook()
# disable FutureWarning
import warnings
warnings.filterwarnings('ignore', category=FutureWarning)

In [2]:
bruker_d_folder_name = r"D:\dataset\Laura_Gordon\250321_JB182_Pen12.d"
dataset = timsimaging.spectrum.MSIDataset(bruker_d_folder_name)
dataset

100%|██████████████████████████████████████████████████████████████████████████| 12173/12173 [00:06<00:00, 1786.35it/s]


MSIDataset with 12173 pixels
        mz range: 99.999-1100.005
        mobility range: 0.400-1.800
        

### Peak processing
As the TIC image shows, there are four regions: *G.arilaitensis* + *P.solitum* co-culuture(top), *P.solitum*(bottom left), *G.arilaitensis*(bottom middle) and the matrix(bottom right)

In [3]:
dataset.image()

The first step is peak processing as usual. Due to the heterogeneity of regions, `sampling_ratio` is set to 1.

In [4]:
results = dataset.process(sampling_ratio=1, frequency_threshold=0.05, intensity_threshold=0.003, tolerance=3, window_size=[30, 7], visualize=True)

Computing mean spectrum...
Traversing graph...
Finding local maxima...
Summarizing...


100%|████████████████████████████████████████████████████████████████████████████████| 293/293 [00:55<00:00,  5.28it/s]


In [5]:
show(results["viz"])

ERROR:bokeh.server.views.ws:Refusing websocket connection from Origin 'vscode-webview://0ui6scdjl04hl02qupaj559l4tuc8cb2taj73d9qe9n7micmgo2o';                       use --allow-websocket-origin=0ui6scdjl04hl02qupaj559l4tuc8cb2taj73d9qe9n7micmgo2o or set BOKEH_ALLOW_WS_ORIGIN=0ui6scdjl04hl02qupaj559l4tuc8cb2taj73d9qe9n7micmgo2o to permit this; currently we allow origins {'localhost:8888'}


### Feature selection
For targets in following MS2 acquisition, we want to exclude matrix ions and select precursors spatially associated with the microbioal culture region. Specifically, a desired precursor should present high intensity level in the microbioal culture region and minimum intensity in the matrix region. 

Here we define 3 microbioal regions as a group and the matrix region as the other group, which is consistent with the literature, then find features with significant different intensity levels between two groups.

In [9]:
import numpy as np
import pandas as pd
from scipy.stats import mannwhitneyu

In [10]:
intensity_array = results["intensity_array"]
dataset.set_ROI("matrix", xmin=200, ymin=100)
matrix = intensity_array.loc[dataset.rois["matrix"]]
cell_culture = intensity_array.loc[np.setdiff1d(intensity_array.index, dataset.rois["matrix"])]

normalize the data for accurate result

In [11]:
# RMS normalization
rms = np.sqrt(np.mean(np.square(intensity_array), axis=1))
intensity_array_norm = intensity_array.div(rms, axis=0)

matrix = intensity_array_norm.loc[dataset.rois["matrix"]]
cell_culture = intensity_array_norm.loc[np.setdiff1d(intensity_array.index, dataset.rois["matrix"])]

Here we treat two groups as pixels are samples, then use the Mann-Whitney U test:

In [12]:
stat, p = mannwhitneyu(cell_culture, matrix)

In [14]:
stats = results["peak_list"].copy()
stats["cell_mean"] = np.mean(cell_culture, axis=0).to_numpy()
stats["matrix_mean"] = np.mean(matrix, axis=0).to_numpy()
stats["log2foldchange"] = np.log2(np.mean(cell_culture, axis=0)/np.mean(matrix, axis=0)).to_numpy()
stats["neg_log10_pvalue"] = -np.log10(p)
stats

Unnamed: 0,mz_values,mobility_values,total_intensity,cell_mean,matrix_mean,log2foldchange,neg_log10_pvalue
45,189.070660,0.618844,3353.983734,0.062326,0.045597,0.450878,13.678943
104,214.065628,0.669440,6115.666475,0.117932,0.074158,0.669283,13.847933
110,216.080995,0.668803,2759.881952,0.051508,0.036267,0.506152,14.715856
125,220.076540,0.674778,3491.578000,0.066014,0.042509,0.635015,19.762198
144,227.073487,0.685773,3010.146554,0.060822,0.034544,0.816155,15.449936
...,...,...,...,...,...,...,...
6510,1050.111608,1.479270,2526.931077,0.050320,0.041126,0.291108,8.604290
6575,1072.093492,1.512087,8448.020784,0.161245,0.104411,0.626977,24.396298
6599,1078.109291,1.510727,7812.170952,0.149786,0.135591,0.143644,1.722745
6602,1079.112242,1.511486,4353.308634,0.084934,0.077595,0.130385,1.377639


Visualize the result with a volcano plot

In [15]:
from bokeh.plotting import figure,show
from bokeh.models import ColumnDataSource, HoverTool

In [16]:
f = figure(
    title="Volcano plot",
    match_aspect=True,
    toolbar_location="right",
    x_axis_label="log2foldchange",
    y_axis_label="neg_log10_pvalue",
)

In [18]:
source = ColumnDataSource(stats)
volcano = f.scatter(x="log2foldchange",
          y="neg_log10_pvalue",
          source = source)
hover = HoverTool(renderers=[volcano], tooltips=[
            ("m/z", "@mz_values{0.0000}"),
            ("1/K0", "@mobility_values{0.0000}"),
            ("intensity", "@total_intensity{0.0000}"),
            ("index", "$index"),
        ],)
f.add_tools(hover)
show(f)

The points on the top right are ions present have higher intensity levels in the microbial culture region than the matrix region, with significant p values.  
For example, we filter features with following conditions:

In [20]:
stats.loc[lambda df: (df.log2foldchange>4) & (df.neg_log10_pvalue>50) & (df.matrix_mean<1)]

Unnamed: 0,mz_values,mobility_values,total_intensity,cell_mean,matrix_mean,log2foldchange,neg_log10_pvalue
1860,425.261619,0.975195,3731.217038,0.071716,0.001477,5.601325,129.742614
1900,428.261729,0.960216,3326.308223,0.068339,0.000638,6.743117,172.423069
2509,475.254336,1.006788,2615.506695,0.052339,0.001255,5.382163,154.749031
2647,485.282522,1.014855,2500.674279,0.047446,0.000525,6.498203,146.554071
3575,554.317614,1.097049,4588.912101,0.0858,0.001535,5.804431,144.702985
3858,576.293969,1.10603,3227.838002,0.062171,0.0018,5.109834,152.131748
4222,604.296857,1.151883,5278.328021,0.102012,0.002847,5.163032,144.28526
4527,626.277599,1.114574,2738.916454,0.051026,9.5e-05,9.0751,134.583614
4545,627.348184,1.152731,4095.695556,0.078735,0.00047,7.386868,156.845189
4747,649.33079,1.169898,5580.144664,0.10824,0.001225,6.465226,162.435291


In this method, 12 features were obtained, in which 5 are in the feature list of the literature with 12 precursors in total.