# Downstream Analysis --Precursor Selection
### Introduction

In this case study, we process a MALDI-TIMS-MS1 natural product dataset of bacterial-fungal co-culture, then use statistical method to filter the feature list for informative precursors as prioritized targets in following iprm-PASEF experiments.
The dataset is from Laura Sanchez Lab and available at Massive.

In [None]:
import timsimaging

# enable visualization in the Jupyter notebook
from bokeh.io import show, output_notebook
output_notebook()
# disable FutureWarning
import warnings
warnings.filterwarnings('ignore', category=FutureWarning)

In [2]:
bruker_d_folder_name = r"D:\dataset\250321_JB182_Pen12.d\250321_JB182_Pen12.d"
dataset = timsimaging.spectrum.MSIDataset(bruker_d_folder_name)
dataset

100%|██████████████████████████████████████████████████████████████████████████| 12173/12173 [00:06<00:00, 1886.68it/s]


MSIDataset with 12173 pixels
        mz range: 99.999-1100.005
        mobility range: 0.400-1.800
        

As the TIC image shows, there are 4 regions: *G.arilaitensis* + *P.solitum* co-culuture(top), *P.solitum*(bottom left), *G.arilaitensis*(bottom middle) and the matrix(bottom right)

In [116]:
dataset.image()

For precursor targets in following MS2 experiments, we want to select ions relevant to cell culture and exclude matrix ions. Specifically, a desired precursor should be associate with the microbioal culture region,  and minimum intensity in the matrix region. So first we, then stati on peak-picked data.

The first step is peak processing as usual. Due to the heterogeneity of regions, `sampling_ratio` is set to 1.

In [3]:
mean_spec = dataset.mean_spectrum(sampling_ratio=1, frequency_threshold=0.05)

In [4]:
mean_spec

mz_values    mobility_values
116.049164   0.523894           0.567568
             0.524540           0.571921
             0.525187           0.734001
             0.525833           0.682248
             0.526480           0.831019
                                  ...   
1096.104187  1.526523           0.516142
             1.527749           0.509653
             1.528363           0.584080
             1.530816           0.500616
             1.538174           0.501520
Name: intensity_values, Length: 1477903, dtype: float64

In [4]:
feature_list = mean_spec.peakPick(tolerance=3, window_size=[30, 7], subdivide=True, return_extents=True)

Traversing graph...
Finding local maxima...
Summarizing...


In [6]:
table, _ = timsimaging.plotting.feature_list(feature_list["peak_list"])

In [7]:
show(table)

In [5]:
peak_list = feature_list["peak_list"].copy()
ccs_curve = dataset.ccs_calibrator()
ccs_values = ccs_curve.transform(peak_list["mz_values"], peak_list["mobility_values"], charge=1)
peak_list["CCS"] = ccs_values
peak_list.to_csv("original_peaklist.csv")

In [121]:
peak_list, peak_extents = feature_list.values()

In [23]:
import numpy as np
import pandas as pd
from tqdm import tqdm

In [122]:
intensity_threshold = np.max(peak_list["total_intensity"]) * 0.003
indices = peak_list["total_intensity"] > intensity_threshold
peak_list = peak_list[indices]
peak_extents = peak_extents[indices]

In [123]:
#peak_list, peak_extents = mean_spec.peakPick(return_extents=True, **kwargs).values()

n_peak = peak_list.shape[0]
frame_indices = np.arange(1, dataset.data.frame_max_index)
# if isinstance(intensity_threshold, float):
    #np.max(peak_list["total_intensity"]) * intensity_threshold
# use dataframe for missing values
intensity_array = pd.DataFrame(
    None,
    index=frame_indices,
    columns=np.arange(1, n_peak + 1),
) # (n_pixel, n_peak)
for i in tqdm(range(n_peak)):
    tof_min, tof_max, scan_min, scan_max = peak_extents.iloc[i][["tof_indices", "scan_indices"]].astype(int)
    # all data for i-th peak
    #image_data = self.data[:, mob_min:mob_max, 0, mz_min:mz_max]
    #intensity_array[i + 1] = image_data.groupby("frame_indices")["intensity_values"].sum()
    indices = dataset.data[:, scan_min:(scan_max+1), 0, tof_min:(tof_max+1), "raw"] # all data points of a peak
    intensity_array[i + 1] = dataset.data.bin_intensities(indices, axis=["rt_values"])[frame_indices] # JIT function

100%|████████████████████████████████████████████████████████████████████████████████| 293/293 [00:47<00:00,  6.23it/s]


In [28]:
intensity_array

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,...,284,285,286,287,288,289,290,291,292,293
1,4258.0,7131.0,2455.0,4483.0,4152.0,6239.0,5591.0,3661.0,4994.0,9566.0,...,4286.0,13360.0,6469.0,2812.0,2213.0,3635.0,11644.0,7675.0,3769.0,5756.0
2,3771.0,4246.0,2871.0,3830.0,2382.0,4375.0,4600.0,3000.0,3376.0,6968.0,...,2742.0,9635.0,4127.0,2508.0,2574.0,2778.0,9102.0,5801.0,3408.0,5332.0
3,3623.0,3736.0,2570.0,3491.0,1005.0,3598.0,2470.0,2109.0,2413.0,4708.0,...,3139.0,8909.0,4940.0,2571.0,3327.0,2915.0,10067.0,5221.0,3044.0,4616.0
4,2613.0,3179.0,1880.0,2082.0,1826.0,3808.0,2565.0,2261.0,1800.0,5157.0,...,2382.0,6640.0,3245.0,2299.0,2647.0,2668.0,5706.0,5012.0,2912.0,3314.0
5,4499.0,3556.0,1919.0,3173.0,1157.0,3662.0,3096.0,3045.0,3132.0,6298.0,...,3634.0,9833.0,5828.0,1789.0,3381.0,2762.0,10262.0,7166.0,3494.0,5803.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12169,5212.0,8552.0,3567.0,3537.0,3607.0,6366.0,5071.0,2752.0,6037.0,11174.0,...,1998.0,6804.0,3598.0,9679.0,2664.0,5117.0,11784.0,15764.0,8228.0,4819.0
12170,4988.0,9845.0,3985.0,4738.0,5033.0,7205.0,4825.0,3793.0,6508.0,12209.0,...,2182.0,4998.0,2210.0,9780.0,1826.0,3383.0,10034.0,13464.0,8317.0,2266.0
12171,4868.0,8605.0,3835.0,4182.0,4608.0,7703.0,5177.0,3193.0,5631.0,9694.0,...,1032.0,2548.0,1671.0,6315.0,2678.0,2851.0,5636.0,8950.0,4527.0,2370.0
12172,5090.0,15204.0,4009.0,4334.0,7542.0,6792.0,7232.0,2364.0,9925.0,15788.0,...,2379.0,7163.0,3304.0,9916.0,2716.0,5361.0,14691.0,14856.0,9301.0,3808.0


In [4]:
results = dataset.process(sampling_ratio=1, frequency_threshold=0.05, intensity_threshold=0.001, window_size=[30,7], visualize=True)

Computing mean spectrum...
Traversing graph...
Finding local maxima...
Summarizing...


100%|████████████████████████████████████████████████████████████████████████████████| 781/781 [01:44<00:00,  7.46it/s]


In [None]:
show(results["viz"])

In [4]:
timsimaging.spectrum.export_imzML(dataset, path=r"D:\dataset\laura_gordon", peaks=results)

100%|██████████████████████████████████████████████████████████████████████████| 12173/12173 [00:06<00:00, 1819.62it/s]


In [9]:
results["intensity_array"]

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,...,772,773,774,775,776,777,778,779,780,781
1,1369.0,1693.0,4258.0,1141.0,795.0,1506.0,1736.0,2012.0,1532.0,7131.0,...,11644.0,2530.0,1348.0,2216.0,7675.0,3769.0,2677.0,2353.0,2219.0,5756.0
2,3835.0,1908.0,3771.0,1047.0,835.0,1369.0,197.0,1981.0,1330.0,4246.0,...,9005.0,1212.0,1253.0,1978.0,5801.0,3408.0,2393.0,1518.0,1270.0,5266.0
3,3670.0,782.0,3623.0,989.0,0.0,1036.0,915.0,1607.0,1341.0,3736.0,...,10067.0,2282.0,633.0,2109.0,5221.0,3044.0,1806.0,1508.0,1408.0,4616.0
4,3705.0,992.0,2613.0,1071.0,593.0,665.0,770.0,1243.0,765.0,3179.0,...,5706.0,1728.0,487.0,1131.0,5012.0,2912.0,2037.0,1542.0,631.0,3314.0
5,3609.0,1459.0,4499.0,990.0,389.0,527.0,1096.0,1342.0,1297.0,3556.0,...,10165.0,2022.0,1127.0,2245.0,7166.0,3494.0,2044.0,1485.0,961.0,5735.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12169,1159.0,1898.0,5212.0,945.0,1344.0,1867.0,1140.0,2230.0,1622.0,8552.0,...,11784.0,2363.0,1891.0,3239.0,15710.0,8228.0,855.0,777.0,387.0,4819.0
12170,684.0,2465.0,4988.0,632.0,1704.0,1445.0,1579.0,3100.0,1610.0,9845.0,...,9971.0,1329.0,1416.0,2677.0,13464.0,8317.0,497.0,293.0,793.0,2266.0
12171,2112.0,1586.0,4868.0,1413.0,1062.0,1848.0,1418.0,2649.0,2183.0,8605.0,...,5636.0,1296.0,1259.0,1358.0,8950.0,4527.0,217.0,306.0,115.0,2370.0
12172,0.0,1475.0,5090.0,76.0,2406.0,2152.0,2404.0,4002.0,1595.0,15204.0,...,14691.0,2320.0,1690.0,3285.0,14736.0,9301.0,1108.0,726.0,1159.0,3756.0


In [124]:
dataset.set_ROI("matrix", xmin=200, ymin=100)

In [68]:
matrix = intensity_array.loc[dataset.rois["matrix"]]
cell_culture = intensity_array.loc[np.setdiff1d(intensity_array.index, dataset.rois["matrix"])]

In [125]:
# TIC normalization
# intensity_array_norm = intensity_array.div(intensity_array.sum(axis=1), axis=0)

# RMS normalization
rms = np.sqrt(np.mean(np.square(intensity_array), axis=1))
intensity_array_norm = intensity_array.div(rms, axis=0)

matrix = intensity_array_norm.loc[dataset.rois["matrix"]]
cell_culture = intensity_array_norm.loc[np.setdiff1d(intensity_array.index, dataset.rois["matrix"])]

In [32]:
from scipy.stats import mannwhitneyu

In [126]:
stat, p = mannwhitneyu(cell_culture, matrix)

In [70]:
target = [
    417.2348,
    425.2625,
    428.2594,
    475.2532,
    532.3086,
    588.3737,
    615.3198,
    627.3479,
    655.2726,
]

In [71]:
from scipy.spatial import KDTree

tree = KDTree(peak_list[["mz_values"]])
dist, index = tree.query(np.array(target).reshape(-1,1))

In [129]:
stats = peak_list.copy()
stats["cell_mean"] = np.mean(cell_culture, axis=0).to_numpy()
stats["matrix_mean"] = np.mean(matrix, axis=0).to_numpy()
stats["log2foldchange"] = np.log2(np.mean(cell_culture, axis=0)/np.mean(matrix, axis=0)).to_numpy()
stats["neg_log10_pvalue"] = -np.log10(p)
stats["target"]="No"
for i in index:
    if i<stats.shape[0]:
        stats["target"].iloc[i]="Yes"
#stats["target"].iloc[index]="Yes"

In [136]:
tree = KDTree(feature_list["peak_list"][["mz_values"]])
dist, index = tree.query(np.array(target).reshape(-1,1))
feature_list["peak_list"].iloc[index]

Unnamed: 0,mz_values,mobility_values,total_intensity
1756,417.227516,0.996345,10.092828
1860,425.261619,0.975195,3731.217038
1900,428.261729,0.960216,3326.308223
2509,475.254336,1.006788,2615.506695
3280,532.30908,1.081939,1673.549659
4028,588.373038,1.151386,1445.407788
4378,615.320932,1.144504,1024.922287
4545,627.348184,1.152731,4095.695556
4843,655.273642,1.272451,2910.554342


In [130]:
stats

Unnamed: 0,mz_values,mobility_values,total_intensity,cell_mean,matrix_mean,log2foldchange,neg_log10_pvalue,target
45,189.070660,0.618844,3353.983734,0.062326,0.045597,0.450878,13.678943,No
104,214.065628,0.669440,6115.666475,0.117932,0.074158,0.669283,13.847933,No
110,216.080995,0.668803,2759.881952,0.051508,0.036267,0.506152,14.715856,No
125,220.076540,0.674778,3491.578000,0.066014,0.042509,0.635015,19.762198,No
144,227.073487,0.685773,3010.146554,0.060822,0.034544,0.816155,15.449936,No
...,...,...,...,...,...,...,...,...
6510,1050.111608,1.479270,2526.931077,0.050320,0.041126,0.291108,8.604290,No
6575,1072.093492,1.512087,8448.020784,0.161245,0.104411,0.626977,24.396298,No
6599,1078.109291,1.510727,7812.170952,0.149786,0.135591,0.143644,1.722745,No
6602,1079.112242,1.511486,4353.308634,0.084934,0.077595,0.130385,1.377639,No


In [13]:
stats

Unnamed: 0,mz_values,mobility_values,total_intensity,cell_mean,matrix_mean,log2foldchange,neg_log10_pvalue,target
0,172.040229,0.627523,2306.307730,2324.374075,2743.743860,-0.239304,4.677712,No
1,187.055019,0.614947,1350.618829,1382.486289,1008.017544,0.455744,11.281295,No
2,189.070660,0.618844,3353.983734,3416.689603,2753.175439,0.311503,8.966857,No
3,190.050757,0.964792,884.310605,917.926228,1054.000000,-0.199425,6.393559,No
4,201.059263,0.651327,1019.046168,1097.044162,754.347368,0.540321,13.614635,No
...,...,...,...,...,...,...,...,...
776,1079.112242,1.511486,4353.308634,4623.626851,4739.094737,-0.035587,0.405783,No
777,1088.066520,1.557763,1397.371889,1549.137954,566.659649,1.450911,57.278563,No
778,1088.067282,1.464957,1019.910129,1148.224680,414.424561,1.470224,49.253756,No
779,1089.070123,1.484550,835.687998,922.599092,347.533333,1.408553,46.334123,No


In [40]:
from bokeh.plotting import figure,show
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.transform import factor_cmap

In [131]:
f = figure(
    title="Volcano",
    match_aspect=True,
    toolbar_location="right",
    x_axis_label="log2foldchange",
    y_axis_label="neg_log10_pvalue",
)

In [132]:
source = ColumnDataSource(stats)
volcano = f.scatter(x="log2foldchange",
          y="neg_log10_pvalue",
          color = factor_cmap("target", ["Steelblue", "Orange"], ["No", "Yes"]),
          #color = factor_cmap(np.where(func(stats), "A", "B"), ["Steelblue", "Orange"], ["A", "B"]),
          source = source)
hover = HoverTool(renderers=[volcano], tooltips=[
            ("m/z", "@mz_values{0.0000}"),
            ("1/K0", "@mobility_values{0.0000}"),
            ("index", "$index"),
        ],)
f.add_tools(hover)
show(f)

In [111]:
source = ColumnDataSource(stats)
volcano = f.scatter(x="log2foldchange",
          y="neg_log10_pvalue",
          color = factor_cmap("target", ["Steelblue", "Orange"], ["No", "Yes"]),
          #color = factor_cmap(np.where(func(stats), "A", "B"), ["Steelblue", "Orange"], ["A", "B"]),
          source = source)
hover = HoverTool(renderers=[volcano], tooltips=[
            ("m/z", "@mz_values{0.0000}"),
            ("1/K0", "@mobility_values{0.0000}"),
            ("index", "$index"),
        ],)
f.add_tools(hover)
show(f)

In [50]:
func = lambda df: (df.log2foldchange>4)&(df.neg_log10_pvalue>50)&(df.matrix_mean<1000)
source.add(np.where(func(stats), "Orange", "Steelblue"), name="color")
a = figure(
    title="Volcano",
    match_aspect=True,
    toolbar_location="right",
    x_axis_label="log2foldchange",
    y_axis_label="neg_log10_pvalue",
)
volc = a.scatter(x="log2foldchange",
          y="neg_log10_pvalue",
          #color = factor_cmap("target", ["Steelblue", "Orange"], ["No", "Yes"]),
          color = 'color',
          source = source)
hover = HoverTool(renderers=[volcano], tooltips=[
            ("m/z", "@mz_values{0.0000}"),
            ("1/K0", "@mobility_values{0.0000}"),
            ("index", "$index"),
        ],)
a.add_tools(hover)
show(a)

In [None]:
#425.2625 
#532.3086 
627.3479 
655.2726 
#417.2438 
#475.2532 
529.3415
615.3198 
608.3601
#428.2594 
#588.3737 
594.3424