# Overview of TIMSImaging package
### Introduction
TIMSImaging is a Python package to process and visualize mass spectrometry imaging(MSI) data with ion mobility. In this notebook, we will go through the functions in TIMSImaging and show the basic use on a small kidney biopsy dataset.

### Installation
the following lines would create a new Conda environment and install TIMSImaging from the GitHub repository.

In [None]:
!conda create -n timsimaging python=3.11
!conda activate timsimaging
!pip install git+https://github.com/YinyueZhu/TIMSImaging.git

package setup

In [1]:
import timsimaging

# enable visualization in the Jupyter notebook
from bokeh.io import show, output_notebook
output_notebook()
# disable FutureWarning
import warnings
warnings.filterwarnings('ignore', category=FutureWarning)

### Load raw data (.d)
.d is a directory from Bruker's timsTOF instrument. For MAIDI-TIMS-TOF imaging experiments, there would be a .tdf file for metadata, and a .tdf_bin file for spectra.

In [2]:
bruker_d_folder_name = "/home/zhu.yiny/MSI/datasets/chs_tims_on_lipids_only.d/"
dataset = timsimaging.spectrum.MSIDataset(bruker_d_folder_name)
dataset

100%|██████████████████████████████████████| 1696/1696 [00:03<00:00, 472.78it/s]


MSIDataset with 1696 pixels
        mz range: 49.999-1000.003
        mobility range: 0.400-1.100
        

### View unprocessed ion images
`MSIDataset.image()` shows ion images with given m/z and 1/$K_0$ ranges.

In [3]:
# visualize TIC image by default
dataset.image()

In [4]:
# view slice images
dataset.image(mz=slice(325.0, 326.0), mobility=slice(0.9, 1.0))

### Access the spectrum of a frame(pixel)
`Frame` is the class for a single spectrum in TIMSImaging, basically it is point cloud in 2D (m/z, ion mobility) space. Indexing `MSIDataset[i]` returns a `Frame` instance

In [5]:
dataset[99]

mz_values   mobility_values
50.849879   0.914327           21
50.865117   1.048156           27
50.960060   0.818535           42
51.690662   0.617485           71
51.875873   0.819136           33
                               ..
984.663702  0.928651           24
985.783646  0.484620           20
987.837002  0.582233           21
992.949214  0.980477           22
993.040622  0.741525           23
Name: intensity_values, Length: 15836, dtype: uint16

### Compute mean spectrum
`dataset.mean_spectrum()` returns mean spectrum of the dataset as a `Frame` instance.

In [6]:
?dataset.mean_spectrum

[0;31mSignature:[0m
[0mdataset[0m[0;34m.[0m[0mmean_spectrum[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mframe_indices[0m[0;34m:[0m [0mnumpy[0m[0;34m.[0m[0mndarray[0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0msampling_ratio[0m[0;34m:[0m [0mfloat[0m [0;34m=[0m [0;36m1.0[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mintensity_threshold[0m[0;34m:[0m [0mfloat[0m [0;34m=[0m [0;36m0.05[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mas_frame[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mseed[0m[0;34m=[0m[0;36m42[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
compute mean spectra over the whole dataset

:param as_frame: if True, return a pd.DataFrame, otherwise an Frame, defaults to False
:type as_frame: bool, optional
:param intensity_threshold: Filter out intensities that appear in little fraction of pixels, defaults to 0.05
:type intensity_threshol

In [7]:
mean_spec = dataset.mean_spectrum()
mean_spec

mz_values   mobility_values
92.850896   0.656304           1.020637
            0.656909           1.314269
            0.657515           1.123821
            0.658121           1.176297
            0.658727           1.183373
                                 ...   
562.418698  0.886840           1.015330
562.420994  0.880261           1.196344
            0.880859           1.047759
            0.882055           1.366745
            0.883252           1.068396
Name: intensity_values, Length: 41185, dtype: float64

visualize traditional mass spectrum

In [8]:
mean_spec.heatmap()

In [9]:
mean_spec.spectrum()

In [10]:
mean_spec.mobilogram()

### Peak picking
In order to extract features from the raw data, we can use `Frame.peakPick()`, which returns a peak list. Here we do peak picking on the mean spectrum obtained above, we can do peak picking on any `Frame` instance, too.

In [11]:
peak_list = mean_spec.peakPick()[0]
peak_list

Traversing graph...
Finding local maxima...
Summarizing...


Unnamed: 0,mz_values,mobility_values,total_intensity
1,92.851317,0.657137,10.581958
2,128.846816,0.496184,431.852594
3,130.845599,0.496923,724.589033
4,130.845507,0.628834,15.189269
5,131.846980,0.496935,27.541863
...,...,...,...
370,500.542565,0.859062,91.647406
371,502.537583,0.859331,161.206958
372,504.532068,0.859422,78.372642
373,560.423196,0.881971,56.185731


For more advanced use of peak picking, refer to the documentation

In [12]:
?mean_spec.peakPick

[0;31mSignature:[0m
[0mmean_spec[0m[0;34m.[0m[0mpeakPick[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mtolerance[0m[0;34m:[0m [0mUnion[0m[0;34m[[0m[0mIterable[0m[0;34m[[0m[0mint[0m [0;34m|[0m [0mfloat[0m[0;34m][0m[0;34m,[0m [0mint[0m[0;34m,[0m [0mfloat[0m[0;34m,[0m [0mNoneType[0m[0;34m][0m [0;34m=[0m [0;36m2[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmetric[0m[0;34m:[0m [0mLiteral[0m[0;34m[[0m[0;34m'euclidean'[0m[0;34m,[0m [0;34m'chebyshev'[0m[0;34m][0m [0;34m=[0m [0;34m'euclidean'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mwindow_size[0m[0;34m:[0m [0mIterable[0m[0;34m[[0m[0mint[0m[0;34m][0m [0;34m=[0m [0;34m[[0m[0;36m17[0m[0;34m,[0m [0;36m7[0m[0;34m][0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0madaptive_window[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0msubdivide[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcount_thrshold[0m[0;34m=[0m[

### CCS Calibration
CCS(cross collision section) calibration aims to compute CCS value empirically based on reference values of calibrants(standard molecules). The calibrant reference was recorded in the .tdf file. `MSIDataset.ccs_calibrator` returns a model fitted with that reference.

In [13]:
ccs_curve = dataset.ccs_calibrator()
ccs_values = ccs_curve.transform(peak_list["mz_values"], peak_list["mobility_values"], charge=1)
peak_list["CCS"] = ccs_values
peak_list

Unnamed: 0,mz_values,mobility_values,total_intensity,CCS
1,92.851317,0.657137,10.581958,147.057427
2,128.846816,0.496184,431.852594,106.614013
3,130.845599,0.496923,724.589033,106.627427
4,130.845507,0.628834,15.189269,135.689395
5,131.846980,0.496935,27.541863,106.556712
...,...,...,...,...
370,500.542565,0.859062,91.647406,173.657229
371,502.537583,0.859331,161.206958,173.693795
372,504.532068,0.859422,78.372642,173.694225
373,560.423196,0.881971,56.185731,177.850419


### Pipelined processing
We can simply call `Dataset.process()` to do all processing in one step, it accepts the same arguments for computing mean spectrum and peak picking, with options to do visualization and CCS calibration.

In [14]:
results = dataset.process(sampling_ratio=1, visualize=True, ccs_calibration=True)

Computing mean spectrum...
Traversing graph...
Finding local maxima...
Summarizing...


100%|████████████████████████████████████████| 374/374 [00:02<00:00, 141.06it/s]


Here the results includes pixel coordinates, peak list, intensity array, which are different aspects of the data cube.  
The intensity array is in ($n_{pixel}$, $n_{peak}$) shape.

In [15]:
results

{'coords':        XIndexPos  YIndexPos
 Frame                      
 1            368        224
 2            369        224
 3            370        224
 4            371        224
 5            372        224
 ...          ...        ...
 1692         307        256
 1693         308        256
 1694         309        256
 1695         310        256
 1696         311        256
 
 [1696 rows x 2 columns],
 'peak_list':       mz_values  mobility_values  total_intensity  ccs_values
 1     92.851317         0.657137        10.581958  147.057427
 2    128.846816         0.496184       431.852594  106.614013
 3    130.845599         0.496923       724.589033  106.627427
 4    130.845507         0.628834        15.189269  135.689395
 5    131.846980         0.496935        27.541863  106.556712
 ..          ...              ...              ...         ...
 370  500.542565         0.859062        91.647406  173.657229
 371  502.537583         0.859331       161.206958  173.693795
 372 

### Export imzML 
We can export the processed data as imzML format using `export_imzML()`

In [16]:
timsimaging.spectrum.export_imzML(dataset, "example_results", peaks=results)

100%|█████████████████████████████████████| 1696/1696 [00:01<00:00, 1326.92it/s]


### Interactive Viusalization
The `MSIDataset.procss()` can generate an interactive visualization GUI by option, it could be displayed in the notebook. Note some interactivities rely on the active session, so for each time we need to rerun `process` to show the GUI

In [17]:
show(results["viz"], notebook_url="10.99.250.67:8888")