# Performance

This notebook is intended to demonstrate the performance of AlphaTIMS.
It contains the following information:
1. [**Samples**](#Samples)
2. [**Reading raw Bruker .d folders**](#Reading-raw-Bruker-.d-folders)
3. [**Saving HDF files**](#Saving-HDF-files)
4. [**Reading HDF files**](#Reading-HDF-files)
5. [**Slicing data**](#Slicing-data)

The following system was used:

In [1]:
import alphatims.utils

alphatims.utils.set_threads(8)
log_file_name = alphatims.utils.set_logger(
    log_file_name="performance_log.txt",
    overwrite=True
)
alphatims.utils.show_platform_info()
alphatims.utils.show_python_info()

2021-01-22 15:01:01> Platform information:
2021-01-22 15:01:01> system     - Darwin
2021-01-22 15:01:01> release    - 19.6.0
2021-01-22 15:01:01> version    - 10.15.7
2021-01-22 15:01:01> machine    - x86_64
2021-01-22 15:01:01> processor  - i386
2021-01-22 15:01:01> cpu count  - 8
2021-01-22 15:01:01> ram memory - 24.4/32.0 Gb (available/total)
2021-01-22 15:01:01> 
2021-01-22 15:01:01> Python information:
2021-01-22 15:01:01> alphatims  - 0.0.210121
2021-01-22 15:01:01> bokeh      - 2.2.3
2021-01-22 15:01:01> click      - 7.1.2
2021-01-22 15:01:01> datashader - 0.11.1
2021-01-22 15:01:01> h5py       - 3.1.0
2021-01-22 15:01:01> holoviews  - 1.14.0
2021-01-22 15:01:01> holoviz    - 0.11.6
2021-01-22 15:01:01> ipykernel  - 5.3.4
2021-01-22 15:01:01> jupyter    - 1.0.0
2021-01-22 15:01:01> matplotlib - 3.3.3
2021-01-22 15:01:01> numba      - 0.52.0
2021-01-22 15:01:01> numpy      - 1.19.4
2021-01-22 15:01:01> pandas     - 1.1.4
2021-01-22 15:01:01> plotly     - 4.13.0
2021-01-22 15:01:0

## Samples

Five samples are used and compared:

In [2]:
file_names = {
    "DDA_6": "/Users/swillems/Data/alphatims_testing/20201016_tims03_Evo03_PS_MA_HeLa_200ng_DDA_06-15_5_6min_4cm_S1-A1_1_21717.d",
    "DIA_6": "/Users/swillems/Data/alphatims_testing/20201016_tims03_Evo03_PS_SA_HeLa_200ng_300-1200_2steps_16scans_06-15_5_6min_4cm_S1-A4_1_21720.d",
    "DDA_21": "/Users/swillems/Data/alphatims_testing/20201207_tims03_Evo03_PS_SA_HeLa_200ng_EvoSep_prot_DDA_21min_8cm_S1-C10_1_22476.d",
    "DIA_21": "/Users/swillems/Data/alphatims_testing/20201207_tims03_Evo03_PS_SA_HeLa_200ng_EvoSep_prot_high_speed_21min_8cm_S1-C8_1_22474.d",
    "DDA_120": "/Users/swillems/Data/alphatims_testing/HeLa_200ng_1428.d"
}

We first load all these files to show their basic statistics before we do the actual timing:

In [3]:
import logging
import alphatims.bruker

timstof_objects = {}
for sample_id, file_name in file_names.items():
    logging.info(f"Initial loading of {sample_id}")
    timstof_objects[sample_id] = alphatims.bruker.TimsTOF(file_name)
    logging.info("")

2021-01-22 15:01:01> No Bruker libraries are available for this operating system. Intensities are uncalibrated, resulting in (very) small differences. However, mobility and m/z values need to be estimated. While this estimation often returns acceptable results with errors < 0.02 Th, huge errors (e.g. offsets of 6 Th) have already been observed for some samples!
2021-01-22 15:01:01> 
2021-01-22 15:01:02> Initial loading of DDA_6
2021-01-22 15:01:02> Importing data from /Users/swillems/Data/alphatims_testing/20201016_tims03_Evo03_PS_MA_HeLa_200ng_DDA_06-15_5_6min_4cm_S1-A1_1_21717.d
2021-01-22 15:01:02> Reading frame metadata for /Users/swillems/Data/alphatims_testing/20201016_tims03_Evo03_PS_MA_HeLa_200ng_DDA_06-15_5_6min_4cm_S1-A1_1_21717.d
2021-01-22 15:01:02> Reading 2,978 frames with 214,172,697 TOF arrivals for /Users/swillems/Data/alphatims_testing/20201016_tims03_Evo03_PS_MA_HeLa_200ng_DDA_06-15_5_6min_4cm_S1-A1_1_21717.d


100%|██████████| 2978/2978 [00:02<00:00, 1318.36it/s]


2021-01-22 15:01:05> Succesfully imported data from /Users/swillems/Data/alphatims_testing/20201016_tims03_Evo03_PS_MA_HeLa_200ng_DDA_06-15_5_6min_4cm_S1-A1_1_21717.d
2021-01-22 15:01:05> 
2021-01-22 15:01:05> Initial loading of DIA_6
2021-01-22 15:01:05> Importing data from /Users/swillems/Data/alphatims_testing/20201016_tims03_Evo03_PS_SA_HeLa_200ng_300-1200_2steps_16scans_06-15_5_6min_4cm_S1-A4_1_21720.d
2021-01-22 15:01:05> Reading frame metadata for /Users/swillems/Data/alphatims_testing/20201016_tims03_Evo03_PS_SA_HeLa_200ng_300-1200_2steps_16scans_06-15_5_6min_4cm_S1-A4_1_21720.d
2021-01-22 15:01:05> Reading 3,182 frames with 158,552,099 TOF arrivals for /Users/swillems/Data/alphatims_testing/20201016_tims03_Evo03_PS_SA_HeLa_200ng_300-1200_2steps_16scans_06-15_5_6min_4cm_S1-A4_1_21720.d


100%|██████████| 3182/3182 [00:01<00:00, 3003.54it/s]


2021-01-22 15:01:06> Succesfully imported data from /Users/swillems/Data/alphatims_testing/20201016_tims03_Evo03_PS_SA_HeLa_200ng_300-1200_2steps_16scans_06-15_5_6min_4cm_S1-A4_1_21720.d
2021-01-22 15:01:06> 
2021-01-22 15:01:06> Initial loading of DDA_21
2021-01-22 15:01:06> Importing data from /Users/swillems/Data/alphatims_testing/20201207_tims03_Evo03_PS_SA_HeLa_200ng_EvoSep_prot_DDA_21min_8cm_S1-C10_1_22476.d
2021-01-22 15:01:06> Reading frame metadata for /Users/swillems/Data/alphatims_testing/20201207_tims03_Evo03_PS_SA_HeLa_200ng_EvoSep_prot_DDA_21min_8cm_S1-C10_1_22476.d
2021-01-22 15:01:07> Reading 11,886 frames with 295,251,252 TOF arrivals for /Users/swillems/Data/alphatims_testing/20201207_tims03_Evo03_PS_SA_HeLa_200ng_EvoSep_prot_DDA_21min_8cm_S1-C10_1_22476.d


100%|██████████| 11886/11886 [00:02<00:00, 4194.59it/s]


2021-01-22 15:01:10> Succesfully imported data from /Users/swillems/Data/alphatims_testing/20201207_tims03_Evo03_PS_SA_HeLa_200ng_EvoSep_prot_DDA_21min_8cm_S1-C10_1_22476.d
2021-01-22 15:01:10> 
2021-01-22 15:01:10> Initial loading of DIA_21
2021-01-22 15:01:10> Importing data from /Users/swillems/Data/alphatims_testing/20201207_tims03_Evo03_PS_SA_HeLa_200ng_EvoSep_prot_high_speed_21min_8cm_S1-C8_1_22474.d
2021-01-22 15:01:10> Reading frame metadata for /Users/swillems/Data/alphatims_testing/20201207_tims03_Evo03_PS_SA_HeLa_200ng_EvoSep_prot_high_speed_21min_8cm_S1-C8_1_22474.d
2021-01-22 15:01:11> Reading 11,868 frames with 730,564,765 TOF arrivals for /Users/swillems/Data/alphatims_testing/20201207_tims03_Evo03_PS_SA_HeLa_200ng_EvoSep_prot_high_speed_21min_8cm_S1-C8_1_22474.d


100%|██████████| 11868/11868 [00:04<00:00, 2921.13it/s]


2021-01-22 15:01:15> Succesfully imported data from /Users/swillems/Data/alphatims_testing/20201207_tims03_Evo03_PS_SA_HeLa_200ng_EvoSep_prot_high_speed_21min_8cm_S1-C8_1_22474.d
2021-01-22 15:01:15> 
2021-01-22 15:01:15> Initial loading of DDA_120
2021-01-22 15:01:15> Importing data from /Users/swillems/Data/alphatims_testing/HeLa_200ng_1428.d
2021-01-22 15:01:15> Reading frame metadata for /Users/swillems/Data/alphatims_testing/HeLa_200ng_1428.d
2021-01-22 15:01:19> Reading 68,114 frames with 2,074,019,899 TOF arrivals for /Users/swillems/Data/alphatims_testing/HeLa_200ng_1428.d


100%|██████████| 68114/68114 [00:22<00:00, 3095.80it/s]


2021-01-22 15:01:48> Succesfully imported data from /Users/swillems/Data/alphatims_testing/HeLa_200ng_1428.d
2021-01-22 15:01:48> 


In summary, we thus consider the following samples:

| type | gradient | datapoints    | sample_id |
|------|----------|---------------|-----------|
| DDA  | 6 min    | 214,172,697   | DDA_6     |
| DIA  | 6 min    | 158,552,099   | DIA_6     |
| DDA  | 21 min   | 295,251,252   | DDA_21    |
| DIA  | 21 min   | 730,564,765   | DIA_21    |
| DDA  | 120 min  | 2,074,019,899 | DDA_120   |

## Reading raw Bruker .d folders

To avoid unwanted system inferences, we perform a `timeit` function to get a robust estimate of loading times for raw Bruker .d folders:

In [4]:
alphatims.utils.set_logger(stream=False)
alphatims.utils.PROGRESS_CALLBACK_STYLE = alphatims.utils.PROGRESS_CALLBACK_STYLE_NONE
for sample_id, file_name in file_names.items():
    print(f"Time to load {sample_id} raw Bruker .d folder:")
    %timeit tmp = alphatims.bruker.TimsTOF(file_name)
    print("")

Time to load DDA_6 raw Bruker .d folder:
1.55 s ± 26 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to load DIA_6 raw Bruker .d folder:
1.09 s ± 9.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to load DDA_21 raw Bruker .d folder:
3.07 s ± 103 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to load DIA_21 raw Bruker .d folder:
4.54 s ± 93.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to load DDA_120 raw Bruker .d folder:
24.1 s ± 2.04 s per loop (mean ± std. dev. of 7 runs, 1 loop each)



## Saving HDF files

Each of the data files can also be exported to HDF files:

In [5]:
for sample_id, data in timstof_objects.items():
    print(f"Time to export {sample_id} to HDF file:")
    path = data.directory
    file_name = f"{data.sample_name}.hdf"
    %timeit tmp = data.save_as_hdf(path, file_name, overwrite=True)
    print("")

Time to export DDA_6 to HDF file:
571 ms ± 54.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to export DIA_6 to HDF file:
403 ms ± 42.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to export DDA_21 to HDF file:
757 ms ± 33.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to export DIA_21 to HDF file:
1.85 s ± 73.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to export DDA_120 to HDF file:
5.7 s ± 419 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



## Reading HDF files

Once these HDF file are created, they can be loaded much faster than raw Bruker .d folders:

In [6]:
import os
for sample_id, data in timstof_objects.items():
    print(f"Time to load {sample_id} HDF file:")
    file_name = os.path.join(data.directory, f"{data.sample_name}.hdf")
    %timeit tmp = alphatims.bruker.TimsTOF(file_name)
    print("")

Time to load DDA_6 HDF file:
536 ms ± 34.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to load DIA_6 HDF file:
381 ms ± 15.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to load DDA_21 HDF file:
913 ms ± 52.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to load DIA_21 HDF file:
2.2 s ± 515 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to load DDA_120 HDF file:
10.6 s ± 254 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



## Slicing data

Lastly, we can slice this data. Since this uses Numba JIT compilation, we first compile the relevant functions with an initial slice call:

In [7]:
tmp = timstof_objects["DDA_6"][0]

Now we can time how long it takes to slice in different dimensions:
   * **LC:** $100.0 \leq \textrm{retention_time} \lt 100.5$
   * **TIMS:** $\textrm{scan_index} = 450$
   * **Quadrupole:** $700.0 \leq \textrm{quad_mz_values} \lt 710.0$
   * **TOF:** $621.9 \leq \textrm{tof_mz_values} \lt 622.1$

In [8]:
import os
for sample_id, data in timstof_objects.items():
    print(f"Time to slice {sample_id}:")
    
    print("Testing slice data[100.0: 100.5]")
    %timeit tmp = data[100.:100.5]
    
    print("Testing slice data[:, 450]")
    %timeit tmp = data[:, 450]
    
    print("Testing slice data[:, :, 700.0: 710]")
    %timeit tmp = data[:, :, 700.0: 710]
    
    print("Testing slice data[:, :, :, 621.9: 622.1]")
    %timeit tmp = data[:, :, :, 621.9: 622.1]
    
    print("")

Time to slice DDA_6:
Testing slice data[100.0: 100.5]
1.64 ms ± 122 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Testing slice data[:, 450]
45.7 ms ± 17.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Testing slice data[:, :, 700.0: 710]
27 ms ± 2.48 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Testing slice data[:, :, :, 621.9: 622.1]
78.8 ms ± 3.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to slice DIA_6:
Testing slice data[100.0: 100.5]
6.4 ms ± 200 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Testing slice data[:, 450]
26.7 ms ± 1.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Testing slice data[:, :, 700.0: 710]
626 ms ± 19.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Testing slice data[:, :, :, 621.9: 622.1]
109 ms ± 3.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to slice DDA_21:
Testing slice data[100.0: 100.5]
1.74 ms ± 96.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops eac