# Performance

This notebook is intended to demonstrate the performance of AlphaTIMS.
It contains the following information:
1. [**Samples**](#Samples)
2. [**Reading raw Bruker .d folders**](#Reading-raw-Bruker-.d-folders)
3. [**Saving HDF files**](#Saving-HDF-files)
4. [**Reading HDF files**](#Reading-HDF-files)
5. [**Slicing data**](#Slicing-data)

The following system was used:

In [1]:
import alphatims.utils

alphatims.utils.set_threads(8)
log_file_name = alphatims.utils.set_logger(
    log_file_name="performance_log.txt",
    overwrite=True
)
alphatims.utils.show_platform_info()
alphatims.utils.show_python_info()

2021-03-23 13:28:40> Platform information:
2021-03-23 13:28:40> system     - Darwin
2021-03-23 13:28:40> release    - 19.6.0
2021-03-23 13:28:40> version    - 10.15.7
2021-03-23 13:28:40> machine    - x86_64
2021-03-23 13:28:40> processor  - i386
2021-03-23 13:28:40> cpu count  - 8
2021-03-23 13:28:40> ram        - 26.2/32.0 Gb (available/total)
2021-03-23 13:28:40> 
2021-03-23 13:28:40> Python information:
2021-03-23 13:28:40> Sphinx           - 3.5.1
2021-03-23 13:28:40> alphatims        - 0.1.210323
2021-03-23 13:28:40> autodocsumm      - 0.2.2
2021-03-23 13:28:40> bokeh            - 2.2.3
2021-03-23 13:28:40> click            - 7.1.2
2021-03-23 13:28:40> datashader       - 0.11.1
2021-03-23 13:28:40> h5py             - 3.2.1
2021-03-23 13:28:40> holoviews        - 1.13.5
2021-03-23 13:28:40> holoviz          - 0.11.6
2021-03-23 13:28:40> ipykernel        - 5.5.0
2021-03-23 13:28:40> jupyter          - 1.0.0
2021-03-23 13:28:40> matplotlib       - 3.3.4
2021-03-23 13:28:40> numba   

## Samples

Five samples are used and compared:

In [2]:
file_names = {
    "DDA_6": "/Users/swillems/Data/alphatims_testing/20201016_tims03_Evo03_PS_MA_HeLa_200ng_DDA_06-15_5_6min_4cm_S1-A1_1_21717.d",
    "DIA_6": "/Users/swillems/Data/alphatims_testing/20201016_tims03_Evo03_PS_SA_HeLa_200ng_300-1200_2steps_16scans_06-15_5_6min_4cm_S1-A4_1_21720.d",
    "DDA_21": "/Users/swillems/Data/alphatims_testing/20201207_tims03_Evo03_PS_SA_HeLa_200ng_EvoSep_prot_DDA_21min_8cm_S1-C10_1_22476.d",
    "DIA_21": "/Users/swillems/Data/alphatims_testing/20201207_tims03_Evo03_PS_SA_HeLa_200ng_EvoSep_prot_high_speed_21min_8cm_S1-C8_1_22474.d",
    "DDA_120": "/Users/swillems/Data/alphatims_testing/HeLa_200ng_1428.d"
}

We first load all these files to show their basic statistics before we do the actual timing:

In [3]:
import logging
import alphatims.bruker

timstof_objects = {}
for sample_id, file_name in file_names.items():
    logging.info(f"Initial loading of {sample_id}")
    timstof_objects[sample_id] = alphatims.bruker.TimsTOF(file_name)
    logging.info("")

2021-03-23 13:28:40> 
2021-03-23 13:28:41> Initial loading of DDA_6
2021-03-23 13:28:41> Importing data from /Users/swillems/Data/alphatims_testing/20201016_tims03_Evo03_PS_MA_HeLa_200ng_DDA_06-15_5_6min_4cm_S1-A1_1_21717.d
2021-03-23 13:28:41> Reading frame metadata for /Users/swillems/Data/alphatims_testing/20201016_tims03_Evo03_PS_MA_HeLa_200ng_DDA_06-15_5_6min_4cm_S1-A1_1_21717.d
2021-03-23 13:28:41> Reading 2,978 frames with 214,172,697 detector strikes for /Users/swillems/Data/alphatims_testing/20201016_tims03_Evo03_PS_MA_HeLa_200ng_DDA_06-15_5_6min_4cm_S1-A1_1_21717.d


100%|██████████| 2978/2978 [00:01<00:00, 2501.77it/s]

2021-03-23 13:28:43> Indexing /Users/swillems/Data/alphatims_testing/20201016_tims03_Evo03_PS_MA_HeLa_200ng_DDA_06-15_5_6min_4cm_S1-A1_1_21717.d...





2021-03-23 13:28:43> Succesfully imported data from /Users/swillems/Data/alphatims_testing/20201016_tims03_Evo03_PS_MA_HeLa_200ng_DDA_06-15_5_6min_4cm_S1-A1_1_21717.d
2021-03-23 13:28:43> 
2021-03-23 13:28:43> Initial loading of DIA_6
2021-03-23 13:28:43> Importing data from /Users/swillems/Data/alphatims_testing/20201016_tims03_Evo03_PS_SA_HeLa_200ng_300-1200_2steps_16scans_06-15_5_6min_4cm_S1-A4_1_21720.d
2021-03-23 13:28:43> Reading frame metadata for /Users/swillems/Data/alphatims_testing/20201016_tims03_Evo03_PS_SA_HeLa_200ng_300-1200_2steps_16scans_06-15_5_6min_4cm_S1-A4_1_21720.d
2021-03-23 13:28:43> Reading 3,182 frames with 158,552,099 detector strikes for /Users/swillems/Data/alphatims_testing/20201016_tims03_Evo03_PS_SA_HeLa_200ng_300-1200_2steps_16scans_06-15_5_6min_4cm_S1-A4_1_21720.d


100%|██████████| 3182/3182 [00:00<00:00, 3397.25it/s]


2021-03-23 13:28:44> Indexing /Users/swillems/Data/alphatims_testing/20201016_tims03_Evo03_PS_SA_HeLa_200ng_300-1200_2steps_16scans_06-15_5_6min_4cm_S1-A4_1_21720.d...
2021-03-23 13:28:44> Succesfully imported data from /Users/swillems/Data/alphatims_testing/20201016_tims03_Evo03_PS_SA_HeLa_200ng_300-1200_2steps_16scans_06-15_5_6min_4cm_S1-A4_1_21720.d
2021-03-23 13:28:44> 
2021-03-23 13:28:44> Initial loading of DDA_21
2021-03-23 13:28:44> Importing data from /Users/swillems/Data/alphatims_testing/20201207_tims03_Evo03_PS_SA_HeLa_200ng_EvoSep_prot_DDA_21min_8cm_S1-C10_1_22476.d
2021-03-23 13:28:44> Reading frame metadata for /Users/swillems/Data/alphatims_testing/20201207_tims03_Evo03_PS_SA_HeLa_200ng_EvoSep_prot_DDA_21min_8cm_S1-C10_1_22476.d
2021-03-23 13:28:44> Reading 11,886 frames with 295,251,252 detector strikes for /Users/swillems/Data/alphatims_testing/20201207_tims03_Evo03_PS_SA_HeLa_200ng_EvoSep_prot_DDA_21min_8cm_S1-C10_1_22476.d


100%|██████████| 11886/11886 [00:02<00:00, 5499.04it/s]


2021-03-23 13:28:47> Indexing /Users/swillems/Data/alphatims_testing/20201207_tims03_Evo03_PS_SA_HeLa_200ng_EvoSep_prot_DDA_21min_8cm_S1-C10_1_22476.d...
2021-03-23 13:28:47> Succesfully imported data from /Users/swillems/Data/alphatims_testing/20201207_tims03_Evo03_PS_SA_HeLa_200ng_EvoSep_prot_DDA_21min_8cm_S1-C10_1_22476.d
2021-03-23 13:28:47> 
2021-03-23 13:28:47> Initial loading of DIA_21
2021-03-23 13:28:47> Importing data from /Users/swillems/Data/alphatims_testing/20201207_tims03_Evo03_PS_SA_HeLa_200ng_EvoSep_prot_high_speed_21min_8cm_S1-C8_1_22474.d
2021-03-23 13:28:47> Reading frame metadata for /Users/swillems/Data/alphatims_testing/20201207_tims03_Evo03_PS_SA_HeLa_200ng_EvoSep_prot_high_speed_21min_8cm_S1-C8_1_22474.d
2021-03-23 13:28:47> Reading 11,868 frames with 730,564,765 detector strikes for /Users/swillems/Data/alphatims_testing/20201207_tims03_Evo03_PS_SA_HeLa_200ng_EvoSep_prot_high_speed_21min_8cm_S1-C8_1_22474.d


100%|██████████| 11868/11868 [00:04<00:00, 2901.68it/s]

2021-03-23 13:28:51> Indexing /Users/swillems/Data/alphatims_testing/20201207_tims03_Evo03_PS_SA_HeLa_200ng_EvoSep_prot_high_speed_21min_8cm_S1-C8_1_22474.d...





2021-03-23 13:28:52> Succesfully imported data from /Users/swillems/Data/alphatims_testing/20201207_tims03_Evo03_PS_SA_HeLa_200ng_EvoSep_prot_high_speed_21min_8cm_S1-C8_1_22474.d
2021-03-23 13:28:52> 
2021-03-23 13:28:52> Initial loading of DDA_120
2021-03-23 13:28:52> Importing data from /Users/swillems/Data/alphatims_testing/HeLa_200ng_1428.d
2021-03-23 13:28:52> Reading frame metadata for /Users/swillems/Data/alphatims_testing/HeLa_200ng_1428.d
2021-03-23 13:28:58> Reading 68,114 frames with 2,074,019,899 detector strikes for /Users/swillems/Data/alphatims_testing/HeLa_200ng_1428.d


100%|██████████| 68114/68114 [00:15<00:00, 4453.24it/s]


2021-03-23 13:29:14> Indexing /Users/swillems/Data/alphatims_testing/HeLa_200ng_1428.d...
2021-03-23 13:29:17> Succesfully imported data from /Users/swillems/Data/alphatims_testing/HeLa_200ng_1428.d
2021-03-23 13:29:17> 


In summary, we thus consider the following samples:

| type | gradient | datapoints    | sample_id |
|------|----------|---------------|-----------|
| DDA  | 6 min    | 214,172,697   | DDA_6     |
| DIA  | 6 min    | 158,552,099   | DIA_6     |
| DDA  | 21 min   | 295,251,252   | DDA_21    |
| DIA  | 21 min   | 730,564,765   | DIA_21    |
| DDA  | 120 min  | 2,074,019,899 | DDA_120   |

## Reading raw Bruker .d folders

To avoid unwanted system inferences, we perform a `timeit` function to get a robust estimate of loading times for raw Bruker .d folders:

In [4]:
alphatims.utils.set_logger(stream=False)
for sample_id, file_name in file_names.items():
    print(f"Time to load {sample_id} raw Bruker .d folder:")
    %timeit tmp = alphatims.bruker.TimsTOF(file_name)
    print("")

Time to load DDA_6 raw Bruker .d folder:
1.48 s ± 25.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to load DIA_6 raw Bruker .d folder:
1.06 s ± 7.01 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to load DDA_21 raw Bruker .d folder:
3.07 s ± 67.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to load DIA_21 raw Bruker .d folder:
5.24 s ± 281 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to load DDA_120 raw Bruker .d folder:
23.2 s ± 3.07 s per loop (mean ± std. dev. of 7 runs, 1 loop each)



## Saving HDF files

Each of the data files can also be exported to HDF files:

In [5]:
for sample_id, data in timstof_objects.items():
    print(f"Time to export {sample_id} to HDF file:")
    path = data.directory
    file_name = f"{data.sample_name}.hdf"
    %timeit tmp = data.save_as_hdf(path, file_name, overwrite=True)
    print("")

Time to export DDA_6 to HDF file:


  with h5py.File(full_file_name, hdf_mode, swmr=True) as hdf_root:


569 ms ± 41.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to export DIA_6 to HDF file:
409 ms ± 36.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to export DDA_21 to HDF file:
779 ms ± 50.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to export DIA_21 to HDF file:
1.79 s ± 73.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to export DDA_120 to HDF file:
4.94 s ± 108 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



## Reading HDF files

Once these HDF files are created, they can be loaded much faster than raw Bruker .d folders:

In [6]:
import os
for sample_id, data in timstof_objects.items():
    print(f"Time to load {sample_id} HDF file:")
    file_name = os.path.join(data.directory, f"{data.sample_name}.hdf")
    %timeit tmp = alphatims.bruker.TimsTOF(file_name)
    print("")

Time to load DDA_6 HDF file:
445 ms ± 5.27 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to load DIA_6 HDF file:
295 ms ± 1.74 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to load DDA_21 HDF file:
755 ms ± 40 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to load DIA_21 HDF file:
1.9 s ± 164 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to load DDA_120 HDF file:
10.1 s ± 133 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



## Slicing data

Lastly, we can slice this data. Since this uses Numba JIT compilation, we first compile the relevant functions with an initial slice call:

In [7]:
tmp = timstof_objects["DDA_6"][0]

Now we can time how long it takes to slice in different dimensions:
   * **LC:** $100.0 \leq \textrm{retention_time} \lt 100.5$
   * **TIMS:** $\textrm{scan_index} = 450$
   * **Quadrupole:** $700.0 \leq \textrm{quad_mz_values} \lt 710.0$
   * **TOF:** $621.9 \leq \textrm{tof_mz_values} \lt 622.1$

In [8]:
import os
for sample_id, data in timstof_objects.items():
    print(f"Time to slice {sample_id}:")
    
    print("Testing slice data[100.0: 100.5]")
    %timeit tmp = data[100.:100.5]
    
    print("Testing slice data[:, 450]")
    %timeit tmp = data[:, 450]
    
    print("Testing slice data[:, :, 700.0: 710]")
    %timeit tmp = data[:, :, 700.0: 710]
    
    print("Testing slice data[:, :, :, 621.9: 622.1]")
    %timeit tmp = data[:, :, :, 621.9: 622.1]
    
    print("")

Time to slice DDA_6:
Testing slice data[100.0: 100.5]
1.85 ms ± 67.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Testing slice data[:, 450]
48 ms ± 12 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Testing slice data[:, :, 700.0: 710]
27.1 ms ± 3.69 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Testing slice data[:, :, :, 621.9: 622.1]
89.7 ms ± 1.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to slice DIA_6:
Testing slice data[100.0: 100.5]
7.35 ms ± 803 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Testing slice data[:, 450]
24.1 ms ± 733 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Testing slice data[:, :, 700.0: 710]
649 ms ± 17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Testing slice data[:, :, :, 621.9: 622.1]
110 ms ± 3.98 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Time to slice DDA_21:
Testing slice data[100.0: 100.5]
2.17 ms ± 267 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
T