# Built-in Raw data readers

AlphaRaw supports directly access Thermo's Raw data and Sciex's Wiff data by using PythonNet. PythonNet requires mono to be installed if the os is MacOS or Linux. See installation section of alpharaw (https://github.com/mannlabs/alpharaw). 

## Thermo Raw

`alpharaw.thermo.ThermoRawData` contains all functionalities to load the Themro's Raw data. To enable fast data loading, alpharaw enables multiprocessing when `process_count` > 1. This reader can load different kinds of spectrum information into columns of `spectrum_df`. By default, the columns are:

- `spec_idx`: the index of a spectrum in the raw file, it starts from zero. Its value is the `scan number - 1`.
- `peak_start_idx`: the start row index of peaks in `peak_df` (see `mzml_reader.peak_df` below) for the spectrum.
- `peak_stop_idx`: the stop row index of peaks in `peak_df` (see `mzml_reader.peak_df` below) for the spectrum.
- `rt`: retention time in minutes. We will use `rt_sec` for retention time in seconds in alphaX ecosystem.
- `precursor_mz`: the precursor m/z of the given MS2 scans. For an MS1 scan, the value is always -1. For DIA MS2, the default value will be the isolation center of the MS2. And for DDA MS2, `precursor_mz` may refer to the mono-isotope m/z of the precursor when `precursor_charge` is not 0, otherwise isolation center.
- `precursor_charge`: For DIA, this value is always 0. For DDA, it can be nonzero when the mono-isotope m/z is determined.
- `isolation_lower_mz`: the lower (or left) m/z boundary of the isolation window.
- `isolation_upper_mz`: the upper (or right) m/z boundary of the isolation window.
- `ms_level`: MS1, MS2, ... it starts from one.
- `nce`: normalized collision energy designed by Thermo.

There are also some optional spectrum columns (auxiliary_item) that can be loaded into the `spectrum_df`:

- `injection_time`: `Ion Injection Time (ms)` in the scan header.
- `cv`: source fragmentation CV???
- `max_ion_time`: `Max. Ion Time (ms)` in the scan header.
- `agc_target`: `AGC target` in the scan header.
- `energy_ev`: `HCD Energy V` in the scan header. This is the real EV of the collision energy.
- `injection_optics_settling_time`: `Injection Optics Settling Time (ms)` in the scan header.
- `funnel_rf_level`: `Funnel RF Level` in the scan header.
- `faims_cv`: `FAIMS CV` in the scan header.
- `activation`: activation type, for example, HCD, CID, ETD, ...
- `analyzer`: analyzer type, for example FTMS, Astral, ITMS, ...
- `activation_id`: Thermo's built-in IDs of `activation` types.
- `analyzer_id`: Thermo's built-in IDs of `analyzer` types.

In [1]:
from alpharaw.thermo import ThermoRawData

raw_data = ThermoRawData(
    process_count=1,
    auxiliary_items=[
        "injection_time", "cv",
        "max_ion_time", "agc_target", "energy_ev",
        "injection_optics_settling_time", 
        "funnel_rf_level", "faims_cv",
        "activation", "analyzer",
        "activation_id", "analyzer_id",
        # "multinotch",
    ]
)
raw_data.import_raw("../../nbs_tests/test_data/iRT.raw")
raw_data.spectrum_df

Unnamed: 0,spec_idx,peak_start_idx,peak_stop_idx,rt,precursor_mz,precursor_charge,isolation_lower_mz,isolation_upper_mz,ms_level,nce,...,max_ion_time,agc_target,energy_ev,injection_optics_settling_time,funnel_rf_level,faims_cv,activation,analyzer,activation_id,analyzer_id
0,0,0,254,0.002983,-1.000000,0,-1.000000,-1.000000,1,0.0,...,25.0,3000000,0.000000,0.0,40.0,0.0,MS1,FTMS,255,4
1,1,254,665,0.006392,-1.000000,0,-1.000000,-1.000000,1,0.0,...,25.0,3000000,0.000000,0.0,40.0,0.0,MS1,FTMS,255,4
2,2,665,1131,0.009808,-1.000000,0,-1.000000,-1.000000,1,0.0,...,25.0,3000000,0.000000,0.0,40.0,0.0,MS1,FTMS,255,4
3,3,1131,1663,0.013224,-1.000000,0,-1.000000,-1.000000,1,0.0,...,25.0,3000000,0.000000,0.0,40.0,0.0,MS1,FTMS,255,4
4,4,1663,2169,0.016641,-1.000000,0,-1.000000,-1.000000,1,0.0,...,25.0,3000000,0.000000,0.0,40.0,0.0,MS1,FTMS,255,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3932,3932,1100271,1101512,5.994985,-1.000000,0,-1.000000,-1.000000,1,0.0,...,25.0,3000000,0.000000,0.0,40.0,0.0,MS1,FTMS,255,4
3933,3933,1101512,1101528,5.997334,362.537140,0,361.837140,363.237140,2,30.0,...,28.0,100000,16.500000,0.0,40.0,0.0,HCD,FTMS,5,4
3934,3934,1101528,1102758,5.998843,-1.000000,0,-1.000000,-1.000000,1,0.0,...,25.0,3000000,0.000000,0.0,40.0,0.0,MS1,FTMS,255,4
3935,3935,1102758,1102771,6.001193,425.326569,0,424.626569,426.026569,2,30.0,...,28.0,100000,18.690001,0.0,40.0,0.0,HCD,FTMS,5,4


## Sciex Wiff

AlphaRaw can access basic scan (spectrum) information of Sciex Wiff data. And the peaks are usually not centroided.

In [2]:
from alpharaw.sciex import SciexWiffData

wiff_data = SciexWiffData()
wiff_data.import_raw(
    "../../nbs_tests/test_data/02112022_Zeno1_TiHe_DIAMA_HeLa_200ng_EVO5_01.wiff"
)
wiff_data.spectrum_df

Unnamed: 0,spec_idx,peak_start_idx,peak_stop_idx,rt,ms_level,precursor_mz,precursor_charge,isolation_lower_mz,isolation_upper_mz,nce
0,0,0,100,0.000417,1,-1.00,0,-1.0,-1.0,0.0
1,1,100,447,0.001133,2,403.55,0,399.5,407.6,19.0
2,2,447,924,0.001383,2,411.25,0,406.6,415.9,20.0
3,3,924,1286,0.001650,2,419.25,0,414.9,423.6,20.0
4,4,1286,1943,0.001900,2,426.95,0,422.6,431.3,20.0
...,...,...,...,...,...,...,...,...,...,...
42232,42232,73839218,73841218,11.627550,2,715.15,0,711.1,719.2,34.0
42233,42233,73841218,73843218,11.627817,2,722.30,0,718.2,726.4,35.0
42234,42234,73843218,73845218,11.628067,2,729.70,0,725.4,734.0,35.0
42235,42235,73845218,73847218,11.628317,2,737.35,0,733.0,741.7,35.0


## mzML

mzML is partially supported, the basic spectrum information is extracted.

In [3]:
from alpharaw.mzml import MzMLReader

mzml_reader = MzMLReader()
mzml_reader.load_raw("../../nbs_tests/test_data/small.pwiz.1.1.mzML")
mzml_reader.spectrum_df

Unnamed: 0,spec_idx,peak_start_idx,peak_stop_idx,rt,precursor_mz,precursor_charge,isolation_lower_mz,isolation_upper_mz,ms_level
0,0,0,10739,0.004935,-1.0,0,-1.0,-1.0,1
1,1,10739,25554,0.007897,-1.0,0,-1.0,-1.0,1
2,2,25554,26039,0.011218,810.79,0,810.29,811.29,2
3,3,26039,27045,0.022838,837.34,0,836.84,837.84,2
4,4,27045,27882,0.034925,725.36,0,724.86,725.86,2
5,5,27882,28532,0.04862,558.87,0,558.37,559.37,2
6,6,28532,29294,0.061923,812.33,0,811.83,812.83,2
7,7,29294,37374,0.075015,-1.0,0,-1.0,-1.0,1
8,8,37374,54285,0.077788,-1.0,0,-1.0,-1.0,1
9,9,54285,54837,0.081203,810.75,0,810.25,811.25,2
