# Loading Spectrum Data

In [1]:
%load_ext autoreload

In [2]:
%autoreload 2

Spectrum data loaders implement the same methods as Chromatogram Data Loaders as well as some additional methods since more information can be gathered from spectrum data loaders. Fetching raw data with spectrum loaders takes more time since data is extracted on the fly. Additionally [TargetedDIAExtractionParameters](<link>) must be specified to instruct how the peptide should be extracted. 

In [3]:
# Please run this before executing any cell
import os
os.chdir("../../tests/test_data/") #### Insert path to data, this is the path to the tutorial data. 

## Initiating a Spectrum Data Loader

Most Spectrum loaders require the following inputs. 
1. dataFiles - a list of raw data files 
2. rsltsFile - a `.osw` or DIA-NN `.tsv` file containing the features
3. libraryFile - a `.tsv`/`.osw`/`.pqp` file contaning the library (m/z and annotations of all transitions)

We can initiate a `MzMLDataLoader` object with follows. 

In [4]:
from massseer.loaders import MzMLDataLoader
loader = MzMLDataLoader(dataFiles="ionMobilityTest2/ionMobilityTest2.mzML",
                        rsltsFile="ionMobilityTest2/ionMobilityTest2.osw",
                        rsltsFileType="OpenSWATH",
                       verbose=1)

[2024-01-02 11:50:17,072] MzMLDataAccess - DEBUG - Creating OnDiscExperiment...: Elapsed 5.0067901611328125e-06 ms
[2024-01-02 11:50:17,162] MzMLDataAccess - INFO - Opening ionMobilityTest2/ionMobilityTest2.mzML file...: Elapsed 0.08959078788757324 ms
[2024-01-02 11:50:17,163] MzMLDataAccess - DEBUG - Extracting meta data...: Elapsed 5.4836273193359375e-06 ms
[2024-01-02 11:50:17,163] MzMLDataAccess - INFO - There are 125 spectra and 0 chromatograms.
[2024-01-02 11:50:17,164] MzMLDataAccess - INFO - There are 25 MS1 spectra and 100 MS2 spectra.


This same a MzML file can also be initiated with DIA-NN output as shown. Note, since DIA-NN results files do not always have transition level information, a library file must also be provided.

In [5]:
from massseer.loaders import MzMLDataLoader
loader_diann = MzMLDataLoader(dataFiles="ionMobilityTest2/ionMobilityTest2.mzML",
                        rsltsFile="ionMobilityTest2/ionMobilityTest2-diannReport.tsv",
                        libraryFile="ionMobilityTest2/ionMobilityTest2Library.tsv",
                        rsltsFileType='DIA-NN')

[2024-01-02 11:50:17,241] MzMLDataAccess - INFO - Opening ionMobilityTest2/ionMobilityTest2.mzML file...: Elapsed 0.04728055000305176 ms
[2024-01-02 11:50:17,242] MzMLDataAccess - INFO - There are 125 spectra and 0 chromatograms.
[2024-01-02 11:50:17,242] MzMLDataAccess - INFO - There are 25 MS1 spectra and 100 MS2 spectra.


For the purpose of this tutorial we will be using the OpenSwath results this approach will work with any properly initiated MzMLDataLoader.

<div class="alert alert-info">

Note

If a `.osw` file is provided as a rslts file and no library file is provided, MassSeer will assume the `.osw` file should also be used as the library. 

</div>

## Loading a Transition Group

To fetch the chromatograms for a particular transitionGroup, we can call the `loadTransitionGroups()` method. In addition to the modified peptide sequence and  charge state, this method also requires a [:class:`~massseer.structs.DIATargetedExtraction`](#DIATargetedExtraction) which specifies the extraction parameters and will load the transition groups across all runs. This method can take a while since it is fetching the data across all experiments from disk. 

In this example we will visualize the peptide *NKESPT(UniMod:21)KAIVR(UniMod:267)* with a charge state of *3*

First, we can create a [:class:`~massseer.structs.DIATargetedExtraction`](#DIATargetedExtraction) object. 

In [6]:
from massseer.structs.TargetedDIAConfig import TargetedDIAConfig
extraciton_config = TargetedDIAConfig()
extraciton_config.im_window = 0.2
extraciton_config.rt_window = 50
extraciton_config.mz_tol = 20

In [7]:
transitionGroup = loader.loadTransitionGroups("AFVDFLSDEIK", 2, extraciton_config)
transitionGroup

{'ionMobilityTest2/ionMobilityTest2.mzML': <massseer.structs.TransitionGroup.TransitionGroup at 0x7f6ffc75abe0>}

Like the ChromatogramLoaders, a dictionary is returned where the file keys are the SQMass connectors and the values are a `TransitionGroup` object. The `TransitionGroup` object holds a series of chromatograms belonging to the same precursor. This `TransitionGroup` object can be used to plot for plotting as shown [here]

### Loading Chromatogram Data as a Pandas DataFrame (Not Recommended)

Like in the chromatogram loaders, data can be loaded into a pandas dataframe using the `loadTransitionGroupDf()`.

In [None]:
transitionGroupDf = loader.loadTransitionGroupsDf("AFVDFLSDEIK", 2, extraciton_config )

This dataframe has all of the intensities and retention times for all of the files across all transitions. Transitions can be diffrentiated by the *annotation* column and the *filename* column diffrentiates the file/run in which the chromatograms originate from. 

<div class="alert alert-info">

Note

If a pandas dataframe is required it is recomended to use the `FeatureMap` object directly as described below.   

</div>

Alternatively for analysis directly in python the loaders can return a pandas dataframe object contianing all of the points for this peptide by using the `loadTransitionGroupsDf()` method instead.

## Loading a Feature Map

The primary datatype that is fetched from a [:class:~massseer.structs.FeatureMap](#FeatureMap) which is contains a pandas dataframe of the extracted chromatogram across all precursors and transitions. Thus, under the hood, the :function:[~massseer.structs.loadTransitionGroups()] method is fetching a :class:[~massseer.structs.FeatureMap] and converting it into a :class:[~massseer.structs.TransitionGroup]. Due to this conversion step, if a pandas dataframe is required, it is generally faster to work with the :class:[~massseer.structs.FeatureMap] directly. 

The FeatureMap object can be loaded using the `loadFeatureMap()` method as demonstrated below. 

In [15]:
featureMap = loader.loadFeatureMaps("AFVDFLSDEIK", 2, extraciton_config)
featureMap

{'ionMobilityTest2/ionMobilityTest2.mzML': <massseer.structs.FeatureMap.FeatureMap at 0x7f6ff8df7a30>}

Like the `loadTransitionGroupFeature()` method this method returns a dictionary where the keys are the run names. However here the values are feature map objects.

The `FeatureMap` object has two important properties:
1. `.feature_df` property which returns the dataframe
2. `.config` property which returns the `TargetedDIAConfig()` that was used to generate this `FeatureMap`

In [16]:
featureMap['ionMobilityTest2/ionMobilityTest2.mzML'].feature_df

Unnamed: 0,native_id,ms_level,precursor_mz,mz,rt,im,int,Annotation,product_mz
0,,1,642.3295,642.334187,6225.005106,0.900254,76.000458,prec,642.3295
1,,1,642.3295,642.334187,6225.005106,0.969271,153.011276,prec,642.3295
2,,2,642.3295,504.262011,6225.110817,0.935281,68.001518,y4^1,504.2664
3,,2,642.3295,504.262011,6225.110817,1.025902,41.000328,y4^1,504.2664
4,,2,642.3295,591.298142,6225.110817,0.887884,92.002846,y5^1,591.2984
...,...,...,...,...,...,...,...,...,...
6812,,2,642.3295,966.486248,6266.827558,0.982659,141.999191,y8^1,966.4779
6813,,2,642.3295,966.486248,6266.827558,0.988840,68.997574,y8^1,966.4779
6814,,2,642.3295,1065.535905,6266.827558,0.918785,72.001709,y9^1,1065.5463
6815,,2,642.3295,1065.535905,6266.827558,0.979575,51.001892,y9^1,1065.5463


In [17]:
featureMap['ionMobilityTest2/ionMobilityTest2.mzML'].config

<massseer.structs.TargetedDIAConfig.TargetedDIAConfig at 0x7f6ffc70be20>