# Loading Spectrum Data

In [2]:
%load_ext autoreload
%autoreload 2

In [2]:
# Please run this before executing any cell
import os
os.chdir("../../tests/test_data/") #### Insert path to data, this is the path to the tutorial data. 

## Initiating a Spectrum Data Loader

Most [Spectrum Loaders](../generated/massseer.loaders.GenericSpectrumLoader.rst) require the following inputs. 


1. **dataFiles** - a list of raw data files 
2. **rsltsFile** - a `.osw` or DIA-NN `.tsv` file containing the features
3. **libraryFile** - a `.tsv`/`.osw`/`.pqp` file contaning the library (m/z and annotations of all transitions)

In [3]:
from massseer.loaders import MzMLDataLoader
loader = MzMLDataLoader(dataFiles="ionMobilityTest2/ionMobilityTest2.mzML",
                        rsltsFile="ionMobilityTest2/ionMobilityTest2.osw",
                        rsltsFileType="OpenSWATH")

[2024-01-03 11:54:13,522] MzMLDataAccess - INFO - Opening ionMobilityTest2/ionMobilityTest2.mzML file...: Elapsed 0.09443140029907227 ms
[2024-01-03 11:54:13,523] MzMLDataAccess - INFO - There are 125 spectra and 0 chromatograms.
[2024-01-03 11:54:13,524] MzMLDataAccess - INFO - There are 25 MS1 spectra and 100 MS2 spectra.


This same `.MzML` file can also be initiated with DIA-NN output as shown. Note, since DIA-NN results files do not always have transition level information, a library file must also be provided.

In [4]:
from massseer.loaders import MzMLDataLoader
loader_diann = MzMLDataLoader(dataFiles="ionMobilityTest2/ionMobilityTest2.mzML",
                        rsltsFile="ionMobilityTest2/ionMobilityTest2-diannReport.tsv",
                        libraryFile="ionMobilityTest2/ionMobilityTest2Library.tsv",
                        rsltsFileType='DIA-NN')

[2024-01-03 11:54:13,611] MzMLDataAccess - INFO - Opening ionMobilityTest2/ionMobilityTest2.mzML file...: Elapsed 0.0450291633605957 ms
[2024-01-03 11:54:13,611] MzMLDataAccess - INFO - There are 125 spectra and 0 chromatograms.
[2024-01-03 11:54:13,612] MzMLDataAccess - INFO - There are 25 MS1 spectra and 100 MS2 spectra.


<div class="alert alert-info">

Note

If a `.osw` file is provided as a rslts file and no library file is provided, MassSeer will assume the `.osw` file should also be used as the library. 

</div>

## Loading a Transition Group

In this example we will visualize the peptide *NKESPT(UniMod:21)KAIVR(UniMod:267)* with a charge state of *3*

In [5]:
from massseer.structs.TargetedDIAConfig import TargetedDIAConfig
extraction_config = TargetedDIAConfig()
extraction_config.im_window = 0.2
extraction_config.rt_window = 50
extraction_config.mz_tol = 20

In [6]:
transitionGroup = loader.loadTransitionGroups("AFVDFLSDEIK", 2, extraction_config)
transitionGroup

{'ionMobilityTest2/ionMobilityTest2.mzML': <massseer.structs.TransitionGroup.TransitionGroup at 0x7fdd4403a670>}

### Loading Chromatogram Data as a Pandas DataFrame

In [7]:
transitionGroupDf = loader.loadTransitionGroupsDf("AFVDFLSDEIK", 2, extraction_config )
transitionGroupDf

Unnamed: 0,filename,Annotation,rt,int
0,ionMobilityTest2/ionMobilityTest2.mzML,prec,6225.005106,229.011734
1,ionMobilityTest2/ionMobilityTest2.mzML,prec,6226.792950,26.001631
2,ionMobilityTest2/ionMobilityTest2.mzML,prec,6228.580932,57.999416
3,ionMobilityTest2/ionMobilityTest2.mzML,prec,6230.367189,826.008179
4,ionMobilityTest2/ionMobilityTest2.mzML,prec,6232.156436,1589.015259
...,...,...,...,...
535,ionMobilityTest2/ionMobilityTest2.mzML,y9^1,6265.023624,70.998932
536,ionMobilityTest2/ionMobilityTest2.mzML,y9^1,6266.515136,228.005463
537,ionMobilityTest2/ionMobilityTest2.mzML,y9^1,6266.616533,436.012451
538,ionMobilityTest2/ionMobilityTest2.mzML,y9^1,6266.721905,361.008942


This dataframe has all of the intensities and retention times for all of the files across all transitions. Transitions can be diffrentiated by the *annotation* column and the *filename* column diffrentiates the file/run in which the chromatograms originate from. If ion mobility was present in the original file, intensities are summed across all values of ion mobility.

<div class="alert alert-info">

Note

If a pandas dataframe is required it is recomended to use the `FeatureMap` object directly as described below.   

</div>

## Loading a Feature Map

In [9]:
featureMap = loader.loadFeatureMaps("AFVDFLSDEIK", 2, extraction_config)
featureMap

{'ionMobilityTest2/ionMobilityTest2.mzML': <massseer.structs.FeatureMap.FeatureMap at 0x7fdc849b8ee0>}

In [10]:
featureMap['ionMobilityTest2/ionMobilityTest2.mzML'].feature_df

Unnamed: 0,native_id,ms_level,precursor_mz,mz,rt,im,int,Annotation,product_mz
0,,1,642.3295,642.334187,6225.005106,0.900254,76.000458,prec,642.3295
1,,1,642.3295,642.334187,6225.005106,0.969271,153.011276,prec,642.3295
2,,2,642.3295,504.262011,6225.110817,0.935281,68.001518,y4^1,504.2664
3,,2,642.3295,504.262011,6225.110817,1.025902,41.000328,y4^1,504.2664
4,,2,642.3295,591.298142,6225.110817,0.887884,92.002846,y5^1,591.2984
...,...,...,...,...,...,...,...,...,...
6812,,2,642.3295,966.486248,6266.827558,0.982659,141.999191,y8^1,966.4779
6813,,2,642.3295,966.486248,6266.827558,0.988840,68.997574,y8^1,966.4779
6814,,2,642.3295,1065.535905,6266.827558,0.918785,72.001709,y9^1,1065.5463
6815,,2,642.3295,1065.535905,6266.827558,0.979575,51.001892,y9^1,1065.5463


In [11]:
featureMap['ionMobilityTest2/ionMobilityTest2.mzML'].config

<massseer.structs.TargetedDIAConfig.TargetedDIAConfig at 0x7fdc87f4e2b0>

## Converting a FeatureMap to 1D data

A `FeatureMap` by itself can be difficult to visualize due to its high dimensionality. Thus, massSeer has built in methods to convert a `featureMap` into a `chromatogram` (retention time vs intensity), `spectrum` (m/z vs intensity` or, if ion mobility is present a `mobilogram` (intensity vs ion mobility)

To accomplish this we can use the `to_chromatogram`, `to_spectra` or `to_mobilogram` methods respectively

In [17]:
chromatograms = featureMap['ionMobilityTest2/ionMobilityTest2.mzML'].to_chromatograms()
chromatograms

<massseer.structs.TransitionGroup.TransitionGroup at 0x7fdc849304c0>

In [18]:
mobilograms = featureMap['ionMobilityTest2/ionMobilityTest2.mzML'].to_mobilograms()
mobilograms

<massseer.structs.TransitionGroup.TransitionGroup at 0x7fdc84e21490>

In [19]:
spectra = featureMap['ionMobilityTest2/ionMobilityTest2.mzML'].to_spectra()
spectra

<massseer.structs.TransitionGroup.TransitionGroup at 0x7fdc84a45a60>

<div class="alert alert-info">

Note

When converting a `FeatureMap` a `TransitionGroup` is always returned however the underlying data type is different based on the conversion method used.   

</div>

## Converting a FeatureMap DataFrame to 1D data

If the FeatureMap was fetched as a dataframe, the data can be converted to 1D data using the built in pandas groupby functions. 

In [29]:
df = featureMap['ionMobilityTest2/ionMobilityTest2.mzML'].feature_df
chromatograms = (df
                 .drop(columns=["im", "mz"]) # drop columns want to sum across
                 .groupby(['Annotation', 'ms_level', 'native_id', 'product_mz', 'precursor_mz', 'rt']) # groupby retention time and labels
                 .sum()
                 .reset_index())
chromatograms

Unnamed: 0,Annotation,ms_level,native_id,rt,product_mz,precursor_mz,int
0,prec,1,,6225.005106,642.3295,642.3295,229.011734
1,prec,1,,6226.792950,642.3295,642.3295,26.001631
2,prec,1,,6228.580932,642.3295,642.3295,57.999416
3,prec,1,,6230.367189,642.3295,642.3295,826.008179
4,prec,1,,6232.156436,642.3295,642.3295,1589.015259
...,...,...,...,...,...,...,...
535,y9^1,2,,6265.023624,1065.5463,642.3295,70.998932
536,y9^1,2,,6266.515136,1065.5463,642.3295,228.005463
537,y9^1,2,,6266.616533,1065.5463,642.3295,436.012451
538,y9^1,2,,6266.721905,1065.5463,642.3295,361.008942


In [31]:
df = featureMap['ionMobilityTest2/ionMobilityTest2.mzML'].feature_df
spectra = (df
           .drop(columns=["rt", "im"]) # drop columns want to sum across
           .groupby(['Annotation', 'ms_level', 'native_id', 'product_mz', 'precursor_mz', 'mz']) # groupby retention time and labels
           .sum()
           .reset_index())
spectra

Unnamed: 0,Annotation,ms_level,native_id,product_mz,precursor_mz,mz,int
0,prec,1,,642.3295,642.3295,642.326260,476.006775
1,prec,1,,642.3295,642.3295,642.326260,806.987183
2,prec,1,,642.3295,642.3295,642.326260,399.992432
3,prec,1,,642.3295,642.3295,642.326260,26.001631
4,prec,1,,642.3295,642.3295,642.326261,73.003876
...,...,...,...,...,...,...,...
1518,y9^1,2,,1065.5463,642.3295,1065.556333,115.001068
1519,y9^1,2,,1065.5463,642.3295,1065.556333,186.001022
1520,y9^1,2,,1065.5463,642.3295,1065.556334,136.002472
1521,y9^1,2,,1065.5463,642.3295,1065.556334,269.992981


In [34]:
df = featureMap['ionMobilityTest2/ionMobilityTest2.mzML'].feature_df
mobilograms = (df
               .drop(columns=["rt", "mz"]) # drop columns want to sum across
               .groupby(['Annotation', 'ms_level', 'native_id', 'product_mz', 'precursor_mz', 'im']) # groupby retention time and labels
               .sum()
               .reset_index())
mobilograms

Unnamed: 0,Annotation,ms_level,native_id,product_mz,precursor_mz,im,int
0,prec,1,,642.3295,642.3295,0.878602,148.001190
1,prec,1,,642.3295,642.3295,0.883769,48.997837
2,prec,1,,642.3295,642.3295,0.884804,172.007217
3,prec,1,,642.3295,642.3295,0.885811,319.003937
4,prec,1,,642.3295,642.3295,0.887884,394.968414
...,...,...,...,...,...,...,...
793,y9^1,2,,1065.5463,642.3295,1.019729,90.003929
794,y9^1,2,,1065.5463,642.3295,1.020746,63.002026
795,y9^1,2,,1065.5463,642.3295,1.021795,218.995132
796,y9^1,2,,1065.5463,642.3295,1.023832,63.001839
