# Loading Spectrum Data

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
# Please run this before executing any cell
import os
os.chdir("../../test/test_data/") #### Insert path to data, this is the path to the tutorial data. 

## Initiating a Spectrum Data Loader

Most [Spectrum Loaders](../generated/massdash.loaders.GenericSpectrumLoader.rst) require the following inputs. 


1. **dataFiles** - a list of raw data files 
2. **rsltsFile** - a `.osw` or DIA-NN `.tsv` file containing the features
3. **libraryFile** - a `.tsv`/`.osw`/`.pqp` file contaning the library (m/z and annotations of all transitions)

In [3]:
from massdash.loaders import MzMLDataLoader
loader = MzMLDataLoader(dataFiles="mzml/ionMobilityTest.mzML",
                        rsltsFile="osw/ionMobilityTest.osw",
                        rsltsFileType="OpenSWATH")

[2024-01-12 10:47:32,489] MzMLDataAccess - INFO - Opening mzml/ionMobilityTest.mzML file...: Elapsed 0.08150815963745117 ms
[2024-01-12 10:47:32,490] MzMLDataAccess - INFO - There are 50 spectra and 0 chromatograms.
[2024-01-12 10:47:32,490] MzMLDataAccess - INFO - There are 25 MS1 spectra and 25 MS2 spectra.


This same `.MzML` file can also be initiated with DIA-NN output as shown. Note, since DIA-NN results files do not always have transition level information, a library file must also be provided.

In [4]:
from massdash.loaders import MzMLDataLoader
loader_diann = MzMLDataLoader(dataFiles="mzml/ionMobilityTest.mzML",
                        rsltsFile="diann/ionMobilityTest-diannReport.tsv",
                        libraryFile="library/ionMobilityTestLibrary.tsv",
                        rsltsFileType='DIA-NN')

[2024-01-12 10:47:32,556] MzMLDataAccess - INFO - Opening mzml/ionMobilityTest.mzML file...: Elapsed 0.038275718688964844 ms
[2024-01-12 10:47:32,557] MzMLDataAccess - INFO - There are 50 spectra and 0 chromatograms.
[2024-01-12 10:47:32,557] MzMLDataAccess - INFO - There are 25 MS1 spectra and 25 MS2 spectra.


<div class="alert alert-info">

Note

If a `.osw` file is provided as a rslts file and no library file is provided, MassDash will assume the `.osw` file should also be used as the library. 

</div>

## Loading a Transition Group

In this example we will visualize the peptide *NKESPT(UniMod:21)KAIVR(UniMod:267)* with a charge state of *3*

In [5]:
from massdash.structs.TargetedDIAConfig import TargetedDIAConfig
extraction_config = TargetedDIAConfig()
extraction_config.im_window = 0.2
extraction_config.rt_window = 50
extraction_config.mz_tol = 20

In [6]:
transitionGroup = loader.loadTransitionGroups("AFVDFLSDEIK", 2, extraction_config)
transitionGroup

{'mzml/ionMobilityTest.mzML': <massdash.structs.TransitionGroup.TransitionGroup at 0x7ff236a91e50>}

### Loading Chromatogram Data as a Pandas DataFrame

In [7]:
transitionGroupDf = loader.loadTransitionGroupsDf("AFVDFLSDEIK", 2, extraction_config )
transitionGroupDf

Unnamed: 0,filename,Annotation,rt,int
0,mzml/ionMobilityTest.mzML,prec,6225.005106,229.011734
1,mzml/ionMobilityTest.mzML,prec,6226.792950,26.001631
2,mzml/ionMobilityTest.mzML,prec,6228.580932,57.999416
3,mzml/ionMobilityTest.mzML,prec,6230.367189,826.008179
4,mzml/ionMobilityTest.mzML,prec,6232.156436,1589.015259
...,...,...,...,...
163,mzml/ionMobilityTest.mzML,y9^1,6259.292755,4355.988281
164,mzml/ionMobilityTest.mzML,y9^1,6261.101406,1168.029907
165,mzml/ionMobilityTest.mzML,y9^1,6262.909095,1286.014038
166,mzml/ionMobilityTest.mzML,y9^1,6264.711573,413.995209


This dataframe has all of the intensities and retention times for all of the files across all transitions. Transitions can be diffrentiated by the *annotation* column and the *filename* column diffrentiates the file/run in which the chromatograms originate from. If ion mobility was present in the original file, intensities are summed across all values of ion mobility.

<div class="alert alert-info">

Note

If a pandas dataframe is required it is recomended to use the `FeatureMap` object directly as described below.   

</div>

## Loading a Feature Map

In [8]:
featureMap = loader.loadFeatureMaps("AFVDFLSDEIK", 2, extraction_config)
featureMap

{'mzml/ionMobilityTest.mzML': <massdash.structs.FeatureMap.FeatureMap at 0x7ff236a59760>}

In [9]:
featureMap['mzml/ionMobilityTest.mzML'].feature_df

Unnamed: 0,native_id,ms_level,precursor_mz,mz,rt,im,int,Annotation,product_mz
0,,1,642.3295,642.334187,6225.005106,0.900254,76.000458,prec,642.3295
1,,1,642.3295,642.334187,6225.005106,0.969271,153.011276,prec,642.3295
2,,2,642.3295,504.262011,6225.110817,0.935281,68.001518,y4^1,504.2664
3,,2,642.3295,504.262011,6225.110817,1.025902,41.000328,y4^1,504.2664
4,,2,642.3295,504.262011,6225.110817,0.926001,43.000782,y4^1,504.2664
...,...,...,...,...,...,...,...,...,...
6812,,2,642.3295,1065.546118,6266.515136,0.975441,8.999968,y9^1,1065.5463
6813,,2,642.3295,1065.551224,6266.515136,0.986777,33.001766,y9^1,1065.5463
6814,,2,642.3295,1065.551224,6266.515136,0.923945,84.003464,y9^1,1065.5463
6815,,2,642.3295,1065.556331,6266.515136,0.910546,63.997871,y9^1,1065.5463


In [10]:
featureMap['mzml/ionMobilityTest.mzML'].config

<massdash.structs.TargetedDIAConfig.TargetedDIAConfig at 0x7ff236a91fd0>

## Converting a FeatureMap to 1D data

A `FeatureMap` by itself can be difficult to visualize due to its high dimensionality. Thus, massDash has built in methods to convert a `featureMap` into a `chromatogram` (retention time vs intensity), `spectrum` (m/z vs intensity` or, if ion mobility is present a `mobilogram` (intensity vs ion mobility)

To accomplish this we can use the `to_chromatogram`, `to_spectra` or `to_mobilogram` methods respectively

In [11]:
chromatograms = featureMap['mzml/ionMobilityTest.mzML'].to_chromatograms()
chromatograms

<massdash.structs.TransitionGroup.TransitionGroup at 0x7ff236a59160>

In [12]:
mobilograms = featureMap['mzml/ionMobilityTest.mzML'].to_mobilograms()
mobilograms

<massdash.structs.TransitionGroup.TransitionGroup at 0x7ff2368c8040>

In [13]:
spectra = featureMap['mzml/ionMobilityTest.mzML'].to_spectra()
spectra

<massdash.structs.TransitionGroup.TransitionGroup at 0x7ff287caf400>

<div class="alert alert-info">

Note

When converting a `FeatureMap` a `TransitionGroup` is always returned however the underlying data type is different based on the conversion method used.   

</div>

## Converting a FeatureMap DataFrame to 1D data

If the FeatureMap was fetched as a dataframe, the data can be converted to 1D data using the built in pandas groupby functions. 

In [14]:
df = featureMap['mzml/ionMobilityTest.mzML'].feature_df
chromatograms = (df
                 .drop(columns=["im", "mz"]) # drop columns want to sum across
                 .groupby(['Annotation', 'ms_level', 'native_id', 'product_mz', 'precursor_mz', 'rt']) # groupby retention time and labels
                 .sum()
                 .reset_index())
chromatograms

Unnamed: 0,Annotation,ms_level,native_id,product_mz,precursor_mz,rt,int
0,prec,1,,642.3295,642.3295,6225.005106,229.011734
1,prec,1,,642.3295,642.3295,6226.792950,26.001631
2,prec,1,,642.3295,642.3295,6228.580932,57.999416
3,prec,1,,642.3295,642.3295,6230.367189,826.008179
4,prec,1,,642.3295,642.3295,6232.156436,1589.015259
...,...,...,...,...,...,...,...
163,y9^1,2,,1065.5463,642.3295,6259.292755,4355.988281
164,y9^1,2,,1065.5463,642.3295,6261.101406,1168.029907
165,y9^1,2,,1065.5463,642.3295,6262.909095,1286.014038
166,y9^1,2,,1065.5463,642.3295,6264.711573,413.995209


In [15]:
df = featureMap['mzml/ionMobilityTest.mzML'].feature_df
spectra = (df
           .drop(columns=["rt", "im"]) # drop columns want to sum across
           .groupby(['Annotation', 'ms_level', 'native_id', 'product_mz', 'precursor_mz', 'mz']) # groupby retention time and labels
           .sum()
           .reset_index())
spectra

Unnamed: 0,Annotation,ms_level,native_id,product_mz,precursor_mz,mz,int
0,prec,1,,642.3295,642.3295,642.326260,476.006775
1,prec,1,,642.3295,642.3295,642.326260,806.987183
2,prec,1,,642.3295,642.3295,642.326260,399.992432
3,prec,1,,642.3295,642.3295,642.326260,26.001631
4,prec,1,,642.3295,642.3295,642.326261,73.003876
...,...,...,...,...,...,...,...
548,y9^1,2,,1065.5463,642.3295,1065.556333,220.996506
549,y9^1,2,,1065.5463,642.3295,1065.556333,103.002426
550,y9^1,2,,1065.5463,642.3295,1065.556333,138.000320
551,y9^1,2,,1065.5463,642.3295,1065.556333,264.998871


In [16]:
df = featureMap['mzml/ionMobilityTest.mzML'].feature_df
mobilograms = (df
               .drop(columns=["rt", "mz"]) # drop columns want to sum across
               .groupby(['Annotation', 'ms_level', 'native_id', 'product_mz', 'precursor_mz', 'im']) # groupby retention time and labels
               .sum()
               .reset_index())
mobilograms

Unnamed: 0,Annotation,ms_level,native_id,product_mz,precursor_mz,im,int
0,prec,1,,642.3295,642.3295,0.878602,148.001190
1,prec,1,,642.3295,642.3295,0.883769,48.997837
2,prec,1,,642.3295,642.3295,0.884804,172.007217
3,prec,1,,642.3295,642.3295,0.885811,319.003937
4,prec,1,,642.3295,642.3295,0.887884,394.968414
...,...,...,...,...,...,...,...
793,y9^1,2,,1065.5463,642.3295,1.019729,90.003929
794,y9^1,2,,1065.5463,642.3295,1.020746,63.004654
795,y9^1,2,,1065.5463,642.3295,1.021795,219.001572
796,y9^1,2,,1065.5463,642.3295,1.023832,62.997398
