# Loading Chromatogram Data

In [1]:
# Please run this before executing any cell
import os
os.chdir("../../tests/test_data/") #### Insert path to data, this is the path to the tutorial data. 

## Initiating a Chromatogram Loader

We can initiate a SqMassLoader object with multiple sqMass files as follows. 

In [2]:
from massseer.loaders import SqMassLoader
loader = SqMassLoader(dataFiles=["xics/test_chrom_1.sqMass", "xics/test_chrom_2.sqMass"],
                      rsltsFile="osw/test_data.osw")

<div class="alert alert-info">

Note

The provided `.osw` file must contain information for all runs.

</div>

Priting the loader objects shows the file paths for all of the linked files

In [3]:
loader

SqMassLoader(rsltsFile=osw/test_data.osw, dataFiles=['xics/test_chrom_1.sqMass', 'xics/test_chrom_2.sqMass']

## Loading a Transition Group

In this example we will visualize the peptide *NKESPT(UniMod:21)KAIVR(UniMod:267)* with a charge state of *3*

In [4]:
loader.loadTransitionGroups("NKESPT(UniMod:21)KAIVR(UniMod:267)", 3)

{SqMassDataAccess(filename=xics/test_chrom_1.sqMass): <massseer.structs.TransitionGroup.TransitionGroup at 0x7f98729db3d0>,
 SqMassDataAccess(filename=xics/test_chrom_2.sqMass): <massseer.structs.TransitionGroup.TransitionGroup at 0x7f98729db400>}

### Loading Chromatogram Data as a Pandas DataFrame

In [5]:
transitionGroupDf = loader.loadTransitionGroupsDf("NKESPT(UniMod:21)KAIVR(UniMod:267)", 3)
transitionGroupDf

Unnamed: 0,filename,rt,intensity,annotation
0,xics/test_chrom_1.sqMass,512.8,1069.051908,2274_Precursor_i0
1,xics/test_chrom_1.sqMass,516.4,2230.982597,2274_Precursor_i0
2,xics/test_chrom_1.sqMass,520.0,2583.056921,2274_Precursor_i0
3,xics/test_chrom_1.sqMass,523.7,1876.955276,2274_Precursor_i0
4,xics/test_chrom_1.sqMass,527.3,1862.126603,2274_Precursor_i0
...,...,...,...,...
2697,xics/test_chrom_2.sqMass,1251.0,0.000000,b4^1
2698,xics/test_chrom_2.sqMass,1254.7,42.001872,b4^1
2699,xics/test_chrom_2.sqMass,1258.3,20.999608,b4^1
2700,xics/test_chrom_2.sqMass,1261.9,20.999608,b4^1


This dataframe has all of the intensities and retention times for all of the files across all transitions. Transitions can be diffrentiated by the *annotation* column and the *filename* column diffrentiates the file/run in which the chromatograms originate from. 

For example to get the total intensity across the intensities we can use the pandas `groupby()` functions 

In [6]:
transitionGroupDf[['intensity', 'filename', 'annotation']].groupby(['filename', 'annotation']).sum()

Unnamed: 0_level_0,Unnamed: 1_level_0,intensity
filename,annotation,Unnamed: 2_level_1
xics/test_chrom_1.sqMass,2274_Precursor_i0,2139805.0
xics/test_chrom_1.sqMass,b4^1,30006.97
xics/test_chrom_1.sqMass,y1^1,130078.0
xics/test_chrom_1.sqMass,y2^1,28374.81
xics/test_chrom_1.sqMass,y3^1,387906.2
xics/test_chrom_1.sqMass,y4^1,129531.2
xics/test_chrom_1.sqMass,y5^1,57073.77
xics/test_chrom_2.sqMass,2274_Precursor_i0,593173.6
xics/test_chrom_2.sqMass,b4^1,7226.959
xics/test_chrom_2.sqMass,y1^1,11375.97
