# **SqMass Chromatogram Fetching**

This notebook demonstrates how masseer can be used to visualize chromatograms from `.sqMass` files. `.sqMass` files are output from `OpenSwathWorkflow`. Unlike the *pyopenms* implementation, *massseer* is designed for querying a specific peptide precursor and therefore does not load all of the data into memory. Furthermore, *massseer* also provides easy conversion directly to pandas dataframe for manipulation. 

## **The `.osw` file**

The `.osw` file is a SQLite file which stores all of the library information, features, scores and statistical confidence for all precursors, proteins and peptides across an experiment.

## **The `.sqMass` file**

The `.sqMass` file is a SQLite file storing raw chromatogram data. The SqMass file has a limited amount of metadata and so it has to be linked with an OpenSwath file to determine information such as the peptide sequence, q-value and peak boundaries. 

In [1]:
from massseer.loaders.SqMassLoader import SqMassLoader
import os

The `.SqMass Loader object requies a list of paths of transitionFiles (`.sqMass` files) and a single merged `.osw` file which contains the metainformation across all runs.

**Note:** The SqMassLoader will fetch data for all .sqMass files. If you are only interested in a single run, only a single run is needed to be linked.

In [2]:
help(SqMassLoader)

Help on class SqMassLoader in module massseer.loaders.SqMassLoader:

class SqMassLoader(massseer.loaders.GenericLoader.GenericLoader)
 |  SqMassLoader(transitionFiles: List[str], rsltsFile: str)
 |  
 |  Class for loading Chromatograms and peak features from SqMass files and OSW files
 |  Inherits from GenericLoader
 |  
 |  Method resolution order:
 |      SqMassLoader
 |      massseer.loaders.GenericLoader.GenericLoader
 |      abc.ABC
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __init__(self, transitionFiles: List[str], rsltsFile: str)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __repr__(self)
 |      Return repr(self).
 |  
 |  __str__(self)
 |      Return str(self).
 |  
 |  loadTransitionGroupFeature(self, pep_id: str, charge: int) -> List[massseer.structs.TransitionGroupFeature.TransitionGroupFeature]
 |      Loads a PeakFeature object from the results file
 |      Args:
 |          pep_id (str): Peptide ID
 |          char

In [3]:
pth = "../tests/test_data/"
loader = SqMassLoader([os.path.join(pth, "xics/test_chrom_1.sqMass"), os.path.join(pth, "xics/test_chrom_2.sqMass")],
                      os.path.join(pth, "osw/test_data.osw"))

Printing the loader we can see the result file and the transition files that are linked

In [4]:
loader

SqMassLoader(rsltsFile=../tests/test_data/osw/test_data.osw, transitionFiles=['../tests/test_data/xics/test_chrom_1.sqMass', '../tests/test_data/xics/test_chrom_2.sqMass']

---

## **Loading a Peptide**

In this example we will visualize the peptide *NKESPT(UniMod:21)KAIVR(UniMod:267)* with a charge state of *3*

#### **Get MetaData**

In [5]:
loader.loadTransitionGroupFeature("NKESPT(UniMod:21)KAIVR(UniMod:267)", 3)

{'../tests/test_data/xics/test_chrom_1.sqMass': [TransitionGroupFeature Apex: None LeftWidth: 818.476013183594 RightWidth: 847.557983398438 Area: 15781.0 Qvalue: 0.00027720520546038556,
  TransitionGroupFeature Apex: None LeftWidth: 843.921997070312 RightWidth: 916.620971679688 Area: 40030.0 Qvalue: 0.00027720520546038556,
  TransitionGroupFeature Apex: None LeftWidth: 1018.39001464844 RightWidth: 1058.38000488281 Area: 11879.0 Qvalue: 0.6191579512574589,
  TransitionGroupFeature Apex: None LeftWidth: 1091.08996582031 RightWidth: 1131.06994628906 Area: 6869.0 Qvalue: 0.21848283341834926,
  TransitionGroupFeature Apex: None LeftWidth: 625.831970214844 RightWidth: 651.276977539062 Area: 3311.0 Qvalue: 0.6196804823619941],
 '../tests/test_data/xics/test_chrom_2.sqMass': [TransitionGroupFeature Apex: None LeftWidth: 1174.68994140625 RightWidth: 1189.22998046875 Area: 1187.0 Qvalue: 0.6252601620692566,
  TransitionGroupFeature Apex: None LeftWidth: 865.742004394531 RightWidth: 898.453002929

This gives us a dictionary where the keys are the transitionFile filenames and the values are a list of `PeakFeature` objects a masseer datatype for storing peak boundaries and other information on the feature.  

### **Fetch Chromatogram Raw Data**

The raw chromatogram data can be fetched using the `loadTransitionGroups()` method. This also takes a peptide sequence and its charge state but returns the raw chromatogram data instead.

In [6]:
loader.loadTransitionGroups("NKESPT(UniMod:21)KAIVR(UniMod:267)", 3)

{SqMassDataAccess(filename=../tests/test_data/xics/test_chrom_1.sqMass): <massseer.structs.TransitionGroup.TransitionGroup at 0x7f6e881a4fd0>,
 SqMassDataAccess(filename=../tests/test_data/xics/test_chrom_2.sqMass): <massseer.structs.TransitionGroup.TransitionGroup at 0x7f6df0bacac0>}

Here we have a dictionary returned where the file keys are the SQMass connectors and the values are a `TransitionGroup` object. The `TransitionGroup` object holds a series of chromatograms belonging to the same precursor. 

Alternatively for analysis directly in python the loaders can return a pandas dataframe object contianing all of the points for this peptide. 

In [7]:
transitionGroupDf = loader.loadTransitionGroupsDf("NKESPT(UniMod:21)KAIVR(UniMod:267)", 3)
transitionGroupDf

Unnamed: 0,filename,rt,intensity,annotation
0,../tests/test_data/xics/test_chrom_1.sqMass,512.8,1069.051908,2274_Precursor_i0
1,../tests/test_data/xics/test_chrom_1.sqMass,516.4,2230.982597,2274_Precursor_i0
2,../tests/test_data/xics/test_chrom_1.sqMass,520.0,2583.056921,2274_Precursor_i0
3,../tests/test_data/xics/test_chrom_1.sqMass,523.7,1876.955276,2274_Precursor_i0
4,../tests/test_data/xics/test_chrom_1.sqMass,527.3,1862.126603,2274_Precursor_i0
...,...,...,...,...
2697,../tests/test_data/xics/test_chrom_2.sqMass,1251.0,0.000000,b4^1
2698,../tests/test_data/xics/test_chrom_2.sqMass,1254.7,42.001872,b4^1
2699,../tests/test_data/xics/test_chrom_2.sqMass,1258.3,20.999608,b4^1
2700,../tests/test_data/xics/test_chrom_2.sqMass,1261.9,20.999608,b4^1


This dataframe has all of the intensities and retention times for all of the files across all transitions. Transitions can be diffrentiated by the *annotation* column and the *filename* column diffrentiates the file/run in which the chromatograms originate from. 

For example to get the total intensity across the intensities we can use the pandas `groupby()` functions 

In [8]:
transitionGroupDf[['intensity', 'filename', 'annotation']].groupby(['filename', 'annotation']).sum()

Unnamed: 0_level_0,Unnamed: 1_level_0,intensity
filename,annotation,Unnamed: 2_level_1
../tests/test_data/xics/test_chrom_1.sqMass,2274_Precursor_i0,2139805.0
../tests/test_data/xics/test_chrom_1.sqMass,b4^1,30006.97
../tests/test_data/xics/test_chrom_1.sqMass,y1^1,130078.0
../tests/test_data/xics/test_chrom_1.sqMass,y2^1,28374.81
../tests/test_data/xics/test_chrom_1.sqMass,y3^1,387906.2
../tests/test_data/xics/test_chrom_1.sqMass,y4^1,129531.2
../tests/test_data/xics/test_chrom_1.sqMass,y5^1,57073.77
../tests/test_data/xics/test_chrom_2.sqMass,2274_Precursor_i0,593173.6
../tests/test_data/xics/test_chrom_2.sqMass,b4^1,7226.959
../tests/test_data/xics/test_chrom_2.sqMass,y1^1,11375.97


#### **Fetching Precomputed feature information**

The `SqMassLoader` object can also be used to load features from the *.osw* file.  

In [12]:
a = loader.loadTransitionGroupFeature("NKESPT(UniMod:21)KAIVR(UniMod:267)", 3)

Here, the filename is the key of the dictionary and the value is a transitionGroupFeature object which contains information such as the boundaries, qvalue and intensity of the feature. 

Like above, this data can also be export as a pandas dataframe however, this limits the usage with downstream *masseer* tools.

In [10]:
featuresDf = loader.loadTransitionGroupFeaturesDf("NKESPT(UniMod:21)KAIVR(UniMod:267)", 3)
featuresDf

Unnamed: 0,filename,leftBoundary,rightBoundary,areaIntensity,qvalue,consensusApex,consensusApexIntensity
0,../tests/test_data/xics/test_chrom_1.sqMass,818.476013,847.557983,64494.0,0.000277,838.622,15781.0
1,../tests/test_data/xics/test_chrom_1.sqMass,843.921997,916.620972,230773.0,0.000277,864.324,40030.0
2,../tests/test_data/xics/test_chrom_1.sqMass,1018.390015,1058.380005,76219.0,0.619158,1035.83,11879.0
3,../tests/test_data/xics/test_chrom_1.sqMass,1091.089966,1131.069946,48427.0,0.218483,1111.35,6869.0
4,../tests/test_data/xics/test_chrom_1.sqMass,625.83197,651.276978,14772.0,0.61968,636.468,3311.0
5,../tests/test_data/xics/test_chrom_2.sqMass,1174.689941,1189.22998,3697.0,0.62526,1178.48,1187.0
6,../tests/test_data/xics/test_chrom_2.sqMass,865.742004,898.453003,16398.0,0.155031,890.432,3968.0
7,../tests/test_data/xics/test_chrom_2.sqMass,898.453003,949.336975,29383.0,0.000277,915.05,5058.0
8,../tests/test_data/xics/test_chrom_2.sqMass,1131.079956,1171.060059,22497.0,0.611901,1151.74,4338.0
9,../tests/test_data/xics/test_chrom_2.sqMass,1200.140015,1225.589966,7743.0,0.62093,1204.06,2150.0


We can use the pandas dataframe to compute the peakWidth

In [11]:
featuresDf['peakWidth'] = featuresDf['rightBoundary'] - featuresDf['leftBoundary']
featuresDf

Unnamed: 0,filename,leftBoundary,rightBoundary,areaIntensity,qvalue,consensusApex,consensusApexIntensity,peakWidth
0,../tests/test_data/xics/test_chrom_1.sqMass,818.476013,847.557983,64494.0,0.000277,838.622,15781.0,29.08197
1,../tests/test_data/xics/test_chrom_1.sqMass,843.921997,916.620972,230773.0,0.000277,864.324,40030.0,72.698975
2,../tests/test_data/xics/test_chrom_1.sqMass,1018.390015,1058.380005,76219.0,0.619158,1035.83,11879.0,39.98999
3,../tests/test_data/xics/test_chrom_1.sqMass,1091.089966,1131.069946,48427.0,0.218483,1111.35,6869.0,39.97998
4,../tests/test_data/xics/test_chrom_1.sqMass,625.83197,651.276978,14772.0,0.61968,636.468,3311.0,25.445007
5,../tests/test_data/xics/test_chrom_2.sqMass,1174.689941,1189.22998,3697.0,0.62526,1178.48,1187.0,14.540039
6,../tests/test_data/xics/test_chrom_2.sqMass,865.742004,898.453003,16398.0,0.155031,890.432,3968.0,32.710999
7,../tests/test_data/xics/test_chrom_2.sqMass,898.453003,949.336975,29383.0,0.000277,915.05,5058.0,50.883972
8,../tests/test_data/xics/test_chrom_2.sqMass,1131.079956,1171.060059,22497.0,0.611901,1151.74,4338.0,39.980103
9,../tests/test_data/xics/test_chrom_2.sqMass,1200.140015,1225.589966,7743.0,0.62093,1204.06,2150.0,25.449951
