# Loading Feature Information

As a demonstration, we will compare the features found for the peptide `AFVDFLSDEIK` with a charge state of 2

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
# Please run this before executing any cell
import os
os.chdir("../../test/test_data/") #### Insert path to data, this is the path to the tutorial data. 

## Loading Transition Group Features

In [3]:
from massdash.loaders import MzMLDataLoader
loader_osw = MzMLDataLoader(dataFiles="mzml/ionMobilityTest.mzML",
                        rsltsFile="osw/ionMobilityTest.osw",
                        rsltsFileType="OpenSWATH")

loader_diann = MzMLDataLoader(dataFiles="mzml/ionMobilityTest.mzML",
                        rsltsFile="diann/ionMobilityTest-diannReport.tsv",
                        libraryFile="library/ionMobilityTestLibrary.tsv",
                        rsltsFileType='DIA-NN')

[2024-01-12 12:33:17,699] MzMLDataAccess - INFO - Opening mzml/ionMobilityTest.mzML file...: Elapsed 0.08150839805603027 ms
[2024-01-12 12:33:17,700] MzMLDataAccess - INFO - There are 50 spectra and 0 chromatograms.
[2024-01-12 12:33:17,700] MzMLDataAccess - INFO - There are 25 MS1 spectra and 25 MS2 spectra.
[2024-01-12 12:33:17,744] MzMLDataAccess - INFO - Opening mzml/ionMobilityTest.mzML file...: Elapsed 0.03662395477294922 ms
[2024-01-12 12:33:17,744] MzMLDataAccess - INFO - There are 50 spectra and 0 chromatograms.
[2024-01-12 12:33:17,745] MzMLDataAccess - INFO - There are 25 MS1 spectra and 25 MS2 spectra.


In [4]:
features_osw = loader_osw.loadTopTransitionGroupFeature("AFVDFLSDEIK", 2)
features_osw

{'mzml/ionMobilityTest.mzML': -------- TransitionGroupFeature --------
 leftBoundary: 6235.8486328125
 rightBoundary: 6248.42822265625
 areaIntensity: 332556.620606927
 consensusApex: 6242.15
 consensusApexIntensity: 332556.620606927
 qvalue: 3.5084067486223456e-05
 consensusApexIM: 0.978579389257473
 precursor_mz: None
 precursor_charge: 2
 product_annotations: None
 product_mz: None
 sequence: AFVDFLSDEIK}

In [5]:
features_diann = loader_diann.loadTopTransitionGroupFeature("AFVDFLSDEIK", 2)
features_diann

{'mzml/ionMobilityTest.mzML': -------- TransitionGroupFeature --------
 leftBoundary: 6236.052702
 rightBoundary: 6248.63205
 areaIntensity: 1137201.5
 consensusApex: 6241.44882
 consensusApexIntensity: None
 qvalue: 7.964108227e-05
 consensusApexIM: 0.9800000191
 precursor_mz: None
 precursor_charge: 2
 product_annotations: None
 product_mz: None
 sequence: AFVDFLSDEIK}

This method returns a dictionary where the keys are the runnames and the value is a list of TransitionGroupFeatures. 

Here, we can see that both `OpenSwath` and `DIA-NN` are detecting the same feature since the left and right boundaries and consensusApex are approximately equal. The intensities are different due to the different strategies that `OpenSWATH` and `DIA-NN` use to compute intensity. `OpenSWATH` sums up the intensity across all fragments while `DIA-NN` sums up the intensity across the top 3 fragment ions.  

In [12]:
# Proof Intensities are the same 
import pandas as pd
df = pd.read_csv("ionMobilityTest2/ionMobilityTest2-diannReport.tsv", sep='\t')
sum([ float(i) for i in df['Fragment.Quant.Raw'].iloc[1].split(';')[:-1] ])

3079256.37492

### Loading The Top Transition Group Features as a Pandas DataFrame

In [6]:
loader_osw.loadTopTransitionGroupFeatureDf("AFVDFLSDEIK", 2)

Unnamed: 0,filename,feature_id,precursor_id,precursor_mz,precursor_charge,areaIntensity,consensusApexIntensity,sequence,consensusApex,leftBoundary,rightBoundary,consensusApexIM,ms2_dscore,peakgroup_rank,qvalue,run_id
0,mzml/ionMobilityTest.mzML,-4054571632749145316,33018,642.3295,2,2848190.0,332556.620607,AFVDFLSDEIK,6242.15,6235.848633,6248.428223,0.978579,5.932165,1.0,3.5e-05,2870707016753918864


In [7]:
loader_diann.loadTopTransitionGroupFeatureDf("AFVDFLSDEIK", 2)

Unnamed: 0,filename,filename.1,leftBoundary,rightBoundary,areaIntensity,qvalue,consensusApex,consensusApexIntensity,precursor_charge,sequence,consensusApexIM
0,mzml/ionMobilityTest.mzML,ionMobilityTest2,6236.052702,6248.63205,1137201.5,8e-05,6241.44882,,2,AFVDFLSDEIK,0.98


## Loading All TransitionGroupFeatures

In [8]:
loader_osw.loadTransitionGroupFeatures("AFVDFLSDEIK", 2)

{'mzml/ionMobilityTest.mzML': [-------- TransitionGroupFeature --------
  leftBoundary: 6235.8486328125
  rightBoundary: 6248.42822265625
  areaIntensity: 2848190.0
  consensusApex: 6242.15
  consensusApexIntensity: 332556.620606927
  qvalue: 3.5084067486223456e-05
  consensusApexIM: 0.978579389257473
  precursor_mz: None
  precursor_charge: 2
  product_annotations: None
  product_mz: None
  sequence: AFVDFLSDEIK,
  -------- TransitionGroupFeature --------
  leftBoundary: 6255.64599609375
  rightBoundary: 6266.51513671875
  areaIntensity: 35433.9
  consensusApex: 6256.67
  consensusApexIntensity: 5550.48670984957
  qvalue: 0.0001776217262565
  consensusApexIM: 0.981810896830182
  precursor_mz: None
  precursor_charge: 2
  product_annotations: None
  product_mz: None
  sequence: AFVDFLSDEIK]}

This method returns a dictionary where the keys are the run names and the values are a list of all the TransitionGroupFeatures found. Here we can see that `OpenSWATH` finds an additional feature however in general this should be ignore because the top `TransitionGruopFeature` has a lower `q-value` 

<div class="alert alert-info">

**Note:**

DIA-NN only outputs one feature per precursor so calling the `loadTransitionGroupFeatures()` method will output esentially the same as `loadTopTransitionGroupFeature()` (only difference is that `loadTransitionGroupFeatures()` outputs a list of `TransitionGroupFeatures`)   

</div>

In [9]:
loader_diann.loadTransitionGroupFeatures("AFVDFLSDEIK", 2)

{'mzml/ionMobilityTest.mzML': [-------- TransitionGroupFeature --------
  leftBoundary: 6236.052702
  rightBoundary: 6248.63205
  areaIntensity: 1137201.5
  consensusApex: 6241.44882
  consensusApexIntensity: None
  qvalue: 7.964108227e-05
  consensusApexIM: 0.9800000191
  precursor_mz: None
  precursor_charge: 2
  product_annotations: None
  product_mz: None
  sequence: AFVDFLSDEIK]}

## Loading All TransitionGroupFeatures In a Pandas DataFrame

In [10]:
features_df_osw = loader_osw.loadTransitionGroupFeaturesDf("AFVDFLSDEIK", 2)
features_df_osw

Unnamed: 0,filename,feature_id,precursor_id,precursor_mz,precursor_charge,areaIntensity,consensusApexIntensity,sequence,consensusApex,leftBoundary,rightBoundary,consensusApexIM,ms2_dscore,peakgroup_rank,qvalue,run_id
0,mzml/ionMobilityTest.mzML,-4054571632749145316,33018,642.3295,2,2848190.0,332556.620607,AFVDFLSDEIK,6242.15,6235.848633,6248.428223,0.978579,5.932165,1.0,3.5e-05,2870707016753918864
1,mzml/ionMobilityTest.mzML,8869432762985074175,33018,642.3295,2,35433.9,5550.48671,AFVDFLSDEIK,6256.67,6255.645996,6266.515137,0.981811,5.131462,2.0,0.000178,2870707016753918864


Althuogh the Pandas DataFrame output is incompatible with further *Masseer* analysis, the pandas dataframe allows for greater flexibity with alternative analysis. For example, we can calculate the peakWidth of all features as shown below.

In [11]:
features_df_osw['peakWidth'] = features_df_osw['rightBoundary'] - features_df_osw['leftBoundary']
features_df_osw

Unnamed: 0,filename,feature_id,precursor_id,precursor_mz,precursor_charge,areaIntensity,consensusApexIntensity,sequence,consensusApex,leftBoundary,rightBoundary,consensusApexIM,ms2_dscore,peakgroup_rank,qvalue,run_id,peakWidth
0,mzml/ionMobilityTest.mzML,-4054571632749145316,33018,642.3295,2,2848190.0,332556.620607,AFVDFLSDEIK,6242.15,6235.848633,6248.428223,0.978579,5.932165,1.0,3.5e-05,2870707016753918864,12.57959
1,mzml/ionMobilityTest.mzML,8869432762985074175,33018,642.3295,2,35433.9,5550.48671,AFVDFLSDEIK,6256.67,6255.645996,6266.515137,0.981811,5.131462,2.0,0.000178,2870707016753918864,10.869141


We can also just load the top features in a dataframe if desired.