# Loading Feature Information

As a demonstration, we will compare the features found for the peptide `AFVDFLSDEIK` with a charge state of 2

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
# Please run this before executing any cell
import os
os.chdir("../../test/test_data/") #### Insert path to data, this is the path to the tutorial data. 

## Loading Transition Group Features

In [3]:
from massdash.loaders import MzMLDataLoader
loader = MzMLDataLoader(dataFiles="mzml/ionMobilityTest.mzML",
                        rsltsFile=["osw/ionMobilityTest.osw", "diann/ionMobilityTest-diannReport.tsv"])

Initializing valid scores for selection
[2024-09-30 17:26:54,983] MzMLDataAccess - INFO - Opening mzml/ionMobilityTest.mzML file...: Elapsed 0.0842585563659668 ms
[2024-09-30 17:26:54,984] MzMLDataAccess - INFO - There are 50 spectra and 0 chromatograms.
[2024-09-30 17:26:54,984] MzMLDataAccess - INFO - There are 25 MS1 spectra and 25 MS2 spectra.


In [4]:
features = loader.loadTopTransitionGroupFeature("AFVDFLSDEIK", 2)
features

TransitionGroupFeatureCollection
ionMobilityTest: [-------- TransitionGroupFeature --------
leftBoundary: 6235.8486328125
rightBoundary: 6248.42822265625
areaIntensity: 352642.16135025
consensusApex: 6242.15
consensusApexIntensity: 352642.16135025
qvalue: 3.5084067486223456e-05
consensusApexIM: 0.978579389257473
precursor_mz: None
precursor_charge: 2
product_annotations: None
product_mz: None
sequence: AFVDFLSDEIK
software: OpenSWATH, -------- TransitionGroupFeature --------
leftBoundary: 6236.052702
rightBoundary: 6248.63205
areaIntensity: 1137201.5
consensusApex: 6241.44882
consensusApexIntensity: None
qvalue: 7.964108227e-05
consensusApexIM: 0.9800000191
precursor_mz: None
precursor_charge: 2
product_annotations: None
product_mz: None
sequence: AFVDFLSDEIK
software: DIA-NN]

This method returns a dictionary where the keys are the runnames and the value is a list of TransitionGroupFeatures. We can see which software this feature was found in by the "software" tag.

Here, we can see that both `OpenSwath` and `DIA-NN` are detecting the same feature since the left and right boundaries and consensusApex are approximately equal. The intensities are different due to the different strategies that `OpenSWATH` and `DIA-NN` use to compute intensity. `OpenSWATH` sums up the intensity across all fragments while `DIA-NN` sums up the intensity across the top 3 fragment ions.  

In [5]:
# Proof Intensities are actually similar
import pandas as pd
df = pd.read_csv("ionMobilityTest2/ionMobilityTest2-diannReport.tsv", sep='\t')
sum([ float(i) for i in df['Fragment.Quant.Raw'].iloc[1].split(';')[:-1] ])

3079256.37492

### Loading The Top Transition Group Features as a Pandas DataFrame

In [6]:
loader.loadTopTransitionGroupFeatureDf("AFVDFLSDEIK", 2)

Unnamed: 0,runName,leftBoundary,rightBoundary,areaIntensity,qvalue,consensusApex,consensusApexIntensity,sequence,precursor_charge,software
0,ionMobilityTest,6235.848633,6248.428223,2848190.0,3.5e-05,6242.15,352642.16135,AFVDFLSDEIK,2,OpenSWATH
1,ionMobilityTest,6236.052702,6248.63205,1137201.5,8e-05,6241.44882,,AFVDFLSDEIK,2,DIA-NN


## Loading All TransitionGroupFeatures

In [7]:
loader.loadTransitionGroupFeatures("AFVDFLSDEIK", 2)

TransitionGroupFeatureCollection
ionMobilityTest: [-------- TransitionGroupFeature --------
leftBoundary: 6235.8486328125
rightBoundary: 6248.42822265625
areaIntensity: 2848190.0
consensusApex: 6242.15
consensusApexIntensity: 352642.16135025
qvalue: 3.5084067486223456e-05
consensusApexIM: 0.978579389257473
precursor_mz: None
precursor_charge: 2
product_annotations: None
product_mz: None
sequence: AFVDFLSDEIK
software: OpenSWATH, -------- TransitionGroupFeature --------
leftBoundary: 6255.64599609375
rightBoundary: 6266.51513671875
areaIntensity: 35433.9
consensusApex: 6256.67
consensusApexIntensity: 2888.98703575134
qvalue: 0.0001776217262565
consensusApexIM: 0.981810896830182
precursor_mz: None
precursor_charge: 2
product_annotations: None
product_mz: None
sequence: AFVDFLSDEIK
software: OpenSWATH, -------- TransitionGroupFeature --------
leftBoundary: 6236.052702
rightBoundary: 6248.63205
areaIntensity: 1137201.5
consensusApex: 6241.44882
consensusApexIntensity: None
qvalue: 7.964108

<div class="alert alert-info">

**Note:**

DIA-NN only outputs one feature per precursor so calling the `loadTransitionGroupFeatures()` method will output esentially the same as `loadTopTransitionGroupFeature()` (only difference is that `loadTransitionGroupFeatures()` outputs a list of `TransitionGroupFeatures`)   

</div>

## Loading All TransitionGroupFeatures In a Pandas DataFrame

In [8]:
features_df = loader.loadTransitionGroupFeaturesDf("AFVDFLSDEIK", 2)
features_df

Unnamed: 0,runname,leftBoundary,rightBoundary,areaIntensity,qvalue,consensusApex,consensusApexIntensity,precursor_charge,sequence,software
0,ionMobilityTest,6235.848633,6248.428223,2848190.0,3.5e-05,6242.15,352642.16135,2,AFVDFLSDEIK,OpenSWATH
1,ionMobilityTest,6255.645996,6266.515137,35433.9,0.000178,6256.67,2888.987036,2,AFVDFLSDEIK,OpenSWATH
2,ionMobilityTest,6236.052702,6248.63205,1137201.5,8e-05,6241.44882,,2,AFVDFLSDEIK,DIA-NN


Althuogh the Pandas DataFrame output is incompatible with further *Masseer* analysis, the pandas dataframe allows for greater flexibity with alternative analysis. For example, we can calculate the peakWidth of all features as shown below.

In [9]:
features_df['peakWidth'] = features_df['rightBoundary'] - features_df['leftBoundary']
features_df

Unnamed: 0,runname,leftBoundary,rightBoundary,areaIntensity,qvalue,consensusApex,consensusApexIntensity,precursor_charge,sequence,software,peakWidth
0,ionMobilityTest,6235.848633,6248.428223,2848190.0,3.5e-05,6242.15,352642.16135,2,AFVDFLSDEIK,OpenSWATH,12.57959
1,ionMobilityTest,6255.645996,6266.515137,35433.9,0.000178,6256.67,2888.987036,2,AFVDFLSDEIK,OpenSWATH,10.869141
2,ionMobilityTest,6236.052702,6248.63205,1137201.5,8e-05,6241.44882,,2,AFVDFLSDEIK,DIA-NN,12.579348
