In [None]:
#| default_exp proteomics_analysis
from nbdev import *

# AlphaPept: Proteomic Analysis

This tutorial will cover the basic analysis of a mixed species dataset.
We will analyze files from Puyvelde et al.: ["A comprehensive LFQ benchmark dataset on modern day acquisition strategies in proteomics"](10.1038/s41597-022-01216-6).

The data is available at ProteomeXchange as  [PXD028735](https://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD028735).

## Prerequistes

Download and install AlphaPept according to the Readme.

## Files

We will download the files from the DDA dataset for Thermo. This is the following six files:
    
- LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.raw
- LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_02.raw
- LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03.raw
- LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_01.raw
- LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_02.raw
- LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_03.raw

## Organize Files

To process the data, we will place all files in one folder like so:

![](https://i.imgur.com/Nj7BCOQ.png)

## FASTA files
You can either place the `.fasta`-files directly in the folder or you can use the FASTA management in the GUI. AlphaPept can directly download FASTA files from UniProt and stores them locally in the user folder (i.e., that is `C:\Users\admin\.alphapept\fasta`. 

We download all four FASTA files by clicking on the respective button, waiting for the download, and refreshing the page afterward. 
![](https://i.imgur.com/m1nq6VI.png)

![](https://i.imgur.com/j2vRaAI.png)

## Creating a new experiment

###  Files
Next, we switch to the `New experiment`-Tab and enter the file path. The page should refresh, and all files should be listed.

![](https://i.imgur.com/U0Ww4Ke.png)

ℹ️ Note that you can deselect images by moving the slider to the right and deselecting `Use`.

![](https://i.imgur.com/HuXv8qk.png)

Next, select the FASTA files you want to use for this run. Click on the `Select FASTA files`-dropdown button. If a FASTA file was in your data folder, it would be selectable here; otherwise, it will list everything alphapept user folder.
For the example, we will select all FASTA files.

![](https://i.imgur.com/OAHEWvA.png)


ℹ️ Note that you can hover over your selection to display the full filename.

###  Settings
By default, settings in AlphaPept are collapsed; you will not need to change them. You can adjust the Processing steps (e.g. if you only want to do feature finding and no search) in `Processing steps` and change settings in `Modify settings`).
Suppose you would want to add a fixed modification; you would select fasta. This enables the FASTA settings, and you can use the menus to add and change the settings:

![](https://i.imgur.com/qlzTOYN.png)

For the tutorial, we will not change any settings. Give your experiment a meaningful name, and if no warnings appear, you can click on `Submit`. Once the job has been submitted you will be prompted to go to the Status page.

![](https://i.imgur.com/vPsz5xX.png)

ℹ️ AlphaPept has an internal processing queue - so you can submit multiple experiments, and they will be processed one after another. 

###  Status

The Status page will periodically update and show you the process. In the green bar, you can see how much time has been spent on the respective job. It can take 1-2 minutes until you will see the process once a job has started.

![](https://i.imgur.com/HmRnIEC.png)

ℹ️ You can expand `Full log` and `Queue` to see a detailed log of your processing job or to inspect the queue of future jobs that will be processed.

ℹ️ Once the experiment is done, it will show `No files to process`.


###  Results
Once an experiment is processed, we can do some basic data exploration within AlphaPept. For this, click on the `Results`-Tab.

![](https://i.imgur.com/AwCTCDf.png)

All previous experiments that were run in the GUI will be selectable here. You can also select a `*.hdf` file. Once loaded, you will see a `Run File Summary`, showing the number of MS1-Features, Peptides, and Protein Groups per run. You can hover over each column to see the filename. Additionally, you can read the `Run summary` with information about the settings and timings and the `Run log`.

#### Exploring tables from Experiment
AlphaPept provides basic plotting functions to explore results from an experiment. 

**Volcano Plot:**
In `Volcano plot`, you can create a volcano plot to explore differentially expressed proteins. For this, select the groups. You will have the filenames available, and if LFQ-based processing is present can choose between LFQ-intensity or normal intensity. For the example, we will divide in Condition A and B LFQ intensities according to the filenames. If you can't read the filenames as they are too long, hover over the file and wait to see the full name.
The plot will automatically update while selecting the groups. You can hover over each point in the volcano plot to see the protein id. Additionally, you can use `Highlight proteins` to annotate specific proteins in the plot. For the sake of demonstration, we choose `P46683` and `P36578` for highlighting. The field is interactive so you can type to search for specific proteins.

![](https://i.imgur.com/DLFMzz6.png)

**Correlation heatmap:**

In `Correlation heatmap`, you can create a correlation heatmap based on the protein intensities. In the example case, you can nicely see how the conditions correlate.

![](https://i.imgur.com/vcs9gEj.png)

**Scatterplot:**

`Scatterplot` shows you a scatterplot of protein intensities in one sample against another. You can choose the two filenames via the Group1 and Group2 selector. You can only select one file per group but can select the same file for each group. For the tutorial case, we will create a scatter plot between two samples of condition A (01 and 02. We can see that they are nicely correlating. The output also provides the results of an OLS Regression. In our case, we have a `R-squared` of `0.979` highlighting the high correlation (as expected) of samples of the same condition.

ℹ️ Hovering over a data point will show you the respective protein. This can be useful to identify outliers.

![](https://i.imgur.com/zdjhUYb.png)

**PCA:**

`PCA` will perform a PCA analysis of the data and show the first two components as a scatterplot. This can be very useful for finding files that cluster together and e.g., identify batch effects. For our example, we see that condition A and condition B are in close proximity of each other and form clusters, as expected.

![](https://i.imgur.com/7VzsJQe.png)


**Sequence coverage map:**

`Sequence coverage map` can show you which peptides were identified for a specific protein. Click on a protein of interest, select the filename, and see which part of the protein sequence is covered by identified peptides. 

![](https://i.imgur.com/iASIXoo.png)

**Select group:**

This allows to display the tabular data, which is stored in the `*.hdf`-files. As tables can be very large, you can select the data range via the slider. Clicking `Create download link` allows you to download the table. 

![](https://i.imgur.com/1gEAl31.png)

ℹ️ By default, you explore the `results.hdf` of an experiment. By using `Select file from experiment`, you can select individual experimental files and explore tables only from the respective run.