# AlphaPept workflow and files

## Core
The core function of Alphapept is `interface.run_complete_workflow()`. This function requires a settings file (a dictionary containing the settings and file paths). Filewise, we store settings files as `*.yaml` file. When calling the core function, it will run a complete workflow based on the settings given.

<img src="images/workflow/core.png" align="center" style="width:600px"/>


## GUI

When starting the AlphaPept GUI via the shortcut that the one-click installer created or via python (`python -m alphapept gui`), the AlphaPept server will be started. It can be accessed via a browser and provides a graphical user interface (GUI) to the AlphaPept functionality. The server extends the core function to a processing framework.
The server is centered around three folders, `Queue`, `Failed`, and `Finished,` which will be created in the `.alphapept`-folder in the user's home directory.  Whenever a new `*.yaml`-file is found in the `Queue`-folder, the server will start handing this over to the core function and start processing. There are three ways to add files to the `Queue`-folder:
1. Via the `New experiment`-tab in the GUI
2. Manually copying a `*.yaml`-file into the `Queue`-folder
3. Automatically via the `File watcher.`

The `File watcher` can be set up to monitor a folder; whenever a new file matching pre-defined settings is copied to the folder, it will create a `*.yaml`-file and add it to the `Queue`-folder.

Whenever an experiment succeeds, the `*.yaml`-file will be appended by summary information of the experiment and moved to the `Finished`-folder. As the `*.yaml`-file is only very small in size (~kB), it is intended to serve as a history of processed files.

Whenever an experiment fails, the `*.yaml`-file will be moved to the `Failed`-folder. It can be moved from there to the `Queue`-folder for reprocessing.


### History and Results

AlphaPept screens all `*.yaml`-files in the finished folder and plots a run history based on the summary information. This is especially useful for QC or comparison purposes. Additionally, the `*.yaml`-files can be used to investigate the results of a run.

<img src="images/workflow/gui.png" align="center" style="width:600px"/>

## Output Files

For each run, AlphaPept creates several output files:
- For each raw file, there will a `.ms_data.hdf`-file with raw-specific data, such as `feature_table`, `first_search`, `second_search` and `peptide_fdr`.
- For the entire experiment, there will be a `results.hdf` (name can be defined in the settings), which contains experiment-specific data, such as `protein_fdr` and the `protein_table` (containing quantified proteins over all files).
- Additionally to the `results.hdf`, there will be a `*.yaml`-file which contains the run settings and summary information of the run. This `*.yaml` can be used to serve as a template to rerun other files with the same settings.
- If a database is created from `FASTA`-files there will be a `database.hdf` (name can be defined in the settings). This contains theoretical spectra and can be reused for other experiments (and speedup total analysis time)

The `ms_data.hdf`, `results.hdf` and database containers can be accessed via the `alphapept.io` library. The GUI also allows to explore these files. Additionally, the `results.hdf` can be directly loaded via the pandas-package (e.g. `pd.read_hdf('results.hdf', 'protein_table')`.

For easier access, AlphaPept directly exports the most relevant tables as `*.csv`:
- `results.csv`: The search results after protein_fdr
- `results_proteins.csv`: The quantified proteins per file.

## Column headers

### protein_fdr

## Downstream analysis

AlphaPept offers some basic plots in the results section (e.g., volcano, heatmap, and PCA). The `*.csv`-format should be generic to use with multiple other tools. Feel free to reach out in case you have ideas for plots or find that the output format not supported or has required columns missing. To reach out, report an issue [here](https://github.com/MannLabs/alphapept/issues/new/choose) or send an email to opensource@alphapept.com.

### Using with Perseus

Perseus offers a generic table import, so you can directly use the `results_proteins.csv`.

#### Example: Volcano-Plot
An excellent tutorial for creating volcano-plots with Perseus can be found [here](http://www.coxdocs.org/doku.php?id=perseus:user:use_cases:interactions).

Below a quickstart to use AlphaPept with Perseus (tested with `1.6.15.0`) The file used here is `PXD006109` from the test runner (multi-species quantification test) with six files (three each group).

1. Open Perseus.
<img src="images/workflow/perseus_0.PNG" align="center"/>

2. Drag and drop the `results_proteins.csv` in the central pane of Perseus. The `Generic matrix upload`-window will open.
<img src="images/workflow/perseus_1.PNG" align="center"/>

3. Select the appropriate columns (e.g., LFQ for LFQ-intensities) and select them for Main with the `>`-Button. The first row is empty. Assign this for text. Click `OK,` and the table should be loaded.
<img src="images/workflow/perseus_2.PNG" align="center"/>

4. Click on the `f(x)`-button and press `OK` on the window that opens to apply a `log2(x)`-transformation.
<img src="images/workflow/perseus_3.PNG" align="center"/>

5. Click on `Annot. rows` > `Categorical annotation rows` to assign a group for each file. Select multiple entries and click on the checkmark to assign multiple groups at the same time. Click `OK` to close the window.
<img src="images/workflow/perseus_4.PNG" align="center"/>

6. Click on the `Volcano plot`-symbol in the upper right `Analysis`-column. For the tutorial, we keep the standard settings and press `OK`.

7. You can double-click on the small volcano plot to show the plot.
<img src="images/workflow/perseus_5.PNG" align="center"/>

Enjoy your volcano-plot.