# Execute the STAVER workflow

The `STAVER` algorithm is implemented in the `staver_pipeline` module. This module provides a comprehensive proteomics data analysis tool designed to streamline the workflow from raw data preprocessing to the final result output. We provide a tutorial for running the `STAVER` workflow with the `Command-Line Interface (CLI)`. For more details about the `STAVER` algorithm, please refer to the [STAVER Document](https://staver.readthedocs.io/en/latest/).

In [2]:
import pandas as pd
import numpy as np
import staver as st

import warnings
warnings.filterwarnings('ignore')

## List all the optional command arguments

In [3]:
%run ~/STAVER/staver/staver_pipeline.py  -h

usage: staver_pipeline.py [-h] -n NUMBER_THRESHODS -i DIA_PATH
                          [-ref REFERENCE_STANDARD_DATASET] -o
                          DIA_PEP_DATA_OUTPATH -op DIA_PROTEIN_DATA_OUTPATH
                          [-fdr FDR_THRESHOLD] [-c COUNT_CUTOFF_SAME_LIBS]
                          [-d COUNT_CUTOFF_DIFF_LIBS]
                          [-pep_cv PEPTIDES_CV_THRESH]
                          [-pro_cv PROTEINS_CV_THRESH]
                          [-na_thresh NA_THRESHOLD] [-top TOP_PRECURSOR_IONS]
                          [-norm NORMALIZATION_METHOD] [-suffix FILE_SUFFIX]
                          [-sample SAMPLE_TYPE] [-ver VERBOSE] [-v]

STAVER: A Standardized Dataset-Based Algorithm for Efficient Variation
Reduction in Large-Scale DIA MS Data

optional arguments:
  -h, --help            show this help message and exit
  -n NUMBER_THRESHODS, --thread_numbers NUMBER_THRESHODS
                        The number of thresholds for computer operations
  -i DIA_PATH, --inp

## Run the staver_pipeline

> (Estimated time: ~5 min of 20 samples)

To begin with, the Environment and the DIA dataset should be prepared:

1. **Preparing the Environment:**
   - Ensure that Python is installed on your system.
   - Download or clone the `STAVER` repository to your local machine or HPC. 
   - Install the required packages by running `pip install -r requirements.txt` in the `STAVER` directory.

2. **Setting Up the Parameters:**
   - Use the `-n` flag to set the number of threads for computation.
   - The `-i` flag should point to your input DIA data path.
   - If you have a reference dataset, use the `-ref` flag to provide its path; otherwise, the default dataset will be used.
   - Define the output paths for peptide data with `-o` and protein data with `-op`.

In [5]:
## run staver_pipeline
%run ~/STAVER/staver/staver_pipeline.py \
        --thread_numbers 16 \
        --input /Volumes/T7_Shield/staver/data/likai-diann-raw-20/ \
        --reference_dataset_path /Volumes/T7_Shield/staver/data/likai-diann-raw \
        --output_peptide /Volumes/T7_Shield/staver/results/DIA_repeat20_2023010/peptides/ \
        --output_protein /Volumes/T7_Shield/staver/results/DIA_repeat20_2023010/proteins/ \
        --count_cutoff_same_libs 1 \
        --count_cutoff_diff_libs 2 \
        --fdr_threshold 0.01 \
        --peptides_cv_thresh 0.3 \
        --proteins_cv_thresh 0.3 \
        --na_threshold 0.3 \
        --top_precursor_ions 6 \
        --file_suffix _F1_R1

All parsed arguments:
number_threshods: 16
dia_path: /Volumes/T7_Shield/staver/data/likai-diann-raw-20/
reference_standard_dataset: /Volumes/T7_Shield/staver/data/likai-diann-raw
dia_pep_data_outpath: /Volumes/T7_Shield/staver/results/DIA_repeat20_2023010/peptides/
dia_protein_data_outpath: /Volumes/T7_Shield/staver/results/DIA_repeat20_2023010/proteins/
fdr_threshold: 0.01
count_cutoff_same_libs: 1
count_cutoff_diff_libs: 2
peptides_cv_thresh: 0.3
proteins_cv_thresh: 0.3
na_threshold: 0.3
top_precursor_ions: 6
normalization_method: median
file_suffix: _F1_R1
sample_type: None
verbose: False




/Volumes/T7_Shield/staver/results/DIA_repeat20_2023010/peptides/
