<title/>Template notebook for site-centric PTM Signature Enrichment Analysis (PTM-SEA)

# Template notebook for running site-centric PTM signature enrichment analysis (PTM-SEA)

Before running any other cell, type in a unique project name. This will determine the folder in which your inputs and outputs will be stored. If this name doesn't exist in this workspace, a new folder will be created. If otherwise, all files will be overwritten.

In [None]:
### EDIT THIS CELL (1/3)
PROJECT_NAME <- "test_project_ssc"

In [None]:
source("~/src/terra-functions.R")
init_project_dir()

## Configure environment and prepare files

### Configure cloud environment

Click on "Cloud Environment" in the top-right corner, for **Application configuration** select "Custom Environment" and for **Container image** type in `munchic/ptm-sea:latest`. This is a Terra-based Docker environment that has all the libraries and scripts for PTM-SEA.

### Upload files

1. Upload the input file to bucket

Open your workspace in a new tab or window. Upload files into your workspace by navigating to DATA tab -> Files tab, and then using the + button on the bottom right of your page. A single file is required: single-site PTM proteome [GCT v1.3+].

2. Locate the uploaded file

In [None]:
list_files_in_bucket(only_gct = TRUE)

3. Select the file name to copy over to environment

In [None]:
### EDIT THIS CELL (2/3)
input_file <- "test_ccle_pY.gct" 

In [None]:
copy_from_bucket_to_project_dir(input_file) 

## Single-site centric PTM-SEA

### Set parameters

1. Basic parameters for pre-processing PTM GCT:
- `id_type_out` - type of site annotation in the provided GCT file
- `seqwin_col` - name of column containing the site annotation
- `organism` - organism from which the dataset is derived
- `mode` - determines how multiple sites per gene will be combined

In [None]:
### EDIT THIS CELL (3/3)
id_type_out <- "uniprot"       # options: "uniprot", "refseq", "seqwin", "psp"
seqwin_col  <- "VMsiteFlanks"  # only relevant if the annotation is "seqwin"
organism    <- "human"         # options: "human", "mouse", "rat"
mode        <- "median"        # options: sd - most variable (standard deviation) across sample columns; SGT - subgroup top: first subgroup in protein group (Spectrum Mill); abs.max - for log-transformed, signed p-values.

2. Advanced parameters for pre-processing PTM GCT

In [None]:
id_type        <- "sm"       # Notation of site-ids: 'sm' - Spectrum Mill; 'wg' - Web Gestalt; 'ph' - Philosopher
acc_type_in    <- "uniprot"  # Type of accession number in 'rid' object in GCT file (uniprot, refseq, symbol).
residue        <- '"S|T|Y"'  # Modified residues, e.g. "S|T|Y" or "K".
ptm            <- "p"        # Type of modification, e.g "p" or "ac".
localized      <- TRUE       # CAUTION: it is NOT RECOMMENDED to set this flag to FALSE. If TRUE only fully localized sites will be considered.

3. Advanced parameters for running PTM-SEA

In [None]:
output_prefix     <- "ptm-sea-results"  # Label for output files from PTM-SEA
sample_norm_type  <- "rank"             # rank, log, log.rank, none
weight            <- 0.75               # -w ${default=NA weight}
correl_type       <- "z.score"          # "rank", "z.score", "symm.rank"
statistic         <- "area.under.RES"   # "area.under.RES", "Kolmogorov-Smirnov"
output_score_type <- "NES"              # 'Score type: "ES" - enrichment score,  "NES" - normalized ES'
nperm             <- 1000               # Number of permutations
min_overlap       <- 5                  # 'Minimal overlap between signature and data set.'
extended_output   <- TRUE               # 'If TRUE additional stats on signature coverage etc. will be included as row annotations in the GCT results files.'
export_signal_gct <- TRUE               # For each signature export expression GCT files.
global_fdr        <- FALSE              # If TRUE global FDR across all data columns is calculated.

### Run PTM-SEA

In [None]:
input_ds <- file.path(project_input, input_file)
ptm_sig_db <- get_ptm_sig_db(id_type_out, organism)

In [None]:
preprocess_gct()

In [None]:
run_ptm_sea()

In [None]:
# top 10 sign sites

### Save results to bucket and access results

1. Zip and save results to bucket (permanent storage)

In [None]:
save_results_to_bucket()

2. Download results to local computer

Open your workspace in a new tab or window. Navigate to DATA tab -> Files tab, click on the <PROJECT_NAME>.zip file to download the zip folder with all PTM-SEA outputs.