# Protocol for analysis of labeled proteomics data
We recommend the following structure for describing the protocol and its example workflow. Below you can find an overview of required features of the protocol and its description and rules for best practices.

This is the link to the github repository of the protocol 
TODO add version!
https://github.com/ProtProtocols/biocontainer-jupyter

Link to docker image:



## Abstract
Provide a short description of the software protocol including broader context, functionality, use case and purpose. 


## Maintainer
Provide details about the protocol maintainer (e.g. email address and/or github username)

## Software
Specify links for documentation and tutorials of used software, source code, publications and use cases. Detail versions of each used software. Alternatively, provide links to the software descriptions in https://bio.tools where this information is available.

## Diagram
Provide a simple diagram of functionality of the workflow/software. We recommend using controlled vocabularies for input/output data types and file formats as well as provided operation of the tool(s). You can use http://edamontology.org terms for the description.

__TODO: example__

## System requirements
Fill in the following items:
Required hard disk space for docker image, input and output files: 

Required memory: 

Recommmended number of threads: 

## Example 
Presentation of well-documented instructions and commands to run the example use case. Depending on the use case and the software, provide link(s) to open the web service incorportated in the Docker image (e.g. 0.0.0.0:8080), bash commands to run programs from the command line and additional code for e.g. checking and visualizing the (intermediate) results. 

Instead of providing the instructions in this notebook, one can also provide a link to a notebook containing the example use case.

## More general use case (optional)
Provide link to notebook with a generalized use case that easily can be adapted to e.g. process different input data and concurrent parametrization.




# Example


Specify parameters for database search and evaluation of identified peptide-spectrum matches:

In [1]:
%load_ext rpy2.ipython



ModuleNotFoundError: No module named 'rpy2'

In [1]:
import ipywidgets as widgets
from ipywidgets import VBox, Label

w = widgets.IntSlider(min=-10,max=30,step=1,value=20)
w2 = widgets.BoundedFloatText(min=0,max=200,value=0.05)
w3 = widgets.Text("IN/sp_human.fasta")
# TODO  needs table to describe labeling formats
w4 = widgets.Dropdown(options={'TMT10': 'TMT 10-plex of K,TMT 10-plex of peptide N-term',
                               'TMT6': 'TMT 6-plex of K,TMT 6-plex of peptide N-term',
                               'iTRAQ4': 'iTRAQ 4-plex of K,iTRAQ 4-plex of Y,iTRAQ 4-plex of peptide N-term',
                               'iTRAQ8 (fixed)': 'iTRAQ 8-plex of K, iTRAQ 8-plex of peptide N-term',
                               'iTRAQ8 (variable)': 'iTRAQ 8-plex of Y'})
w5= widgets.IntSlider(min=0,max=10,step=1,value=1)
w6 = widgets.Dropdown(options=["Carbamidomethylation of C","None"])
w7 = widgets.Dropdown(options=["None","Oxidation of M","Phosphorylation of STY"])
w8 = widgets.Text("IN/")

ww = widgets.Checkbox(description="Decoy")

display(VBox([Label('Precursor tolerance (ppm):'), w, 
              Label('Fragment ion tolerance (da):'),w2,
             Label('Fasta file (database):'),w3,
             Label('Quantification method:'),w4,
             Label('Number of miscleavages;'),w5,
             Label('Further fixed modifications'),w6,
             Label('Further variable modifications'),w7,
             Label('Folder for spectra files (files need to be mgf)'),w8]))

#TODO set names of samples and replicates (peptideshaker)


In [40]:
%%bash -s "$w.value" "$w2.value" "$w3.value" "$w4.value" "$w5.value" "$w6.value" "$w7.value" "$w7.value"
function check_error {
    RETURN_CODE="$1"
    MSG="$2"

    if [ "${RETURN_CODE}" != "0" ]; then
        echo "Error: $MSG"
        exit 1
    fi
}
cd OUT
java -cp /home/biodocker/bin/SearchGUI-*/SearchGUI-*.jar eu.isas.searchgui.cmd.FastaCLI -in "../$3" -decoy

check_error $? "Failed to create decoy database"

DECOY_FASTA="../${3%.*}_concatenated_target_decoy.fasta"
echo $DECOY_FASTA

if [ ! -e ${DECOY_FASTA} ]; then
    echo "Failed to create decoy database"
    exit 1
fi

VAR_MODS=""
if [ $7 != "None"]; then
    VAR_MODS="-variable_mods $7"
fi

# ---- create parameter file for SearchGUI ----
java -cp /home/biodocker/bin/SearchGUI-*/SearchGUI-*.jar \
eu.isas.searchgui.cmd.IdentificationParametersCLI -prec_tol $1 -frag_tol $2 \
-fixed_mods "$4,$6"  $VAR_MODS  -db "${DECOY_FASTA}" -out search.par -mc $5

check_error $? "Failed to create parameter file"

if [ ! -e "search.par" ]; then
    echo "Failed to create search parameters"
    exit 1
fi



Input: /home/biodocker/OUT/../IN/sp_human.fasta

Name: sp_human
Version: 21.12.2017
Decoy Tag: null
Type: UniProt
Last modified: Thu Dec 21 15:38:04 UTC 2017
Size: 20243 sequences

Tue Jan 16 17:39:48 UTC 2018 Appending Decoy Sequences. Please Wait...
10% 20% 30% 40% 50% 60% 70% 80% 90% 100%Reindexing: sp_human_concatenated_target_decoy.fasta.

10% 20% 30% 40% 50% 60% 70% 80% 90%Decoy file successfully created: 

Output: /home/biodocker/OUT/../IN/sp_human_concatenated_target_decoy.fasta

Name: sp_human_concatenated_target_decoy
Version: 16.1.2018
Decoy Tag: REVERSED
Type: UniProt
Last modified: Tue Jan 16 17:39:50 UTC 2018
Size: 40486 sequences (20243 target)

../IN/sp_human_concatenated_target_decoy.fasta

Identification parameters file created: /home/biodocker/OUT/search.par



Reindexing: sp_human_concatenated_target_decoy.fasta. (changes in the file detected)
bash: line 24: [: missing `]'


In [37]:
%%bash -s "$w8.value"
function check_error {
    RETURN_CODE="$1"
    MSG="$2"

    if [ "${RETURN_CODE}" != "0" ]; then
        echo "Error: $MSG"
        exit 1
    fi
}
FILE_LIST=$(ls $1)
echo $FILE_LIST
cd OUT

# Run Search
java -cp /home/biodocker/bin/SearchGUI-*/SearchGUI-*.jar eu.isas.searchgui.cmd.SearchCLI \
-spectrum_files ../$1  -output_folder ./  -id_params search.par -xtandem 0 -msgf 1 \
-comet 0 -ms_amanda 0 -myrimatch 0 -andromeda 0 -omssa 0 -tide 0
check_error $? "Search failed."


IN/test.mgf
Tue Jan 16 17:32:03 UTC 2018 Validating MGF file: /home/biodocker/OUT/../IN/test.mgf

Tue Jan 16 17:32:04 UTC 2018 Indexing spectrum files.
Tue Jan 16 17:32:04 UTC 2018 Extracting search settings.



Processing: test.mgf (1/1)


ms-gf+ command: 
/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -jar /home/biodocker/bin/SearchGUI-2.8.6/resources/MS-GF+/MSGFPlus.jar -s /home/biodocker/OUT/../IN/test.mgf -d /home/biodocker/OUT/../IN/sp_human_concatenated_target_decoy.fasta -o /home/biodocker/OUT/./.SearchGUI_temp/test.msgf.mzid -t 20.0ppm -tda 0 -mod /home/biodocker/bin/SearchGUI-2.8.6/resources/MS-GF+/params/Mods.txt -minCharge 2 -maxCharge 4 -inst 3 -thread 4 -m 3 -e 1 -ntt 2 -protocol 0 -minLength 8 -maxLength 30 -n 10 -addFeatures 0 -ti 0,1 

Tue Jan 16 17:32:04 UTC 2018 Processing test.mgf with MS-GF+.

MS-GF+ Beta (v10282) (12/19/2014)
Loading database files...
Loading database finished (elapsed time: 4.76 sec)
Reading spectra...
Ignoring 0 profile spectra.
Ignoring 0 spect

In [None]:
#TODO, some visualizations + numbers (e.g. number of spectra, ...)

In [48]:
%%bash -s "$w8.value"
function check_error {
    RETURN_CODE="$1"
    MSG="$2"

    if [ "${RETURN_CODE}" != "0" ]; then
        echo "Error: $MSG"
        exit 1
    fi
}

cd OUT
ls

# ---- PeptideShaker ----
java -Xmx4G  -cp /home/biodocker/bin/PeptideShaker-*/PeptideShaker-*.jar \
eu.isas.peptideshaker.cmd.PeptideShakerCLI -experiment experiment1 \
-sample test -replicate 1 -identification_files './'  -out ./experiment.cpsx \
-id_params search.par -spectrum_files  "../$1"

check_error $? "Failed to run PeptideShaker"

java -cp /home/biodocker/bin/PeptideShaker-*/PeptideShaker-*.jar \
eu.isas.peptideshaker.cmd.ReportCLI -in "experiment.cpsx" -out_reports "./" -reports "8"


PeptideShaker Report experiment.cpsx 2018-01-16 17.52.32.html
SearchGUI Report 2018-01-16 15.29.49.html
SearchGUI Report 2018-01-16 17.32.57.html
derby.log
experiment.cpsx
resources
search.par
searchgui_out.zip
Tue Jan 16 18:03:27 UTC 2018 Unzipping searchgui_out.zip.
10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Tue Jan 16 18:03:27 UTC 2018 Import process for experiment1 (Sample: test, Replicate: 1)

Tue Jan 16 18:03:27 UTC 2018 Importing sequences from sp_human_concatenated_target_decoy.fasta.
Tue Jan 16 18:03:29 UTC 2018 FASTA file import completed.
Tue Jan 16 18:03:29 UTC 2018 Importing gene mappings.
Tue Jan 16 18:03:31 UTC 2018 Establishing local database connection.
Tue Jan 16 18:03:36 UTC 2018 Reading identification files.
Tue Jan 16 18:03:36 UTC 2018 Parsing test.msgf.mzid.
10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Tue Jan 16 18:03:36 UTC 2018 Loading spectra for test.msgf.mzid.
Tue Jan 16 18:03:36 UTC 2018 Importing test.mgf
Tue Jan 16 18:03:36 UTC 2018 test.mgf imported.
10% 20% 

In [6]:
%%R require(isobar)

# TODO load parameters "%%R -i parname"
# process the input files
max.fdr <- 0.01
quant.method <- "TMT10plexSpectra"
class.labels <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J")
args <- commandArgs(trailingOnly = TRUE)

ident.file <- "OUT/experiment1_test_1_Extended_PSM_Report.txt"
mgf.files <- system("ls IN/*.mgf", intern=T)

if (!file.exists(ident.file)) {
    stop("Error: Cannot find identification file ", ident.file)
}
for (mgf.file in mgf.files) {
    if (!file.exists(mgf.file)) {
        stop("Error: Cannot find MGF file ", mgf.file)
    }
}

# Convert SearchGUI output to isobar format
psms <- read.csv(ident.file, sep = "\t")

if (! "Decoy" %in% names(psms)) {
    stop("Error: No decoy information available in output file")
}

print(paste("Loaded",nrow(psms), "PSMs"))
message("Loaded ", nrow(psms), " PSMs")

# ---- Confidence filter ----
psms <- psms[order(psms[, "Confidence...."], decreasing = T), ]
decoy.psms <- which(psms[, "Decoy"] == "1")

decoy.count <- 0

for (decoy.index in decoy.psms) {
    decoy.count <- decoy.count + 1
    target.count <- decoy.index - decoy.count

    cur.fdr <- (decoy.count * 2) / (decoy.count + target.count)

    if (cur.fdr > max.fdr) {
        # filter
        psms <- psms[1:decoy.index - 1,]
        break
    }
}

print(head(psms))

message("Filtered ", nrow(psms), " PSMs @ ", max.fdr, " FDR")

# ---- convert to isobar output ----
cols.to.save <- c("Protein.s.", "Sequence", "Spectrum.Title", "Variable.Modifications", "Confidence....", "D.score", "Validation", "Precursor.m.z.Error..ppm.", "Spectrum.File")

if (!all(cols.to.save %in% colnames(psms))) {
    stop("Error: Unexpected result format")
}

psms <- psms[, cols.to.save]
colnames(psms) <- c("accession", "peptide", "spectrum", "var_mod", "pepscore", "dscore", "validation", 
"precursor.mz.error.ppm", "file")

# TODO: add modif...
psms$modif <- ""

write.table(psms, file = "t.corr.csv", sep = "\t", row.names = F, quote = F)

# ---- isobar workflow ----
ib <- readIBSpectra(quant.method, "t.corr.csv", mgf.files, decode.titles = T)

ib <- correctIsotopeImpurities(ib)
ib <- normalize(ib)



[1] "Loaded 446 PSMs"
      X Protein.s.  Sequence Variable.Modifications
25   25     Q9BQE5  HDKDQQHR                     NA
77   77     Q9BQE5  HDKDQQHR                     NA
131 131     Q8N3Z6  SSHYHTSR                     NA
280 280     P25942  ETHCHQHK                     NA
366 366     O60885 RQEQQQQQR                     NA
413 413     P46776 GHVSHGHGR                     NA
                                                                    Fixed.Modifications
25                                TMT 10-plex of K(3), TMT 10-plex of peptide N-term(1)
77                                TMT 10-plex of K(3), TMT 10-plex of peptide N-term(1)
131                                                    TMT 10-plex of peptide N-term(1)
280 Carbamidomethylation of C(4), TMT 10-plex of K(8), TMT 10-plex of peptide N-term(1)
366                                                    TMT 10-plex of peptide N-term(1)
413                                                    TMT 10-plex of peptide N-term(1




  could not find function "readIBSpectra"


 

