MSstats label-free preprocessing

This repo contains a script and a Rmd file for the pre-processing and normalization of MaxQuant or DIA-NN output files through the MSstats R package. The output is a tabular file in wide format (1 row per protein, 1 column per sample/condition) that could be used as an input to run statistics with Limma or similar.

Instructions for using the script for your normalization step starting from MaxQuant outpus

Download/clone the contents of this repo into your local computer. This should create a R project folder with the script to run the preprocessing.
Delete the MSstats_Output_data/ folder and its contents from your local computer.
Add the next three MaxQuant output files into this folder

evidence.txt
proteinGroups.txt
annotation.csv (not included in the MaxQuant txt folder, see below how to create this one before executing the script).

NOTE: These files should be in the same folder as the R script, and this folder should be an initiated RStudio project (There should be a .Rproj file in the same folder).

Open your RStudio project by double-clicking the .Rproj file in your newly created R project folder.
Open the script mq_to_msstats_formating_normalization_n_prep_for_limma.R
Modify lines between 16 to 31 to set up the parameters for both the transformation from MaxQuant format to MSstats format, and for the actual summarizaton and normalization.
Execute the script (click ‘Source’ on the top-right corner of the script).
The script should generate three .csv files: msstats_tabular_data_for_limma_input.csv, in wide format suitable for downstream analysis with limma. And two files in long format within MSstats_Output_data with the un-normalized and the normalized feature intensities before and after MSstats pre-processing.

Instructions for using the script for your normalization step starting from DIANN outputs

BE AWARE!!: There is a known issue with the dataProcessing function fron MSstats that makes it use a lot of RAM with big input files (> 1 million rows). If you have A big output from DIANN and have issues with your R session crashing due to RAM overload, you can execute this script up to line 105 and get the output of the MSstats formatted data from ~/MSstats_Output_data/MSstats_formated_tables/msstas_formated_diann_data_bf_normalization.csv and continue on Galaxy, where the RAM shouldn’t be an issue.

Download/clone the contents of this repo into your local computer. This should create a R project folder with the script to run the preprocessing.
Delete the MSstats_Output_data/ folder and its contents from your local computer.
Add the MainOutput.tsv output file from DIA-NN into this folder.
Add your annotation_diann.csv file into this folder.

NOTE: These files should be in the same folder as the R script, and this folder should be an initiated RStudio project (There should be a .Rproj file in the same folder).

NOTE 2: Check the samples folder a sample of the annotation_diann.csv file and how it should look like.

Open your RStudio project by double-clicking the .Rproj file in your newly created R project folder.
Open the script diann_to_msstats_formating_normalization_n_prep_for_limma.R
Modify lines between 16 to 21 to set up the parameters for both the transformation from MaxQuant format to MSstats format, and for the actual summarizaton and normalization.
Execute the script (click ‘Source’ on the top-right corner of the script).
The script should generate three .csv files: msstats_tabular_data_for_limma_input.csv, in wide format suitable for downstream analysis with limma. And two files in long format within MSstats_Output_data with the un-normalized and the normalized feature intensities before and after MSstats pre-processing.

Creating the annotation file

You have 2 options to create your annotation file:

Use the create_annotation_file.R script created for this purpuse (RECOMENDED). NOTE: Now the script only works if every sample corresponds to a different biological replicate and for label-free samples. Manually create your file if otherwise.
Manually create your annotation.csv file in a spread sheet editor (such as MS Excel)

Using the `create_annotation_file.R` ‘interactive’ script

Corroborate that you have the create_annotation_file.R in your R Project folder.
Go to the Console in your opened R Studio project session.
Type source("create_annotation_file.R")
Answer the questions as prompted on the Console in your R session.
Important!: please corroborate that your sample names/codes correspond with the desired experimental condition by opening the newly created annotation.csv file. It should be in the same folder of your R Project.

Manually create your `annotation.csv` file in a spread sheet editor

Open a new spread sheet (i.e. in MS Excel).
The first row should be your column names as follows: “Raw.file”, “Condition”, “BioReplicate”, “IsotopeLabelType”
Fill the rows with the required information for each of the required sample.

For Raw.file: give the name of your Thermo RAW file as it was named when processed by MaxQuant.
For Condition: give the Experimental or Biological condition of the sample.
For BioReplicate: give the number of the biological replicate associated with this sample. If every sample came from a different biological source, then you can give a different (any) number for each sample.
For IsotopeLabelType: Type of labelling. Since in this case we are working with label-free quantification, set all rows in this column to ‘L’.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
R		R
samples		samples
.gitignore		.gitignore
LICENSE		LICENSE
MQ_to_MSstats_formating_n_normalization.Rmd		MQ_to_MSstats_formating_n_normalization.Rmd
MQ_to_MSstats_formating_n_normalization_options_01.Rmd		MQ_to_MSstats_formating_n_normalization_options_01.Rmd
MSstats_labelfree_preprocessing.Rproj		MSstats_labelfree_preprocessing.Rproj
README.Rmd		README.Rmd
README.docx		README.docx
README.html		README.html
README.md		README.md
create_annotation_file.R		create_annotation_file.R
diann_to_msstats_formating_normalization_n_prep_for_limma.R		diann_to_msstats_formating_normalization_n_prep_for_limma.R
mq_to_msstats_formating_normalization_n_prep_for_limma.R		mq_to_msstats_formating_normalization_n_prep_for_limma.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MSstats label-free preprocessing

Instructions for using the script for your normalization step starting from MaxQuant outpus

Instructions for using the script for your normalization step starting from DIANN outputs

Creating the annotation file

Using the `create_annotation_file.R` ‘interactive’ script

Manually create your `annotation.csv` file in a spread sheet editor

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MSstats label-free preprocessing

Instructions for using the script for your normalization step starting from MaxQuant outpus

Instructions for using the script for your normalization step starting from DIANN outputs

Creating the annotation file

Using the create_annotation_file.R ‘interactive’ script

Manually create your annotation.csv file in a spread sheet editor

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Using the `create_annotation_file.R` ‘interactive’ script

Manually create your `annotation.csv` file in a spread sheet editor

Packages