Image adapted from https://doi.org/10.1038/nature01511
This notebook outlines the essential steps in the process of analyzing proteomics data and recommends commonly used tools and techniques for this purpose. It assumes a simple experimental design for differential abundance including two experimental conditions such as cancer vs normal. The training data provided utilized TMT10plex multiplex design with MS3 data acquistion. This notebook describes mass spectrometry and statistical terminology for data preprocessing, normalization, and differential abundance analysis. Note: This notebook uses simple base R plots. These can be modified to learn how to build better publication quality plots using R.
This tutorial was designed to be used on cloud computing platforms, with the aim of requiring nothing but the files within this github repository. The Jupyter Notebook file can run on Google Cloud Platform, Amazon Web Service, and Microsoft Azure provided the R packages are installed. The Notebook can be launched using NIH STRIDES training module and therfore requirements should only require access NIH STRIDES resources.
Text and materials are licensed under a Creative Commons CC-BY-NC-SA license. The license allows you to copy, remix and redistribute any of our publicly available materials, under the condition that you attribute the work (details in the license) and do not make profits from it. More information is available here.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
This tutorial will cost you just less than $1.00 assuming a n1-standard-4 machine, and assuming you delete the virtual machine after you finish the tutorial.
Follow the steps highlighted here to create a new user-managed notebook in Vertex AI. Follow steps 1-8 and be especially careful to enable idle shutdown as highlighted in step 7. For this module you should select Debian 10 and R 4.2 in the Environment tab in step 5. In step 6 in the Machine type tab, select n1-standard-4 from the dropdown box.
To clone this repository, open a Terminal window in your new instance and type git clone https://github.com/NIGMS/Proteome-Quantification.git
This will create a directory called Proteome-Quantification. Navigate into that directory and open the tutorial notebooks to get started.
- Database search using Mascot, MaxQuant, or Prosit/EncylopeDIA. The example TMT data was searched using MS3 in MaxQuant.
- Assess the sample variance, biological replicate correlation, and data distributions using ProtieNorm (Graw et al 2021).
- Perfom data normalization using the method with the lowest variance and highest intra-group correlation. For the majority of cases, VSN and Cyclic Loess have performed well.
- Plot quality control figures such as PCA and clustered dendrograms to check for outlier samples. These plots will give an indication of the effect size in the data. How many proteins do we expect to be differenitally expressed?
- Set up the limma model and run analysis. The model should consider factors such as batch, sex, age, if the samples are paired, etc.
- Plot the results using Volcano and/or MD plots.
Funded by National Resource for Quantitative Proteomics NIH/NIGHMS R24GM137786.