Google Cloud Proteomics Training Tutorial

Image adapted from https://doi.org/10.1038/nature01511

Google Cloud Proteomics Training Tutorial

Proteomics Data Analysis Overview

This notebook outlines the essential steps in the process of analyzing proteomics data and recommends commonly used tools and techniques for this purpose. It assumes a simple experimental design for differential abundance including two experimental conditions such as cancer vs normal. The training data provided utilized TMT10plex multiplex design with MS3 data acquistion. This notebook describes mass spectrometry and statistical terminology for data preprocessing, normalization, and differential abundance analysis. Note: This notebook uses simple base R plots. These can be modified to learn how to build better publication quality plots using R.

Requirements

This tutorial was designed to be used on cloud computing platforms, with the aim of requiring nothing but the files within this github repository. The Jupyter Notebook file can run on Google Cloud Platform, Amazon Web Service, and Microsoft Azure provided the R packages are installed. The Notebook can be launched using NIH STRIDES training module and therfore requirements should only require access NIH STRIDES resources.

License for Data

Text and materials are licensed under a Creative Commons CC-BY-NC-SA license. The license allows you to copy, remix and redistribute any of our publicly available materials, under the condition that you attribute the work (details in the license) and do not make profits from it. More information is available here.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

This tutorial will cost you just less than $1.00 assuming a n1-standard-4 machine, and assuming you delete the virtual machine after you finish the tutorial.

Getting Started

Creating a user managed notebook

Follow the steps highlighted here to create a new user-managed notebook in Vertex AI. Follow steps 1-8 and be especially careful to enable idle shutdown as highlighted in step 7. For this module you should select Debian 10 and R 4.2 in the Environment tab in step 5. In step 6 in the Machine type tab, select n1-standard-4 from the dropdown box.

To clone this repository, open a Terminal window in your new instance and type git clone https://github.com/NIGMS/Proteome-Quantification.git This will create a directory called Proteome-Quantification. Navigate into that directory and open the tutorial notebooks to get started.

Basic Steps

Database search using Mascot, MaxQuant, or Prosit/EncylopeDIA. The example TMT data was searched using MS3 in MaxQuant.
Assess the sample variance, biological replicate correlation, and data distributions using ProtieNorm (Graw et al 2021).
Perfom data normalization using the method with the lowest variance and highest intra-group correlation. For the majority of cases, VSN and Cyclic Loess have performed well.
Plot quality control figures such as PCA and clustered dendrograms to check for outlier samples. These plots will give an indication of the effect size in the data. How many proteins do we expect to be differenitally expressed?
Set up the limma model and run analysis. The model should consider factors such as batch, sex, age, if the samples are paired, etc.
Plot the results using Volcano and/or MD plots.

Funding

Funded by National Resource for Quantitative Proteomics NIH/NIGHMS R24GM137786.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github		.github
ISSUE_TEMPLATE		ISSUE_TEMPLATE
images		images
proteinorm		proteinorm
Byrum_ProteiNorm_cyclicLoess_Rinput2_meta.csv		Byrum_ProteiNorm_cyclicLoess_Rinput2_meta.csv
Logo.jpg		Logo.jpg
Module1_proteomic_analysis.ipynb		Module1_proteomic_analysis.ipynb
Module1_proteomic_secondary_analysis.ipynb		Module1_proteomic_secondary_analysis.ipynb
PULL_REQUEST_TEMPLATE.md		PULL_REQUEST_TEMPLATE.md
README.md		README.md
WorkFlow_figure.png		WorkFlow_figure.png
contrasts.csv		contrasts.csv
proteoDA_MS3_input.csv		proteoDA_MS3_input.csv
proteoDA_sample_metafile.csv		proteoDA_sample_metafile.csv
sandbox-v1.yml		sandbox-v1.yml

NIGMS/Proteome-Quantification

Folders and files

Latest commit

History

Repository files navigation

Google Cloud Proteomics Training Tutorial

Proteomics Data Analysis Overview

Table of Contents

Requirements

License for Data

Getting Started

Creating a user managed notebook

Basic Steps

Funding

About

Resources

Stars

Watchers

Forks

Languages