Tools for the analysis of protein correlation profiling data
The goal is to create tools to handle data that is generated by protein correlation profiling (PCP) data.
At the moment, this entails two main tasks (and several things that are difficult to categorize):
DepLab package that has been developed by the Applied Bioinformatics Core at Weill Cornell Medicine will serve as a starting point.
General data workflow for PCP data
DepLab package contains a shiny app that allows for:
- upload of PCP data into a data base
- smoothening of the data
- visual exploration of individual protein profiles
More details can be found in the manual.
The Hackathon Shiny App can be found here.
It includes functions and examples for the following cool tasks:
- interactive graphics
- additional plots, e.g. histograms of QC values to allow for user-defined filtering [QC should definitely be part of the development]
- log files once a user saves a plot to reload the exact same settings in the future
- connection to String, the database of protein interactions
Identify proteins whose profiles change between two (or more) conditions (taking the variability based on replicates into account)
* some sort of ranking * statistical significance?
Identify proteins that co-elute/change the same/different way(s), i.e.,
* that may be in the same complex * that may change the complex membership depending on the condition * ...
An R package containing the example data that we are going to work with
Quality control, both visually and perhaps even cooking up some sort of score?
- per protein - reproducibility between replicates - how well are certain "gold-standard" complexes revocered?
Updating the manual, making a proper vignette/tutorial (there should be one for every package at least)
Implementing proper tests for the functions, e.g. using Hadley's
Resources and references
- MaxQuant - the software we rely on to produce our primary data
- Brief description @ MPI website
- GUI user guide with some info about the output
- Andromeda paper
- How to interpret MQ output
- Nat. Methods MQ Practical Guide --> mostly Box 2
- Quantitative Proteome Profiling, Methods in Mol Biol --> Section 3.4 Data Analysis Using MaxQuant contains many details about the MQ output (ignore the SILAC details)
- Presentation with lots of MQ details
- Protein correlation profiling
- 15 minute interactive Git tutorial
- Longish Data camp course for using Git via RStudio
- Brief summary of git lingo and commands with a focus on team work
Markdown cheat sheets
Making diagrams, flow charts etc.
Creating R packages
- Brief intro
- The very long and detailed R page (I used it mostly as a reference for the formating of the documentation)
R packages that we will rely on
This is based on the packages currently used in DepLab. This is, of course, subject to change!
- R package creation and maintenance
- data wrangling