Retreats Spring2014 ProteinInference

White Paper on OpenMS Protein Inference

Problems

Currently there is no good protein inference tool available in OpenMS. TPP_ProteinProphet adapter was written before and added in TOPPAS/External section, however it has been reported not working. For the urgent demand of protein inference tools in OpenMS, one of the feasible solutions would be to integrate existing tools.

Existing Tools

Fido: a C++ library with MIT license. It might go into OpenMS/contrib. Fido models a probability that a sample peptide is generated from a protein containing it with a constant value.
ProteinProphet: the most widely used tool which has been used for comparison in most of the new algorithms. Using an EM-like algorithm, it learns‘degenerate peptide weight’. Such degenerate peptide weight corresponds to the probability of one protein being present conditional on the presence of a given peptide.
PIA: Protein Inference Algorithms (PIA) combine PSMs from different experiments and/or search engines, and reports consistent and thus comparable results on the PSM, peptide and protein level. The algorithm suite is written in Java, including a fully parametrisable web interface, supporting format: idXML, mzIdentML, mzTab, pepXML. KNIME nodes are also available: click here.
MaxQuant: see Martens' book
ProteinLP 2012: A linear programming model with an optimization based on only joint probabilities (avoiding conditional probabilities). There are no codes available.

Protein inference discussion (Hendrik, Lars, George, Canan, Xiao, Rabia, Jens)

Knut explains Xiaos project: We want to implement a bayesain appraoch that takes into account not only peptide ID lists but additional information from the MS1 map. He mentions PIA (Bochum)
Hendrik: we would need a good protein inference in OpenMS now. Mentions Fido which is a C++ library with MIT license and could go into contrib.
Lars: mentions the Huang He comprison paper (Knut adds Claasens paper). He thinks ProteonProhet would be a standard to beat and nice to have. (T. Huang and Z. He, Bioinformatics vol. 28 no. 22 (2012) 2956)
George: Mayu from Claasen is more for FDR not protein inference
Jens: Proteoypicity has to be on the whole genome. Also in the implementation it would be nice to have a container that encapsulates all evidence for a ProteinID so other approaches could use it.
All: We discussed getting a possible "good" real data set. People should talk to experimental partners whether they could provide protein sets (antibodies?. commericla prtoeins (form synthesized peptides)? )

Action items

Hendrik will look at possible integration of Fido into OpenMS contrib.
Evaluation pipeline will be coordinated between Freiburg (Lars) and Berlin (Xiao) using one dataset (ground truth dataset Sigma49) from Huang/He and a dataset from Freiburg. The PIA we will test in KNIME, ProteinProphet (with pepXML generated in OpenMS) in Galaxy.
Look at what MaxQuant does and further discussion is needed.
Compare the tools mentioned above: PIA, Fido and ProteinProphet. The key is to find an appropriate way to compare the difference of protein lists and true positives.
Xiao is going to fix the compatibility problem of OpenMS pepXML for ProteinProphet.

Questions

How do we go about pepXML? (Fido, ProteinProhet, Jens (dnmso) still need it)
pepXML(generated from peptideProphet) doesn't go through external/TPP_ProteinProhet in TOPPAS, problem with absolut paths in pep.xml files.
pepXML(converted by IDfileconverter) can not be processed in PeptideProphet (OpenMS-ms-general mailing list Feb 12)

Misc

Oliver might get data of whole protein measurements that could be combined with different fragmentation technologies (ETD, HCD, CID).
Think about applying the consenus ID approach (possible MSc thesis)
Think about integrating quantitation information, when having a labelled experiment. (Peptides originating from same protein expected to show similar fold changes.)

Home

Provide feedback

Saved searches

Use saved searches to filter your results more quickly