Retreats Spring2014 ProteinInference
liangoaix edited this page Apr 15, 2014
·
27 revisions
Currently there is no good protein inference tool available in OpenMS. TPP_ProteinProphet adapter was written before and added in TOPPAS/External section, however it has been reported not working. For the urgent demand of protein inference tools in OpenMS, one of the feasible solutions would be to integrate existing tools.
- Fido: a C++ library with MIT license. It might go into OpenMS/contrib. Fido models a probability that a sample peptide is generated from a protein containing it with a constant value.
- ProteinProphet: the most widely used tool which has been used for comparison in most of the new algorithms. Using an EM-like algorithm, it learns‘degenerate peptide weight’. Such degenerate peptide weight corresponds to the probability of one protein being present conditional on the presence of a given peptide.
- PIA: Protein Inference Algorithms (PIA) combine PSMs from different experiments and/or search engines, and reports consistent and thus comparable results on the PSM, peptide and protein level. The algorithm suite is written in Java, including a fully parametrisable web interface, supporting format: idXML, mzIdentML, mzTab, pepXML. KNIME nodes are also available: click here.
- MaxQuant: see Martens' book
- ProteinLP 2012: A linear programming model with an optimization based on only joint probabilities (avoiding conditional probabilities). There are no codes available.
- Knut explains Xiaos project: We want to implement a bayesain appraoch that takes into account not only peptide ID lists but additional information from the MS1 map. He mentions PIA (Bochum)
- Hendrik: we would need a good protein inference in OpenMS now. Mentions Fido which is a C++ library with MIT license and could go into contrib.
- Lars: mentions the Huang He comprison paper (Knut adds Claasens paper). He thinks ProteonProhet would be a standard to beat and nice to have. (T. Huang and Z. He, Bioinformatics vol. 28 no. 22 (2012) 2956)
- George: Mayu from Claasen is more for FDR not protein inference
- Jens: Proteoypicity has to be on the whole genome. Also in the implementation it would be nice to have a container that encapsulates all evidence for a ProteinID so other approaches could use it.
- All: We discussed getting a possible "good" real data set. People should talk to experimental partners whether they could provide protein sets (antibodies?. commericla prtoeins (form synthesized peptides)? )
- Hendrik will look at possible integration of Fido into OpenMS contrib.
- Evaluation pipeline will be coordinated between Freiburg (Lars) and Berlin (Xiao) using one dataset (ground truth dataset Sigma49) from Huang/He and a dataset from Freiburg. The PIA we will test in KNIME, ProteinProphet (with pepXML generated in OpenMS) in Galaxy.
- Look at what MaxQuant does and further discussion is needed.
- Compare the tools mentioned above: PIA, Fido and ProteinProphet. The key is to find an appropriate way to compare the difference of protein lists and true positives.
- Xiao is going to fix the compatibility problem of OpenMS pepXML for ProteinProphet.
- How do we go about pepXML? (Fido, ProteinProhet, Jens (dnmso) still need it)
- pepXML(generated from peptideProphet) doesn't go through external/TPP_ProteinProhet in TOPPAS, problem with absolut paths in pep.xml files.
- pepXML(converted by IDfileconverter) can not be processed in PeptideProphet (OpenMS-ms-general mailing list Feb 12)
- Oliver might get data of whole protein measurements that could be combined with different fragmentation technologies (ETD, HCD, CID).
- Think about applying the consenus ID approach (possible MSc thesis)
- Think about integrating quantitation information, when having a labelled experiment. (Peptides originating from same protein expected to show similar fold changes.)