Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need help on learning to run MS2PIP #89

Closed
nattzy94 opened this issue Jul 12, 2020 · 1 comment
Closed

Need help on learning to run MS2PIP #89

nattzy94 opened this issue Jul 12, 2020 · 1 comment

Comments

@nattzy94
Copy link

Hi,

I am new to mass spec analysis and would like to use MS2PIP to improve protein predictions. My main goal is to identify small proteins in a mass spec dataset.

Currently, what I have done is to search mgf files against a database of Uniprot annotated proteins (H. sapiens). I then searched the resulting unmatched spectra against a database of small proteins. This outputs a number of predictions of small proteins. All of the searches were performed on PeptideShaker. As the MS experiment was not optimised for small proteins, the small peptide predictions are naturally, of low/doubtful confidence. Hence, I would like to see if using MS2PIP could improve the prediction quality.

I am a little confused as to where to start, however. I understand MS2PIP requires a PEPREC file to run. I generated a PEPREC of small proteins that I am interested in (~40,000 small proteins). This was done using the fasta2PEPREC.py script in the conversion_tools folder. I am not sure if I did this correctly as the resulting PEPREC file does not contain any amino acid mods (e.g. oxidation of M, carbamidomethylatino of C). How do I generate a PEPREC file properly containing AA modifications?

Having generated the PEPREC file, I then ran ms2pip and this outputs a HCD_predictions.csv file. I am stuck here as I don't know how to proceed to get improved protein predictions. Am I using the right workflow i.e. should I be starting from the protein database in the first place or should I start from the output predictions from PeptideShaker?

@RalfG
Copy link
Member

RalfG commented Jul 13, 2020

Hi! You've come to the right place! MS2 spectrum predictions can give a boost in sensitivity to challenging identification workflows (see https://doi.org/10.1093/bioinformatics/btz383, and https://doi.org/10.1002/pmic.201900351). The easiest and most versatile way to make use of MS²PIP to improve your identification workflow is with MS²ReScore. I noticed your issue over there (compomics/ms2rescore#11), so I'll help you out in that issue thread.

To clarify the use cases for our MS²PIP-related tools:

  • MS²PIP only predicts spectra, so for a given list of (modified) peptide sequences and a charge state, it will output a predicted spectrum. These spectra can be used directly for a number of use cases, for instance to manually compare with and inspect important peptide spectrum identifications (e.g. https://doi.org/10.1038/s41586-019-1555-y). This is, of course, not a very high throughput work method.
  • Fasta2SpecLib is a sort of wrapper around MS²PIP that takes not a peptide list as input, but a protein fasta file. It will then in silico digest those proteins and generate an MS²PIP-predicted spectral library to be used in spectral library searching, or as a reference library for DIA identifications (see https://doi.org/10.1002/pmic.201900306).
  • MS²ReScore enables you to use MS²PIP, and recently also DeepLC, to rescore peptide identifications by adding additional information to the Percolator input. This leads to a big boost in sensitivity, usually leading to more identifications at a more conservative false discovery rate threshold (https://doi.org/10.1093/bioinformatics/btz383).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants