-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Peptides Mapped to Wrong proteins #718
Comments
Thanks for your report. Are you using FragPipe 18.0 with Percolator enabled? Thanks, Fengchao |
I'm using Fragpipe v17.1 and for PSM I'm using PeptideProphet for Closed Search. |
Then, the peptide-protein mapping is from PeptideProphet. I am not sure if it is something that we can change/fix. Best, Fengchao |
Can you send me your .pepXML and pep.xml files? |
Thanks |
Refresh parser tool of PeptideProphet maps based on the sequence. I do not think it consider cleavage rules.
In MSFragger 3.5 we remap peptides ourselves so if used with percolator then you have get more accurate mapping, Fengchao?
Get Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Savvas Kourtis ***@***.***>
Sent: Tuesday, June 14, 2022 6:42:23 AM
To: Nesvilab/FragPipe ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [Nesvilab/FragPipe] Peptides Mapped to Wrong proteins (Issue #718)
External Email - Use Caution
This also happens with post-translational modifications where a sequence n[42.0106]NGTSM[15.9949]ISLIIPPK, could only possibly arise from a protein which starts with this sequence (since only the [^ was allowed the n[42.0106] mod).
Again you can see that instead of only mapping to P62495_nterm_5204, because it is the n-term of the protein, and the other sequences don't have an R|K before the sequence, it maps to many of them.
[image]<https://user-images.githubusercontent.com/51754041/173559134-44fb8478-0981-4a24-8c4b-51bfdd7f1922.png>
[image]<https://user-images.githubusercontent.com/51754041/173558863-7d12dc38-dd19-4651-8580-19831d6e0ab7.png>
[image]<https://user-images.githubusercontent.com/51754041/173559016-fb352dcd-3361-4823-a047-7f293b52246f.png>
—
Reply to this email directly, view it on GitHub<#718 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AIIMM63KMKDU3XXFGU342BLVPBOY7ANCNFSM5YXF4V3Q>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
|
From your first example, we can see that fragger mapped the peptide AELLAGR to the decoy protein Q15772:
PeptideProphet scored the mapping and added the real protein O15360 as an alternative to the assignement:
The switch happened during the filtering step, where we perform something that we call 'protein promotion'. If a PSM maps to a decoy, and has a target entry as an alternative, we flip them, and the assignment becomes a "target" one. That is why you see O15360 in the report. |
We also don't consider the cleavage rules in MSFragger 3.5. I think we should add it in the next version. Best, Fengchao
|
Hi Everyone! thank you for your quick and clear responses! I've implemented a peptide-> protein remapping in my scripts while I wait for the next version! Some of the peptides will not map to any real proteins (which will mean that they were actually decoy) but from what I have seen this is extremely rare (10 peptide evidence out of 20000) so shouldn't have an impact on FDR. Thanks! |
Fixed. Will be available in the next release. Best, Fengchao |
The Peptides which are reported in the ion_label_quant.tsv, in the modified sequence, are mapped directly to fasta file based on sequence matching, but often that peptide couldn't have arisen from that protein (because there is no cleavage site e.g. R|K for trypsin before the peptide) in the specfic protein.
E.g. This peptide AELLAGR maps to all these isoforms in the ion_label_quant (and probably downstream) but as you can see from their fasta sequence, They all have a W which is non-tryptic before the sequence. In this case I suspect it arises from a rev_ sequence in the fasta file and then accidentally matches to real proteins.
This run was done with enzymatic trypsin cleavage, (not for example with semi_n_term cleavage), so the R and K before should be respected.
This leads to peptides assigned to more proteins than they should, changing the Razor rule.
The text was updated successfully, but these errors were encountered: