Peptides Mapped to Wrong proteins #718

Skourtis · 2022-06-14T10:29:29Z

The Peptides which are reported in the ion_label_quant.tsv, in the modified sequence, are mapped directly to fasta file based on sequence matching, but often that peptide couldn't have arisen from that protein (because there is no cleavage site e.g. R|K for trypsin before the peptide) in the specfic protein.

E.g. This peptide AELLAGR maps to all these isoforms in the ion_label_quant (and probably downstream) but as you can see from their fasta sequence, They all have a W which is non-tryptic before the sequence. In this case I suspect it arises from a rev_ sequence in the fasta file and then accidentally matches to real proteins.

This run was done with enzymatic trypsin cleavage, (not for example with semi_n_term cleavage), so the R and K before should be respected.

This leads to peptides assigned to more proteins than they should, changing the Razor rule.

Skourtis · 2022-06-14T10:42:11Z

This also happens with post-translational modifications where a sequence n[42.0106]NGTSM[15.9949]ISLIIPPK, could only possibly arise from a protein which starts with this sequence (since only the [^ was allowed the n[42.0106] mod).
Again you can see that instead of only mapping to P62495_nterm_5204, because it is the n-term of the protein, and the other sequences don't have an R|K before the sequence, it maps to many of them.

fcyu · 2022-06-14T13:44:39Z

Thanks for your report. Are you using FragPipe 18.0 with Percolator enabled?

Thanks,

Fengchao

Skourtis · 2022-06-14T14:09:47Z

I'm using Fragpipe v17.1 and for PSM I'm using PeptideProphet for Closed Search.

fcyu · 2022-06-14T15:00:02Z

Then, the peptide-protein mapping is from PeptideProphet. I am not sure if it is something that we can change/fix.

Best,

Fengchao

prvst · 2022-06-14T15:52:31Z

Can you send me your .pepXML and pep.xml files?

prvst · 2022-06-14T16:40:02Z

Thanks

anesvi · 2022-06-14T16:42:00Z

Refresh parser tool of PeptideProphet maps based on the sequence. I do not think it consider cleavage rules. In MSFragger 3.5 we remap peptides ourselves so if used with percolator then you have get more accurate mapping, Fengchao? Get Outlook for iOS<https://aka.ms/o0ukef>

…

________________________________ From: Savvas Kourtis ***@***.***> Sent: Tuesday, June 14, 2022 6:42:23 AM To: Nesvilab/FragPipe ***@***.***> Cc: Subscribed ***@***.***> Subject: Re: [Nesvilab/FragPipe] Peptides Mapped to Wrong proteins (Issue #718) External Email - Use Caution This also happens with post-translational modifications where a sequence n[42.0106]NGTSM[15.9949]ISLIIPPK, could only possibly arise from a protein which starts with this sequence (since only the [^ was allowed the n[42.0106] mod). Again you can see that instead of only mapping to P62495_nterm_5204, because it is the n-term of the protein, and the other sequences don't have an R|K before the sequence, it maps to many of them. [image]<https://user-images.githubusercontent.com/51754041/173559134-44fb8478-0981-4a24-8c4b-51bfdd7f1922.png> [image]<https://user-images.githubusercontent.com/51754041/173558863-7d12dc38-dd19-4651-8580-19831d6e0ab7.png> [image]<https://user-images.githubusercontent.com/51754041/173559016-fb352dcd-3361-4823-a047-7f293b52246f.png> — Reply to this email directly, view it on GitHub<#718 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AIIMM63KMKDU3XXFGU342BLVPBOY7ANCNFSM5YXF4V3Q>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***> ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

prvst · 2022-06-14T16:52:00Z

From your first example, we can see that fragger mapped the peptide AELLAGR to the decoy protein Q15772:

<search_result>
<search_hit peptide="AELLAGR" massdiff="0.00103759765625" calc_neutral_pep_mass="738.4263" peptide_next_aa="C" num_missed_cleavages="0" num_tol_term="2" protein_descr="Striated muscle preferentially expressed protein kinase OS=Homo sapiens OX=9606 GN=SPEG PE=1 SV=4" num_tot_proteins="1" tot_num_ions="12" hit_rank="1" num_matched_ions="6" protein="rev_sp|Q15772|SPEG_HUMAN" peptide_prev_aa="R" is_rejected="0">
<modification_info modified_peptide="AELLAGR[166]">
<mod_aminoacid_mass mass="166.1094" position="7"/>
</modification_info>
<search_score name="hyperscore" value="13.407"/>
<search_score name="nextscore" value="11.721"/>
<search_score name="expect" value="1.057631e+00"/>
</search_hit>
</search_result>

PeptideProphet scored the mapping and added the real protein O15360 as an alternative to the assignement:

<spectrum_query start_scan="44773" uncalibrated_precursor_neutral_mass="738.42645" assumed_charge="2" spectrum="2020LD002_MAGU_002_03_50pto.44773.44773.2" end_scan="44773" index="7009" precursor_neutral_mass="738.4273" retention_time_sec="1375.9601211547852">
<search_result>
<search_hit peptide="AELLAGR" massdiff="0.00103759765625" calc_neutral_pep_mass="738.4263" peptide_next_aa="C" num_missed_cleavages="0" num_tol_term="2" protein_descr="Striated muscle preferentially expressed protein kinase OS=Homo sapiens OX=9606 GN=SPEG PE=1 SV=4" num_tot_proteins="2" tot_num_ions="12" hit_rank="1" num_matched_ions="6" protein="rev_sp|Q15772|SPEG_HUMAN" peptide_prev_aa="R" is_rejected="0">
<alternative_protein protein="sp|O15360|FANCA_HUMAN" protein_descr="Fanconi anemia group A protein OS=Homo sapiens OX=9606 GN=FANCA PE=1 SV=2" num_tol_term="1" peptide_prev_aa="W" peptide_next_aa="V"/>
<modification_info modified_peptide="AELLAGR[166]">
<mod_aminoacid_mass mass="166.1094" position="7"/>
</modification_info>
<search_score name="hyperscore" value="13.407"/>
<search_score name="nextscore" value="11.721"/>
<search_score name="expect" value="1.057631e+00"/>
<analysis_result analysis="peptideprophet">
<peptideprophet_result probability="0.4301" all_ntt_prob="(0.0000,0.0000,0.4301)">
<search_score_summary>
<parameter name="fval" value="0.9580"/>
<parameter name="ntt" value="2"/>
<parameter name="nmc" value="0"/>
<parameter name="massd" value="1.405"/>
<parameter name="isomassd" value="0"/>
</search_score_summary>
</peptideprophet_result>
</analysis_result>
</search_hit>
</search_result>
</spectrum_query>

The switch happened during the filtering step, where we perform something that we call 'protein promotion'. If a PSM maps to a decoy, and has a target entry as an alternative, we flip them, and the assignment becomes a "target" one. That is why you see O15360 in the report.

fcyu · 2022-06-14T17:05:09Z

We also don't consider the cleavage rules in MSFragger 3.5. I think we should add it in the next version.

Best,

Fengchao

Refresh parser tool of PeptideProphet maps based on the sequence. I do not think it consider cleavage rules. In MSFragger 3.5 we remap peptides ourselves so if used with percolator then you have get more accurate mapping, Fengchao?

Skourtis · 2022-06-16T12:29:20Z

Hi Everyone!

thank you for your quick and clear responses! I've implemented a peptide-> protein remapping in my scripts while I wait for the next version! Some of the peptides will not map to any real proteins (which will mean that they were actually decoy) but from what I have seen this is extremely rare (10 peptide evidence out of 20000) so shouldn't have an impact on FDR.

Thanks!

fcyu · 2022-09-12T19:29:51Z

Fixed. Will be available in the next release.

Best,

Fengchao

Skourtis changed the title ~~Peptides Mapped to Wrong proteins in ion_quant.tsv~~ Peptides Mapped to Wrong proteins Jun 14, 2022

fcyu self-assigned this Jun 14, 2022

fcyu assigned prvst Jun 14, 2022

fcyu added the Philosopher label Jun 14, 2022

Nesvilab deleted a comment from Skourtis Jun 14, 2022

fcyu added the MSFragger label Jun 14, 2022

fcyu mentioned this issue Jul 1, 2022

Some N-terminal acetylations are not N-terminal #748

Closed

hollenstein mentioned this issue Jul 22, 2022

Wrong peptide start and end positions #771

Closed

fcyu mentioned this issue Aug 7, 2022

Few peptides are included in report though didn't match digestion rule #788

Closed

fcyu closed this as completed Sep 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Peptides Mapped to Wrong proteins #718

Peptides Mapped to Wrong proteins #718

Skourtis commented Jun 14, 2022 •

edited

Skourtis commented Jun 14, 2022

fcyu commented Jun 14, 2022

Skourtis commented Jun 14, 2022 •

edited

fcyu commented Jun 14, 2022

prvst commented Jun 14, 2022

prvst commented Jun 14, 2022

anesvi commented Jun 14, 2022 via email

prvst commented Jun 14, 2022

fcyu commented Jun 14, 2022

Skourtis commented Jun 16, 2022 •

edited

fcyu commented Sep 12, 2022

Peptides Mapped to Wrong proteins #718

Peptides Mapped to Wrong proteins #718

Comments

Skourtis commented Jun 14, 2022 • edited

Skourtis commented Jun 14, 2022

fcyu commented Jun 14, 2022

Skourtis commented Jun 14, 2022 • edited

fcyu commented Jun 14, 2022

prvst commented Jun 14, 2022

prvst commented Jun 14, 2022

anesvi commented Jun 14, 2022 via email

prvst commented Jun 14, 2022

fcyu commented Jun 14, 2022

Skourtis commented Jun 16, 2022 • edited

fcyu commented Sep 12, 2022

Skourtis commented Jun 14, 2022 •

edited

Skourtis commented Jun 14, 2022 •

edited

Skourtis commented Jun 16, 2022 •

edited