Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peptides Mapped to Wrong proteins #718

Closed
Skourtis opened this issue Jun 14, 2022 · 11 comments
Closed

Peptides Mapped to Wrong proteins #718

Skourtis opened this issue Jun 14, 2022 · 11 comments

Comments

@Skourtis
Copy link

Skourtis commented Jun 14, 2022

The Peptides which are reported in the ion_label_quant.tsv, in the modified sequence, are mapped directly to fasta file based on sequence matching, but often that peptide couldn't have arisen from that protein (because there is no cleavage site e.g. R|K for trypsin before the peptide) in the specfic protein.

E.g. This peptide AELLAGR maps to all these isoforms in the ion_label_quant (and probably downstream) but as you can see from their fasta sequence, They all have a W which is non-tryptic before the sequence. In this case I suspect it arises from a rev_ sequence in the fasta file and then accidentally matches to real proteins.

This run was done with enzymatic trypsin cleavage, (not for example with semi_n_term cleavage), so the R and K before should be respected.

This leads to peptides assigned to more proteins than they should, changing the Razor rule.
image
image

@Skourtis Skourtis changed the title Peptides Mapped to Wrong proteins in ion_quant.tsv Peptides Mapped to Wrong proteins Jun 14, 2022
@Skourtis
Copy link
Author

This also happens with post-translational modifications where a sequence n[42.0106]NGTSM[15.9949]ISLIIPPK, could only possibly arise from a protein which starts with this sequence (since only the [^ was allowed the n[42.0106] mod).
Again you can see that instead of only mapping to P62495_nterm_5204, because it is the n-term of the protein, and the other sequences don't have an R|K before the sequence, it maps to many of them.
image

image
image

@fcyu
Copy link
Member

fcyu commented Jun 14, 2022

Thanks for your report. Are you using FragPipe 18.0 with Percolator enabled?

Thanks,

Fengchao

@fcyu fcyu self-assigned this Jun 14, 2022
@Skourtis
Copy link
Author

Skourtis commented Jun 14, 2022

I'm using Fragpipe v17.1 and for PSM I'm using PeptideProphet for Closed Search.

@fcyu
Copy link
Member

fcyu commented Jun 14, 2022

Then, the peptide-protein mapping is from PeptideProphet. I am not sure if it is something that we can change/fix.

Best,

Fengchao

@prvst
Copy link

prvst commented Jun 14, 2022

Can you send me your .pepXML and pep.xml files?

@Nesvilab Nesvilab deleted a comment from Skourtis Jun 14, 2022
@prvst
Copy link

prvst commented Jun 14, 2022

Thanks

@anesvi
Copy link
Collaborator

anesvi commented Jun 14, 2022 via email

@prvst
Copy link

prvst commented Jun 14, 2022

From your first example, we can see that fragger mapped the peptide AELLAGR to the decoy protein Q15772:

<search_result>
<search_hit peptide="AELLAGR" massdiff="0.00103759765625" calc_neutral_pep_mass="738.4263" peptide_next_aa="C" num_missed_cleavages="0" num_tol_term="2" protein_descr="Striated muscle preferentially expressed protein kinase OS=Homo sapiens OX=9606 GN=SPEG PE=1 SV=4" num_tot_proteins="1" tot_num_ions="12" hit_rank="1" num_matched_ions="6" protein="rev_sp|Q15772|SPEG_HUMAN" peptide_prev_aa="R" is_rejected="0">
<modification_info modified_peptide="AELLAGR[166]">
<mod_aminoacid_mass mass="166.1094" position="7"/>
</modification_info>
<search_score name="hyperscore" value="13.407"/>
<search_score name="nextscore" value="11.721"/>
<search_score name="expect" value="1.057631e+00"/>
</search_hit>
</search_result>

PeptideProphet scored the mapping and added the real protein O15360 as an alternative to the assignement:

<spectrum_query start_scan="44773" uncalibrated_precursor_neutral_mass="738.42645" assumed_charge="2" spectrum="2020LD002_MAGU_002_03_50pto.44773.44773.2" end_scan="44773" index="7009" precursor_neutral_mass="738.4273" retention_time_sec="1375.9601211547852">
<search_result>
<search_hit peptide="AELLAGR" massdiff="0.00103759765625" calc_neutral_pep_mass="738.4263" peptide_next_aa="C" num_missed_cleavages="0" num_tol_term="2" protein_descr="Striated muscle preferentially expressed protein kinase OS=Homo sapiens OX=9606 GN=SPEG PE=1 SV=4" num_tot_proteins="2" tot_num_ions="12" hit_rank="1" num_matched_ions="6" protein="rev_sp|Q15772|SPEG_HUMAN" peptide_prev_aa="R" is_rejected="0">
<alternative_protein protein="sp|O15360|FANCA_HUMAN" protein_descr="Fanconi anemia group A protein OS=Homo sapiens OX=9606 GN=FANCA PE=1 SV=2" num_tol_term="1" peptide_prev_aa="W" peptide_next_aa="V"/>
<modification_info modified_peptide="AELLAGR[166]">
<mod_aminoacid_mass mass="166.1094" position="7"/>
</modification_info>
<search_score name="hyperscore" value="13.407"/>
<search_score name="nextscore" value="11.721"/>
<search_score name="expect" value="1.057631e+00"/>
<analysis_result analysis="peptideprophet">
<peptideprophet_result probability="0.4301" all_ntt_prob="(0.0000,0.0000,0.4301)">
<search_score_summary>
<parameter name="fval" value="0.9580"/>
<parameter name="ntt" value="2"/>
<parameter name="nmc" value="0"/>
<parameter name="massd" value="1.405"/>
<parameter name="isomassd" value="0"/>
</search_score_summary>
</peptideprophet_result>
</analysis_result>
</search_hit>
</search_result>
</spectrum_query>

The switch happened during the filtering step, where we perform something that we call 'protein promotion'. If a PSM maps to a decoy, and has a target entry as an alternative, we flip them, and the assignment becomes a "target" one. That is why you see O15360 in the report.

@fcyu
Copy link
Member

fcyu commented Jun 14, 2022

We also don't consider the cleavage rules in MSFragger 3.5. I think we should add it in the next version.

Best,

Fengchao

Refresh parser tool of PeptideProphet maps based on the sequence. I do not think it consider cleavage rules. In MSFragger 3.5 we remap peptides ourselves so if used with percolator then you have get more accurate mapping, Fengchao?

@fcyu fcyu added the MSFragger label Jun 14, 2022
@Skourtis
Copy link
Author

Skourtis commented Jun 16, 2022

Hi Everyone!

thank you for your quick and clear responses! I've implemented a peptide-> protein remapping in my scripts while I wait for the next version! Some of the peptides will not map to any real proteins (which will mean that they were actually decoy) but from what I have seen this is extremely rare (10 peptide evidence out of 20000) so shouldn't have an impact on FDR.

Thanks!

@fcyu
Copy link
Member

fcyu commented Sep 12, 2022

Fixed. Will be available in the next release.

Best,

Fengchao

@fcyu fcyu closed this as completed Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants