Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in mod reporting if multiple PSMs are tied for rank 1 for a spectrum #251

Closed
mriffle opened this issue Jul 23, 2021 · 2 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@mriffle
Copy link

mriffle commented Jul 23, 2021

I got this odd result in psm.tsv when using latest philosopher and comet:

QEP2_2017_1003_AZ_135_az583_AZ.58073.58073.2 interact-QEP2_2017_1003_AZ_135_az583_AZ.pep.xml MTAREVMDIDEAREKYAR M[147]TAREVMDIDEAREK[283]YAR 18 2 7061.3000 2753.1887 2753.1887 1377.6016 1377.6016 2354.1358 1178.0752 399.0511 0.4490 0.0010 0.0000 0.0000 1.0000 0.73400000000000 0.0000 0.0000 0.0511 2 3 0.0000 0.0000 0.0000 15K(155.0946), 19M(15.9949), 1M(15.9949), 7M(15.9949), N-term(155.0946) true gi|16128177|ref|NP_414726.1| gi|16128177|ref|NP_414726.1| DNA polymerase III alpha subunit [Escherichia coli str. K-12 substr. MG1655]

What's odd about it are the reported mods:

15K(155.0946), 19M(15.9949), 1M(15.9949), 7M(15.9949), N-term(155.0946)

This is the peptide in question: MTAREVMDIDEAREKYAR

This is odd for a couple reasons:

  1. The reported peptide only has two variable mods in it (M[147]TAREVMDIDEAREK[283]YAR), yet psm.tsv is reporting 5 variable mods on this peptide (none of them are static mods)
  2. The peptide doesn't have 19 residues, so the reported mod of 19M(15.9949) makes no sense.

Digging into the pepXML generated by commet, I see the following for that spectrum:

 <spectrum_query spectrum="QEP2_2017_1003_AZ_135_az583_AZ.58073.58073.2" spectrumNativeID="controllerType=0 controllerNumber=1 scan=58073" start_scan="58073" end_scan="58073" precursor_neutral_mass="2753.188721" assumed_charge="2" index="51618" retention_time_sec="7061.3">
  <search_result>
   <search_hit hit_rank="1" peptide="GGGNTIEYFVNTTFNYPTMAEAYR" peptide_prev_aa="K" peptide_next_aa="V" protein="gi|90111670|ref|NP_418397.2|" num_tot_proteins="1" num_matched_ions="2" tot_num_ions="46" calc_neutral_pep_mass="2886.317017" massdiff="-133.128296" num_tol_term="2" num_missed_cleavages="0" num_matched_peptides="245742">
    <modification_info modified_peptide="n[156]GGGNTIEYFVNTTFNYPTM[147]AEAYR" mod_nterm_mass="156.102425">
     <mod_aminoacid_mass position="19" mass="147.035385" variable="15.994900" source="param"/>
    </modification_info>
    <search_score name="xcorr" value="0.449"/>
    <search_score name="deltacn" value="0.000"/>
    <search_score name="deltacnstar" value="0.000"/>
    <search_score name="spscore" value="5.0"/>
    <search_score name="sprank" value="2"/>
    <search_score name="expect" value="7.34E-01"/>
   </search_hit>
   <search_hit hit_rank="1" peptide="MTAREVMDIDEAREKYAR" peptide_prev_aa="K" peptide_next_aa="G" protein="gi|16128177|ref|NP_414726.1|" num_tot_proteins="1" num_matched_ions="2" tot_num_ions="34" calc_neutral_pep_mass="2354.135822" massdiff="399.052899" num_tol_term="2" num_missed_cleavages="3" num_matched_peptides="245742">
    <modification_info modified_peptide="MTAREVM[147]DIDEAREK[283]YAR">
     <mod_aminoacid_mass position="7" mass="147.035385" variable="15.994900" source="param"/>
     <mod_aminoacid_mass position="15" mass="283.189563" variable="155.094600" source="param"/>
    </modification_info>
    <search_score name="xcorr" value="0.449"/>
    <search_score name="deltacn" value="0.001"/>
    <search_score name="deltacnstar" value="0.001"/>
    <search_score name="spscore" value="6.7"/>
    <search_score name="sprank" value="1"/>
    <search_score name="expect" value="7.34E-01"/>
   </search_hit>
   <search_hit hit_rank="1" peptide="MTAREVMDIDEAREKYAR" peptide_prev_aa="K" peptide_next_aa="G" protein="gi|16128177|ref|NP_414726.1|" num_tot_proteins="1" num_matched_ions="2" tot_num_ions="34" calc_neutral_pep_mass="2354.135822" massdiff="399.052899" num_tol_term="2" num_missed_cleavages="3" num_matched_peptides="245742">
    <modification_info modified_peptide="M[147]TAREVMDIDEAREK[283]YAR">
     <mod_aminoacid_mass position="1" mass="147.035385" variable="15.994900" source="param"/>
     <mod_aminoacid_mass position="15" mass="283.189563" variable="155.094600" source="param"/>
    </modification_info>
    <search_score name="xcorr" value="0.449"/>
    <search_score name="deltacn" value="0.001"/>
    <search_score name="deltacnstar" value="0.000"/>
    <search_score name="spscore" value="6.7"/>
    <search_score name="sprank" value="1"/>
    <search_score name="expect" value="7.34E-01"/>
   </search_hit>
   <search_hit hit_rank="2" peptide="GGGNTIEYFVNTTFNYPTMAEAYR" peptide_prev_aa="K" peptide_next_aa="V" protein="gi|90111670|ref|NP_418397.2|" num_tot_proteins="1" num_matched_ions="2" tot_num_ions="46" calc_neutral_pep_mass="2731.222417" massdiff="21.966304" num_tol_term="2" num_missed_cleavages="0" num_matched_peptides="245742">
    <modification_info modified_peptide="GGGNTIEYFVNTTFNYPTM[147]AEAYR">
     <mod_aminoacid_mass position="19" mass="147.035385" variable="15.994900" source="param"/>
    </modification_info>
    <search_score name="xcorr" value="0.449"/>
    <search_score name="deltacn" value="0.100"/>
    <search_score name="deltacnstar" value="0.000"/>
    <search_score name="spscore" value="5.0"/>
    <search_score name="sprank" value="2"/>
    <search_score name="expect" value="7.46E-01"/>
   </search_hit>
   <search_hit hit_rank="3" peptide="HPPTDYVEEGHDSFYVLFGNPNAAKFDK" peptide_prev_aa="K" peptide_next_aa="T" protein="rev_gi|16131793|ref|NP_418390.1|" num_tot_proteins="1" num_matched_ions="2" tot_num_ions="54" calc_neutral_pep_mass="3193.478129" massdiff="-440.289408" num_tol_term="2" num_missed_cleavages="1" num_matched_peptides="245742">
    <search_score name="xcorr" value="0.404"/>
    <search_score name="deltacn" value="1.000"/>
    <search_score name="deltacnstar" value="0.000"/>
    <search_score name="spscore" value="3.6"/>
    <search_score name="sprank" value="4"/>
    <search_score name="expect" value="2.52E+00"/>
   </search_hit>
  </search_result>
 </spectrum_query>

So, what it looks like to me, is that philosopher is grabbing all the reported modifications for all the rank 1 PSMs for this spectrum, and listing them all one the same line as a single psm in psm.tsv. It does this even if the peptides are not the same--which is extra bad.

IMO, the preferred behavior would be to list each of the rank 1 hit as separate lines in psm.tsv.

@mriffle
Copy link
Author

mriffle commented Jul 23, 2021

Actually, just remembered PeptideProphet only puts a probability on one of the rank 1 PSMs, so I think reporting all of them wouldn't work...

@prvst prvst self-assigned this Jul 23, 2021
@prvst prvst added the bug Something isn't working label Jul 23, 2021
@prvst
Copy link
Collaborator

prvst commented Jul 23, 2021

Hi @mriffle. Thanks for reporting this. The idea was never to collapse the top hits ties from Comet, I developed our logic around MSFragger which does not produce results with ties. I'll work on a fix for this.

@prvst prvst closed this as not planned Won't fix, can't repro, duplicate, stale Jul 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants