Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"sprank" value not consistent #46

Closed
keesh0 opened this issue Nov 14, 2023 · 7 comments
Closed

"sprank" value not consistent #46

keesh0 opened this issue Nov 14, 2023 · 7 comments

Comments

@keesh0
Copy link

keesh0 commented Nov 14, 2023

keesh - 1 second ago
Jimmy,
We employ Comet in a clinical Proteomics pipeline at Mayo.
We would like deterministic results to adhere to great patient quality.
We would be happy to submit a pull request in order to facilitate this, if you are too busy.
Can you provide some hints on the affected variables / threads?

Jimmy Eng - 2021-03-09
This is due to threading and how results are stored. Peptides are only stored if they score greater than current lowest score. Depending on which threads finish first, the peptide list can change between runs, thus affective the sp scores and sp rank. This is an unfortunate artifact that is not going to be addressed.

@jke000
Copy link
Collaborator

jke000 commented Nov 14, 2023

Thanks for being a Comet user! It's still nice to know that there are researchers out there that still find value in this peptide identification tool.

I'll first throw out that you might consider simply not including the Sp rank in your analysis. Both Sp score and Sp rank used to be more relevant when they were the primary score applied to every candidate peptide and the xcorr calculated only on the best scoring candidates. Now that the xcorr is the primary score calculated on every candidate peptide, both Sp score and Sp rank are calculated only on a subset of the best xcorr scoring peptides for backwards compatibility. For whatever post processing tools you use that considers Sp rank, I would check if there's any meaningful/useful contribution of the Sp rank value and if not, ignore it. (I have considered deprecating calculating/reporting these two scores.)

If you're still motivated to address this, the Sp rank is calculated in the source file CometPostAnalysis.cpp. And the relevant variable is iRankSp which is calculated in AnalyzeSP(). However, I believe the issue is the peptide list stored during the search of a spectrum. This means looking into calls to StorePeptide() in CometSearch.cpp and making the stored peptide list deterministic. I will spend a few days on this in the coming weeks to see if I can address this. But I would definitely welcome a pull request if you are able to address this.

@dpsmca
Copy link

dpsmca commented Nov 14, 2023

Unfortunately I don't think we can turn off using the Sp rank values -- in our clinical pipeline, the output from Comet is used (along with output from several other tools like X!Tandem, Mascot, and some ProteoWizard utilities) as input for Scaffold to analyze into a single SF3 file that contains their analysis. I don't think there's an option to ignore Sp rank values in the SCAFML file that configures the options for each search.

The reason this is a concern for us is that sometimes the tiny differences in the Sp Rank values that Comet produces depending on the threading will occasionally cause small differences in the final protein and peptide counts that Scaffold produces. (Well, and in one case, a large difference, but so far that's been the only time we've seen that happen -- usually the differences are quite small.) And when we run searches on our standard set of validation files, the small differences mean we can't just do a an automated "yes the output files are identical to what we expect" comparison, so the lab users have to manually compare them to make sure the differences are within acceptable limits.

We can "solve" this issue by running Comet in single-threaded mode ... but of course that greatly extends the time it takes the Comet task to complete.

Thanks for taking a look at this issue, we really appreciate it!

@jke000
Copy link
Collaborator

jke000 commented Nov 14, 2023

It's worrisome that a small change in Sp rank could affect a noticeable difference in downstream results. I won't belabor the point but it would still be nice if you could set all of the Sp rank values to say "10" (or any other random value) in your evaluation datasets and see how that affects downstream results.

After thinking about it more, I do think there's hope in addressing this issue in the code by being more stringent/deterministic in how peptides are stored as the search progresses.

@keesh0
Copy link
Author

keesh0 commented Feb 5, 2024

Jimmy,
Any bandwidth on the following?
After thinking about it more, I do think there's hope in addressing this issue in the code by being more stringent/deterministic in how peptides are stored as the search progresses.

@jke000
Copy link
Collaborator

jke000 commented Feb 5, 2024

Yes, I've devoted time recently to see if I could address this. Sad to say I'm progressing the wrong way as I've made the issue worse (more inconsistent Sp rank values) in my current development code. Logically, it should be do-able but some of the inconsistencies I'm seeing pointing to a larger issue of peptides not showing up consistently between runs. If that is due to a bug in my current code then it should be easier to remedy vs. some historical core problem in Comet which would take more time to track down. Stay tuned as I am working on it.

jke000 added a commit that referenced this issue Mar 30, 2024
… replace based on modification state; after a handful of tests, I believe this finally addresses issue #46
@jke000
Copy link
Collaborator

jke000 commented Mar 31, 2024

Hi. I'm following up to mention that I believe this issue is finally addressed with the latest commit acca728. If you're interested and willing to run tests on your datasets, I've compiled Linux and Windows binaries which are available here; any feedback good or bad is helpful. Barring any unforeseen issues, my plan is to make a long overdue release of Comet in a couple of weeks.

@jke000
Copy link
Collaborator

jke000 commented May 6, 2024

Release 2024.01 rev. 0 should address this issue.

@jke000 jke000 closed this as completed May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants