Unified decoy-AA global FLR across AScore / PhosphoRS / LucXor + scoring fixes (#40)#41
Unified decoy-AA global FLR across AScore / PhosphoRS / LucXor + scoring fixes (#40)#41timosachsenberg wants to merge 14 commits into
Conversation
…site scores (#40) Prerequisites for a unified decoy-amino-acid global FLR across AScore, PhosphoRS and LucXor (#40). The PhosphoDecoy(A) flag existed but two algorithms scored the decoy A incorrectly, biasing the FLR low. AScore (ascore.py): - getSites_ appended decoy A positions after S/T/Y without re-sorting, so combinations() emitted descending index combos (e.g. [4,1]) that createTheoreticalSpectra_ silently dropped (it assumes ascending order). Sort candidate sites, and defensively sort each permutation before building its spectrum, so decoy A is no longer dropped on multi-phospho peptides. LucXor (psm.py): - The production scoring path did not recognize the lowercase 'a' that encodes a PhosphoDecoy(A) site, so an A-localization was scored as the unmodified backbone and could not be serialized. Teach _get_mod_positions_from_perm, _get_mod_map and convert_sequence_to_standard_format to treat 'a' as a real phospho-mass-bearing PhosphoDecoy site (is_decoy stays False; full +79.966 mass; serializes to A(PhosphoDecoy)). End-to-end this surfaces 145 A-wins that were previously reverted to the input sequence. LucXor per-site scoring (psm.py, cli.py): - Add PSM.get_site_scores(): a per-site localization confidence derived from the already-computed real-permutation scores (max-with minus max-without delta), with no new spectral scoring. Emit it as the Luciphor_site_scores meta value in both output paths. This gives LucXor a per-site score (it is natively per-PSM only) so a site-level decoy-AA FLR can rank sites uniformly across all three tools. Tests: +10 (AScore sort/drop regressions; LucXor 'a' scoring/serialization; get_site_scores aggregation). Full LucXor pipeline suite unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… tools (#40) Adds the site-level decoy-amino-acid FLR that lets AScore, PhosphoRS and LucXor be compared at the same PSM-FDR and the same FLR level (#40), implementing Ramsbottom et al. 2022 (J. Proteome Res., 10.1021/acs.jproteome.1c00827). onsite/decoy_flr.py (new) + `python -m onsite.decoy_flr`: - Pure, separately-tested estimator flr_curve(): Eq. 2 pX_FLR_n = 2*(T_c/X_c)*cum_decoy/n, capped at 1.0, q-value-style monotonization (reverse cummin), and sites_at_flr() for the yield at a cutoff. - Mandatory "same PSM-FDR" filtering: drop identification decoys and apply the q-value threshold, then restrict to the spectrum_reference intersection shared by all tools (reporting per-tool drops). - T_c/X_c computed over exactly the analyzed set; unambiguous peptides excluded. - Redundant (peptide, position) sites collapsed (max score, PSM-count tiebreak). - Uniform position-keyed per-site scores across tools (AScore_site_scores / PhosphoRS_site_probs / Luciphor_site_scores). AScore (ascore.py, cli.py): - Emit position-keyed AScore_site_scores (from the existing site2score), preserved through both CLI paths, so AScore matches the other tools' format. PhosphoRS (phosphors.py): - Fix str(AASequence) -> .toString(): under pyopenms 3.5 str() returns the object repr, so the localized sequence was written as "AASequence(sequence=...)" and --add-decoys crashed on save. Fixes the two output sites and one unused site. Validated end-to-end with --add-decoys on the example data: on the shared 2470-PSM set at 5% global FLR, LucXor recovers 574 sites, AScore 441, PhosphoRS 181; decoys concentrate at low scores and the position join is correct for all three (>=298/300). Full suite: 168 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… position (#40) `onsite all` fused the AScore / PhosphoRS / LucXor outputs by zipping the three result lists by index. The tools drop/reorder PSMs independently (LucXor keepNBestHits / spectrum-miss / min-PSM abort; AScore error-drops), so a single dropped PSM shifted every later index and silently fused scores from different peptides into one hit; only a non-fatal length warning guarded it. - Add _join_psms_by_ref(): match PeptideIdentifications across tools by spectrum_reference (in LucXor order), verify the unmodified backbone is identical across all three before fusing, and report per-tool exclusions instead of silently truncating. - Rewrite the merge loop to iterate the keyed triples. - Fix the store call: merged_pep_ids was a plain list, which IdXMLFile().store rejects on pyopenms 3.5 (so the merge never actually produced output); use a PeptideIdentificationList. Validated end-to-end: merging the three example outputs yields 3697 PSMs with all three tool-sequences aligned and zero backbone mismatches. +3 unit tests (incl. the missing-middle-PSM misalignment case). Full suite: 171 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PhosphoRS's per-isomer peak matching scanned theoretical ions while advancing a single experimental-peak pointer that was never consumed, and counted every theoretical ion (including indistinguishable near-duplicates). Both inflate the binomial k/n, and asymmetrically: decoy-A isomers carry extra phospho neutral-loss ions that cluster, so decoys accrued more spurious matches than S/T/Y isomers (~0.66 vs ~0.45 extra matches/isomer) and won more often. Add a tested helper _count_matched_ions() that: - merges theoretical ions within one tolerance window (one experimental peak cannot distinguish them -> count as a single trial), and - consumes each experimental peak at most once (no peak satisfies several ions). Also sort red_mz_arr explicitly (the matcher and the window width both require ascending m/z; the old scan silently assumed it, and would mis-match if not). Effect on the example data: PhosphoRS decoy-localization rate drops 5.4% -> 4.3% (36 -> 28 decoy wins), in line with AScore (4.6%). The decoy-AA FLR yield stays ~179 and tie-sensitive, which is correct: that reflects genuine saturation of PhosphoRS's reported site probabilities, not this double-counting bug. +1 unit test (no-double-count + near-duplicate ion merge). Full suite: 172 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughEmits per-site localization scores (AScore/LucXor/PhosphoRS), fixes AScore site/permutation ordering, treats lowercase 'a' as PhosphoDecoy alanine in LucXor, aligns PSMs by spectrum_reference, adds a unified decoy‑AA FLR estimator with idXML parsing/CLI, and improves phosphors ion-matching/peak‑depth reduction. ChangesAScore Site Ordering and Cross-Tool Site Score Metadata
LucXor PhosphoDecoy and Site Scores
PSM Alignment and Merged Results
Decoy-AA False Localization Rate Estimator
PhosphoRS Ion Matching and Peak-Depth Optimization
Sequence Diagram (decoy-AA FLR high-level flow) sequenceDiagram
participant CLI as compute_decoy_flr/main
participant Parser as parse_tool_idxml
participant ToolCompute as compute_tool_flr
participant Intersector as shared_spectrum_intersection
participant FLR as flr_curve
CLI->>Parser: parse each tool idXML
Parser->>ToolCompute: emit PSMRecord list
ToolCompute->>Intersector: restrict to shared spectrum_refs
Intersector->>ToolCompute: keep-set
ToolCompute->>FLR: rank collapsed sites -> is_decoy_by_rank, t_c, x_c
FLR->>CLI: return per-tool FLR curve & cutoff
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Not up to standards ⛔🔴 Issues
|
| Category | Results |
|---|---|
| Documentation | 31 minor |
| ErrorProne | 3 high |
| Security | 1 medium 55 high |
| Complexity | 10 medium |
🟢 Metrics 131 complexity · -10 duplication
Metric Results Complexity 131 Duplication -10
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
🧹 Nitpick comments (3)
tests/test_onsitec_merge.py (2)
36-36: ⚡ Quick winRename single-letter variable for clarity.
The variable
l(lowercase L) is flagged by PEP8 as ambiguous and hard to distinguish from1(digit one) orI(uppercase i). Consider renaming tolucorlucxor_pid.♻️ Proposed fix
- refs = [l.getMetaValue("spectrum_reference") for (_a, _p, l) in triples] + refs = [luc.getMetaValue("spectrum_reference") for (_a, _p, luc) in triples]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_onsitec_merge.py` at line 36, Rename the single-letter variable `l` in the list comprehension refs = [l.getMetaValue("spectrum_reference") for (_a, _p, l) in triples] to a clearer name (e.g., `luc` or `lucxor_pid`) throughout that expression; update the comprehension to use the new identifier and ensure any downstream references in the same scope that relied on `l` are renamed as well so the call to getMetaValue("spectrum_reference") now reads luc.getMetaValue(...).
68-78: ⚡ Quick winRename single-letter variable for clarity.
Same PEP8 ambiguity issue with
lon line 77.♻️ Proposed fix
- assert [l.getMetaValue("spectrum_reference") for (_a, _p, l) in triples] == refs + assert [luc.getMetaValue("spectrum_reference") for (_a, _p, luc) in triples] == refs🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_onsitec_merge.py` around lines 68 - 78, The test function test_join_identical_sets uses a single-letter variable `l` in the list comprehension extracting spectrum_reference from `triples`, which is ambiguous; rename `l` to a clearer name (e.g., `lucxor_psm` or `last_psm`) and update the unpacking for `triples` (from `(_a, _p, l)` to `(_a, _p, lucxor_psm)`) so the comprehension [l.getMetaValue("spectrum_reference") for ...] becomes [lucxor_psm.getMetaValue("spectrum_reference") for ...], keeping the call to `_join_psms_by_ref` and the rest of the assertions unchanged.tests/test_decoy_flr.py (1)
101-112: 💤 Low valueConsider using
next()for single-element extraction.Line 109 uses list comprehension with indexing
[0]to extract one element. Usingnext()is more idiomatic and raisesStopIterationif no match is found (which would surface a test logic error).♻️ Proposed refactor
- top = [c for c in collapsed if c[2] == 2][0] + top = next(c for c in collapsed if c[2] == 2)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_decoy_flr.py` around lines 101 - 112, In test_collapse_takes_max_and_counts_psms, replace the list-comprehension indexing used to extract the single matching element with next(...) so a missing match raises StopIteration; specifically change the expression that sets top (currently using [c for c in collapsed if c[2] == 2][0]) to use next(c for c in collapsed if c[2] == 2) while keeping the rest of the assertion logic and the reference to _collapse_sites intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/test_decoy_flr.py`:
- Around line 101-112: In test_collapse_takes_max_and_counts_psms, replace the
list-comprehension indexing used to extract the single matching element with
next(...) so a missing match raises StopIteration; specifically change the
expression that sets top (currently using [c for c in collapsed if c[2] ==
2][0]) to use next(c for c in collapsed if c[2] == 2) while keeping the rest of
the assertion logic and the reference to _collapse_sites intact.
In `@tests/test_onsitec_merge.py`:
- Line 36: Rename the single-letter variable `l` in the list comprehension refs
= [l.getMetaValue("spectrum_reference") for (_a, _p, l) in triples] to a clearer
name (e.g., `luc` or `lucxor_pid`) throughout that expression; update the
comprehension to use the new identifier and ensure any downstream references in
the same scope that relied on `l` are renamed as well so the call to
getMetaValue("spectrum_reference") now reads luc.getMetaValue(...).
- Around line 68-78: The test function test_join_identical_sets uses a
single-letter variable `l` in the list comprehension extracting
spectrum_reference from `triples`, which is ambiguous; rename `l` to a clearer
name (e.g., `lucxor_psm` or `last_psm`) and update the unpacking for `triples`
(from `(_a, _p, l)` to `(_a, _p, lucxor_psm)`) so the comprehension
[l.getMetaValue("spectrum_reference") for ...] becomes
[lucxor_psm.getMetaValue("spectrum_reference") for ...], keeping the call to
`_join_psms_by_ref` and the rest of the assertions unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b3ba9e69-29c2-4a21-bc3f-fdb9b8751421
📒 Files selected for processing (12)
onsite/ascore/ascore.pyonsite/ascore/cli.pyonsite/decoy_flr.pyonsite/lucxor/cli.pyonsite/lucxor/psm.pyonsite/onsitec.pyonsite/phosphors/phosphors.pytests/test_ascore.pytests/test_decoy_flr.pytests/test_lucxor_regression.pytests/test_onsitec_merge.pytests/test_phosphors.py
Algorithm Comparison Test ResultsClick to expand test results |
PhosphoRS's reported site probabilities saturate -- (1/P)/sum(1/P) pins hundreds of sites at ~100% -- so a global decoy-AA FLR cannot rank genuine sites above decoy mis-localizations, and the recovered-site count was a tie-break artifact (~0-190 depending on ordering). phosphoRS's own discrimination signal is the peptide-score delta (-10*log10(P_random) between the best and the best alternative isoform -- the rank1-rank2 gap it maximizes when choosing peak depth, Taus et al. 2011), which lives in log space and does not saturate. It is a native phosphoRS quantity, not a re-derived metric. - Add site_deltas_from_isomers(): per-site best-with-minus-best-without of the peptide score, floored to stay finite if P_random underflowed. - Emit it as position-keyed PhosphoRS_site_delta (both CLI paths; managed-meta hygiene), alongside the still-reported PhosphoRS_site_probs. - Rank the PhosphoRS leg of the decoy-AA FLR on PhosphoRS_site_delta. Effect: the ~100% tie block collapses 351 -> 1 and the 5%-FLR yield becomes deterministic (151 across 40 tie orderings, previously 0-190). PhosphoRS still recovers fewer sites than AScore, but now as a stable, meaningful result. Adds a file-based 3-tool integration test (parse -> filter -> intersect -> collapse -> Eq.2 -> threshold) plus a site_deltas_from_isomers unit test. Full suite: 174 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…40) Implements the phosphoRS peak-depth optimization (Taus et al. 2011, pseudocode sections 9-12) that the implementation was missing: per 100 m/z window, choose the peak depth (1..8) that maximizes the rank1-rank2 separation between the best and second-best isoform (rank3/rank4 tie-breakers; best-absolute-score when no site-determining ions are present; smaller depth on ties). This replaces _reduce_by_delta_selection, which was dead code AND buggy: its depth-selection ratio (pj/pj1, maximized) was inverted (it minimized separation) and it used the experimental peak count as the binomial n. Since commit d64e2dd ("Use filtered spectrum directly") the active path scored against the unoptimized filtered spectrum. New, separately-tested helpers: - _window_has_site_determining_ions (section 10) - _isoform_peptide_scores / _choose_window_depth (sections 9 & 11) - _reduce_by_peak_depth_optimization (sections 8-12 orchestrator) Gated by ENABLE_PEAK_DEPTH_OPTIMIZATION (default True). Effect on the example data (q<=0.01, 5% global FLR): PhosphoRS yield 151 -> 163, decoy-localization rate 4.3% -> 6.2%; phosphoRS ~3x slower (per-isoform theoretical-spectrum generation). The comparison conclusion is unchanged -- PhosphoRS still recovers far fewer sites than AScore (163 vs 441), confirming its lower reliability on this data is genuine, not an artifact of the missing optimization. +4 unit tests (depth selection, site-determining detection, tie-break, empties). Full suite: 178 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…optimizer Deletes the now-unused window-reduction code paths (none had any call site): - _reduce_by_delta_selection (buggy: inverted depth ratio, experimental peak count used as the binomial n) and _reduce_spectrum_by_windows; - their exclusive helpers _site_determining_ions, _window_intensity_thresholds, _count_peaks_above_threshold, _get_intensity_thresholds; - the now-orphaned MIN_DEPTH constant. The faithful _reduce_by_peak_depth_optimization replaces them; _get_window_indexes is kept (used by it). ~371 lines removed; behavior unchanged. Full suite: 178 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lity w (#40) p = N*d/w (phosphoRS, Taus et al. 2011 section 13) where w is the FULL m/z range of the MS/MS spectrum. The code used the m/z span of the *extracted* peak list instead, which is narrower -> p too large -> scores slightly over-conservative. Use the original spectrum's m/z range (N still = extracted peak count). Before/after on the full dataset (data/1.mzML, 3697 PSMs): 81/1989 per-PSM PhosphoRS_pep_score values change, but the 3-way decoy-AA FLR is unchanged (PhosphoRS 163 sites @ 5% FLR, decoy rate 6.2%; LucXor 574 / AScore 441) -- the deviation was real but immaterial to the comparison here. Kept for fidelity. Full suite: 178 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Realized 3-way comparison (the deliverable)All on the example data (
Sites recovered at matched global FLRCorrectly-localized target sites (decoy-A wins in parentheses):
LucXor recovers the most sites at every threshold and has the lowest decoy-localization rate; AScore is the clear middle; PhosphoRS trails and only catches up at looser FLR. Inter-method agreement (same 1258 analyzed PSMs)
The ~23% of ambiguous PSMs where they disagree are the hard cases the FLR adjudicates. Why the curves differ
Takeaways
Reproduce (incl. the curve plot)from onsite.decoy_flr import compute_decoy_flr, sites_at_flr
res = compute_decoy_flr({"ascore": ASCORE_IDXML, "phosphors": PHOSPHORS_IDXML,
"lucxor": LUCXOR_IDXML}, q_threshold=0.01, flr_threshold=0.05)
for tool, r in res.items():
for thr in (0.01, 0.05, 0.10):
print(tool, thr, sites_at_flr(r.curve, thr)) # (total, target, decoy)
# plot: r.curve["qval"] (x = global FLR) vs r.curve["cum_target"] (y = sites)The LucXor idXML is produced with the default seed ( |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
onsite/phosphors/cli.py (1)
241-260:⚠️ Potential issue | 🟠 Major | ⚡ Quick winClear PhosphoRS metadata on failed threaded hits.
When
hit_res["status"] != "success", this branch cloneshit_srcand only setsscore = -1.0. On a rerun over an already-scored idXML, oldPhosphoRS_site_probs/PhosphoRS_site_deltasurvive because the cleanup at Lines 254-262 is skipped, andsave_identifications()only backfills missing fields. That can leak stale site deltas into downstream FLR ranking.Proposed fix
for hit_src, hit_res in zip(pid_src.getHits(), res["hits"]): if hit_res.get("status") != "success": - # Preserve original hit with -1 score when failed failed_hit = PeptideHit(hit_src) + for k in [ + "search_engine_sequence", + "regular_phospho_count", + "phospho_decoy_count", + "PhosphoRS_pep_score", + "PhosphoRS_site_probs", + "PhosphoRS_site_delta", + "SpecEValue_score", + ]: + if failed_hit.metaValueExists(k): + try: + failed_hit.removeMetaValue(k) + except Exception: + pass failed_hit.setScore(-1.0) + failed_hit.setMetaValue("PhosphoRS_pep_score", -1.0) + failed_hit.setMetaValue("PhosphoRS_site_probs", "{}") + failed_hit.setMetaValue("PhosphoRS_site_delta", "{}") new_hits.append(failed_hit) continue🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@onsite/phosphors/cli.py` around lines 241 - 260, The failed-hit branch clones hit_src into failed_hit and only sets score = -1.0, but it must also clear the same managed PhosphoRS metadata as the success branch to avoid leaking stale fields; update the branch handling hit_res.get("status") != "success" to remove/clear keys like "search_engine_sequence", "regular_phospho_count", "phospho_decoy_count", "PhosphoRS_pep_score", "PhosphoRS_site_probs", and "PhosphoRS_site_delta" from the PeptideHit (failed_hit) before appending it, mirroring the cleanup logic used for new_hit so downstream save_identifications()/FLR ranking cannot see old PhosphoRS values.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@onsite/phosphors/phosphors.py`:
- Around line 853-871: The loop skips a peak exactly equal to max_mz because
windows use cur <= mz < hi and the loop ends at cur < max_mz; to fix, iterate
windows up to and including max_mz and include the upper-bound peak in the last
window: change the loop to compute hi = min(cur + WINDOW_SIZE, max_mz) and use
while cur <= max_mz, and when building theo_in_win use an inclusive upper bound
(cur <= mz <= hi) for selecting mz values (or otherwise ensure
_get_window_indexes treats the final hi as inclusive) so a peak at max_mz is
considered in the final window; update references in this block (cur, hi,
max_mz, WINDOW_SIZE, _get_window_indexes, isoform_theo,
_window_has_site_determining_ions) accordingly.
---
Outside diff comments:
In `@onsite/phosphors/cli.py`:
- Around line 241-260: The failed-hit branch clones hit_src into failed_hit and
only sets score = -1.0, but it must also clear the same managed PhosphoRS
metadata as the success branch to avoid leaking stale fields; update the branch
handling hit_res.get("status") != "success" to remove/clear keys like
"search_engine_sequence", "regular_phospho_count", "phospho_decoy_count",
"PhosphoRS_pep_score", "PhosphoRS_site_probs", and "PhosphoRS_site_delta" from
the PeptideHit (failed_hit) before appending it, mirroring the cleanup logic
used for new_hit so downstream save_identifications()/FLR ranking cannot see old
PhosphoRS values.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3e903b9c-d6f8-4aa2-9ee7-9338b0624926
📒 Files selected for processing (5)
onsite/decoy_flr.pyonsite/phosphors/cli.pyonsite/phosphors/phosphors.pytests/test_decoy_flr.pytests/test_phosphors.py
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/test_phosphors.py
- onsite/decoy_flr.py
… idXMLs (#40) filter_phosphors compared percentage site probabilities (0-100) against a decimal threshold (prob_threshold/100 -> 0.99/0.90/0.75), so the STRICT/MODERATE/ LENIENT tiers all thresholded at <1% and selected the same set (identical counts). Compare in percent directly. The three tiers now stratify, e.g. PhosphoRS 986 (>99%) / 1105 (>90%) / 1122 (>75%). Regenerate the reference outputs (data/1_{ascore,phosphors,lucxor}_result.idXML) from the current, fixed algorithms using the exact CLI invocations the test uses. The old references predated this PR's scoring fixes (delta-ranking, double-counting, peak-depth optimization, full-range w), so the regression baseline was guarding pre-fix output. New==reference now -> recall ~100%. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
4553f51 to
13b4068
Compare
1) Peak-depth optimizer dropped a peak sitting exactly on the final window boundary at max_mz (windows are [cur, hi) with hi exclusive and the loop ends at cur < max_mz). Extend the final window's upper bound past max_mz so that peak is included; the loop still advances by WINDOW_SIZE so it terminates. (The reviewer's literal suggestion, while cur <= max_mz with hi=min(...,max_mz), would infinite-loop once hi==max_mz.) Edge case only when max_mz-min_mz is an exact multiple of WINDOW_SIZE; unreachable for real m/z but a genuine off-by-one. 2) The parallel PhosphoRS path's failed-hit branch cloned the original hit and set score -1.0 but did not clear the managed PhosphoRS metadata the success branch clears, so stale fields could leak if the input already carried prior PhosphoRS results. Mirror the success-branch cleanup on failed_hit. Both are behavior-neutral on data/1.mzML (threads=1: 616 sites / 120 @5% FLR; threads=4: 665 / 161 -- unchanged), and full suite (178) passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The threads=1 path computed the best-scoring isomer but never wrote it back to the hit, so it kept the original search-engine localization, while the threaded path (and AScore/LucXor) rewrite to the best isomer. This diverged the two thread counts on 496 PSMs and, through the decoy-AA FLR (which reads the localized site from the winning sequence), changed the measured yield (616 -> 665 sites at q<=0.01; @5% FLR 120 -> 161). Mirror the worker apply-back: setSequence(best_isomer) in process_peptide_identification. threads=1 now equals threads=4 (differing winning sequences 496 -> 0) and reproduces the reported numbers. Regenerate data/1_phosphors_result.idXML (the test reference, built at threads=1, had captured the pre-fix localizations). Full suite: 178 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/test_algorithm_comparison.py (1)
161-185:⚠️ Potential issue | 🟠 Major | ⚡ Quick winCount
PhosphoDecoysites explicitly in the top-N check.Line 174 undercounts decoy-localized peptides:
"(Phospho)"is not a substring of"(PhosphoDecoy)". That means this filter can pass a hit after checking fewer than the real number of localized sites, which weakens the regression guard for the unified decoy-FLR path.Suggested fix
- # Count phosphorylations (both regular and decoy) - # OpenMS format: "S(Phospho)", "T(Phospho)", "Y(Phospho)", "A(PhosphoDecoy)" - phospho_count = sequence.count('(Phospho)') # Matches both Phospho and PhosphoDecoy + # Count both regular and decoy localization sites. + phospho_count = ( + sequence.count("(Phospho)") + + sequence.count("(PhosphoDecoy)") + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_algorithm_comparison.py` around lines 161 - 185, The current phospho site count logic in the peptide loop under tests/test_algorithm_comparison.py uses sequence.count('(Phospho)') which misses '(PhosphoDecoy)' occurrences, undercounting phospho_count used by the top-N probability check; update the counting in the block that computes phospho_count (where sequence is defined and used) to explicitly count both '(Phospho)' and '(PhosphoDecoy)' (e.g., sum of sequence.count('(Phospho)') and sequence.count('(PhosphoDecoy)') or a single regex that matches both) so phospho_count reflects the true number of localized sites before the sorted_probs[:phospho_count] threshold check in that loop.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@onsite/phosphors/cli.py`:
- Around line 669-673: After calling
new_hit.setSequence(AASequence.fromString(new_sequence)) you must remove or
regenerate any existing ProForma annotation so metadata no longer describes the
old localization: clear or rebuild the ProForma field on new_hit (do not leave
the original ProForma in meta_fields), and apply the same change inside
_worker_process_pid_threaded so the threaded path does not re-append the
original ProForma after relocalization; ensure new_hit ends up with a ProForma
consistent with the new sequence (or no ProForma) before saving.
- Around line 243-262: In process_peptide_identification(), the single-thread
(threads=1) return paths that currently return PeptideHit(hit) can leak stale
PhosphoRS metadata; mirror the cleanup done for the threaded failed_hit by
creating a copy (e.g., failed_hit = PeptideHit(hit)), setScore(-1.0) when
appropriate, and remove the same metadata keys
("search_engine_sequence","regular_phospho_count","phospho_decoy_count","PhosphoRS_pep_score","PhosphoRS_site_probs","PhosphoRS_site_delta","SpecEValue_score","ProForma")
using metaValueExists/removeMetaValue wrapped in the same try/except, then
return that cleaned hit instead of the raw PeptideHit(hit).
---
Outside diff comments:
In `@tests/test_algorithm_comparison.py`:
- Around line 161-185: The current phospho site count logic in the peptide loop
under tests/test_algorithm_comparison.py uses sequence.count('(Phospho)') which
misses '(PhosphoDecoy)' occurrences, undercounting phospho_count used by the
top-N probability check; update the counting in the block that computes
phospho_count (where sequence is defined and used) to explicitly count both
'(Phospho)' and '(PhosphoDecoy)' (e.g., sum of sequence.count('(Phospho)') and
sequence.count('(PhosphoDecoy)') or a single regex that matches both) so
phospho_count reflects the true number of localized sites before the
sorted_probs[:phospho_count] threshold check in that loop.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d0ad8389-d850-433c-bf9f-d3f0f7c26e5d
📒 Files selected for processing (6)
data/1_ascore_result.idXMLdata/1_lucxor_result.idXMLdata/1_phosphors_result.idXMLonsite/phosphors/cli.pyonsite/phosphors/phosphors.pytests/test_algorithm_comparison.py
🚧 Files skipped from review as they are similar to previous changes (1)
- onsite/phosphors/phosphors.py
The serial path stamped search_engine_sequence and PhosphoRS_pep_score=-1.0 onto the 1708 non-phospho hits, while the threaded worker dropped both from its failed-hit branch. Same localization and FLR, but the files differed. Add a shared make_unscored_hit() helper (original hit, score -1, all managed PhosphoRS metadata removed) and route every skip branch through it: the worker's failed-hit branch and both serial branches (no phospho site, scoring returned no result). The phospho-scored path is unchanged. threads=1 now byte-for-byte equals threads>1 (identical MD5); FLR unchanged (665 sites / @5% FLR 161). Reference regenerated. Full suite: 178 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…istic (#40) PhosphoRS — the threaded worker scored every hit unconditionally while the serial path gated on an explicit (Phospho)/(PhosphoDecoy) string, so the worker could fall through to the scorer's mass-based modification inference (abs_tol 0.1 Da around the phospho mass) and (mis)score a non-phospho modification that is near-isobaric with phospho (e.g. Sulfation, +79.9568, 0.0095 Da away) which the serial path skips — diverging threads=1 vs threads>1 output. Both paths now gate on a shared _has_localizable_phospho() before the scorer. threads=1 == threads>1 byte-identical is preserved and data/1 output is unchanged (still byte-matches the committed reference); the fix only removes the latent divergence. LucXor — the CLI never seeded the RNG, so the unseeded random.shuffle of decoy permutations (psm.py) and np.random.choice model subsampling (models.py) made output run-to-run non-deterministic. Add --seed (default 42); PyLuciPHOr2.run() seeds random and numpy at entry (both the standalone CLI and the `onsite all` path funnel through run()). Two default runs are now byte-identical; determinism holds for the default single-threaded run (--threads>1 shares the global RNG). Regenerate data/1_lucxor_result.idXML under the seed via the exact invocation the comparison test uses, so the reference is itself reproducible. All three algorithm-comparison tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
onsite/lucxor/cli.py (1)
286-290: 💤 Low valueUnreachable return statement.
After the early return at lines 240-253 when
compute_all_scores=True, the code here can only execute whencompute_all_scores=False. The condition on line 288 is therefore always True,sys.exit(exit_code)always runs, and line 290 is unreachable dead code.🔧 Suggested simplification
- # Only call sys.exit if not being called from compute_all_scores - if not compute_all_scores: - sys.exit(exit_code) - return exit_code + sys.exit(exit_code)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@onsite/lucxor/cli.py` around lines 286 - 290, The conditional around sys.exit is redundant and leaves the final "return exit_code" unreachable; update the tail of the function that uses the compute_all_scores flag so it either (A) explicitly returns exit_code when compute_all_scores is True and calls sys.exit(exit_code) only when compute_all_scores is False (i.e., convert the current if-block to an if/else with sys.exit in the False branch and return exit_code in the True branch), or (B) remove the unconditional sys.exit and always return exit_code, ensuring references to compute_all_scores, sys.exit, and return exit_code are adjusted accordingly to eliminate dead code.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@onsite/lucxor/cli.py`:
- Around line 286-290: The conditional around sys.exit is redundant and leaves
the final "return exit_code" unreachable; update the tail of the function that
uses the compute_all_scores flag so it either (A) explicitly returns exit_code
when compute_all_scores is True and calls sys.exit(exit_code) only when
compute_all_scores is False (i.e., convert the current if-block to an if/else
with sys.exit in the False branch and return exit_code in the True branch), or
(B) remove the unconditional sys.exit and always return exit_code, ensuring
references to compute_all_scores, sys.exit, and return exit_code are adjusted
accordingly to eliminate dead code.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6b8bb848-cc8f-4a82-abe9-9fe6c61e6dbb
📒 Files selected for processing (4)
data/1_lucxor_result.idXMLdata/1_phosphors_result.idXMLonsite/lucxor/cli.pyonsite/phosphors/cli.py
… --seed (#40) The README listed only the basic per-PSM output metrics. Add what was missing: - the per-site score dicts (AScore_site_scores, PhosphoRS_site_probs/_site_delta, Luciphor_site_scores), score directions, and per-tool thresholds (AScore >= 13, prob >= 75%, local FLR <= 0.05); - a new "Interpreting the output: PSM-FDR vs localization FLR" section explaining the two orthogonal error axes (input PSM-FDR is preserved; localization adds an FLR), with a consolidated tool -> score -> cutoff table; - the unified decoy-amino-acid global FLR workflow (--add-decoys + python -m onsite.decoy_flr), which was undocumented; - the new LucXor --seed option. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@README.md`:
- Line 239: The description for PhosphoRS_site_delta currently uses a hyphenated
phrase "best-alternative isoform"; update that text to read "best alternative
isoform" (no hyphen) in the PhosphoRS_site_delta definition so it becomes:
PhosphoRS_site_delta: {position: Δ} — the −10·log10 P gap between the best and
best alternative isoform (rank1 − rank2). Used to rank a global FLR because,
unlike the probability, it does not saturate at 100%.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| - Isomer details with sequence and score | ||
| - Detailed confidence metrics | ||
| - `PhosphoRS_site_probs`: `{position: probability}` on a **0–100% scale** (**higher = more confident**) — the classic phosphoRS site probability. | ||
| - `PhosphoRS_site_delta`: `{position: Δ}` — the `−10·log10 P` gap between the best and best-alternative isoform (rank1 − rank2). Used to rank a global FLR because, unlike the probability, it does not saturate at 100%. |
There was a problem hiding this comment.
Use “best alternative” (no hyphen) for clarity.
“best-alternative” reads like a typo here; “best alternative isoform” is clearer and avoids grammar-tool warnings.
🧰 Tools
🪛 LanguageTool
[grammar] ~239-~239: Ensure spelling is correct
Context: ... the best and best-alternative isoform (rank1 − rank2). Used to rank a global FLR bec...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~239-~239: Ensure spelling is correct
Context: ...t and best-alternative isoform (rank1 − rank2). Used to rank a global FLR because, un...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@README.md` at line 239, The description for PhosphoRS_site_delta currently
uses a hyphenated phrase "best-alternative isoform"; update that text to read
"best alternative isoform" (no hyphen) in the PhosphoRS_site_delta definition so
it becomes: PhosphoRS_site_delta: {position: Δ} — the −10·log10 P gap between
the best and best alternative isoform (rank1 − rank2). Used to rank a global FLR
because, unlike the probability, it does not saturate at 100%.
Summary
Implements a unified decoy-amino-acid global FLR so AScore, PhosphoRS and LucXor can be compared at the same PSM-FDR and the same FLR level, fixes the per-algorithm scoring bugs that made it unmeasurable / biased / unrankable, and brings the PhosphoRS implementation up to faithful phosphoRS (Taus et al. 2011). Implements the decoy-AA method of Ramsbottom et al. 2022 (J. Proteome Res., 10.1021/acs.jproteome.1c00827) — the only FLR definition common to all three tools.
Addresses #40. (Now also includes the peak-depth-optimization work originally split into #42, which was fast-forwarded into this branch.)
What's in here (11 commits)
AScore — stop dropping the decoy on multi-phospho peptides.
getSites_appended decoy-A positions after S/T/Y without re-sorting, socombinations()emitted descending index combos thatcreateTheoreticalSpectra_silently dropped. Sites are now sorted; each permutation is sorted before spectrum build.LucXor — score and serialize the decoy. The production scoring path didn't recognize the lowercase
aencoding aPhosphoDecoy(A)site, so A-localizations were scored as the unmodified backbone and couldn't be written._get_mod_positions_from_perm,_get_mod_mapandconvert_sequence_to_standard_formatnow treataas a real phospho-mass-bearing decoy site. AddsPSM.get_site_scores()(per-site delta from the permutation scores) asLuciphor_site_scores, giving LucXor a per-site confidence.AScore/PhosphoRS plumbing + the FLR module. AScore emits position-keyed
AScore_site_scores; PhosphoRS--add-decoyscrash fixed (str(AASequence)→.toString()on pyopenms 3.5). Newonsite/decoy_flr.py(python -m onsite.decoy_flr): pure estimator (Eq. 22·(T_c/X_c)·Σ(p_X_c)/n, capped at 1, q-value-style monotonization); mandatory ident-decoy + q-value filtering and aspectrum_referenceintersection across tools;T_c/X_cover the analyzed set; site collapsing; unambiguous-peptide exclusion.onsitec— merge byspectrum_reference, not list position.onsite allzipped results by index; a single dropped PSM fused scores from different peptides. Now matched by spectrum reference with a backbone-equality guard and drop reporting. (Also fixes a latentstore()type bug.)PhosphoRS — stop double-counting ions. Per-isomer matching reused experimental peaks and counted near-duplicate theoretical ions, inflating the binomial
k/nand asymmetrically favoring decoys. New_count_matched_ions()merges indistinguishable ions and consumes each experimental peak once (decoy rate 5.4% → 4.3%).PhosphoRS — rank the FLR on its native peptide-score delta. The reported site probabilities saturate at ~100% (no ranking resolution), so the decoy-AA yield was a tie-break artifact. New
site_deltas_from_isomers()emits position-keyedPhosphoRS_site_delta(−10·log10 Pgap between best and best-alternative isoform — phosphoRS's own rank1−rank2 signal), and the FLR ranks on it (probability still emitted for reporting). Tie block 351 → 1; yield becomes deterministic.PhosphoRS — faithful dynamic per-window peak-depth optimization (§9–§12). The algorithm's defining step was missing — scoring ran against the plain filtered spectrum. New
_choose_window_depth/_window_has_site_determining_ions/_reduce_by_peak_depth_optimization: per 100 m/z window, pick the depth maximizing rank1−rank2 isoform separation (best-absolute-score when no site-determining ions; smaller depth on ties). Replaces and removes the prior dead, buggy_reduce_by_delta_selection(inverted ratio, experimental peak count used as the binomialn) and_reduce_spectrum_by_windows.PhosphoRS — full MS/MS m/z range for the random-match probability.
p = N·d/w(§13) now usesw= full spectrum m/z range, not the extracted-peak span (N= extracted peaks). Paper-faithful; symmetric.Regression test — fix the PhosphoRS tier unit bug + re-baseline references.
filter_phosphorsintest_algorithm_comparison.pycompared percentage site probabilities (0–100) against a decimal threshold (/100→ 0.99/0.90/0.75), so the STRICT/MODERATE/LENIENT tiers all thresholded at <1% and collapsed to one set; now compared in percent so they stratify (PhosphoRS 986 / 1105 / 1122). The three reference idXMLs (data/1_{ascore,phosphors,lucxor}_result.idXML) are regenerated from this PR's fixed algorithms via the exact CLI invocations the test uses — they predated the scoring fixes, so the regression baseline had been guarding pre-fix output (new == reference now → ~100% recall).PhosphoRS — re-localize in the serial (
threads=1) path. Thethreads=1path computed the best-scoring isomer but never wrote it back to the hit, so it kept the original search-engine localization while the threaded path (and AScore/LucXor) rewrote to the best isomer. This diverged the two thread counts on 496 PSMs and — because the decoy-AA FLR reads the localized site from the winning sequence — changed the measured yield (616 → 665 sites at q ≤ 0.01; @5% FLR 120 → 161).process_peptide_identificationnow mirrors the worker apply-back (setSequence(best_isomer));threads=1reproduces the reported numbers. Reference regenerated.PhosphoRS — byte-identical output across thread counts. The serial path also stamped
search_engine_sequenceandPhosphoRS_pep_score=-1.0onto the 1708 non-phospho hits while the threaded worker dropped both — equal localization and FLR, but different files. A sharedmake_unscored_hit()helper (original hit, score −1, all managed PhosphoRS metadata removed) now backs every skip branch in both paths.threads=1andthreads>1are now byte-for-byte identical (identical MD5); FLR unchanged.Realized 3-way comparison (
data/1.mzML, 3697 PSMs → 2470 shared at q ≤ 0.01 → 1258 ambiguous)Each tool ranked by its own native per-site confidence, scored on the same decoy-AA FLR (
T_c/X_c = 2.03). Target sites recovered (decoy-A wins in parens). PhosphoRS figures are now thread-invariant (commits 10–11):Agreement on the winning localization across all three (1982 ambiguous PSMs): 65% all-agree; pairwise AScore=LucXor 78%, PhosphoRS=LucXor 75%, AScore=PhosphoRS 73%.
LucXor is most reliable here, AScore second, PhosphoRS least. PhosphoRS's lower yield is genuine — it survived the double-counting fix, delta-ranking, the faithful peak-depth optimizer, and the full-range-
wfix; ~88% of its decoy wins are on symmetric A-vs-S/T ground (the decoy is scored identically to a real site), and ~half are driven by non-site-determining phospho neutral-loss ions, which is faithful phosphoRS behavior on these HCD spectra. Caveat: single dataset; the 96-file PXD000138 benchmark isn't available here, so the ranking could shift on other data.Tests
178 pass (full suite), including a file-based 3-tool FLR integration test and unit tests for: AScore site-sort/decoy-drop, LucXor
a-handling +get_site_scores, the FLR estimator/parser/orchestration, theonsitecspectrum-reference join, PhosphoRS ion-matching, the peptide-score delta, and the peak-depth optimizer (separation maximization, site-determining detection, tie/empty handling). PhosphoRS output is additionally verified thread-invariant (threads=1==threads>1, identical MD5 ondata/1.mzML).🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Documentation