Skip to content

Unified decoy-AA global FLR across AScore / PhosphoRS / LucXor + scoring fixes (#40)#41

Open
timosachsenberg wants to merge 14 commits into
mainfrom
fix/decoy-flr-scoring-bugs
Open

Unified decoy-AA global FLR across AScore / PhosphoRS / LucXor + scoring fixes (#40)#41
timosachsenberg wants to merge 14 commits into
mainfrom
fix/decoy-flr-scoring-bugs

Conversation

@timosachsenberg
Copy link
Copy Markdown
Contributor

@timosachsenberg timosachsenberg commented May 31, 2026

Summary

Implements a unified decoy-amino-acid global FLR so AScore, PhosphoRS and LucXor can be compared at the same PSM-FDR and the same FLR level, fixes the per-algorithm scoring bugs that made it unmeasurable / biased / unrankable, and brings the PhosphoRS implementation up to faithful phosphoRS (Taus et al. 2011). Implements the decoy-AA method of Ramsbottom et al. 2022 (J. Proteome Res., 10.1021/acs.jproteome.1c00827) — the only FLR definition common to all three tools.

Addresses #40. (Now also includes the peak-depth-optimization work originally split into #42, which was fast-forwarded into this branch.)

What's in here (11 commits)

  1. AScore — stop dropping the decoy on multi-phospho peptides. getSites_ appended decoy-A positions after S/T/Y without re-sorting, so combinations() emitted descending index combos that createTheoreticalSpectra_ silently dropped. Sites are now sorted; each permutation is sorted before spectrum build.

  2. LucXor — score and serialize the decoy. The production scoring path didn't recognize the lowercase a encoding a PhosphoDecoy(A) site, so A-localizations were scored as the unmodified backbone and couldn't be written. _get_mod_positions_from_perm, _get_mod_map and convert_sequence_to_standard_format now treat a as a real phospho-mass-bearing decoy site. Adds PSM.get_site_scores() (per-site delta from the permutation scores) as Luciphor_site_scores, giving LucXor a per-site confidence.

  3. AScore/PhosphoRS plumbing + the FLR module. AScore emits position-keyed AScore_site_scores; PhosphoRS --add-decoys crash fixed (str(AASequence).toString() on pyopenms 3.5). New onsite/decoy_flr.py (python -m onsite.decoy_flr): pure estimator (Eq. 2 2·(T_c/X_c)·Σ(p_X_c)/n, capped at 1, q-value-style monotonization); mandatory ident-decoy + q-value filtering and a spectrum_reference intersection across tools; T_c/X_c over the analyzed set; site collapsing; unambiguous-peptide exclusion.

  4. onsitec — merge by spectrum_reference, not list position. onsite all zipped results by index; a single dropped PSM fused scores from different peptides. Now matched by spectrum reference with a backbone-equality guard and drop reporting. (Also fixes a latent store() type bug.)

  5. PhosphoRS — stop double-counting ions. Per-isomer matching reused experimental peaks and counted near-duplicate theoretical ions, inflating the binomial k/n and asymmetrically favoring decoys. New _count_matched_ions() merges indistinguishable ions and consumes each experimental peak once (decoy rate 5.4% → 4.3%).

  6. PhosphoRS — rank the FLR on its native peptide-score delta. The reported site probabilities saturate at ~100% (no ranking resolution), so the decoy-AA yield was a tie-break artifact. New site_deltas_from_isomers() emits position-keyed PhosphoRS_site_delta (−10·log10 P gap between best and best-alternative isoform — phosphoRS's own rank1−rank2 signal), and the FLR ranks on it (probability still emitted for reporting). Tie block 351 → 1; yield becomes deterministic.

  7. PhosphoRS — faithful dynamic per-window peak-depth optimization (§9–§12). The algorithm's defining step was missing — scoring ran against the plain filtered spectrum. New _choose_window_depth / _window_has_site_determining_ions / _reduce_by_peak_depth_optimization: per 100 m/z window, pick the depth maximizing rank1−rank2 isoform separation (best-absolute-score when no site-determining ions; smaller depth on ties). Replaces and removes the prior dead, buggy _reduce_by_delta_selection (inverted ratio, experimental peak count used as the binomial n) and _reduce_spectrum_by_windows.

  8. PhosphoRS — full MS/MS m/z range for the random-match probability. p = N·d/w (§13) now uses w = full spectrum m/z range, not the extracted-peak span (N = extracted peaks). Paper-faithful; symmetric.

  9. Regression test — fix the PhosphoRS tier unit bug + re-baseline references. filter_phosphors in test_algorithm_comparison.py compared percentage site probabilities (0–100) against a decimal threshold (/100 → 0.99/0.90/0.75), so the STRICT/MODERATE/LENIENT tiers all thresholded at <1% and collapsed to one set; now compared in percent so they stratify (PhosphoRS 986 / 1105 / 1122). The three reference idXMLs (data/1_{ascore,phosphors,lucxor}_result.idXML) are regenerated from this PR's fixed algorithms via the exact CLI invocations the test uses — they predated the scoring fixes, so the regression baseline had been guarding pre-fix output (new == reference now → ~100% recall).

  10. PhosphoRS — re-localize in the serial (threads=1) path. The threads=1 path computed the best-scoring isomer but never wrote it back to the hit, so it kept the original search-engine localization while the threaded path (and AScore/LucXor) rewrote to the best isomer. This diverged the two thread counts on 496 PSMs and — because the decoy-AA FLR reads the localized site from the winning sequence — changed the measured yield (616 → 665 sites at q ≤ 0.01; @5% FLR 120 → 161). process_peptide_identification now mirrors the worker apply-back (setSequence(best_isomer)); threads=1 reproduces the reported numbers. Reference regenerated.

  11. PhosphoRS — byte-identical output across thread counts. The serial path also stamped search_engine_sequence and PhosphoRS_pep_score=-1.0 onto the 1708 non-phospho hits while the threaded worker dropped both — equal localization and FLR, but different files. A shared make_unscored_hit() helper (original hit, score −1, all managed PhosphoRS metadata removed) now backs every skip branch in both paths. threads=1 and threads>1 are now byte-for-byte identical (identical MD5); FLR unchanged.

Realized 3-way comparison (data/1.mzML, 3697 PSMs → 2470 shared at q ≤ 0.01 → 1258 ambiguous)

Each tool ranked by its own native per-site confidence, scored on the same decoy-AA FLR (T_c/X_c = 2.03). Target sites recovered (decoy-A wins in parens). PhosphoRS figures are now thread-invariant (commits 10–11):

method @1% FLR @5% FLR @10% FLR total sites / decoy rate
LucXor 141 567 (7) 597 (10) 607 / 1.6%
AScore 23 436 (5) 521 (13) 611 / 4.6%
PhosphoRS 90 161 (2) 321 (8) 665 / 6.2%

Agreement on the winning localization across all three (1982 ambiguous PSMs): 65% all-agree; pairwise AScore=LucXor 78%, PhosphoRS=LucXor 75%, AScore=PhosphoRS 73%.

LucXor is most reliable here, AScore second, PhosphoRS least. PhosphoRS's lower yield is genuine — it survived the double-counting fix, delta-ranking, the faithful peak-depth optimizer, and the full-range-w fix; ~88% of its decoy wins are on symmetric A-vs-S/T ground (the decoy is scored identically to a real site), and ~half are driven by non-site-determining phospho neutral-loss ions, which is faithful phosphoRS behavior on these HCD spectra. Caveat: single dataset; the 96-file PXD000138 benchmark isn't available here, so the ranking could shift on other data.

Tests

178 pass (full suite), including a file-based 3-tool FLR integration test and unit tests for: AScore site-sort/decoy-drop, LucXor a-handling + get_site_scores, the FLR estimator/parser/orchestration, the onsitec spectrum-reference join, PhosphoRS ion-matching, the peptide-score delta, and the peak-depth optimizer (separation maximization, site-determining detection, tie/empty handling). PhosphoRS output is additionally verified thread-invariant (threads=1 == threads>1, identical MD5 on data/1.mzML).

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Unified decoy-amino-acid FLR curve computation across AScore, PhosphoRS, and LucXor.
    • Tools now emit position-keyed per‑site localization confidence metadata.
    • LucXor CLI adds --seed for reproducible runs.
    • PhosphoRS exposes per-isomer site deltas for downstream reporting.
  • Bug Fixes

    • Preserve decoy modifications and enforce sorted site/permutation ordering during spectrum generation.
    • Merge PSMs by spectrum reference to avoid misalignment.
    • Prevent double-counting in ion matching and improve peak‑depth selection.
  • Tests

    • Extensive unit and integration tests covering FLR, site scoring, merging, phosphors, and regressions.
  • Documentation

    • README updated with seed option, clearer metric descriptions, and unified FLR workflow.

timosachsenberg and others added 4 commits May 31, 2026 12:54
…site scores (#40)

Prerequisites for a unified decoy-amino-acid global FLR across AScore,
PhosphoRS and LucXor (#40). The PhosphoDecoy(A) flag existed
but two algorithms scored the decoy A incorrectly, biasing the FLR low.

AScore (ascore.py):
- getSites_ appended decoy A positions after S/T/Y without re-sorting, so
  combinations() emitted descending index combos (e.g. [4,1]) that
  createTheoreticalSpectra_ silently dropped (it assumes ascending order).
  Sort candidate sites, and defensively sort each permutation before
  building its spectrum, so decoy A is no longer dropped on multi-phospho
  peptides.

LucXor (psm.py):
- The production scoring path did not recognize the lowercase 'a' that
  encodes a PhosphoDecoy(A) site, so an A-localization was scored as the
  unmodified backbone and could not be serialized. Teach
  _get_mod_positions_from_perm, _get_mod_map and
  convert_sequence_to_standard_format to treat 'a' as a real
  phospho-mass-bearing PhosphoDecoy site (is_decoy stays False; full
  +79.966 mass; serializes to A(PhosphoDecoy)). End-to-end this surfaces
  145 A-wins that were previously reverted to the input sequence.

LucXor per-site scoring (psm.py, cli.py):
- Add PSM.get_site_scores(): a per-site localization confidence derived from
  the already-computed real-permutation scores
  (max-with minus max-without delta), with no new spectral scoring. Emit it
  as the Luciphor_site_scores meta value in both output paths. This gives
  LucXor a per-site score (it is natively per-PSM only) so a site-level
  decoy-AA FLR can rank sites uniformly across all three tools.

Tests: +10 (AScore sort/drop regressions; LucXor 'a' scoring/serialization;
get_site_scores aggregation). Full LucXor pipeline suite unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… tools (#40)

Adds the site-level decoy-amino-acid FLR that lets AScore, PhosphoRS and LucXor
be compared at the same PSM-FDR and the same FLR level (#40),
implementing Ramsbottom et al. 2022 (J. Proteome Res., 10.1021/acs.jproteome.1c00827).

onsite/decoy_flr.py (new) + `python -m onsite.decoy_flr`:
- Pure, separately-tested estimator flr_curve(): Eq. 2
  pX_FLR_n = 2*(T_c/X_c)*cum_decoy/n, capped at 1.0, q-value-style
  monotonization (reverse cummin), and sites_at_flr() for the yield at a cutoff.
- Mandatory "same PSM-FDR" filtering: drop identification decoys and apply the
  q-value threshold, then restrict to the spectrum_reference intersection shared
  by all tools (reporting per-tool drops).
- T_c/X_c computed over exactly the analyzed set; unambiguous peptides excluded.
- Redundant (peptide, position) sites collapsed (max score, PSM-count tiebreak).
- Uniform position-keyed per-site scores across tools
  (AScore_site_scores / PhosphoRS_site_probs / Luciphor_site_scores).

AScore (ascore.py, cli.py):
- Emit position-keyed AScore_site_scores (from the existing site2score),
  preserved through both CLI paths, so AScore matches the other tools' format.

PhosphoRS (phosphors.py):
- Fix str(AASequence) -> .toString(): under pyopenms 3.5 str() returns the
  object repr, so the localized sequence was written as
  "AASequence(sequence=...)" and --add-decoys crashed on save. Fixes the two
  output sites and one unused site.

Validated end-to-end with --add-decoys on the example data: on the shared
2470-PSM set at 5% global FLR, LucXor recovers 574 sites, AScore 441,
PhosphoRS 181; decoys concentrate at low scores and the position join is
correct for all three (>=298/300). Full suite: 168 tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… position (#40)

`onsite all` fused the AScore / PhosphoRS / LucXor outputs by zipping the three
result lists by index. The tools drop/reorder PSMs independently (LucXor
keepNBestHits / spectrum-miss / min-PSM abort; AScore error-drops), so a single
dropped PSM shifted every later index and silently fused scores from different
peptides into one hit; only a non-fatal length warning guarded it.

- Add _join_psms_by_ref(): match PeptideIdentifications across tools by
  spectrum_reference (in LucXor order), verify the unmodified backbone is
  identical across all three before fusing, and report per-tool exclusions
  instead of silently truncating.
- Rewrite the merge loop to iterate the keyed triples.
- Fix the store call: merged_pep_ids was a plain list, which IdXMLFile().store
  rejects on pyopenms 3.5 (so the merge never actually produced output); use a
  PeptideIdentificationList.

Validated end-to-end: merging the three example outputs yields 3697 PSMs with
all three tool-sequences aligned and zero backbone mismatches. +3 unit tests
(incl. the missing-middle-PSM misalignment case). Full suite: 171 pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PhosphoRS's per-isomer peak matching scanned theoretical ions while advancing a
single experimental-peak pointer that was never consumed, and counted every
theoretical ion (including indistinguishable near-duplicates). Both inflate the
binomial k/n, and asymmetrically: decoy-A isomers carry extra phospho
neutral-loss ions that cluster, so decoys accrued more spurious matches than
S/T/Y isomers (~0.66 vs ~0.45 extra matches/isomer) and won more often.

Add a tested helper _count_matched_ions() that:
- merges theoretical ions within one tolerance window (one experimental peak
  cannot distinguish them -> count as a single trial), and
- consumes each experimental peak at most once (no peak satisfies several ions).
Also sort red_mz_arr explicitly (the matcher and the window width both require
ascending m/z; the old scan silently assumed it, and would mis-match if not).

Effect on the example data: PhosphoRS decoy-localization rate drops 5.4% -> 4.3%
(36 -> 28 decoy wins), in line with AScore (4.6%). The decoy-AA FLR yield stays
~179 and tie-sensitive, which is correct: that reflects genuine saturation of
PhosphoRS's reported site probabilities, not this double-counting bug.

+1 unit test (no-double-count + near-duplicate ion merge). Full suite: 172 pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 31, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Emits per-site localization scores (AScore/LucXor/PhosphoRS), fixes AScore site/permutation ordering, treats lowercase 'a' as PhosphoDecoy alanine in LucXor, aligns PSMs by spectrum_reference, adds a unified decoy‑AA FLR estimator with idXML parsing/CLI, and improves phosphors ion-matching/peak‑depth reduction.

Changes

AScore Site Ordering and Cross-Tool Site Score Metadata

Layer / File(s) Summary
AScore site sorting and theoretical spectrum fix
onsite/ascore/ascore.py, tests/test_ascore.py
getSites_ now returns phosphorylation/decoy site indices sorted ascending; createTheoreticalSpectra_ sorts permutation indices before applying modifications and uses sorted lengths for completion checks.
AScore per-site score metadata & CLI propagation
onsite/ascore/ascore.py, onsite/ascore/cli.py, tests/test_ascore.py
compute records AScore_site_scores (stringified position-keyed dict) into PeptideHit metadata for unambiguous and ambiguous localization cases; CLI threaded and non-thread paths append this meta when present.

LucXor PhosphoDecoy and Site Scores

Layer / File(s) Summary
LucXor PhosphoDecoy-Alanine support & site scoring
onsite/lucxor/psm.py, onsite/lucxor/cli.py, tests/test_lucxor_regression.py
Lowercase 'a' treated as a phospho-bearing Alanine (PHOSPHO_MOD_MASS) and included among phospho candidate positions; added PSM.get_site_scores() computing per-residue deltas from permutation scores; convert_sequence_to_standard_format emits A(PhosphoDecoy); CLI adds --seed, seeds RNGs, and emits Luciphor_site_scores on hits.

PSM Alignment and Merged Results

Layer / File(s) Summary
PSM alignment by spectrum reference
onsite/onsitec.py, tests/test_onsitec_merge.py
New _join_psms_by_ref matches PSMs across tools by spectrum_reference, verifies backbone sequence identity, and returns matched triples with drop/mismatch stats; merge_algorithm_results uses this helper and inserts into PeptideIdentificationList via push_back.

Decoy-AA False Localization Rate Estimator

Layer / File(s) Summary
Core FLR estimators and idXML parsing
onsite/decoy_flr.py, tests/test_decoy_flr.py
Added flr_curve (ranked cumulative target/decoy counts, raw FLR capped at 1.0, monotonized q-curve) and sites_at_flr; parse_localized_sites and parse_tool_idxml parse localized sites, decoy flags, q-values, and per-position score maps into PSMRecords.
Tool FLR computation & orchestration
onsite/decoy_flr.py, tests/test_decoy_flr.py
compute_tool_flr filters ident-decoys/q-threshold, restricts to shared spectrum-ref keep-set, excludes unambiguous peptides, optionally collapses redundant sites by max score, computes t_c/x_c, ranks sites, and produces FLR curve; compute_decoy_flr orchestrates per-tool runs and CLI output.

PhosphoRS Ion Matching and Peak-Depth Optimization

Layer / File(s) Summary
PhosphoRS ion matching & depth optimization
onsite/phosphors/phosphors.py, onsite/phosphors/cli.py, tests/test_phosphors.py
Removed MIN_DEPTH; added _count_matched_ions to merge indistinguishable theoretical ions and greedily consume experimental peaks; implemented per-window peak-depth optimization for isoform separation; updated serialization to AASequence.toString() and wired PhosphoRS_site_delta metadata through CLI helpers.

Sequence Diagram (decoy-AA FLR high-level flow)

sequenceDiagram
  participant CLI as compute_decoy_flr/main
  participant Parser as parse_tool_idxml
  participant ToolCompute as compute_tool_flr
  participant Intersector as shared_spectrum_intersection
  participant FLR as flr_curve
  CLI->>Parser: parse each tool idXML
  Parser->>ToolCompute: emit PSMRecord list
  ToolCompute->>Intersector: restrict to shared spectrum_refs
  Intersector->>ToolCompute: keep-set
  ToolCompute->>FLR: rank collapsed sites -> is_decoy_by_rank, t_c, x_c
  FLR->>CLI: return per-tool FLR curve & cutoff
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

  • bigbio/onsite#4: LucXor core changes around PSM permutation logic and site scoring relate to this earlier LucXor work.
  • bigbio/onsite#28: Touches filter_phosphors and algorithm-comparison semantics updated here.
  • bigbio/onsite#11: Prior CLI wiring and per-hit metadata propagation that these changes extend.

Suggested labels

Review effort 5/5

Suggested reviewers

  • weizhongchun
  • ypriverol

Poem

"I'm a rabbit in the lab, nose twitching at scores,
Sites now line up tidy, no scrambled permutation wars.
Decoys and phosphos take their rightful place,
FLR curves hum softly, metadata hops through each trace.
Hooray — the outputs sing, reproducible and neat!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 69.39% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: implementing a unified decoy-amino acid global FLR estimator across three tools and addressing multiple scoring bugs.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/decoy-flr-scoring-bugs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented May 31, 2026

Not up to standards ⛔

🔴 Issues 58 high · 11 medium · 31 minor

Alerts:
⚠ 100 issues (≤ 0 issues of at least minor severity)

Results:
100 new issues

Category Results
Documentation 31 minor
ErrorProne 3 high
Security 1 medium
55 high
Complexity 10 medium

View in Codacy

🟢 Metrics 131 complexity · -10 duplication

Metric Results
Complexity 131
Duplication -10

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
tests/test_onsitec_merge.py (2)

36-36: ⚡ Quick win

Rename single-letter variable for clarity.

The variable l (lowercase L) is flagged by PEP8 as ambiguous and hard to distinguish from 1 (digit one) or I (uppercase i). Consider renaming to luc or lucxor_pid.

♻️ Proposed fix
-    refs = [l.getMetaValue("spectrum_reference") for (_a, _p, l) in triples]
+    refs = [luc.getMetaValue("spectrum_reference") for (_a, _p, luc) in triples]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_onsitec_merge.py` at line 36, Rename the single-letter variable
`l` in the list comprehension refs = [l.getMetaValue("spectrum_reference") for
(_a, _p, l) in triples] to a clearer name (e.g., `luc` or `lucxor_pid`)
throughout that expression; update the comprehension to use the new identifier
and ensure any downstream references in the same scope that relied on `l` are
renamed as well so the call to getMetaValue("spectrum_reference") now reads
luc.getMetaValue(...).

68-78: ⚡ Quick win

Rename single-letter variable for clarity.

Same PEP8 ambiguity issue with l on line 77.

♻️ Proposed fix
-    assert [l.getMetaValue("spectrum_reference") for (_a, _p, l) in triples] == refs
+    assert [luc.getMetaValue("spectrum_reference") for (_a, _p, luc) in triples] == refs
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_onsitec_merge.py` around lines 68 - 78, The test function
test_join_identical_sets uses a single-letter variable `l` in the list
comprehension extracting spectrum_reference from `triples`, which is ambiguous;
rename `l` to a clearer name (e.g., `lucxor_psm` or `last_psm`) and update the
unpacking for `triples` (from `(_a, _p, l)` to `(_a, _p, lucxor_psm)`) so the
comprehension [l.getMetaValue("spectrum_reference") for ...] becomes
[lucxor_psm.getMetaValue("spectrum_reference") for ...], keeping the call to
`_join_psms_by_ref` and the rest of the assertions unchanged.
tests/test_decoy_flr.py (1)

101-112: 💤 Low value

Consider using next() for single-element extraction.

Line 109 uses list comprehension with indexing [0] to extract one element. Using next() is more idiomatic and raises StopIteration if no match is found (which would surface a test logic error).

♻️ Proposed refactor
-    top = [c for c in collapsed if c[2] == 2][0]
+    top = next(c for c in collapsed if c[2] == 2)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_decoy_flr.py` around lines 101 - 112, In
test_collapse_takes_max_and_counts_psms, replace the list-comprehension indexing
used to extract the single matching element with next(...) so a missing match
raises StopIteration; specifically change the expression that sets top
(currently using [c for c in collapsed if c[2] == 2][0]) to use next(c for c in
collapsed if c[2] == 2) while keeping the rest of the assertion logic and the
reference to _collapse_sites intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_decoy_flr.py`:
- Around line 101-112: In test_collapse_takes_max_and_counts_psms, replace the
list-comprehension indexing used to extract the single matching element with
next(...) so a missing match raises StopIteration; specifically change the
expression that sets top (currently using [c for c in collapsed if c[2] ==
2][0]) to use next(c for c in collapsed if c[2] == 2) while keeping the rest of
the assertion logic and the reference to _collapse_sites intact.

In `@tests/test_onsitec_merge.py`:
- Line 36: Rename the single-letter variable `l` in the list comprehension refs
= [l.getMetaValue("spectrum_reference") for (_a, _p, l) in triples] to a clearer
name (e.g., `luc` or `lucxor_pid`) throughout that expression; update the
comprehension to use the new identifier and ensure any downstream references in
the same scope that relied on `l` are renamed as well so the call to
getMetaValue("spectrum_reference") now reads luc.getMetaValue(...).
- Around line 68-78: The test function test_join_identical_sets uses a
single-letter variable `l` in the list comprehension extracting
spectrum_reference from `triples`, which is ambiguous; rename `l` to a clearer
name (e.g., `lucxor_psm` or `last_psm`) and update the unpacking for `triples`
(from `(_a, _p, l)` to `(_a, _p, lucxor_psm)`) so the comprehension
[l.getMetaValue("spectrum_reference") for ...] becomes
[lucxor_psm.getMetaValue("spectrum_reference") for ...], keeping the call to
`_join_psms_by_ref` and the rest of the assertions unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b3ba9e69-29c2-4a21-bc3f-fdb9b8751421

📥 Commits

Reviewing files that changed from the base of the PR and between eb7f3be and c8de981.

📒 Files selected for processing (12)
  • onsite/ascore/ascore.py
  • onsite/ascore/cli.py
  • onsite/decoy_flr.py
  • onsite/lucxor/cli.py
  • onsite/lucxor/psm.py
  • onsite/onsitec.py
  • onsite/phosphors/phosphors.py
  • tests/test_ascore.py
  • tests/test_decoy_flr.py
  • tests/test_lucxor_regression.py
  • tests/test_onsitec_merge.py
  • tests/test_phosphors.py

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 31, 2026

Algorithm Comparison Test Results

Click to expand test results
============================= test session starts ==============================
platform linux -- Python 3.11.15, pytest-9.0.3, pluggy-1.6.0 -- /opt/hostedtoolcache/Python/3.11.15/x64/bin/python
cachedir: .pytest_cache
rootdir: /home/runner/work/onsite/onsite
configfile: pyproject.toml
plugins: cov-7.1.0
collecting ... collected 3 items

tests/test_algorithm_comparison.py::TestAlgorithmComparison::test_lucxor_comparison 
================================================================================
LucXor Comparison Results (q-value < 0.01)
================================================================================

STRICT (Local FLR < 0.01):
  New results: 848
  Reference results: 848
  Overlap: 848 (100.0%)
  Recall: 100.0% (new found 848/848 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x

MODERATE (Local FLR < 0.05):
  New results: 1064
  Reference results: 1064
  Overlap: 1064 (100.0%)
  Recall: 100.0% (new found 1064/1064 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x

LENIENT (Local FLR < 0.1):
  New results: 1081
  Reference results: 1081
  Overlap: 1081 (100.0%)
  Recall: 100.0% (new found 1081/1081 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x
PASSED
tests/test_algorithm_comparison.py::TestAlgorithmComparison::test_ascore_comparison 
================================================================================
AScore Comparison Results (q-value < 0.01)
================================================================================

STRICT (AScore >= 20):
  New results: 919
  Reference results: 919
  Overlap: 919 (100.0%)
  Recall: 100.0% (new found 919/919 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x

MODERATE (AScore >= 15):
  New results: 1023
  Reference results: 1023
  Overlap: 1023 (100.0%)
  Recall: 100.0% (new found 1023/1023 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x

LENIENT (AScore >= 3):
  New results: 1076
  Reference results: 1076
  Overlap: 1076 (100.0%)
  Recall: 100.0% (new found 1076/1076 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x
PASSED
tests/test_algorithm_comparison.py::TestAlgorithmComparison::test_phosphors_comparison 
================================================================================
PhosphoRS Comparison Results (q-value < 0.01)
================================================================================

STRICT (Site probability > 99%):
  New results: 983
  Reference results: 983
  Overlap: 983 (100.0%)
  Recall: 100.0% (new found 983/983 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x

MODERATE (Site probability > 90%):
  New results: 1084
  Reference results: 1084
  Overlap: 1084 (100.0%)
  Recall: 100.0% (new found 1084/1084 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x

LENIENT (Site probability > 75%):
  New results: 1102
  Reference results: 1102
  Overlap: 1102 (100.0%)
  Recall: 100.0% (new found 1102/1102 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x
PASSED

============================== 3 passed in 45.11s ==============================

timosachsenberg and others added 2 commits May 31, 2026 14:46
PhosphoRS's reported site probabilities saturate -- (1/P)/sum(1/P) pins
hundreds of sites at ~100% -- so a global decoy-AA FLR cannot rank genuine
sites above decoy mis-localizations, and the recovered-site count was a
tie-break artifact (~0-190 depending on ordering).

phosphoRS's own discrimination signal is the peptide-score delta
(-10*log10(P_random) between the best and the best alternative isoform -- the
rank1-rank2 gap it maximizes when choosing peak depth, Taus et al. 2011), which
lives in log space and does not saturate. It is a native phosphoRS quantity,
not a re-derived metric.

- Add site_deltas_from_isomers(): per-site best-with-minus-best-without of the
  peptide score, floored to stay finite if P_random underflowed.
- Emit it as position-keyed PhosphoRS_site_delta (both CLI paths; managed-meta
  hygiene), alongside the still-reported PhosphoRS_site_probs.
- Rank the PhosphoRS leg of the decoy-AA FLR on PhosphoRS_site_delta.

Effect: the ~100% tie block collapses 351 -> 1 and the 5%-FLR yield becomes
deterministic (151 across 40 tie orderings, previously 0-190). PhosphoRS still
recovers fewer sites than AScore, but now as a stable, meaningful result.

Adds a file-based 3-tool integration test (parse -> filter -> intersect ->
collapse -> Eq.2 -> threshold) plus a site_deltas_from_isomers unit test.
Full suite: 174 pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…40)

Implements the phosphoRS peak-depth optimization (Taus et al. 2011, pseudocode
sections 9-12) that the implementation was missing: per 100 m/z window, choose
the peak depth (1..8) that maximizes the rank1-rank2 separation between the best
and second-best isoform (rank3/rank4 tie-breakers; best-absolute-score when no
site-determining ions are present; smaller depth on ties).

This replaces _reduce_by_delta_selection, which was dead code AND buggy: its
depth-selection ratio (pj/pj1, maximized) was inverted (it minimized separation)
and it used the experimental peak count as the binomial n. Since commit d64e2dd
("Use filtered spectrum directly") the active path scored against the
unoptimized filtered spectrum.

New, separately-tested helpers:
- _window_has_site_determining_ions   (section 10)
- _isoform_peptide_scores / _choose_window_depth  (sections 9 & 11)
- _reduce_by_peak_depth_optimization   (sections 8-12 orchestrator)
Gated by ENABLE_PEAK_DEPTH_OPTIMIZATION (default True).

Effect on the example data (q<=0.01, 5% global FLR): PhosphoRS yield 151 -> 163,
decoy-localization rate 4.3% -> 6.2%; phosphoRS ~3x slower (per-isoform
theoretical-spectrum generation). The comparison conclusion is unchanged --
PhosphoRS still recovers far fewer sites than AScore (163 vs 441), confirming
its lower reliability on this data is genuine, not an artifact of the missing
optimization.

+4 unit tests (depth selection, site-determining detection, tie-break, empties).
Full suite: 178 pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
timosachsenberg and others added 2 commits May 31, 2026 15:40
…optimizer

Deletes the now-unused window-reduction code paths (none had any call site):
- _reduce_by_delta_selection  (buggy: inverted depth ratio, experimental peak
  count used as the binomial n) and _reduce_spectrum_by_windows;
- their exclusive helpers _site_determining_ions, _window_intensity_thresholds,
  _count_peaks_above_threshold, _get_intensity_thresholds;
- the now-orphaned MIN_DEPTH constant.

The faithful _reduce_by_peak_depth_optimization replaces them; _get_window_indexes
is kept (used by it). ~371 lines removed; behavior unchanged. Full suite: 178 pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lity w (#40)

p = N*d/w (phosphoRS, Taus et al. 2011 section 13) where w is the FULL m/z range
of the MS/MS spectrum. The code used the m/z span of the *extracted* peak list
instead, which is narrower -> p too large -> scores slightly over-conservative.
Use the original spectrum's m/z range (N still = extracted peak count).

Before/after on the full dataset (data/1.mzML, 3697 PSMs): 81/1989 per-PSM
PhosphoRS_pep_score values change, but the 3-way decoy-AA FLR is unchanged
(PhosphoRS 163 sites @ 5% FLR, decoy rate 6.2%; LucXor 574 / AScore 441) -- the
deviation was real but immaterial to the comparison here. Kept for fidelity.

Full suite: 178 pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@timosachsenberg
Copy link
Copy Markdown
Contributor Author

timosachsenberg commented May 31, 2026

Realized 3-way comparison (the deliverable)

All on the example data (data/1.mzML, 3697 PSMs → 2470 shared after q ≤ 0.01 + identification-decoy removal → 1258 ambiguous PSMs analyzed), each tool ranked by its own native per-site confidence, scored on the same decoy-amino-acid FLR (Ramsbottom et al. 2022; 2·(T_c/X_c) normalization, T_c/X_c = 2.03).

The peak-depth optimizer (originally split into #42) is now part of this PR. Without it, PhosphoRS is ~151 sites @ 5% — the ranking and the conclusion are unchanged. All figures below are now reproducible: PhosphoRS is byte-identical across thread counts, and LucXor is seeded (--seed, default 42; the reference idXML was regenerated under the seed).

Sites recovered at matched global FLR

Correctly-localized target sites (decoy-A wins in parentheses):

method @1% FLR @5% FLR @10% FLR total sites / decoys (rate)
LucXor 139 591 598 607 / 9 (1.5%)
AScore 23 436 521 611 / 28 (4.6%)
PhosphoRS 90 161 321 665 / 41 (6.2%)

LucXor recovers the most sites at every threshold and has the lowest decoy-localization rate; AScore is the clear middle; PhosphoRS trails and only catches up at looser FLR.

Inter-method agreement (same 1258 analyzed PSMs)

agreement
All three agree on the site(s) 77%
AScore = LucXor 89%
PhosphoRS = LucXor 83%
AScore = PhosphoRS 80%

The ~23% of ambiguous PSMs where they disagree are the hard cases the FLR adjudicates.

Why the curves differ

AScore PhosphoRS LucXor
scoring basis binomial over site-determining ions binomial over all matched ions + dynamic peak-depth opt KDE of delta-scores vs decoy-peptide permutations
native rank key AScore_N PhosphoRS_site_delta Luciphor_site_scores
decoy rate 4.6% 6.2% 1.5%

Takeaways

  • This is the intended "same PSM-FDR + same FLR" comparison: identical PSM set, identical FLR definition, each ranked by its own confidence.
  • LucXor is most reliable on this data, AScore second, PhosphoRS least. PhosphoRS's lower yield is genuine — it survived the double-counting fix, delta-ranking, the faithful peak-depth optimizer, and the full-range-w fix; ~88% of its decoy wins are on symmetric A-vs-S/T ground.
  • Caveats: (1) single dataset (data/1.mzML); the 96-file PXD000138 benchmark isn't available here, so the ranking could shift on other data — but the pipeline now produces it correctly and reproducibly. (2) The test_algorithm_comparison.py recall numbers guard determinism/stability against a self-baselined reference, not external correctness; correctness rests on the hand-computed unit tests and on faithfulness to Ramsbottom 2022 Eq. 2.

Reproduce (incl. the curve plot)

from onsite.decoy_flr import compute_decoy_flr, sites_at_flr
res = compute_decoy_flr({"ascore": ASCORE_IDXML, "phosphors": PHOSPHORS_IDXML,
                         "lucxor": LUCXOR_IDXML}, q_threshold=0.01, flr_threshold=0.05)
for tool, r in res.items():
    for thr in (0.01, 0.05, 0.10):
        print(tool, thr, sites_at_flr(r.curve, thr))   # (total, target, decoy)
# plot: r.curve["qval"] (x = global FLR) vs r.curve["cum_target"] (y = sites)

The LucXor idXML is produced with the default seed (--seed 42), so these numbers reproduce exactly. A sites-recovered vs global-FLR curve plot for the three methods was generated from r.curve (qval vs cum_target).

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
onsite/phosphors/cli.py (1)

241-260: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clear PhosphoRS metadata on failed threaded hits.

When hit_res["status"] != "success", this branch clones hit_src and only sets score = -1.0. On a rerun over an already-scored idXML, old PhosphoRS_site_probs / PhosphoRS_site_delta survive because the cleanup at Lines 254-262 is skipped, and save_identifications() only backfills missing fields. That can leak stale site deltas into downstream FLR ranking.

Proposed fix
                     for hit_src, hit_res in zip(pid_src.getHits(), res["hits"]):
                         if hit_res.get("status") != "success":
-                            # Preserve original hit with -1 score when failed
                             failed_hit = PeptideHit(hit_src)
+                            for k in [
+                                "search_engine_sequence",
+                                "regular_phospho_count",
+                                "phospho_decoy_count",
+                                "PhosphoRS_pep_score",
+                                "PhosphoRS_site_probs",
+                                "PhosphoRS_site_delta",
+                                "SpecEValue_score",
+                            ]:
+                                if failed_hit.metaValueExists(k):
+                                    try:
+                                        failed_hit.removeMetaValue(k)
+                                    except Exception:
+                                        pass
                             failed_hit.setScore(-1.0)
+                            failed_hit.setMetaValue("PhosphoRS_pep_score", -1.0)
+                            failed_hit.setMetaValue("PhosphoRS_site_probs", "{}")
+                            failed_hit.setMetaValue("PhosphoRS_site_delta", "{}")
                             new_hits.append(failed_hit)
                             continue
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@onsite/phosphors/cli.py` around lines 241 - 260, The failed-hit branch clones
hit_src into failed_hit and only sets score = -1.0, but it must also clear the
same managed PhosphoRS metadata as the success branch to avoid leaking stale
fields; update the branch handling hit_res.get("status") != "success" to
remove/clear keys like "search_engine_sequence", "regular_phospho_count",
"phospho_decoy_count", "PhosphoRS_pep_score", "PhosphoRS_site_probs", and
"PhosphoRS_site_delta" from the PeptideHit (failed_hit) before appending it,
mirroring the cleanup logic used for new_hit so downstream
save_identifications()/FLR ranking cannot see old PhosphoRS values.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@onsite/phosphors/phosphors.py`:
- Around line 853-871: The loop skips a peak exactly equal to max_mz because
windows use cur <= mz < hi and the loop ends at cur < max_mz; to fix, iterate
windows up to and including max_mz and include the upper-bound peak in the last
window: change the loop to compute hi = min(cur + WINDOW_SIZE, max_mz) and use
while cur <= max_mz, and when building theo_in_win use an inclusive upper bound
(cur <= mz <= hi) for selecting mz values (or otherwise ensure
_get_window_indexes treats the final hi as inclusive) so a peak at max_mz is
considered in the final window; update references in this block (cur, hi,
max_mz, WINDOW_SIZE, _get_window_indexes, isoform_theo,
_window_has_site_determining_ions) accordingly.

---

Outside diff comments:
In `@onsite/phosphors/cli.py`:
- Around line 241-260: The failed-hit branch clones hit_src into failed_hit and
only sets score = -1.0, but it must also clear the same managed PhosphoRS
metadata as the success branch to avoid leaking stale fields; update the branch
handling hit_res.get("status") != "success" to remove/clear keys like
"search_engine_sequence", "regular_phospho_count", "phospho_decoy_count",
"PhosphoRS_pep_score", "PhosphoRS_site_probs", and "PhosphoRS_site_delta" from
the PeptideHit (failed_hit) before appending it, mirroring the cleanup logic
used for new_hit so downstream save_identifications()/FLR ranking cannot see old
PhosphoRS values.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3e903b9c-d6f8-4aa2-9ee7-9338b0624926

📥 Commits

Reviewing files that changed from the base of the PR and between c8de981 and c01eb28.

📒 Files selected for processing (5)
  • onsite/decoy_flr.py
  • onsite/phosphors/cli.py
  • onsite/phosphors/phosphors.py
  • tests/test_decoy_flr.py
  • tests/test_phosphors.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/test_phosphors.py
  • onsite/decoy_flr.py

Comment thread onsite/phosphors/phosphors.py
… idXMLs (#40)

filter_phosphors compared percentage site probabilities (0-100) against a
decimal threshold (prob_threshold/100 -> 0.99/0.90/0.75), so the STRICT/MODERATE/
LENIENT tiers all thresholded at <1% and selected the same set (identical
counts). Compare in percent directly. The three tiers now stratify, e.g.
PhosphoRS 986 (>99%) / 1105 (>90%) / 1122 (>75%).

Regenerate the reference outputs (data/1_{ascore,phosphors,lucxor}_result.idXML)
from the current, fixed algorithms using the exact CLI invocations the test
uses. The old references predated this PR's scoring fixes (delta-ranking,
double-counting, peak-depth optimization, full-range w), so the regression
baseline was guarding pre-fix output. New==reference now -> recall ~100%.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@timosachsenberg timosachsenberg force-pushed the fix/decoy-flr-scoring-bugs branch from 4553f51 to 13b4068 Compare May 31, 2026 14:25
timosachsenberg and others added 2 commits May 31, 2026 16:48
1) Peak-depth optimizer dropped a peak sitting exactly on the final window
   boundary at max_mz (windows are [cur, hi) with hi exclusive and the loop ends
   at cur < max_mz). Extend the final window's upper bound past max_mz so that
   peak is included; the loop still advances by WINDOW_SIZE so it terminates.
   (The reviewer's literal suggestion, while cur <= max_mz with hi=min(...,max_mz),
   would infinite-loop once hi==max_mz.) Edge case only when max_mz-min_mz is an
   exact multiple of WINDOW_SIZE; unreachable for real m/z but a genuine off-by-one.

2) The parallel PhosphoRS path's failed-hit branch cloned the original hit and
   set score -1.0 but did not clear the managed PhosphoRS metadata the success
   branch clears, so stale fields could leak if the input already carried prior
   PhosphoRS results. Mirror the success-branch cleanup on failed_hit.

Both are behavior-neutral on data/1.mzML (threads=1: 616 sites / 120 @5% FLR;
threads=4: 665 / 161 -- unchanged), and full suite (178) passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The threads=1 path computed the best-scoring isomer but never wrote it
back to the hit, so it kept the original search-engine localization,
while the threaded path (and AScore/LucXor) rewrite to the best isomer.
This diverged the two thread counts on 496 PSMs and, through the
decoy-AA FLR (which reads the localized site from the winning sequence),
changed the measured yield (616 -> 665 sites at q<=0.01; @5% FLR 120 -> 161).

Mirror the worker apply-back: setSequence(best_isomer) in
process_peptide_identification. threads=1 now equals threads=4
(differing winning sequences 496 -> 0) and reproduces the reported
numbers. Regenerate data/1_phosphors_result.idXML (the test reference,
built at threads=1, had captured the pre-fix localizations).

Full suite: 178 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/test_algorithm_comparison.py (1)

161-185: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Count PhosphoDecoy sites explicitly in the top-N check.

Line 174 undercounts decoy-localized peptides: "(Phospho)" is not a substring of "(PhosphoDecoy)". That means this filter can pass a hit after checking fewer than the real number of localized sites, which weakens the regression guard for the unified decoy-FLR path.

Suggested fix
-            # Count phosphorylations (both regular and decoy)
-            # OpenMS format: "S(Phospho)", "T(Phospho)", "Y(Phospho)", "A(PhosphoDecoy)"
-            phospho_count = sequence.count('(Phospho)')  # Matches both Phospho and PhosphoDecoy
+            # Count both regular and decoy localization sites.
+            phospho_count = (
+                sequence.count("(Phospho)")
+                + sequence.count("(PhosphoDecoy)")
+            )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_algorithm_comparison.py` around lines 161 - 185, The current
phospho site count logic in the peptide loop under
tests/test_algorithm_comparison.py uses sequence.count('(Phospho)') which misses
'(PhosphoDecoy)' occurrences, undercounting phospho_count used by the top-N
probability check; update the counting in the block that computes phospho_count
(where sequence is defined and used) to explicitly count both '(Phospho)' and
'(PhosphoDecoy)' (e.g., sum of sequence.count('(Phospho)') and
sequence.count('(PhosphoDecoy)') or a single regex that matches both) so
phospho_count reflects the true number of localized sites before the
sorted_probs[:phospho_count] threshold check in that loop.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@onsite/phosphors/cli.py`:
- Around line 669-673: After calling
new_hit.setSequence(AASequence.fromString(new_sequence)) you must remove or
regenerate any existing ProForma annotation so metadata no longer describes the
old localization: clear or rebuild the ProForma field on new_hit (do not leave
the original ProForma in meta_fields), and apply the same change inside
_worker_process_pid_threaded so the threaded path does not re-append the
original ProForma after relocalization; ensure new_hit ends up with a ProForma
consistent with the new sequence (or no ProForma) before saving.
- Around line 243-262: In process_peptide_identification(), the single-thread
(threads=1) return paths that currently return PeptideHit(hit) can leak stale
PhosphoRS metadata; mirror the cleanup done for the threaded failed_hit by
creating a copy (e.g., failed_hit = PeptideHit(hit)), setScore(-1.0) when
appropriate, and remove the same metadata keys
("search_engine_sequence","regular_phospho_count","phospho_decoy_count","PhosphoRS_pep_score","PhosphoRS_site_probs","PhosphoRS_site_delta","SpecEValue_score","ProForma")
using metaValueExists/removeMetaValue wrapped in the same try/except, then
return that cleaned hit instead of the raw PeptideHit(hit).

---

Outside diff comments:
In `@tests/test_algorithm_comparison.py`:
- Around line 161-185: The current phospho site count logic in the peptide loop
under tests/test_algorithm_comparison.py uses sequence.count('(Phospho)') which
misses '(PhosphoDecoy)' occurrences, undercounting phospho_count used by the
top-N probability check; update the counting in the block that computes
phospho_count (where sequence is defined and used) to explicitly count both
'(Phospho)' and '(PhosphoDecoy)' (e.g., sum of sequence.count('(Phospho)') and
sequence.count('(PhosphoDecoy)') or a single regex that matches both) so
phospho_count reflects the true number of localized sites before the
sorted_probs[:phospho_count] threshold check in that loop.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d0ad8389-d850-433c-bf9f-d3f0f7c26e5d

📥 Commits

Reviewing files that changed from the base of the PR and between c01eb28 and 5f2b36b.

📒 Files selected for processing (6)
  • data/1_ascore_result.idXML
  • data/1_lucxor_result.idXML
  • data/1_phosphors_result.idXML
  • onsite/phosphors/cli.py
  • onsite/phosphors/phosphors.py
  • tests/test_algorithm_comparison.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • onsite/phosphors/phosphors.py

Comment thread onsite/phosphors/cli.py Outdated
Comment thread onsite/phosphors/cli.py
timosachsenberg and others added 2 commits May 31, 2026 17:35
The serial path stamped search_engine_sequence and
PhosphoRS_pep_score=-1.0 onto the 1708 non-phospho hits, while the
threaded worker dropped both from its failed-hit branch. Same
localization and FLR, but the files differed.

Add a shared make_unscored_hit() helper (original hit, score -1, all
managed PhosphoRS metadata removed) and route every skip branch through
it: the worker's failed-hit branch and both serial branches (no phospho
site, scoring returned no result). The phospho-scored path is unchanged.

threads=1 now byte-for-byte equals threads>1 (identical MD5); FLR
unchanged (665 sites / @5% FLR 161). Reference regenerated.

Full suite: 178 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…istic (#40)

PhosphoRS — the threaded worker scored every hit unconditionally while the
serial path gated on an explicit (Phospho)/(PhosphoDecoy) string, so the worker
could fall through to the scorer's mass-based modification inference (abs_tol
0.1 Da around the phospho mass) and (mis)score a non-phospho modification that
is near-isobaric with phospho (e.g. Sulfation, +79.9568, 0.0095 Da away) which
the serial path skips — diverging threads=1 vs threads>1 output. Both paths now
gate on a shared _has_localizable_phospho() before the scorer. threads=1 ==
threads>1 byte-identical is preserved and data/1 output is unchanged (still
byte-matches the committed reference); the fix only removes the latent divergence.

LucXor — the CLI never seeded the RNG, so the unseeded random.shuffle of decoy
permutations (psm.py) and np.random.choice model subsampling (models.py) made
output run-to-run non-deterministic. Add --seed (default 42); PyLuciPHOr2.run()
seeds random and numpy at entry (both the standalone CLI and the `onsite all`
path funnel through run()). Two default runs are now byte-identical; determinism
holds for the default single-threaded run (--threads>1 shares the global RNG).

Regenerate data/1_lucxor_result.idXML under the seed via the exact invocation
the comparison test uses, so the reference is itself reproducible. All three
algorithm-comparison tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
onsite/lucxor/cli.py (1)

286-290: 💤 Low value

Unreachable return statement.

After the early return at lines 240-253 when compute_all_scores=True, the code here can only execute when compute_all_scores=False. The condition on line 288 is therefore always True, sys.exit(exit_code) always runs, and line 290 is unreachable dead code.

🔧 Suggested simplification
-        # Only call sys.exit if not being called from compute_all_scores
-        if not compute_all_scores:
-            sys.exit(exit_code)
-        return exit_code
+        sys.exit(exit_code)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@onsite/lucxor/cli.py` around lines 286 - 290, The conditional around sys.exit
is redundant and leaves the final "return exit_code" unreachable; update the
tail of the function that uses the compute_all_scores flag so it either (A)
explicitly returns exit_code when compute_all_scores is True and calls
sys.exit(exit_code) only when compute_all_scores is False (i.e., convert the
current if-block to an if/else with sys.exit in the False branch and return
exit_code in the True branch), or (B) remove the unconditional sys.exit and
always return exit_code, ensuring references to compute_all_scores, sys.exit,
and return exit_code are adjusted accordingly to eliminate dead code.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@onsite/lucxor/cli.py`:
- Around line 286-290: The conditional around sys.exit is redundant and leaves
the final "return exit_code" unreachable; update the tail of the function that
uses the compute_all_scores flag so it either (A) explicitly returns exit_code
when compute_all_scores is True and calls sys.exit(exit_code) only when
compute_all_scores is False (i.e., convert the current if-block to an if/else
with sys.exit in the False branch and return exit_code in the True branch), or
(B) remove the unconditional sys.exit and always return exit_code, ensuring
references to compute_all_scores, sys.exit, and return exit_code are adjusted
accordingly to eliminate dead code.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6b8bb848-cc8f-4a82-abe9-9fe6c61e6dbb

📥 Commits

Reviewing files that changed from the base of the PR and between 5f2b36b and b1f69b2.

📒 Files selected for processing (4)
  • data/1_lucxor_result.idXML
  • data/1_phosphors_result.idXML
  • onsite/lucxor/cli.py
  • onsite/phosphors/cli.py

… --seed (#40)

The README listed only the basic per-PSM output metrics. Add what was missing:
- the per-site score dicts (AScore_site_scores, PhosphoRS_site_probs/_site_delta,
  Luciphor_site_scores), score directions, and per-tool thresholds (AScore >= 13,
  prob >= 75%, local FLR <= 0.05);
- a new "Interpreting the output: PSM-FDR vs localization FLR" section explaining
  the two orthogonal error axes (input PSM-FDR is preserved; localization adds an
  FLR), with a consolidated tool -> score -> cutoff table;
- the unified decoy-amino-acid global FLR workflow (--add-decoys + python -m
  onsite.decoy_flr), which was undocumented;
- the new LucXor --seed option.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@README.md`:
- Line 239: The description for PhosphoRS_site_delta currently uses a hyphenated
phrase "best-alternative isoform"; update that text to read "best alternative
isoform" (no hyphen) in the PhosphoRS_site_delta definition so it becomes:
PhosphoRS_site_delta: {position: Δ} — the −10·log10 P gap between the best and
best alternative isoform (rank1 − rank2). Used to rank a global FLR because,
unlike the probability, it does not saturate at 100%.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0dd682be-af34-4550-9506-836b542566f7

📥 Commits

Reviewing files that changed from the base of the PR and between b1f69b2 and b47bce6.

📒 Files selected for processing (1)
  • README.md

Comment thread README.md
- Isomer details with sequence and score
- Detailed confidence metrics
- `PhosphoRS_site_probs`: `{position: probability}` on a **0–100% scale** (**higher = more confident**) — the classic phosphoRS site probability.
- `PhosphoRS_site_delta`: `{position: Δ}` — the `−10·log10 P` gap between the best and best-alternative isoform (rank1 − rank2). Used to rank a global FLR because, unlike the probability, it does not saturate at 100%.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use “best alternative” (no hyphen) for clarity.

“best-alternative” reads like a typo here; “best alternative isoform” is clearer and avoids grammar-tool warnings.

🧰 Tools
🪛 LanguageTool

[grammar] ~239-~239: Ensure spelling is correct
Context: ... the best and best-alternative isoform (rank1 − rank2). Used to rank a global FLR bec...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~239-~239: Ensure spelling is correct
Context: ...t and best-alternative isoform (rank1 − rank2). Used to rank a global FLR because, un...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` at line 239, The description for PhosphoRS_site_delta currently
uses a hyphenated phrase "best-alternative isoform"; update that text to read
"best alternative isoform" (no hyphen) in the PhosphoRS_site_delta definition so
it becomes: PhosphoRS_site_delta: {position: Δ} — the −10·log10 P gap between
the best and best alternative isoform (rank1 − rank2). Used to rank a global FLR
because, unlike the probability, it does not saturate at 100%.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant