Unified decoy-AA global FLR across AScore / PhosphoRS / LucXor + scoring fixes (#40) by timosachsenberg · Pull Request #41 · bigbio/onsite

timosachsenberg · 2026-05-31T12:25:43Z

Summary

Implements a unified decoy-amino-acid global FLR so AScore, PhosphoRS and LucXor can be compared at the same PSM-FDR and the same FLR level, fixes the per-algorithm scoring bugs that made it unmeasurable / biased / unrankable, and brings the PhosphoRS implementation up to faithful phosphoRS (Taus et al. 2011). Implements the decoy-AA method of Ramsbottom et al. 2022 (J. Proteome Res., 10.1021/acs.jproteome.1c00827) — the only FLR definition common to all three tools.

Addresses #40. (Now also includes the peak-depth-optimization work originally split into #42, which was fast-forwarded into this branch.)

What's in here (11 commits)

AScore — stop dropping the decoy on multi-phospho peptides. getSites_ appended decoy-A positions after S/T/Y without re-sorting, so combinations() emitted descending index combos that createTheoreticalSpectra_ silently dropped. Sites are now sorted; each permutation is sorted before spectrum build.
LucXor — score and serialize the decoy. The production scoring path didn't recognize the lowercase a encoding a PhosphoDecoy(A) site, so A-localizations were scored as the unmodified backbone and couldn't be written. _get_mod_positions_from_perm, _get_mod_map and convert_sequence_to_standard_format now treat a as a real phospho-mass-bearing decoy site. Adds PSM.get_site_scores() (per-site delta from the permutation scores) as Luciphor_site_scores, giving LucXor a per-site confidence.
AScore/PhosphoRS plumbing + the FLR module. AScore emits position-keyed AScore_site_scores; PhosphoRS --add-decoys crash fixed (str(AASequence) → .toString() on pyopenms 3.5). New onsite/decoy_flr.py (python -m onsite.decoy_flr): pure estimator (Eq. 2 2·(T_c/X_c)·Σ(p_X_c)/n, capped at 1, q-value-style monotonization); mandatory ident-decoy + q-value filtering and a spectrum_reference intersection across tools; T_c/X_c over the analyzed set; site collapsing; unambiguous-peptide exclusion.
onsitec — merge by spectrum_reference, not list position. onsite all zipped results by index; a single dropped PSM fused scores from different peptides. Now matched by spectrum reference with a backbone-equality guard and drop reporting. (Also fixes a latent store() type bug.)
PhosphoRS — stop double-counting ions. Per-isomer matching reused experimental peaks and counted near-duplicate theoretical ions, inflating the binomial k/n and asymmetrically favoring decoys. New _count_matched_ions() merges indistinguishable ions and consumes each experimental peak once (decoy rate 5.4% → 4.3%).
PhosphoRS — rank the FLR on its native peptide-score delta. The reported site probabilities saturate at ~100% (no ranking resolution), so the decoy-AA yield was a tie-break artifact. New site_deltas_from_isomers() emits position-keyed PhosphoRS_site_delta (−10·log10 P gap between best and best-alternative isoform — phosphoRS's own rank1−rank2 signal), and the FLR ranks on it (probability still emitted for reporting). Tie block 351 → 1; yield becomes deterministic.
PhosphoRS — faithful dynamic per-window peak-depth optimization (§9–§12). The algorithm's defining step was missing — scoring ran against the plain filtered spectrum. New _choose_window_depth / _window_has_site_determining_ions / _reduce_by_peak_depth_optimization: per 100 m/z window, pick the depth maximizing rank1−rank2 isoform separation (best-absolute-score when no site-determining ions; smaller depth on ties). Replaces and removes the prior dead, buggy _reduce_by_delta_selection (inverted ratio, experimental peak count used as the binomial n) and _reduce_spectrum_by_windows.
PhosphoRS — full MS/MS m/z range for the random-match probability. p = N·d/w (§13) now uses w = full spectrum m/z range, not the extracted-peak span (N = extracted peaks). Paper-faithful; symmetric.
Regression test — fix the PhosphoRS tier unit bug + re-baseline references. filter_phosphors in test_algorithm_comparison.py compared percentage site probabilities (0–100) against a decimal threshold (/100 → 0.99/0.90/0.75), so the STRICT/MODERATE/LENIENT tiers all thresholded at <1% and collapsed to one set; now compared in percent so they stratify (PhosphoRS 986 / 1105 / 1122). The three reference idXMLs (data/1_{ascore,phosphors,lucxor}_result.idXML) are regenerated from this PR's fixed algorithms via the exact CLI invocations the test uses — they predated the scoring fixes, so the regression baseline had been guarding pre-fix output (new == reference now → ~100% recall).
PhosphoRS — re-localize in the serial (threads=1) path. The threads=1 path computed the best-scoring isomer but never wrote it back to the hit, so it kept the original search-engine localization while the threaded path (and AScore/LucXor) rewrote to the best isomer. This diverged the two thread counts on 496 PSMs and — because the decoy-AA FLR reads the localized site from the winning sequence — changed the measured yield (616 → 665 sites at q ≤ 0.01; @5% FLR 120 → 161). process_peptide_identification now mirrors the worker apply-back (setSequence(best_isomer)); threads=1 reproduces the reported numbers. Reference regenerated.
PhosphoRS — byte-identical output across thread counts. The serial path also stamped search_engine_sequence and PhosphoRS_pep_score=-1.0 onto the 1708 non-phospho hits while the threaded worker dropped both — equal localization and FLR, but different files. A shared make_unscored_hit() helper (original hit, score −1, all managed PhosphoRS metadata removed) now backs every skip branch in both paths. threads=1 and threads>1 are now byte-for-byte identical (identical MD5); FLR unchanged.

Realized 3-way comparison (`data/1.mzML`, 3697 PSMs → 2470 shared at q ≤ 0.01 → 1258 ambiguous)

Each tool ranked by its own native per-site confidence, scored on the same decoy-AA FLR (T_c/X_c = 2.03). Target sites recovered (decoy-A wins in parens). PhosphoRS figures are now thread-invariant (commits 10–11):

method	@1% FLR	@5% FLR	@10% FLR	total sites / decoy rate
LucXor	141	567 (7)	597 (10)	607 / 1.6%
AScore	23	436 (5)	521 (13)	611 / 4.6%
PhosphoRS	90	161 (2)	321 (8)	665 / 6.2%

Agreement on the winning localization across all three (1982 ambiguous PSMs): 65% all-agree; pairwise AScore=LucXor 78%, PhosphoRS=LucXor 75%, AScore=PhosphoRS 73%.

LucXor is most reliable here, AScore second, PhosphoRS least. PhosphoRS's lower yield is genuine — it survived the double-counting fix, delta-ranking, the faithful peak-depth optimizer, and the full-range-w fix; ~88% of its decoy wins are on symmetric A-vs-S/T ground (the decoy is scored identically to a real site), and ~half are driven by non-site-determining phospho neutral-loss ions, which is faithful phosphoRS behavior on these HCD spectra. Caveat: single dataset; the 96-file PXD000138 benchmark isn't available here, so the ranking could shift on other data.

Tests

178 pass (full suite), including a file-based 3-tool FLR integration test and unit tests for: AScore site-sort/decoy-drop, LucXor a-handling + get_site_scores, the FLR estimator/parser/orchestration, the onsitec spectrum-reference join, PhosphoRS ion-matching, the peptide-score delta, and the peak-depth optimizer (separation maximization, site-determining detection, tie/empty handling). PhosphoRS output is additionally verified thread-invariant (threads=1 == threads>1, identical MD5 on data/1.mzML).

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Unified decoy-amino-acid FLR curve computation across AScore, PhosphoRS, and LucXor.
- Tools now emit position-keyed per‑site localization confidence metadata.
- LucXor CLI adds --seed for reproducible runs.
- PhosphoRS exposes per-isomer site deltas for downstream reporting.
Bug Fixes
- Preserve decoy modifications and enforce sorted site/permutation ordering during spectrum generation.
- Merge PSMs by spectrum reference to avoid misalignment.
- Prevent double-counting in ion matching and improve peak‑depth selection.
Tests
- Extensive unit and integration tests covering FLR, site scoring, merging, phosphors, and regressions.
Documentation
- README updated with seed option, clearer metric descriptions, and unified FLR workflow.

…site scores (#40) Prerequisites for a unified decoy-amino-acid global FLR across AScore, PhosphoRS and LucXor (#40). The PhosphoDecoy(A) flag existed but two algorithms scored the decoy A incorrectly, biasing the FLR low. AScore (ascore.py): - getSites_ appended decoy A positions after S/T/Y without re-sorting, so combinations() emitted descending index combos (e.g. [4,1]) that createTheoreticalSpectra_ silently dropped (it assumes ascending order). Sort candidate sites, and defensively sort each permutation before building its spectrum, so decoy A is no longer dropped on multi-phospho peptides. LucXor (psm.py): - The production scoring path did not recognize the lowercase 'a' that encodes a PhosphoDecoy(A) site, so an A-localization was scored as the unmodified backbone and could not be serialized. Teach _get_mod_positions_from_perm, _get_mod_map and convert_sequence_to_standard_format to treat 'a' as a real phospho-mass-bearing PhosphoDecoy site (is_decoy stays False; full +79.966 mass; serializes to A(PhosphoDecoy)). End-to-end this surfaces 145 A-wins that were previously reverted to the input sequence. LucXor per-site scoring (psm.py, cli.py): - Add PSM.get_site_scores(): a per-site localization confidence derived from the already-computed real-permutation scores (max-with minus max-without delta), with no new spectral scoring. Emit it as the Luciphor_site_scores meta value in both output paths. This gives LucXor a per-site score (it is natively per-PSM only) so a site-level decoy-AA FLR can rank sites uniformly across all three tools. Tests: +10 (AScore sort/drop regressions; LucXor 'a' scoring/serialization; get_site_scores aggregation). Full LucXor pipeline suite unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… tools (#40) Adds the site-level decoy-amino-acid FLR that lets AScore, PhosphoRS and LucXor be compared at the same PSM-FDR and the same FLR level (#40), implementing Ramsbottom et al. 2022 (J. Proteome Res., 10.1021/acs.jproteome.1c00827). onsite/decoy_flr.py (new) + `python -m onsite.decoy_flr`: - Pure, separately-tested estimator flr_curve(): Eq. 2 pX_FLR_n = 2*(T_c/X_c)*cum_decoy/n, capped at 1.0, q-value-style monotonization (reverse cummin), and sites_at_flr() for the yield at a cutoff. - Mandatory "same PSM-FDR" filtering: drop identification decoys and apply the q-value threshold, then restrict to the spectrum_reference intersection shared by all tools (reporting per-tool drops). - T_c/X_c computed over exactly the analyzed set; unambiguous peptides excluded. - Redundant (peptide, position) sites collapsed (max score, PSM-count tiebreak). - Uniform position-keyed per-site scores across tools (AScore_site_scores / PhosphoRS_site_probs / Luciphor_site_scores). AScore (ascore.py, cli.py): - Emit position-keyed AScore_site_scores (from the existing site2score), preserved through both CLI paths, so AScore matches the other tools' format. PhosphoRS (phosphors.py): - Fix str(AASequence) -> .toString(): under pyopenms 3.5 str() returns the object repr, so the localized sequence was written as "AASequence(sequence=...)" and --add-decoys crashed on save. Fixes the two output sites and one unused site. Validated end-to-end with --add-decoys on the example data: on the shared 2470-PSM set at 5% global FLR, LucXor recovers 574 sites, AScore 441, PhosphoRS 181; decoys concentrate at low scores and the position join is correct for all three (>=298/300). Full suite: 168 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… position (#40) `onsite all` fused the AScore / PhosphoRS / LucXor outputs by zipping the three result lists by index. The tools drop/reorder PSMs independently (LucXor keepNBestHits / spectrum-miss / min-PSM abort; AScore error-drops), so a single dropped PSM shifted every later index and silently fused scores from different peptides into one hit; only a non-fatal length warning guarded it. - Add _join_psms_by_ref(): match PeptideIdentifications across tools by spectrum_reference (in LucXor order), verify the unmodified backbone is identical across all three before fusing, and report per-tool exclusions instead of silently truncating. - Rewrite the merge loop to iterate the keyed triples. - Fix the store call: merged_pep_ids was a plain list, which IdXMLFile().store rejects on pyopenms 3.5 (so the merge never actually produced output); use a PeptideIdentificationList. Validated end-to-end: merging the three example outputs yields 3697 PSMs with all three tool-sequences aligned and zero backbone mismatches. +3 unit tests (incl. the missing-middle-PSM misalignment case). Full suite: 171 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

PhosphoRS's per-isomer peak matching scanned theoretical ions while advancing a single experimental-peak pointer that was never consumed, and counted every theoretical ion (including indistinguishable near-duplicates). Both inflate the binomial k/n, and asymmetrically: decoy-A isomers carry extra phospho neutral-loss ions that cluster, so decoys accrued more spurious matches than S/T/Y isomers (~0.66 vs ~0.45 extra matches/isomer) and won more often. Add a tested helper _count_matched_ions() that: - merges theoretical ions within one tolerance window (one experimental peak cannot distinguish them -> count as a single trial), and - consumes each experimental peak at most once (no peak satisfies several ions). Also sort red_mz_arr explicitly (the matcher and the window width both require ascending m/z; the old scan silently assumed it, and would mis-match if not). Effect on the example data: PhosphoRS decoy-localization rate drops 5.4% -> 4.3% (36 -> 28 decoy wins), in line with AScore (4.6%). The decoy-AA FLR yield stays ~179 and tie-sensitive, which is correct: that reflects genuine saturation of PhosphoRS's reported site probabilities, not this double-counting bug. +1 unit test (no-double-count + near-duplicate ion merge). Full suite: 172 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

qodo-code-review · 2026-05-31T12:25:47Z

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

coderabbitai · 2026-05-31T12:25:55Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Emits per-site localization scores (AScore/LucXor/PhosphoRS), fixes AScore site/permutation ordering, treats lowercase 'a' as PhosphoDecoy alanine in LucXor, aligns PSMs by spectrum_reference, adds a unified decoy‑AA FLR estimator with idXML parsing/CLI, and improves phosphors ion-matching/peak‑depth reduction.

Changes

AScore Site Ordering and Cross-Tool Site Score Metadata

Layer / File(s)	Summary
AScore site sorting and theoretical spectrum fix `onsite/ascore/ascore.py`, `tests/test_ascore.py`	`getSites_` now returns phosphorylation/decoy site indices sorted ascending; `createTheoreticalSpectra_` sorts permutation indices before applying modifications and uses sorted lengths for completion checks.
AScore per-site score metadata & CLI propagation `onsite/ascore/ascore.py`, `onsite/ascore/cli.py`, `tests/test_ascore.py`	`compute` records `AScore_site_scores` (stringified position-keyed dict) into `PeptideHit` metadata for unambiguous and ambiguous localization cases; CLI threaded and non-thread paths append this meta when present.

LucXor PhosphoDecoy and Site Scores

Layer / File(s)	Summary
LucXor PhosphoDecoy-Alanine support & site scoring `onsite/lucxor/psm.py`, `onsite/lucxor/cli.py`, `tests/test_lucxor_regression.py`	Lowercase `'a'` treated as a phospho-bearing Alanine (`PHOSPHO_MOD_MASS`) and included among phospho candidate positions; added `PSM.get_site_scores()` computing per-residue deltas from permutation scores; `convert_sequence_to_standard_format` emits `A(PhosphoDecoy)`; CLI adds `--seed`, seeds RNGs, and emits `Luciphor_site_scores` on hits.

PSM Alignment and Merged Results

Layer / File(s)	Summary
PSM alignment by spectrum reference `onsite/onsitec.py`, `tests/test_onsitec_merge.py`	New `_join_psms_by_ref` matches PSMs across tools by `spectrum_reference`, verifies backbone sequence identity, and returns matched triples with drop/mismatch stats; `merge_algorithm_results` uses this helper and inserts into `PeptideIdentificationList` via `push_back`.

Decoy-AA False Localization Rate Estimator

Layer / File(s)	Summary
Core FLR estimators and idXML parsing `onsite/decoy_flr.py`, `tests/test_decoy_flr.py`	Added `flr_curve` (ranked cumulative target/decoy counts, raw FLR capped at 1.0, monotonized q-curve) and `sites_at_flr`; `parse_localized_sites` and `parse_tool_idxml` parse localized sites, decoy flags, q-values, and per-position score maps into `PSMRecord`s.
Tool FLR computation & orchestration `onsite/decoy_flr.py`, `tests/test_decoy_flr.py`	`compute_tool_flr` filters ident-decoys/q-threshold, restricts to shared spectrum-ref keep-set, excludes unambiguous peptides, optionally collapses redundant sites by max score, computes `t_c/x_c`, ranks sites, and produces FLR curve; `compute_decoy_flr` orchestrates per-tool runs and CLI output.

PhosphoRS Ion Matching and Peak-Depth Optimization

Layer / File(s)	Summary
PhosphoRS ion matching & depth optimization `onsite/phosphors/phosphors.py`, `onsite/phosphors/cli.py`, `tests/test_phosphors.py`	Removed `MIN_DEPTH`; added `_count_matched_ions` to merge indistinguishable theoretical ions and greedily consume experimental peaks; implemented per-window peak-depth optimization for isoform separation; updated serialization to `AASequence.toString()` and wired `PhosphoRS_site_delta` metadata through CLI helpers.

Sequence Diagram (decoy-AA FLR high-level flow)

sequenceDiagram
  participant CLI as compute_decoy_flr/main
  participant Parser as parse_tool_idxml
  participant ToolCompute as compute_tool_flr
  participant Intersector as shared_spectrum_intersection
  participant FLR as flr_curve
  CLI->>Parser: parse each tool idXML
  Parser->>ToolCompute: emit PSMRecord list
  ToolCompute->>Intersector: restrict to shared spectrum_refs
  Intersector->>ToolCompute: keep-set
  ToolCompute->>FLR: rank collapsed sites -> is_decoy_by_rank, t_c, x_c
  FLR->>CLI: return per-tool FLR curve & cutoff

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Unified global FLR (decoy-AA method) across AScore / PhosphoRS / LucXor for same-FDR/same-FLR comparison #40: Implements the requested fixes/features (AScore getSites_/computePermutations_ sorting and AScore_site_scores, LucXor PSM handling of lowercase 'a' and get_site_scores, merge by spectrum_reference, plus a new decoy FLR estimator).

Possibly related PRs

bigbio/onsite#4: LucXor core changes around PSM permutation logic and site scoring relate to this earlier LucXor work.
bigbio/onsite#28: Touches filter_phosphors and algorithm-comparison semantics updated here.
bigbio/onsite#11: Prior CLI wiring and per-hit metadata propagation that these changes extend.

Suggested labels

Review effort 5/5

Suggested reviewers

weizhongchun
ypriverol

Poem

"I'm a rabbit in the lab, nose twitching at scores,
Sites now line up tidy, no scrambled permutation wars.
Decoys and phosphos take their rightful place,
FLR curves hum softly, metadata hops through each trace.
Hooray — the outputs sing, reproducible and neat!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 69.39% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main changes: implementing a unified decoy-amino acid global FLR estimator across three tools and addressing multiple scoring bugs.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/decoy-flr-scoring-bugs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codacy-production · 2026-05-31T12:31:30Z

Not up to standards ⛔

🔴 Issues 58 high · 11 medium · 31 minor

Alerts:
⚠ 100 issues (≤ 0 issues of at least minor severity)

Results:
100 new issues

Category Results

Documentation 31 minor

ErrorProne 3 high

Security 1 medium
55 high

Complexity 10 medium

View in Codacy

🟢 Metrics 131 complexity · -10 duplication

Metric Results

Complexity 131

Duplication -10

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

coderabbitai

🧹 Nitpick comments (3)

tests/test_onsitec_merge.py (2)

36-36: ⚡ Quick win

Rename single-letter variable for clarity.

The variable l (lowercase L) is flagged by PEP8 as ambiguous and hard to distinguish from 1 (digit one) or I (uppercase i). Consider renaming to luc or lucxor_pid.

♻️ Proposed fix

-    refs = [l.getMetaValue("spectrum_reference") for (_a, _p, l) in triples]
+    refs = [luc.getMetaValue("spectrum_reference") for (_a, _p, luc) in triples]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_onsitec_merge.py` at line 36, Rename the single-letter variable
`l` in the list comprehension refs = [l.getMetaValue("spectrum_reference") for
(_a, _p, l) in triples] to a clearer name (e.g., `luc` or `lucxor_pid`)
throughout that expression; update the comprehension to use the new identifier
and ensure any downstream references in the same scope that relied on `l` are
renamed as well so the call to getMetaValue("spectrum_reference") now reads
luc.getMetaValue(...).

68-78: ⚡ Quick win

Rename single-letter variable for clarity.

Same PEP8 ambiguity issue with l on line 77.

♻️ Proposed fix

-    assert [l.getMetaValue("spectrum_reference") for (_a, _p, l) in triples] == refs
+    assert [luc.getMetaValue("spectrum_reference") for (_a, _p, luc) in triples] == refs

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_onsitec_merge.py` around lines 68 - 78, The test function
test_join_identical_sets uses a single-letter variable `l` in the list
comprehension extracting spectrum_reference from `triples`, which is ambiguous;
rename `l` to a clearer name (e.g., `lucxor_psm` or `last_psm`) and update the
unpacking for `triples` (from `(_a, _p, l)` to `(_a, _p, lucxor_psm)`) so the
comprehension [l.getMetaValue("spectrum_reference") for ...] becomes
[lucxor_psm.getMetaValue("spectrum_reference") for ...], keeping the call to
`_join_psms_by_ref` and the rest of the assertions unchanged.

tests/test_decoy_flr.py (1)

101-112: 💤 Low value

Consider using next() for single-element extraction.

Line 109 uses list comprehension with indexing [0] to extract one element. Using next() is more idiomatic and raises StopIteration if no match is found (which would surface a test logic error).

♻️ Proposed refactor

-    top = [c for c in collapsed if c[2] == 2][0]
+    top = next(c for c in collapsed if c[2] == 2)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_decoy_flr.py` around lines 101 - 112, In
test_collapse_takes_max_and_counts_psms, replace the list-comprehension indexing
used to extract the single matching element with next(...) so a missing match
raises StopIteration; specifically change the expression that sets top
(currently using [c for c in collapsed if c[2] == 2][0]) to use next(c for c in
collapsed if c[2] == 2) while keeping the rest of the assertion logic and the
reference to _collapse_sites intact.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_decoy_flr.py`:
- Around line 101-112: In test_collapse_takes_max_and_counts_psms, replace the
list-comprehension indexing used to extract the single matching element with
next(...) so a missing match raises StopIteration; specifically change the
expression that sets top (currently using [c for c in collapsed if c[2] ==
2][0]) to use next(c for c in collapsed if c[2] == 2) while keeping the rest of
the assertion logic and the reference to _collapse_sites intact.

In `@tests/test_onsitec_merge.py`:
- Line 36: Rename the single-letter variable `l` in the list comprehension refs
= [l.getMetaValue("spectrum_reference") for (_a, _p, l) in triples] to a clearer
name (e.g., `luc` or `lucxor_pid`) throughout that expression; update the
comprehension to use the new identifier and ensure any downstream references in
the same scope that relied on `l` are renamed as well so the call to
getMetaValue("spectrum_reference") now reads luc.getMetaValue(...).
- Around line 68-78: The test function test_join_identical_sets uses a
single-letter variable `l` in the list comprehension extracting
spectrum_reference from `triples`, which is ambiguous; rename `l` to a clearer
name (e.g., `lucxor_psm` or `last_psm`) and update the unpacking for `triples`
(from `(_a, _p, l)` to `(_a, _p, lucxor_psm)`) so the comprehension
[l.getMetaValue("spectrum_reference") for ...] becomes
[lucxor_psm.getMetaValue("spectrum_reference") for ...], keeping the call to
`_join_psms_by_ref` and the rest of the assertions unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b3ba9e69-29c2-4a21-bc3f-fdb9b8751421

📥 Commits

Reviewing files that changed from the base of the PR and between eb7f3be and c8de981.

📒 Files selected for processing (12)

onsite/ascore/ascore.py
onsite/ascore/cli.py
onsite/decoy_flr.py
onsite/lucxor/cli.py
onsite/lucxor/psm.py
onsite/onsitec.py
onsite/phosphors/phosphors.py
tests/test_ascore.py
tests/test_decoy_flr.py
tests/test_lucxor_regression.py
tests/test_onsitec_merge.py
tests/test_phosphors.py

github-actions · 2026-05-31T12:34:38Z

Algorithm Comparison Test Results

Click to expand test results

============================= test session starts ==============================
platform linux -- Python 3.11.15, pytest-9.0.3, pluggy-1.6.0 -- /opt/hostedtoolcache/Python/3.11.15/x64/bin/python
cachedir: .pytest_cache
rootdir: /home/runner/work/onsite/onsite
configfile: pyproject.toml
plugins: cov-7.1.0
collecting ... collected 3 items

tests/test_algorithm_comparison.py::TestAlgorithmComparison::test_lucxor_comparison 
================================================================================
LucXor Comparison Results (q-value < 0.01)
================================================================================

STRICT (Local FLR < 0.01):
  New results: 848
  Reference results: 848
  Overlap: 848 (100.0%)
  Recall: 100.0% (new found 848/848 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x

MODERATE (Local FLR < 0.05):
  New results: 1064
  Reference results: 1064
  Overlap: 1064 (100.0%)
  Recall: 100.0% (new found 1064/1064 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x

LENIENT (Local FLR < 0.1):
  New results: 1081
  Reference results: 1081
  Overlap: 1081 (100.0%)
  Recall: 100.0% (new found 1081/1081 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x
PASSED
tests/test_algorithm_comparison.py::TestAlgorithmComparison::test_ascore_comparison 
================================================================================
AScore Comparison Results (q-value < 0.01)
================================================================================

STRICT (AScore >= 20):
  New results: 919
  Reference results: 919
  Overlap: 919 (100.0%)
  Recall: 100.0% (new found 919/919 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x

MODERATE (AScore >= 15):
  New results: 1023
  Reference results: 1023
  Overlap: 1023 (100.0%)
  Recall: 100.0% (new found 1023/1023 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x

LENIENT (AScore >= 3):
  New results: 1076
  Reference results: 1076
  Overlap: 1076 (100.0%)
  Recall: 100.0% (new found 1076/1076 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x
PASSED
tests/test_algorithm_comparison.py::TestAlgorithmComparison::test_phosphors_comparison 
================================================================================
PhosphoRS Comparison Results (q-value < 0.01)
================================================================================

STRICT (Site probability > 99%):
  New results: 983
  Reference results: 983
  Overlap: 983 (100.0%)
  Recall: 100.0% (new found 983/983 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x

MODERATE (Site probability > 90%):
  New results: 1084
  Reference results: 1084
  Overlap: 1084 (100.0%)
  Recall: 100.0% (new found 1084/1084 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x

LENIENT (Site probability > 75%):
  New results: 1102
  Reference results: 1102
  Overlap: 1102 (100.0%)
  Recall: 100.0% (new found 1102/1102 reference sites)
  Gain rate: 0.0% (0 new-only sites)
  Lost sites: 0
  Count ratio: 1.00x
PASSED

============================== 3 passed in 45.11s ==============================

PhosphoRS's reported site probabilities saturate -- (1/P)/sum(1/P) pins hundreds of sites at ~100% -- so a global decoy-AA FLR cannot rank genuine sites above decoy mis-localizations, and the recovered-site count was a tie-break artifact (~0-190 depending on ordering). phosphoRS's own discrimination signal is the peptide-score delta (-10*log10(P_random) between the best and the best alternative isoform -- the rank1-rank2 gap it maximizes when choosing peak depth, Taus et al. 2011), which lives in log space and does not saturate. It is a native phosphoRS quantity, not a re-derived metric. - Add site_deltas_from_isomers(): per-site best-with-minus-best-without of the peptide score, floored to stay finite if P_random underflowed. - Emit it as position-keyed PhosphoRS_site_delta (both CLI paths; managed-meta hygiene), alongside the still-reported PhosphoRS_site_probs. - Rank the PhosphoRS leg of the decoy-AA FLR on PhosphoRS_site_delta. Effect: the ~100% tie block collapses 351 -> 1 and the 5%-FLR yield becomes deterministic (151 across 40 tie orderings, previously 0-190). PhosphoRS still recovers fewer sites than AScore, but now as a stable, meaningful result. Adds a file-based 3-tool integration test (parse -> filter -> intersect -> collapse -> Eq.2 -> threshold) plus a site_deltas_from_isomers unit test. Full suite: 174 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…40) Implements the phosphoRS peak-depth optimization (Taus et al. 2011, pseudocode sections 9-12) that the implementation was missing: per 100 m/z window, choose the peak depth (1..8) that maximizes the rank1-rank2 separation between the best and second-best isoform (rank3/rank4 tie-breakers; best-absolute-score when no site-determining ions are present; smaller depth on ties). This replaces _reduce_by_delta_selection, which was dead code AND buggy: its depth-selection ratio (pj/pj1, maximized) was inverted (it minimized separation) and it used the experimental peak count as the binomial n. Since commit d64e2dd ("Use filtered spectrum directly") the active path scored against the unoptimized filtered spectrum. New, separately-tested helpers: - _window_has_site_determining_ions (section 10) - _isoform_peptide_scores / _choose_window_depth (sections 9 & 11) - _reduce_by_peak_depth_optimization (sections 8-12 orchestrator) Gated by ENABLE_PEAK_DEPTH_OPTIMIZATION (default True). Effect on the example data (q<=0.01, 5% global FLR): PhosphoRS yield 151 -> 163, decoy-localization rate 4.3% -> 6.2%; phosphoRS ~3x slower (per-isoform theoretical-spectrum generation). The comparison conclusion is unchanged -- PhosphoRS still recovers far fewer sites than AScore (163 vs 441), confirming its lower reliability on this data is genuine, not an artifact of the missing optimization. +4 unit tests (depth selection, site-determining detection, tie-break, empties). Full suite: 178 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…optimizer Deletes the now-unused window-reduction code paths (none had any call site): - _reduce_by_delta_selection (buggy: inverted depth ratio, experimental peak count used as the binomial n) and _reduce_spectrum_by_windows; - their exclusive helpers _site_determining_ions, _window_intensity_thresholds, _count_peaks_above_threshold, _get_intensity_thresholds; - the now-orphaned MIN_DEPTH constant. The faithful _reduce_by_peak_depth_optimization replaces them; _get_window_indexes is kept (used by it). ~371 lines removed; behavior unchanged. Full suite: 178 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…lity w (#40) p = N*d/w (phosphoRS, Taus et al. 2011 section 13) where w is the FULL m/z range of the MS/MS spectrum. The code used the m/z span of the *extracted* peak list instead, which is narrower -> p too large -> scores slightly over-conservative. Use the original spectrum's m/z range (N still = extracted peak count). Before/after on the full dataset (data/1.mzML, 3697 PSMs): 81/1989 per-PSM PhosphoRS_pep_score values change, but the 3-way decoy-AA FLR is unchanged (PhosphoRS 163 sites @ 5% FLR, decoy rate 6.2%; LucXor 574 / AScore 441) -- the deviation was real but immaterial to the comparison here. Kept for fidelity. Full suite: 178 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

timosachsenberg · 2026-05-31T14:09:12Z

Realized 3-way comparison (the deliverable)

All on the example data (data/1.mzML, 3697 PSMs → 2470 shared after q ≤ 0.01 + identification-decoy removal → 1258 ambiguous PSMs analyzed), each tool ranked by its own native per-site confidence, scored on the same decoy-amino-acid FLR (Ramsbottom et al. 2022; 2·(T_c/X_c) normalization, T_c/X_c = 2.03).

The peak-depth optimizer (originally split into #42) is now part of this PR. Without it, PhosphoRS is ~151 sites @ 5% — the ranking and the conclusion are unchanged. All figures below are now reproducible: PhosphoRS is byte-identical across thread counts, and LucXor is seeded (--seed, default 42; the reference idXML was regenerated under the seed).

Sites recovered at matched global FLR

Correctly-localized target sites (decoy-A wins in parentheses):

method	@1% FLR	@5% FLR	@10% FLR	total sites / decoys (rate)
LucXor	139	591	598	607 / 9 (1.5%)
AScore	23	436	521	611 / 28 (4.6%)
PhosphoRS	90	161	321	665 / 41 (6.2%)

LucXor recovers the most sites at every threshold and has the lowest decoy-localization rate; AScore is the clear middle; PhosphoRS trails and only catches up at looser FLR.

Inter-method agreement (same 1258 analyzed PSMs)

	agreement
All three agree on the site(s)	77%
AScore = LucXor	89%
PhosphoRS = LucXor	83%
AScore = PhosphoRS	80%

The ~23% of ambiguous PSMs where they disagree are the hard cases the FLR adjudicates.

Why the curves differ

	AScore	PhosphoRS	LucXor
scoring basis	binomial over site-determining ions	binomial over all matched ions + dynamic peak-depth opt	KDE of delta-scores vs decoy-peptide permutations
native rank key	`AScore_N`	`PhosphoRS_site_delta`	`Luciphor_site_scores`
decoy rate	4.6%	6.2%	1.5%

Takeaways

This is the intended "same PSM-FDR + same FLR" comparison: identical PSM set, identical FLR definition, each ranked by its own confidence.
LucXor is most reliable on this data, AScore second, PhosphoRS least. PhosphoRS's lower yield is genuine — it survived the double-counting fix, delta-ranking, the faithful peak-depth optimizer, and the full-range-w fix; ~88% of its decoy wins are on symmetric A-vs-S/T ground.
Caveats: (1) single dataset (data/1.mzML); the 96-file PXD000138 benchmark isn't available here, so the ranking could shift on other data — but the pipeline now produces it correctly and reproducibly. (2) The test_algorithm_comparison.py recall numbers guard determinism/stability against a self-baselined reference, not external correctness; correctness rests on the hand-computed unit tests and on faithfulness to Ramsbottom 2022 Eq. 2.

Reproduce (incl. the curve plot)

from onsite.decoy_flr import compute_decoy_flr, sites_at_flr
res = compute_decoy_flr({"ascore": ASCORE_IDXML, "phosphors": PHOSPHORS_IDXML,
                         "lucxor": LUCXOR_IDXML}, q_threshold=0.01, flr_threshold=0.05)
for tool, r in res.items():
    for thr in (0.01, 0.05, 0.10):
        print(tool, thr, sites_at_flr(r.curve, thr))   # (total, target, decoy)
# plot: r.curve["qval"] (x = global FLR) vs r.curve["cum_target"] (y = sites)

The LucXor idXML is produced with the default seed (--seed 42), so these numbers reproduce exactly. A sites-recovered vs global-FLR curve plot for the three methods was generated from r.curve (qval vs cum_target).

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

onsite/phosphors/cli.py (1)

241-260: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clear PhosphoRS metadata on failed threaded hits.

When hit_res["status"] != "success", this branch clones hit_src and only sets score = -1.0. On a rerun over an already-scored idXML, old PhosphoRS_site_probs / PhosphoRS_site_delta survive because the cleanup at Lines 254-262 is skipped, and save_identifications() only backfills missing fields. That can leak stale site deltas into downstream FLR ranking.

Proposed fix

                     for hit_src, hit_res in zip(pid_src.getHits(), res["hits"]):
                         if hit_res.get("status") != "success":
-                            # Preserve original hit with -1 score when failed
                             failed_hit = PeptideHit(hit_src)
+                            for k in [
+                                "search_engine_sequence",
+                                "regular_phospho_count",
+                                "phospho_decoy_count",
+                                "PhosphoRS_pep_score",
+                                "PhosphoRS_site_probs",
+                                "PhosphoRS_site_delta",
+                                "SpecEValue_score",
+                            ]:
+                                if failed_hit.metaValueExists(k):
+                                    try:
+                                        failed_hit.removeMetaValue(k)
+                                    except Exception:
+                                        pass
                             failed_hit.setScore(-1.0)
+                            failed_hit.setMetaValue("PhosphoRS_pep_score", -1.0)
+                            failed_hit.setMetaValue("PhosphoRS_site_probs", "{}")
+                            failed_hit.setMetaValue("PhosphoRS_site_delta", "{}")
                             new_hits.append(failed_hit)
                             continue

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@onsite/phosphors/cli.py` around lines 241 - 260, The failed-hit branch clones
hit_src into failed_hit and only sets score = -1.0, but it must also clear the
same managed PhosphoRS metadata as the success branch to avoid leaking stale
fields; update the branch handling hit_res.get("status") != "success" to
remove/clear keys like "search_engine_sequence", "regular_phospho_count",
"phospho_decoy_count", "PhosphoRS_pep_score", "PhosphoRS_site_probs", and
"PhosphoRS_site_delta" from the PeptideHit (failed_hit) before appending it,
mirroring the cleanup logic used for new_hit so downstream
save_identifications()/FLR ranking cannot see old PhosphoRS values.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@onsite/phosphors/phosphors.py`:
- Around line 853-871: The loop skips a peak exactly equal to max_mz because
windows use cur <= mz < hi and the loop ends at cur < max_mz; to fix, iterate
windows up to and including max_mz and include the upper-bound peak in the last
window: change the loop to compute hi = min(cur + WINDOW_SIZE, max_mz) and use
while cur <= max_mz, and when building theo_in_win use an inclusive upper bound
(cur <= mz <= hi) for selecting mz values (or otherwise ensure
_get_window_indexes treats the final hi as inclusive) so a peak at max_mz is
considered in the final window; update references in this block (cur, hi,
max_mz, WINDOW_SIZE, _get_window_indexes, isoform_theo,
_window_has_site_determining_ions) accordingly.

---

Outside diff comments:
In `@onsite/phosphors/cli.py`:
- Around line 241-260: The failed-hit branch clones hit_src into failed_hit and
only sets score = -1.0, but it must also clear the same managed PhosphoRS
metadata as the success branch to avoid leaking stale fields; update the branch
handling hit_res.get("status") != "success" to remove/clear keys like
"search_engine_sequence", "regular_phospho_count", "phospho_decoy_count",
"PhosphoRS_pep_score", "PhosphoRS_site_probs", and "PhosphoRS_site_delta" from
the PeptideHit (failed_hit) before appending it, mirroring the cleanup logic
used for new_hit so downstream save_identifications()/FLR ranking cannot see old
PhosphoRS values.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3e903b9c-d6f8-4aa2-9ee7-9338b0624926

📥 Commits

Reviewing files that changed from the base of the PR and between c8de981 and c01eb28.

📒 Files selected for processing (5)

onsite/decoy_flr.py
onsite/phosphors/cli.py
onsite/phosphors/phosphors.py
tests/test_decoy_flr.py
tests/test_phosphors.py

🚧 Files skipped from review as they are similar to previous changes (2)

tests/test_phosphors.py
onsite/decoy_flr.py

… idXMLs (#40) filter_phosphors compared percentage site probabilities (0-100) against a decimal threshold (prob_threshold/100 -> 0.99/0.90/0.75), so the STRICT/MODERATE/ LENIENT tiers all thresholded at <1% and selected the same set (identical counts). Compare in percent directly. The three tiers now stratify, e.g. PhosphoRS 986 (>99%) / 1105 (>90%) / 1122 (>75%). Regenerate the reference outputs (data/1_{ascore,phosphors,lucxor}_result.idXML) from the current, fixed algorithms using the exact CLI invocations the test uses. The old references predated this PR's scoring fixes (delta-ranking, double-counting, peak-depth optimization, full-range w), so the regression baseline was guarding pre-fix output. New==reference now -> recall ~100%. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@5

1) Peak-depth optimizer dropped a peak sitting exactly on the final window boundary at max_mz (windows are [cur, hi) with hi exclusive and the loop ends at cur < max_mz). Extend the final window's upper bound past max_mz so that peak is included; the loop still advances by WINDOW_SIZE so it terminates. (The reviewer's literal suggestion, while cur <= max_mz with hi=min(...,max_mz), would infinite-loop once hi==max_mz.) Edge case only when max_mz-min_mz is an exact multiple of WINDOW_SIZE; unreachable for real m/z but a genuine off-by-one. 2) The parallel PhosphoRS path's failed-hit branch cloned the original hit and set score -1.0 but did not clear the managed PhosphoRS metadata the success branch clears, so stale fields could leak if the input already carried prior PhosphoRS results. Mirror the success-branch cleanup on failed_hit. Both are behavior-neutral on data/1.mzML (threads=1: 616 sites / 120 @5% FLR; threads=4: 665 / 161 -- unchanged), and full suite (178) passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@5

The threads=1 path computed the best-scoring isomer but never wrote it back to the hit, so it kept the original search-engine localization, while the threaded path (and AScore/LucXor) rewrite to the best isomer. This diverged the two thread counts on 496 PSMs and, through the decoy-AA FLR (which reads the localized site from the winning sequence), changed the measured yield (616 -> 665 sites at q<=0.01; @5% FLR 120 -> 161). Mirror the worker apply-back: setSequence(best_isomer) in process_peptide_identification. threads=1 now equals threads=4 (differing winning sequences 496 -> 0) and reproduces the reported numbers. Regenerate data/1_phosphors_result.idXML (the test reference, built at threads=1, had captured the pre-fix localizations). Full suite: 178 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/test_algorithm_comparison.py (1)

161-185: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Count PhosphoDecoy sites explicitly in the top-N check.

Line 174 undercounts decoy-localized peptides: "(Phospho)" is not a substring of "(PhosphoDecoy)". That means this filter can pass a hit after checking fewer than the real number of localized sites, which weakens the regression guard for the unified decoy-FLR path.

Suggested fix

-            # Count phosphorylations (both regular and decoy)
-            # OpenMS format: "S(Phospho)", "T(Phospho)", "Y(Phospho)", "A(PhosphoDecoy)"
-            phospho_count = sequence.count('(Phospho)')  # Matches both Phospho and PhosphoDecoy
+            # Count both regular and decoy localization sites.
+            phospho_count = (
+                sequence.count("(Phospho)")
+                + sequence.count("(PhosphoDecoy)")
+            )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_algorithm_comparison.py` around lines 161 - 185, The current
phospho site count logic in the peptide loop under
tests/test_algorithm_comparison.py uses sequence.count('(Phospho)') which misses
'(PhosphoDecoy)' occurrences, undercounting phospho_count used by the top-N
probability check; update the counting in the block that computes phospho_count
(where sequence is defined and used) to explicitly count both '(Phospho)' and
'(PhosphoDecoy)' (e.g., sum of sequence.count('(Phospho)') and
sequence.count('(PhosphoDecoy)') or a single regex that matches both) so
phospho_count reflects the true number of localized sites before the
sorted_probs[:phospho_count] threshold check in that loop.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@onsite/phosphors/cli.py`:
- Around line 669-673: After calling
new_hit.setSequence(AASequence.fromString(new_sequence)) you must remove or
regenerate any existing ProForma annotation so metadata no longer describes the
old localization: clear or rebuild the ProForma field on new_hit (do not leave
the original ProForma in meta_fields), and apply the same change inside
_worker_process_pid_threaded so the threaded path does not re-append the
original ProForma after relocalization; ensure new_hit ends up with a ProForma
consistent with the new sequence (or no ProForma) before saving.
- Around line 243-262: In process_peptide_identification(), the single-thread
(threads=1) return paths that currently return PeptideHit(hit) can leak stale
PhosphoRS metadata; mirror the cleanup done for the threaded failed_hit by
creating a copy (e.g., failed_hit = PeptideHit(hit)), setScore(-1.0) when
appropriate, and remove the same metadata keys
("search_engine_sequence","regular_phospho_count","phospho_decoy_count","PhosphoRS_pep_score","PhosphoRS_site_probs","PhosphoRS_site_delta","SpecEValue_score","ProForma")
using metaValueExists/removeMetaValue wrapped in the same try/except, then
return that cleaned hit instead of the raw PeptideHit(hit).

---

Outside diff comments:
In `@tests/test_algorithm_comparison.py`:
- Around line 161-185: The current phospho site count logic in the peptide loop
under tests/test_algorithm_comparison.py uses sequence.count('(Phospho)') which
misses '(PhosphoDecoy)' occurrences, undercounting phospho_count used by the
top-N probability check; update the counting in the block that computes
phospho_count (where sequence is defined and used) to explicitly count both
'(Phospho)' and '(PhosphoDecoy)' (e.g., sum of sequence.count('(Phospho)') and
sequence.count('(PhosphoDecoy)') or a single regex that matches both) so
phospho_count reflects the true number of localized sites before the
sorted_probs[:phospho_count] threshold check in that loop.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d0ad8389-d850-433c-bf9f-d3f0f7c26e5d

📥 Commits

Reviewing files that changed from the base of the PR and between c01eb28 and 5f2b36b.

📒 Files selected for processing (6)

data/1_ascore_result.idXML
data/1_lucxor_result.idXML
data/1_phosphors_result.idXML
onsite/phosphors/cli.py
onsite/phosphors/phosphors.py
tests/test_algorithm_comparison.py

🚧 Files skipped from review as they are similar to previous changes (1)

onsite/phosphors/phosphors.py

@5

The serial path stamped search_engine_sequence and PhosphoRS_pep_score=-1.0 onto the 1708 non-phospho hits, while the threaded worker dropped both from its failed-hit branch. Same localization and FLR, but the files differed. Add a shared make_unscored_hit() helper (original hit, score -1, all managed PhosphoRS metadata removed) and route every skip branch through it: the worker's failed-hit branch and both serial branches (no phospho site, scoring returned no result). The phospho-scored path is unchanged. threads=1 now byte-for-byte equals threads>1 (identical MD5); FLR unchanged (665 sites / @5% FLR 161). Reference regenerated. Full suite: 178 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…istic (#40) PhosphoRS — the threaded worker scored every hit unconditionally while the serial path gated on an explicit (Phospho)/(PhosphoDecoy) string, so the worker could fall through to the scorer's mass-based modification inference (abs_tol 0.1 Da around the phospho mass) and (mis)score a non-phospho modification that is near-isobaric with phospho (e.g. Sulfation, +79.9568, 0.0095 Da away) which the serial path skips — diverging threads=1 vs threads>1 output. Both paths now gate on a shared _has_localizable_phospho() before the scorer. threads=1 == threads>1 byte-identical is preserved and data/1 output is unchanged (still byte-matches the committed reference); the fix only removes the latent divergence. LucXor — the CLI never seeded the RNG, so the unseeded random.shuffle of decoy permutations (psm.py) and np.random.choice model subsampling (models.py) made output run-to-run non-deterministic. Add --seed (default 42); PyLuciPHOr2.run() seeds random and numpy at entry (both the standalone CLI and the `onsite all` path funnel through run()). Two default runs are now byte-identical; determinism holds for the default single-threaded run (--threads>1 shares the global RNG). Regenerate data/1_lucxor_result.idXML under the seed via the exact invocation the comparison test uses, so the reference is itself reproducible. All three algorithm-comparison tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (1)

onsite/lucxor/cli.py (1)

286-290: 💤 Low value

Unreachable return statement.

After the early return at lines 240-253 when compute_all_scores=True, the code here can only execute when compute_all_scores=False. The condition on line 288 is therefore always True, sys.exit(exit_code) always runs, and line 290 is unreachable dead code.

🔧 Suggested simplification

-        # Only call sys.exit if not being called from compute_all_scores
-        if not compute_all_scores:
-            sys.exit(exit_code)
-        return exit_code
+        sys.exit(exit_code)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@onsite/lucxor/cli.py` around lines 286 - 290, The conditional around sys.exit
is redundant and leaves the final "return exit_code" unreachable; update the
tail of the function that uses the compute_all_scores flag so it either (A)
explicitly returns exit_code when compute_all_scores is True and calls
sys.exit(exit_code) only when compute_all_scores is False (i.e., convert the
current if-block to an if/else with sys.exit in the False branch and return
exit_code in the True branch), or (B) remove the unconditional sys.exit and
always return exit_code, ensuring references to compute_all_scores, sys.exit,
and return exit_code are adjusted accordingly to eliminate dead code.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@onsite/lucxor/cli.py`:
- Around line 286-290: The conditional around sys.exit is redundant and leaves
the final "return exit_code" unreachable; update the tail of the function that
uses the compute_all_scores flag so it either (A) explicitly returns exit_code
when compute_all_scores is True and calls sys.exit(exit_code) only when
compute_all_scores is False (i.e., convert the current if-block to an if/else
with sys.exit in the False branch and return exit_code in the True branch), or
(B) remove the unconditional sys.exit and always return exit_code, ensuring
references to compute_all_scores, sys.exit, and return exit_code are adjusted
accordingly to eliminate dead code.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6b8bb848-cc8f-4a82-abe9-9fe6c61e6dbb

📥 Commits

Reviewing files that changed from the base of the PR and between 5f2b36b and b1f69b2.

📒 Files selected for processing (4)

data/1_lucxor_result.idXML
data/1_phosphors_result.idXML
onsite/lucxor/cli.py
onsite/phosphors/cli.py

… --seed (#40) The README listed only the basic per-PSM output metrics. Add what was missing: - the per-site score dicts (AScore_site_scores, PhosphoRS_site_probs/_site_delta, Luciphor_site_scores), score directions, and per-tool thresholds (AScore >= 13, prob >= 75%, local FLR <= 0.05); - a new "Interpreting the output: PSM-FDR vs localization FLR" section explaining the two orthogonal error axes (input PSM-FDR is preserved; localization adds an FLR), with a consolidated tool -> score -> cutoff table; - the unified decoy-amino-acid global FLR workflow (--add-decoys + python -m onsite.decoy_flr), which was undocumented; - the new LucXor --seed option. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@README.md`:
- Line 239: The description for PhosphoRS_site_delta currently uses a hyphenated
phrase "best-alternative isoform"; update that text to read "best alternative
isoform" (no hyphen) in the PhosphoRS_site_delta definition so it becomes:
PhosphoRS_site_delta: {position: Δ} — the −10·log10 P gap between the best and
best alternative isoform (rank1 − rank2). Used to rank a global FLR because,
unlike the probability, it does not saturate at 100%.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0dd682be-af34-4550-9506-836b542566f7

📥 Commits

Reviewing files that changed from the base of the PR and between b1f69b2 and b47bce6.

📒 Files selected for processing (1)

README.md

coderabbitai · 2026-06-01T05:48:02Z

- Isomer details with sequence and score
- Detailed confidence metrics
+- `PhosphoRS_site_probs`: `{position: probability}` on a **0–100% scale** (**higher = more confident**) — the classic phosphoRS site probability.
+- `PhosphoRS_site_delta`: `{position: Δ}` — the `−10·log10 P` gap between the best and best-alternative isoform (rank1 − rank2). Used to rank a global FLR because, unlike the probability, it does not saturate at 100%.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use “best alternative” (no hyphen) for clarity.

“best-alternative” reads like a typo here; “best alternative isoform” is clearer and avoids grammar-tool warnings.

🧰 Tools

🪛 LanguageTool

[grammar] ~239-~239: Ensure spelling is correct
Context: ... the best and best-alternative isoform (rank1 − rank2). Used to rank a global FLR bec...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~239-~239: Ensure spelling is correct
Context: ...t and best-alternative isoform (rank1 − rank2). Used to rank a global FLR because, un...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@README.md` at line 239, The description for PhosphoRS_site_delta currently uses a hyphenated phrase "best-alternative isoform"; update that text to read "best alternative isoform" (no hyphen) in the PhosphoRS_site_delta definition so it becomes: PhosphoRS_site_delta: {position: Δ} — the −10·log10 P gap between the best and best alternative isoform (rank1 − rank2). Used to rank a global FLR because, unlike the probability, it does not saturate at 100%.

timosachsenberg and others added 4 commits May 31, 2026 12:54

coderabbitai Bot reviewed May 31, 2026

View reviewed changes

timosachsenberg and others added 2 commits May 31, 2026 14:46

timosachsenberg mentioned this pull request May 31, 2026

phosphoRS: faithful dynamic per-window peak-depth optimization (fidelity, #40) #42

Merged

timosachsenberg and others added 2 commits May 31, 2026 15:40

coderabbitai Bot reviewed May 31, 2026

View reviewed changes

Comment thread onsite/phosphors/phosphors.py

timosachsenberg force-pushed the fix/decoy-flr-scoring-bugs branch from 4553f51 to 13b4068 Compare May 31, 2026 14:25

timosachsenberg and others added 2 commits May 31, 2026 16:48

coderabbitai Bot reviewed May 31, 2026

View reviewed changes

Comment thread onsite/phosphors/cli.py Outdated

Comment thread onsite/phosphors/cli.py

timosachsenberg and others added 2 commits May 31, 2026 17:35

coderabbitai Bot reviewed May 31, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 1, 2026

View reviewed changes

timosachsenberg requested a review from weizhongchun June 1, 2026 06:16

Conversation

timosachsenberg commented May 31, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in here (11 commits)

Realized 3-way comparison (data/1.mzML, 3697 PSMs → 2470 shared at q ≤ 0.01 → 1258 ambiguous)

Tests

Summary by CodeRabbit

Uh oh!

qodo-code-review Bot commented May 31, 2026

Qodo reviews are paused for this user.

Uh oh!

coderabbitai Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

codacy-production Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Not up to standards ⛔

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Algorithm Comparison Test Results

Uh oh!

timosachsenberg commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Realized 3-way comparison (the deliverable)

Sites recovered at matched global FLR

Inter-method agreement (same 1258 analyzed PSMs)

Why the curves differ

Takeaways

Reproduce (incl. the curve plot)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

timosachsenberg commented May 31, 2026 •

edited by coderabbitai Bot

Loading

Realized 3-way comparison (`data/1.mzML`, 3697 PSMs → 2470 shared at q ≤ 0.01 → 1258 ambiguous)

coderabbitai Bot commented May 31, 2026 •

edited

Loading

codacy-production Bot commented May 31, 2026 •

edited

Loading

github-actions Bot commented May 31, 2026 •

edited

Loading

timosachsenberg commented May 31, 2026 •

edited

Loading