Skip to content

diag(i5): score_psm trace findings + diff harness (PXD001819, no production code change)#37

Merged
ypriverol merged 5 commits into
devfrom
feat/i5-score-psm-trace
May 27, 2026
Merged

diag(i5): score_psm trace findings + diff harness (PXD001819, no production code change)#37
ypriverol merged 5 commits into
devfrom
feat/i5-score-psm-trace

Conversation

@ypriverol
Copy link
Copy Markdown
Member

Summary

Research-only PR. Localizes the Rust vs Java per-PSM scoring divergence on PXD001819 via instrumented traces on both sides, with no production code change. Sets up the next PR's fix.

Top-line finding

Across 754 matched ion comparisons on 10 traced PSMs (5 label-flip scans, both Rust top-1 + Java top-1 peptides scored on each):

Divergence Count % of matched
LOGPROB_DIFF (per-rank log-prob value differs) 608 81%
RANK_DIFF (rank assignment differs) 301 40%
RUST_ONLY (Rust enumerates ion Java doesn't) 73 (additional)

Disentanglement: 301 of the 608 LOGPROB_DIFF cases coincide with RANK_DIFF (downstream effect of rank-assignment divergence). The remaining 307 are pure table-value differences. Rough split across the 3 hypotheses: 40% H2 (rank), 40% H3 (log-prob table), 10% H1 (ion enumeration). Not a single dominant cause.

Smoking gun

Peptide direction Rust score Java score Δ
Java's favored peptides (5 scans avg) 116.7 120.2 −3.5 (Rust under-scores by ~13 max)
Rust's picks (5 scans avg) 19.8 9.1 +10.7 (Rust over-scores by +5 to +20)

Rust's scoring is generous to weaker peptides while only slightly stingy on Java's picks — net asymmetry of ~14 RawScore points per scan, enough to flip top-1 across thousands of candidates.

Proposed fix (next PR)

Target H2 (rank assignment) first. Lives in crates/scoring/src/scoring/scored_spectrum.rs (setRanksOfPeaks + nearest_peak_rank). Common culprits per analysis:

  • Tie-break on equal-intensity peaks (Java: first-peak; Rust: intensity-max within tolerance)
  • Precursor-filter mask interaction with ranks
  • Rank reassignment after filter

Fixing H2 likely closes the rank-driven share of H3 (301 of 608 LOGPROB_DIFFs disappear). Bench gate for the fix: PXD001819 auto @1% FDR ≥ +200 PSMs (14,755 → 15,000).

What this PR contains

  • crates/msgf-rust/src/bin/msgf-trace.rs — extended with --trace-json for per-PSM per-ion structured output (additive; existing stderr trace unchanged)
  • benchmark/ci/diff_score_psm_traces.py — Python diff harness (stdlib-only)
  • docs/parity-analysis/notes/2026-05-26-score-psm-trace-findings.md — analysis (~250 lines with methodology, per-PSM tables, proposed fix design)
  • docs/parity-analysis/notes/score-psm-trace-artifacts/ — reproducibility artifacts:
    • 5 Rust JSON traces
    • 5 gzipped Java per-scan TRACE logs (90,926 raw TRACE lines total)
    • aggregate-analysis.txt — per-PSM totals + divergence counts
    • analyze.py — analysis script runnable from artifacts dir

What this PR does NOT contain

  • The fix itself (next PR)
  • Production code changes (msgf-trace is a separate binary; msgf-rust is untouched)
  • Java repo changes (instrumentation lives on bench VM /srv/data/msgf-bench/java-legacy-trace/)
  • Datasets other than PXD001819

Verification

  • cargo clippy --workspace --all-targets clean
  • Workspace tests green under existing CI skip list
  • No production code changed (bit-identical regression gate trivially passes)
  • Analysis reproducible: cd <artifacts-dir>; python3 analyze.py matches the divergence counts in the doc
  • CodeRabbit review pass
  • CI matrix green

Reproducibility (full pipeline, ~30 min)

  1. Build msgf-trace: cargo build --release --bin msgf-trace
  2. Clone java-legacy on bench VM; patch NewScoredSpectrum.getNodeScore; mvn package
  3. Run Java with -Dmsgf.trace.scans=41522,34685,23272,23082,16629
  4. Run msgf-trace --scan N --java-top1 <pep> --trace-json out.json for each scan
  5. python3 analyze.py

Details + exact commands in the analysis doc.

ypriverol added 5 commits May 26, 2026 19:15
Research-only PR to identify the dominant root cause of the per-PSM
scoring divergence between Rust and Java (Rust 14 vs Java 38 RawScore
on the same spectrum+peptide, per 2026-05-20 finding).

Three hypotheses to triage:
- H1: per-partition ion-type list differs
- H2: peak rank assignment differs
- H3: per-rank log-probability tables differ

Deliverable: written analysis on 5 PSMs from PXD001819 + proposed
fix design. No production code changes; only `msgf-trace` diagnostic
extensions and a Python diff harness.

Java instrumentation lives on the bench VM in a separate java-legacy
worktree; not committed to this repo.

Path to the +5%/dataset PSM goal goes through this investigation.
Five-task plan to execute the spec at f943aa7.

Tasks: (1) extend msgf-trace with --trace-json; (2) Python diff
harness; (3) bench-VM Java instrumentation on java-legacy clone
(out-of-repo); (4) analysis doc + 5-PSM trace artifacts +
.gitignore allowlist; (5) push + open research PR.

No production code changes; only diagnostic-binary extensions and
research artifacts.
Adds a structured output mode to the diagnostic trace binary so its
per-split breakdown can be diffed against Java's instrumentation
output in a downstream Python harness. JSON is written by hand
(no new serde dep) since the per-PSM volume is small (~5-10 KB).
The existing human-readable stderr output is unaffected.

This is the Rust-side instrumentation for I5 (score_psm trace
investigation). No production code change: msgf-trace is a separate
binary from msgf-rust.
Aligns msgf-trace --trace-json output against java-legacy instrumented
TRACE lines by (ion_kind, theo_mz) with 1e-3 Da tolerance. Emits
per-PSM side-by-side per-ion rows with RANK_DIFF / LOGPROB_DIFF /
CONTRIB_DIFF / RUST_ONLY / JAVA_ONLY flags + per-PSM total
contribution deltas + aggregate divergence counts.

stdlib-only Python 3 (no new deps). Tested on synthetic input
covering all five divergence categories.

Part of I5 score_psm trace investigation. Together with Task 1's
--trace-json output, this provides the diff infrastructure; Task 3
(out-of-repo) instruments Java to produce the TRACE lines.
Identifies a multi-causal Rust vs Java per-PSM scoring divergence
on PXD001819 label-flip PSMs. Methodology + artifacts + proposed
fix design (no code in this PR; fix lands separately).

Top-line: 754 matched ion comparisons across 10 traced PSMs show
- LOGPROB_DIFF 608 (81%): per-rank log-probability lookup values
  diverge between Rust and Java
- RANK_DIFF 301 (40%): peak rank assignment differs
- RUST_ONLY 73: ion types Rust enumerates but Java does not

Disentanglement: 301 of 608 LOGPROB_DIFF cases coincide with
RANK_DIFF (downstream of peak-rank divergence); the remaining
307 are pure table-value differences. Roughly 40/40/10 split
across H2/H3/H1.

Smoking gun: Rust scores Java's favored peptide within +-13 of
Java's per-ion sum, but OVER-scores its own picks by +5 to +20.
The asymmetry flips top-1 ranking.

Proposed fix direction: target H2 (rank assignment) first; live
in crates/scoring/src/scoring/scored_spectrum.rs:setRanksOfPeaks
+ nearest_peak_rank. Fixing H2 likely closes the rank-driven
share of H3. Bench gate: PXD001819 auto @1% FDR >= +200 PSMs.

Artifacts:
- 5 Rust JSON traces (one PSM array per scan, ~22KB each)
- 5 Java gzipped per-scan TRACE logs (~200KB each, 90,926 raw
  TRACE lines total)
- aggregate-analysis.txt: per-PSM totals + divergence counts
- analyze.py: the analysis script (runnable from artifacts dir)

Java instrumentation lives on bench VM /srv/data/msgf-bench/java-legacy-trace/
(NewScoredSpectrum.getNodeScore patched with -Dmsgf.trace.scans=<csv>
gated System.err.println).

Out of scope: fix implementation; next PR after this.
@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9ec5f3a2-078a-42ce-bc44-a15675272fd5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/i5-score-psm-trace

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ypriverol ypriverol merged commit ff68e5e into dev May 27, 2026
5 checks passed
@ypriverol ypriverol deleted the feat/i5-score-psm-trace branch May 27, 2026 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant