diag(i5): score_psm trace findings + diff harness (PXD001819, no production code change) by ypriverol · Pull Request #37 · bigbio/msgf-rust

ypriverol · 2026-05-26T19:21:48Z

Summary

Research-only PR. Localizes the Rust vs Java per-PSM scoring divergence on PXD001819 via instrumented traces on both sides, with no production code change. Sets up the next PR's fix.

Top-line finding

Across 754 matched ion comparisons on 10 traced PSMs (5 label-flip scans, both Rust top-1 + Java top-1 peptides scored on each):

Divergence	Count	% of matched
LOGPROB_DIFF (per-rank log-prob value differs)	608	81%
RANK_DIFF (rank assignment differs)	301	40%
RUST_ONLY (Rust enumerates ion Java doesn't)	73	(additional)

Disentanglement: 301 of the 608 LOGPROB_DIFF cases coincide with RANK_DIFF (downstream effect of rank-assignment divergence). The remaining 307 are pure table-value differences. Rough split across the 3 hypotheses: 40% H2 (rank), 40% H3 (log-prob table), 10% H1 (ion enumeration). Not a single dominant cause.

Smoking gun

Peptide direction	Rust score	Java score	Δ
Java's favored peptides (5 scans avg)	116.7	120.2	−3.5 (Rust under-scores by ~13 max)
Rust's picks (5 scans avg)	19.8	9.1	+10.7 (Rust over-scores by +5 to +20)

Rust's scoring is generous to weaker peptides while only slightly stingy on Java's picks — net asymmetry of ~14 RawScore points per scan, enough to flip top-1 across thousands of candidates.

Proposed fix (next PR)

Target H2 (rank assignment) first. Lives in crates/scoring/src/scoring/scored_spectrum.rs (setRanksOfPeaks + nearest_peak_rank). Common culprits per analysis:

Tie-break on equal-intensity peaks (Java: first-peak; Rust: intensity-max within tolerance)
Precursor-filter mask interaction with ranks
Rank reassignment after filter

Fixing H2 likely closes the rank-driven share of H3 (301 of 608 LOGPROB_DIFFs disappear). Bench gate for the fix: PXD001819 auto @1% FDR ≥ +200 PSMs (14,755 → 15,000).

What this PR contains

crates/msgf-rust/src/bin/msgf-trace.rs — extended with --trace-json for per-PSM per-ion structured output (additive; existing stderr trace unchanged)
benchmark/ci/diff_score_psm_traces.py — Python diff harness (stdlib-only)
docs/parity-analysis/notes/2026-05-26-score-psm-trace-findings.md — analysis (~250 lines with methodology, per-PSM tables, proposed fix design)
docs/parity-analysis/notes/score-psm-trace-artifacts/ — reproducibility artifacts:
- 5 Rust JSON traces
- 5 gzipped Java per-scan TRACE logs (90,926 raw TRACE lines total)
- aggregate-analysis.txt — per-PSM totals + divergence counts
- analyze.py — analysis script runnable from artifacts dir

What this PR does NOT contain

The fix itself (next PR)
Production code changes (msgf-trace is a separate binary; msgf-rust is untouched)
Java repo changes (instrumentation lives on bench VM /srv/data/msgf-bench/java-legacy-trace/)
Datasets other than PXD001819

Verification

cargo clippy --workspace --all-targets clean
Workspace tests green under existing CI skip list
No production code changed (bit-identical regression gate trivially passes)
Analysis reproducible: cd <artifacts-dir>; python3 analyze.py matches the divergence counts in the doc
CodeRabbit review pass
CI matrix green

Reproducibility (full pipeline, ~30 min)

Build msgf-trace: cargo build --release --bin msgf-trace
Clone java-legacy on bench VM; patch NewScoredSpectrum.getNodeScore; mvn package
Run Java with -Dmsgf.trace.scans=41522,34685,23272,23082,16629
Run msgf-trace --scan N --java-top1 <pep> --trace-json out.json for each scan
python3 analyze.py

Details + exact commands in the analysis doc.

Research-only PR to identify the dominant root cause of the per-PSM scoring divergence between Rust and Java (Rust 14 vs Java 38 RawScore on the same spectrum+peptide, per 2026-05-20 finding). Three hypotheses to triage: - H1: per-partition ion-type list differs - H2: peak rank assignment differs - H3: per-rank log-probability tables differ Deliverable: written analysis on 5 PSMs from PXD001819 + proposed fix design. No production code changes; only `msgf-trace` diagnostic extensions and a Python diff harness. Java instrumentation lives on the bench VM in a separate java-legacy worktree; not committed to this repo. Path to the +5%/dataset PSM goal goes through this investigation.

Five-task plan to execute the spec at f943aa7. Tasks: (1) extend msgf-trace with --trace-json; (2) Python diff harness; (3) bench-VM Java instrumentation on java-legacy clone (out-of-repo); (4) analysis doc + 5-PSM trace artifacts + .gitignore allowlist; (5) push + open research PR. No production code changes; only diagnostic-binary extensions and research artifacts.

Adds a structured output mode to the diagnostic trace binary so its per-split breakdown can be diffed against Java's instrumentation output in a downstream Python harness. JSON is written by hand (no new serde dep) since the per-PSM volume is small (~5-10 KB). The existing human-readable stderr output is unaffected. This is the Rust-side instrumentation for I5 (score_psm trace investigation). No production code change: msgf-trace is a separate binary from msgf-rust.

Aligns msgf-trace --trace-json output against java-legacy instrumented TRACE lines by (ion_kind, theo_mz) with 1e-3 Da tolerance. Emits per-PSM side-by-side per-ion rows with RANK_DIFF / LOGPROB_DIFF / CONTRIB_DIFF / RUST_ONLY / JAVA_ONLY flags + per-PSM total contribution deltas + aggregate divergence counts. stdlib-only Python 3 (no new deps). Tested on synthetic input covering all five divergence categories. Part of I5 score_psm trace investigation. Together with Task 1's --trace-json output, this provides the diff infrastructure; Task 3 (out-of-repo) instruments Java to produce the TRACE lines.

@1

Identifies a multi-causal Rust vs Java per-PSM scoring divergence on PXD001819 label-flip PSMs. Methodology + artifacts + proposed fix design (no code in this PR; fix lands separately). Top-line: 754 matched ion comparisons across 10 traced PSMs show - LOGPROB_DIFF 608 (81%): per-rank log-probability lookup values diverge between Rust and Java - RANK_DIFF 301 (40%): peak rank assignment differs - RUST_ONLY 73: ion types Rust enumerates but Java does not Disentanglement: 301 of 608 LOGPROB_DIFF cases coincide with RANK_DIFF (downstream of peak-rank divergence); the remaining 307 are pure table-value differences. Roughly 40/40/10 split across H2/H3/H1. Smoking gun: Rust scores Java's favored peptide within +-13 of Java's per-ion sum, but OVER-scores its own picks by +5 to +20. The asymmetry flips top-1 ranking. Proposed fix direction: target H2 (rank assignment) first; live in crates/scoring/src/scoring/scored_spectrum.rs:setRanksOfPeaks + nearest_peak_rank. Fixing H2 likely closes the rank-driven share of H3. Bench gate: PXD001819 auto @1% FDR >= +200 PSMs. Artifacts: - 5 Rust JSON traces (one PSM array per scan, ~22KB each) - 5 Java gzipped per-scan TRACE logs (~200KB each, 90,926 raw TRACE lines total) - aggregate-analysis.txt: per-PSM totals + divergence counts - analyze.py: the analysis script (runnable from artifacts dir) Java instrumentation lives on bench VM /srv/data/msgf-bench/java-legacy-trace/ (NewScoredSpectrum.getNodeScore patched with -Dmsgf.trace.scans=<csv> gated System.err.println). Out of scope: fix implementation; next PR after this.

qodo-code-review · 2026-05-26T19:21:52Z

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

coderabbitai · 2026-05-26T19:21:55Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9ec5f3a2-078a-42ce-bc44-a15675272fd5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/i5-score-psm-trace

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ypriverol added 5 commits May 26, 2026 19:15

ypriverol mentioned this pull request May 27, 2026

perf+chore: V1 speed milestone — Rust beats Java on 5 of 6 cells (PSM-zero-regression) #36

Merged

ypriverol merged commit ff68e5e into dev May 27, 2026
5 checks passed

ypriverol deleted the feat/i5-score-psm-trace branch May 27, 2026 15:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

diag(i5): score_psm trace findings + diff harness (PXD001819, no production code change)#37

diag(i5): score_psm trace findings + diff harness (PXD001819, no production code change)#37
ypriverol merged 5 commits into
devfrom
feat/i5-score-psm-trace

ypriverol commented May 26, 2026

Uh oh!

qodo-code-review Bot commented May 26, 2026

Uh oh!

coderabbitai Bot commented May 26, 2026

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ypriverol commented May 26, 2026

Summary

Top-line finding

Smoking gun

Proposed fix (next PR)

What this PR contains

What this PR does NOT contain

Verification

Reproducibility (full pipeline, ~30 min)

Uh oh!

qodo-code-review Bot commented May 26, 2026

Qodo reviews are paused for this user.

Uh oh!

coderabbitai Bot commented May 26, 2026

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant