diag(i5): score_psm trace findings + diff harness (PXD001819, no production code change)#37
Conversation
Research-only PR to identify the dominant root cause of the per-PSM scoring divergence between Rust and Java (Rust 14 vs Java 38 RawScore on the same spectrum+peptide, per 2026-05-20 finding). Three hypotheses to triage: - H1: per-partition ion-type list differs - H2: peak rank assignment differs - H3: per-rank log-probability tables differ Deliverable: written analysis on 5 PSMs from PXD001819 + proposed fix design. No production code changes; only `msgf-trace` diagnostic extensions and a Python diff harness. Java instrumentation lives on the bench VM in a separate java-legacy worktree; not committed to this repo. Path to the +5%/dataset PSM goal goes through this investigation.
Five-task plan to execute the spec at f943aa7. Tasks: (1) extend msgf-trace with --trace-json; (2) Python diff harness; (3) bench-VM Java instrumentation on java-legacy clone (out-of-repo); (4) analysis doc + 5-PSM trace artifacts + .gitignore allowlist; (5) push + open research PR. No production code changes; only diagnostic-binary extensions and research artifacts.
Adds a structured output mode to the diagnostic trace binary so its per-split breakdown can be diffed against Java's instrumentation output in a downstream Python harness. JSON is written by hand (no new serde dep) since the per-PSM volume is small (~5-10 KB). The existing human-readable stderr output is unaffected. This is the Rust-side instrumentation for I5 (score_psm trace investigation). No production code change: msgf-trace is a separate binary from msgf-rust.
Aligns msgf-trace --trace-json output against java-legacy instrumented TRACE lines by (ion_kind, theo_mz) with 1e-3 Da tolerance. Emits per-PSM side-by-side per-ion rows with RANK_DIFF / LOGPROB_DIFF / CONTRIB_DIFF / RUST_ONLY / JAVA_ONLY flags + per-PSM total contribution deltas + aggregate divergence counts. stdlib-only Python 3 (no new deps). Tested on synthetic input covering all five divergence categories. Part of I5 score_psm trace investigation. Together with Task 1's --trace-json output, this provides the diff infrastructure; Task 3 (out-of-repo) instruments Java to produce the TRACE lines.
Identifies a multi-causal Rust vs Java per-PSM scoring divergence on PXD001819 label-flip PSMs. Methodology + artifacts + proposed fix design (no code in this PR; fix lands separately). Top-line: 754 matched ion comparisons across 10 traced PSMs show - LOGPROB_DIFF 608 (81%): per-rank log-probability lookup values diverge between Rust and Java - RANK_DIFF 301 (40%): peak rank assignment differs - RUST_ONLY 73: ion types Rust enumerates but Java does not Disentanglement: 301 of 608 LOGPROB_DIFF cases coincide with RANK_DIFF (downstream of peak-rank divergence); the remaining 307 are pure table-value differences. Roughly 40/40/10 split across H2/H3/H1. Smoking gun: Rust scores Java's favored peptide within +-13 of Java's per-ion sum, but OVER-scores its own picks by +5 to +20. The asymmetry flips top-1 ranking. Proposed fix direction: target H2 (rank assignment) first; live in crates/scoring/src/scoring/scored_spectrum.rs:setRanksOfPeaks + nearest_peak_rank. Fixing H2 likely closes the rank-driven share of H3. Bench gate: PXD001819 auto @1% FDR >= +200 PSMs. Artifacts: - 5 Rust JSON traces (one PSM array per scan, ~22KB each) - 5 Java gzipped per-scan TRACE logs (~200KB each, 90,926 raw TRACE lines total) - aggregate-analysis.txt: per-PSM totals + divergence counts - analyze.py: the analysis script (runnable from artifacts dir) Java instrumentation lives on bench VM /srv/data/msgf-bench/java-legacy-trace/ (NewScoredSpectrum.getNodeScore patched with -Dmsgf.trace.scans=<csv> gated System.err.println). Out of scope: fix implementation; next PR after this.
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
Research-only PR. Localizes the Rust vs Java per-PSM scoring divergence on PXD001819 via instrumented traces on both sides, with no production code change. Sets up the next PR's fix.
Top-line finding
Across 754 matched ion comparisons on 10 traced PSMs (5 label-flip scans, both Rust top-1 + Java top-1 peptides scored on each):
Disentanglement: 301 of the 608 LOGPROB_DIFF cases coincide with RANK_DIFF (downstream effect of rank-assignment divergence). The remaining 307 are pure table-value differences. Rough split across the 3 hypotheses: 40% H2 (rank), 40% H3 (log-prob table), 10% H1 (ion enumeration). Not a single dominant cause.
Smoking gun
Rust's scoring is generous to weaker peptides while only slightly stingy on Java's picks — net asymmetry of ~14 RawScore points per scan, enough to flip top-1 across thousands of candidates.
Proposed fix (next PR)
Target H2 (rank assignment) first. Lives in
crates/scoring/src/scoring/scored_spectrum.rs(setRanksOfPeaks+nearest_peak_rank). Common culprits per analysis:Fixing H2 likely closes the rank-driven share of H3 (301 of 608 LOGPROB_DIFFs disappear). Bench gate for the fix: PXD001819 auto @1% FDR ≥ +200 PSMs (14,755 → 15,000).
What this PR contains
crates/msgf-rust/src/bin/msgf-trace.rs— extended with--trace-jsonfor per-PSM per-ion structured output (additive; existing stderr trace unchanged)benchmark/ci/diff_score_psm_traces.py— Python diff harness (stdlib-only)docs/parity-analysis/notes/2026-05-26-score-psm-trace-findings.md— analysis (~250 lines with methodology, per-PSM tables, proposed fix design)docs/parity-analysis/notes/score-psm-trace-artifacts/— reproducibility artifacts:aggregate-analysis.txt— per-PSM totals + divergence countsanalyze.py— analysis script runnable from artifacts dirWhat this PR does NOT contain
msgf-traceis a separate binary;msgf-rustis untouched)/srv/data/msgf-bench/java-legacy-trace/)Verification
cargo clippy --workspace --all-targetscleancd <artifacts-dir>; python3 analyze.pymatches the divergence counts in the docReproducibility (full pipeline, ~30 min)
cargo build --release --bin msgf-traceNewScoredSpectrum.getNodeScore;mvn package-Dmsgf.trace.scans=41522,34685,23272,23082,16629--scan N --java-top1 <pep> --trace-json out.jsonfor each scanpython3 analyze.pyDetails + exact commands in the analysis doc.