Conversation
- BUG_REVIEW.md was a transient artifact from the post-merge bug-hunt review; the actual fixes shipped in PR #32 and are documented in the PR description / commit history. - CLI_MIGRATION.md is an audience-specific guide (Java MS-GF+ users porting CLI invocations) — belongs under docs/, not at root. - DOCS.md stays at root as the primary single-file reference (per the iter39 docs-rewrite design). - Updated inbound references in README.md and DOCS.md.
Adds the design spec for PR-Q1 (post-cutover code quality sweep) and finalizes the inbound-reference updates from the prior commit (docs/CLI_MIGRATION.md links) that weren't staged at that point. PR-Q1 is the first of three sequential sub-projects (quality -> speed -> ID-rate +5% per dataset). Decomposed during the 2026-05-26 brainstorm because the three concerns differ in risk, scope, and time profile; the ID-rate target is a multi-PR research project, not a single ship gate. Scope: 7 groups (6 in-PR + 1 out-of-repo memory update). Dangling .java:LINE refs (42), stale "port of MS-GF+" framing, identifier renames (MSGFRUST_RSS_PROBE etc.), 26 clippy warnings, lift CI lint to required, remove shipped design specs.
The Java source tree was removed in commit b4565b8 during the Rust-cutover; the inline citations to specific Java line numbers now point at code that does not exist in this repo. Replace each citation with intent-only "Java parity" comments. Preserves semantic meaning; removes the broken hyperlinks. Parity-test files (tests/*_java_parity.rs, tests/gf_bsa_parity.rs, tests/*_match_java.rs) untouched — their identity is Java parity and the citations are load-bearing documentation. 8 non-test files touched, 33 refs replaced, 0 functional changes.
The codebase is post-cutover; new contributors should read crate-lib top-of-file doc comments as descriptions of what each crate does, not as port-bookkeeping. CLI --help and enum doc comments that compared behavior to Java's command-line options now describe behavior directly. KEEPS user-facing provenance: - README.md and DOCS.md project-lineage sections - Legacy numeric flag values (Java MS-GF+ -X) in --help (user migration) - (Java -precursorCal) in precursor_cal doc (exact flag name we mirror) - docs/parity-analysis/** content - Parity test files Touched: 5 crate-lib headers + msgf-rust CLI framing.
The "MSGFRUST_" prefix dates from an early iter-era naming and does not match the binary's identity (msgf-rust). Switch the primary name to MSGF_RSS_PROBE and accept the legacy name for this release with a one-line deprecation warning on stderr. The legacy name will be removed in the next quality cleanup. Side-effect-only env var; no functional change to search/scoring.
Brings the workspace to clippy-clean on stable 1.87.0 so the CI lint job can be lifted from advisory to required in the next commit. Changes by class: - map_or simplifications: mechanical rewrite via clippy --fix - complex-type aliases: SegmentPartitionCache, SegmentPartitionSlice, DeconvResult, and RankKeptCtx struct in crates/scoring/src/scoring/scored_spectrum.rs - too_many_arguments: RankKeptCtx context struct in scored_spectrum.rs; #[allow] with reason for directional_node_score_inner, write_tsv, write_tsv_to, write_spectrum_rows, and compute_cleavage_credit - doc-list indentation: add blank line after list / fix continuation indent at 15 sites in msgf-rust.rs and scored_spectrum.rs - unused_mut, ? rewrite, manual split_once, loop-counter: via clippy --fix - needless_range_loop: suppressed with reason (seg indexes cache AND serves as fallback partition_for arg) No functional behavior change; PIN/TSV bit-identical regression gate in tree (precursor_cal_bit_identical) is the verification.
After PR-Q1 Task 4 left the workspace clippy-clean on --lib targets, remove continue-on-error from the lint job's clippy step and extend the lint command to --all-targets (covers tests + examples + bin in addition to lib). Also addresses 5 residual warnings in test/example targets that the --lib-only fix in Task 4 didn't reach: - crates/scoring/examples/dump_main_ion.rs: struct field shorthand - crates/scoring/examples/dump_prefix_cache.rs: needless_range_loop - crates/scoring/tests/add_prob_dist_chunked_parity.rs: unnecessary parens Rustfmt remains advisory (~11k lines of fmt churn pending; separate cleanup). Lint job now blocks PRs on clippy regressions.
Deletes the iter39 docs-rewrite spec and plan (shipped 2026-05-23 via PR #30; the rewrite is in dev and being relied on, so the design docs no longer need to be discoverable in the repo). Their lineage is in git history. Tracks the in-flight PR-Q1 implementation plan alongside its design spec (committed in 55cff3f). Future protocol: when a docs/specs design file references a feature that has fully shipped and closed any deferred gate, remove it in the next quality cleanup.
Three non-blocking observations from the final code review: 1. DOCS.md §97 documented only the legacy MSGFRUST_RSS_PROBE name. Now mentions MSGF_RSS_PROBE as the canonical with the legacy noted. 2. crates/model/src/amino_acid.rs:13 inline comment referenced the legacy name; updated to MSGF_RSS_PROBE. 3. log_rss deprecation warning fired on every call when only the legacy env var was set. Guard with std::sync::Once so it prints exactly once per process invocation. All non-functional; verification: deprecation warning count is now 1 under MSGFRUST_RSS_PROBE=1 + multiple log_rss checkpoints.
After PR #35 (PR-Q1) closed unmerged for not delivering measurable wins, pivot strategy: stack 3 loosely-coupled sub-features on top of the cleanup commits and ship ONE PR with bench-gated value. Sub-features: - S1: profile-guided Astral wall reduction (gate: -5% wall) - S2: LFQ calibrator threshold fallback 1e-6 -> 1e-5 (gate: +50 PSMs) - S3: additive PrecursorErrorPpmSquared PIN column (gate: +50 PSMs on any one dataset) Each sub-feature ships only if its bench gate passes; failures get dropped before merge.
Profile (perf record on Astral cal=off, 285K samples) identified 36.82% of CPU in HashMap<Partition, ...> lookups using SipHash13. The hot scoring path (compute_inner, directional_node_score_inner, rank_scorer::error_score) repeatedly looks up Partition keys for rank_dist_table, frag_off_table, ion_existence_table, etc. Switch the 7 Param hot tables to FxHashMap (rustc-hash). SipHash13's state-init + 13-round mix is unnecessary for non-cryptographic keys on a single-process search; FxHasher is a single multiply-and-xor. Same .get/.insert API; only iteration order differs (and no hot path iterates these tables). Expected: ~25-30% reduction in match_spectra wall on Astral cal=off. Bench gate (S1) requires >= 5% Astral wall reduction to ship.
If the strict SpecEValue threshold (1e-6) does not qualify MIN_CONFIDENT_PSMS (200) in the cal pre-pass, retry once at 1e-5 before giving up. Preserves Java parity on datasets where 1e-6 already succeeds (Astral, TMT); recovers LFQ-shaped distributions where Rust's SpecE-tail drift leaves the cal pre-pass a few PSMs short (LFQ ships at 193/200 in PR-A). Median-of-residuals + MAD-based robust sigma are robust to one decade of noisier outliers; the fallback is a one-shot retry, not a baseline threshold change. Bench gate (S2): LFQ auto @1% FDR >= 14,805 (baseline 14,755 + 50).
This reverts commit 67002e4.
Research-only PR to identify the dominant root cause of the per-PSM scoring divergence between Rust and Java (Rust 14 vs Java 38 RawScore on the same spectrum+peptide, per 2026-05-20 finding). Three hypotheses to triage: - H1: per-partition ion-type list differs - H2: peak rank assignment differs - H3: per-rank log-probability tables differ Deliverable: written analysis on 5 PSMs from PXD001819 + proposed fix design. No production code changes; only `msgf-trace` diagnostic extensions and a Python diff harness. Java instrumentation lives on the bench VM in a separate java-legacy worktree; not committed to this repo. Path to the +5%/dataset PSM goal goes through this investigation.
Five-task plan to execute the spec at f943aa7. Tasks: (1) extend msgf-trace with --trace-json; (2) Python diff harness; (3) bench-VM Java instrumentation on java-legacy clone (out-of-repo); (4) analysis doc + 5-PSM trace artifacts + .gitignore allowlist; (5) push + open research PR. No production code changes; only diagnostic-binary extensions and research artifacts.
Adds a structured output mode to the diagnostic trace binary so its per-split breakdown can be diffed against Java's instrumentation output in a downstream Python harness. JSON is written by hand (no new serde dep) since the per-PSM volume is small (~5-10 KB). The existing human-readable stderr output is unaffected. This is the Rust-side instrumentation for I5 (score_psm trace investigation). No production code change: msgf-trace is a separate binary from msgf-rust.
Aligns msgf-trace --trace-json output against java-legacy instrumented TRACE lines by (ion_kind, theo_mz) with 1e-3 Da tolerance. Emits per-PSM side-by-side per-ion rows with RANK_DIFF / LOGPROB_DIFF / CONTRIB_DIFF / RUST_ONLY / JAVA_ONLY flags + per-PSM total contribution deltas + aggregate divergence counts. stdlib-only Python 3 (no new deps). Tested on synthetic input covering all five divergence categories. Part of I5 score_psm trace investigation. Together with Task 1's --trace-json output, this provides the diff infrastructure; Task 3 (out-of-repo) instruments Java to produce the TRACE lines.
Identifies a multi-causal Rust vs Java per-PSM scoring divergence on PXD001819 label-flip PSMs. Methodology + artifacts + proposed fix design (no code in this PR; fix lands separately). Top-line: 754 matched ion comparisons across 10 traced PSMs show - LOGPROB_DIFF 608 (81%): per-rank log-probability lookup values diverge between Rust and Java - RANK_DIFF 301 (40%): peak rank assignment differs - RUST_ONLY 73: ion types Rust enumerates but Java does not Disentanglement: 301 of 608 LOGPROB_DIFF cases coincide with RANK_DIFF (downstream of peak-rank divergence); the remaining 307 are pure table-value differences. Roughly 40/40/10 split across H2/H3/H1. Smoking gun: Rust scores Java's favored peptide within +-13 of Java's per-ion sum, but OVER-scores its own picks by +5 to +20. The asymmetry flips top-1 ranking. Proposed fix direction: target H2 (rank assignment) first; live in crates/scoring/src/scoring/scored_spectrum.rs:setRanksOfPeaks + nearest_peak_rank. Fixing H2 likely closes the rank-driven share of H3. Bench gate: PXD001819 auto @1% FDR >= +200 PSMs. Artifacts: - 5 Rust JSON traces (one PSM array per scan, ~22KB each) - 5 Java gzipped per-scan TRACE logs (~200KB each, 90,926 raw TRACE lines total) - aggregate-analysis.txt: per-PSM totals + divergence counts - analyze.py: the analysis script (runnable from artifacts dir) Java instrumentation lives on bench VM /srv/data/msgf-bench/java-legacy-trace/ (NewScoredSpectrum.getNodeScore patched with -Dmsgf.trace.scans=<csv> gated System.err.println). Out of scope: fix implementation; next PR after this.
Target 1 (roundf elimination): the perf flamegraph (Astral cal=off, post-PR-V1 FxHashMap) shows 5.15% CPU in libc roundf called from f32::round() / f64::round() in scoring hot paths. Replace with the branchless `(x + 0.5.copysign(x)) as i32` idiom. Mathematically equivalent for finite non-edge values: - x > 0: (x + 0.5) truncated == round_half_away_from_zero - x < 0: (x - 0.5) truncated == round_half_away_from_zero - x = 0: 0 + 0 = 0 in both forms Skipped: match_engine.rs:256,257,679,680 (`(tol - 0.4999).round()`) which is a window-widening adjustment with a different semantic, not the round-to-nearest idiom. Target 2 (GF DP inner-loop): compute_inner is already tightly written; added a comment documenting why no structural change was made and noting the next opportunities (prev_idx caching alongside valid_edges, SIMD widening of the inner multiply loop). No functional behavior change; bit-identical PIN regression gate green.
… tightening" This reverts commit 9a6607a.
Re-profile on the PR-V1 binary (Astral cal=off, 245K samples) showed the FxHashMap swap in scoring::Param did NOT close the dominant hashing hotspot. 39.35% of CPU was still in the FnMut::call_mut chain, which traces to: expand_mod_combinations -> AminoAcidSet::variants_for -> HashMap<(u8, ModLocation), Vec<AminoAcid>>::get -> hashbrown find_inner with std::hash::random::RandomState (SipHash13) PR-V1 missed this map because it lives in the `model` crate, not `scoring`. The candidate enumeration calls variants_for once per peptide position per candidate; on Astral this dominates wall. This commit: - Adds rustc-hash as a model-crate dependency. - Switches AminoAcidSet::table and aa_lists_cache from HashMap to FxHashMap. Same hashbrown internals; FxHasher is a multiply-and-xor vs SipHash13's 13-round mix. No functional behavior change; PIN bit-identical regression gate green. Tests pass. Expected wall reduction on Astral cal=off: -10..-25%, depending on how much of the 39% chain comes from the SipHash vs the surrounding Arc clones and Vec extends.
…enum A perf trace on the PR-V1 binary (Astral cal=off, 245K samples) showed 12.63%+ of CPU under `to_vec<AminoAcid>` and `Arc<Modification>::clone` within the `expand_mod_combinations` -> `variants_for` chain. The candidate enumerator was cloning the full Anywhere-variants Vec for every interior position of every candidate peptide. For a length-L peptide, L-2 of L positions had no terminal merging to do (typically ~87% of positions on real peptides) and the clone was pure waste. Refactor: only the 1-2 terminal positions (pos 0 and pos L-1) build owned merged variants via a new `build_terminal_variants` helper. Interior positions reference the AminoAcidSet's Anywhere-variants slice directly. `expand_recursive`'s signature changes from `&[Vec<AminoAcid>]` to `&[&[AminoAcid]]`. No functional behavior change; bit-identical regression gate green. Expected wall reduction on Astral cal=off: -5..-15%, depending on how much of the 12.63% chain is the `to_vec` vs the surrounding recursion overhead.
Hot paths cross crate boundaries (search -> scoring -> model). Default release profile uses codegen-units=16 with no LTO, so LLVM cannot inline across crates and small leaf functions (AminoAcid::nominal_mass, Enzyme::is_cleavable, FxHasher::write_u32, etc.) stay as function calls across the hot search loop. LTO=fat + cgu=1 give LLVM a whole-binary view at link time so it can inline cross-crate, deduplicate similar generic instantiations, and pick better register allocations on hot paths. Build time goes up (~2-3x) and binary size grows slightly, but this is a release-time cost only. No `target-cpu=native` here -- the released binary must run on a baseline x86_64 (e.g. older bench VMs). Future bench-only builds can opt in with `RUSTFLAGS="-C target-cpu=native"`. The bit-identical regression gate (precursor_cal_off_pin_tsv_match_golden_after_sort) is green under the new profile -- LLVM's whole-program view does not change float semantics.
Project-level `.cargo/config.toml` sets `target-cpu=sandybridge` so LLVM can use AVX 256-bit f64 vector ops to auto-vectorize the chunked GF DP inner loop in `crates/scoring/src/gf/generating_function.rs`. The loop spends ~16% of leaf time on Astral, and the chunked 4-wide layout was already vectorization-friendly -- it just needed an enabled target. Sandy Bridge (Intel, Q1 2011) is the right baseline: - AVX is widely available (any x86 CPU from 2011+) - AVX 256-bit double ops vectorize the GF chunked loop - NO AVX2 (Haswell+) -- preserves portability to older hypervisors - NO FMA -- fused multiply-add changes intermediate rounding; we need bit-identity with Java's separate-mul-add path for the parity gate The default `x86-64` baseline is 22 years old (2003, SSE2 only). The hardware floor it targets is no longer realistic for the workloads this tool runs against. Users on pre-2011 hardware can override with their own RUSTFLAGS. Bench (iter5 vs S1b baseline, same VM, Java drift < 1.5%): | Dataset | Mode | S1b | iter5 | Δ | |------------|------|---------|---------|---------| | Astral | off | 5:32.53 | 5:26.38 | -1.8% | | Astral | auto | 6:19.09 | 5:52.15 | -7.1% | | LFQ | off | 0:46.78 | 0:42.71 | -8.7% | | LFQ | auto | 1:01.08 | 0:53.24 | -12.8% | | TMT | off | 2:31.29 | 2:07.36 | -15.8% | | TMT | auto | 2:50.50 | 2:19.40 | -18.3% | Rust now beats Java on Astral cal=off (5:26 vs 5:40, +4%) and TMT (by 20-27%). PINs sorted-row identical to S1b across all 6 dataset/mode cells; Percolator @1% PSM counts unchanged. The `.cargo/config.toml` entry is committed -- `.gitignore` updated to keep `.cargo/` private by default but allow `config.toml` through.
The `[build] rustflags = ["-C", "target-cpu=sandybridge"]` introduced in the previous commit was unconditional, which fails on aarch64 targets (macOS Apple Silicon, ARM Linux) because sandybridge is an x86-only LLVM target-cpu. Scope via `[target.'cfg(target_arch = "x86_64")']` so the x86_64 release still gets AVX 256-bit vectorization while aarch64 builds fall back to their NEON-by-default baseline. macOS-latest CI is now aarch64-only on GitHub Actions, so this unblocks the macos test job.
The per-node loop in `compute_inner` scans incoming edges twice: once to derive `cur_min_score`/`cur_max_score`, then once to accumulate probabilities into `cur_slice`. The first pass already computed every field the second pass needed (`prev_idx`, `prev_hdr`, `combined_score`, `aa_prob`) but threw all of it away, storing only the raw edge index `e`. The second pass redid the work — `node_index_for_mass` lookup, `arena.headers[prev_idx]` load, and the `combined_score` add — per edge in the hot loop. Replace `valid_edges: Vec<usize>` with `Vec<ValidEdge>` carrying the cached fields. `NodeSlice` is already `Copy` (16 bytes) so the storage cost is small relative to the saved arithmetic + indexed loads. Bit-identical regression gate green; full workspace tests pass under the standard CI skip list. No semantic change — purely caching.
…ute_inner" This reverts commit 9c10429.
Addresses three review observations on the V1 speed milestone PR: 1. `crates/search/src/candidate_gen.rs` — `use model::modification::ModLocation;` was inlined inside `expand_mod_combinations` and again inside `build_terminal_variants`. Lifted to module scope and removed both inline `use`s. 2. `crates/scoring/Cargo.toml` — `rustc-hash = "2"` was listed in both `[dependencies]` and `[dev-dependencies]`. Removed the dev-dep entry; the regular dependency is in scope for tests already. 3. `Cargo.toml` — clarified that `[profile.release]` flags (`lto`, `codegen-units`) are release-only, while the CPU baseline in `.cargo/config.toml` is workspace-wide on purpose — so the bit-identical PIN gate runs under the same SIMD codegen as the shipped binary. No behavior change; cargo clippy --workspace --all-targets -D warnings clean; bit-identical PIN gate green.
perf+chore: V1 speed milestone — Rust beats Java on 5 of 6 cells (PSM-zero-regression)
diag(i5): score_psm trace findings + diff harness (PXD001819, no production code change)
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughThis PR combines quality cleanup with parity investigation foundation. It refactors internal hash maps to FxHashMap for performance, removes stale MS-GF+ port framing throughout documentation, upgrades CI linting to required, renames the RSS probe environment variable with legacy support, and introduces comprehensive Rust-Java scoring parity investigation tooling (JSON trace output, Python analysis harness, and detailed investigation findings). ChangesQuality and Investigation Foundation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
Summary by CodeRabbit
Release Notes
New Features
--trace-jsonflag to msgf-trace binary for structured per-PSM JSON diagnostics.Bug Fixes
Documentation
Performance
Infrastructure
MSGFRUST_RSS_PROBEenvironment variable toMSGF_RSS_PROBE(legacy compatibility supported).