iter39: docs + CLI rename for the post-cutover state#30
Conversation
First bigbio release of msgf+
…r state Design document for iter39: - Rewrite README.md as a linear narrative serving both quantms operators and mass-spec researchers (~190 lines). - New single-file DOCS.md reference at repo root (~505 lines). - New CLI_MIGRATION.md with Java → Rust flag mapping + numeric-legacy → named-value table + worked examples (~100 lines). - CLI rename: numeric enum IDs → named values (--fragmentation HCD vs --fragmentation 3); --ntt → --enzyme-specificity; --mod → --mods. All legacy forms still accepted silently for quantms script compat. - Delete the entire user-facing docs/ tree. The Rust port now beats Java MS-GF+ on all 3 benchmark datasets; this iteration treats msgf-rust as a new app and writes its docs from scratch to fit. Acronym style: HCD/CID/ETD/UVPD/TMT/iTRAQ/TOF uppercase, QExactive in brand casing, descriptive values (auto, low-res, fully, etc.) in lowercase kebab-case. clap parses case-insensitively so quantms scripts that lowercase values still work. ScoringParamGen porting is acknowledged as roadmap work, not in this iteration.
Implementation plan for iter39 — docs rewrite + CLI rename. Plan structure: 5 sequential commits on iter39-docs-rewrite, decomposed into 8 tasks of bite-sized TDD steps. - Tasks 1-3 produce Commit 1: CLI rename + enums + custom parsers + resolver signature change + 15 updated unit tests + 1 new round-trip integration test. - Task 4: rewrite README.md (full content embedded). - Task 5: add DOCS.md (skeleton + per-section content guides; the prose-heavy sections defer to the spec and source code for content). - Task 6: add CLI_MIGRATION.md (full content embedded — Table A Java→Rust, Table B legacy-numeric→named, three worked examples). - Task 7: delete the legacy docs/ tree (36+ tracked files); engineering planning subdirectories preserved. - Task 8: push branch + open PR. Each step is one action (2-5 min). Commits land in dependency order. The new round-trip test (cli_smoke.rs) guards the back-compat path by asserting --fragmentation 3 and --fragmentation HCD produce byte-identical PIN output. Constraint observed: no commit message in this plan contains the word that triggers the no-claude-attribution hook.
Replace numeric Java-historical enum flags with Rust-idiomatic named values and rename --mod → --mods, --ntt → --enzyme-specificity. All legacy forms still accepted silently for quantms script compat. Canonical (shown in --help): - --fragmentation auto|CID|ETD|HCD|UVPD (default: auto) - --instrument low-res|high-res|TOF|QExactive (default: low-res) - --protocol auto|phospho|iTRAQ|iTRAQ-phospho|TMT|standard (default: auto) - --enzyme-specificity non-specific|semi|fully (default: fully) - --mods <FILE> (singular --mod kept as hidden alias) Legacy (silently accepted): - --fragmentation 0..=4 - --instrument 0..=3 - --protocol 0..=5 - --ntt 0..=2 (--ntt is also a clap alias of --enzyme-specificity) - --mod <FILE> clap parses values case-insensitively, so quantms scripts that lowercase named values (--fragmentation hcd) keep working. Internal: - Added four ValueEnum-derived enums: Fragmentation, Instrument, Protocol, EnzymeSpecificity. - Added four custom value parsers: parse_fragmentation, parse_instrument, parse_protocol, parse_enzyme_specificity. Each tries the canonical named value first, falls back to the legacy numeric ID. - Changed resolve_bundled_param and resolve_bundled_param_for_activation signatures from Option<u8> triples to strongly-typed enums. The "all-defaults short-circuit" (which produced HCD_QExactive_Tryp.param pre-iter39 when no flags were given) is preserved via the Fragmentation::Auto + Instrument::LowRes + Protocol::Auto check. - Updated the 15 param_resolver_tests for the new signature; replaced the three "rejects out of range" resolver tests with equivalent tests on the parser functions (clap rejects bad values at parse time now). Verified: - cargo test --release -p msgf-rust → 18 passed (15 resolver tests + 3 new parser-out-of-range tests). - cargo test --release -p msgf-rust --test cli_smoke → 8 passed (7 existing + 1 new round-trip). - cargo test --release --workspace → no new failures vs baseline. New regression guard: cli_accepts_both_named_and_numeric_param_values runs a small search twice (once with --fragmentation 3 --protocol 4, once with --fragmentation HCD --protocol TMT) and asserts PIN outputs are byte-identical. Co-authored-by: Cursor <cursoragent@cursor.com>
Replace the legacy Java-tool README (193 lines, Java 17 + JAR + mvn) with
a linear-narrative README for the Rust port (~190 lines, dual audience).
Sections, top to bottom:
1. Title + tagline + badges (CI, release, license)
2. What is this? — one paragraph, names UCSD original
3. Why msgf-rust? — benchmark table vs Java on Astral / PXD001819 / TMT
4. Install — release archive, cargo install, build from source
5. Quick Start — minimal command, one paragraph on .pin row shape
6. Common workflows — tryptic DDA, TMT, TSV output, quantms integration
7. CLI summary — table of ~17 most-used flags
8. Auto-detection — activation/instrument detection from mzML
9. Parity vs Java MS-GF+ — short summary; pointer to DOCS.md §8d
10. Citation
11. License — UCSD-Noncommercial; pointer to java-legacy and
java-legacy-original branches
12. Acknowledgments
quantms operators have a labeled section in #6 + the CLI summary in #7.
Researchers see the benchmark proof up front in #3.
The full CLI reference, mods.txt grammar, PIN/TSV column docs, training
notes, and Java→Rust migration table live in DOCS.md (separate commit).
The Java→Rust flag mapping table lives in CLI_MIGRATION.md (separate
commit).
Co-authored-by: Cursor <cursoragent@cursor.com>
Add DOCS.md at repo root: the full power-user reference covering all flags, formats, build/test workflow, training notes, and Java→Rust migration. ~505 lines, navigated via a top-of-file table of contents. Sections: 1. CLI reference — every flag with type/default/description and accepted legacy form 2. Mods.txt format — grammar + 3 worked examples 3. Output formats — PIN columns, TSV columns, when to use which 4. Auto-detection — activation method detection from mzML + param-file resolution table 5. Building from source — Rust 1.85+, cargo build/test, the 7 CI-skipped tests and reasons 6. Training new .param files — current state (reuse Java's bundled files), roadmap (port ScoringParamGen), interim workflow (train on java-legacy, --param-file at the Rust binary) 7. Isobaric labeling — TMT and iTRAQ workflows, required mods entries, auto-selected param file 8. Java MS-GF+ → msgf-rust migration — flag rename table, behavior differences, known parity divergences 9. License and citation The DOCS.md design follows the linear-narrative pattern of README.md: no nested directories, no site generator, just one Cmd-F-friendly file. Co-authored-by: Cursor <cursoragent@cursor.com>
One-page reference for porting Java MS-GF+ command lines or quantms scripts to msgf-rust. Covers: - Table A: Java flag → msgf-rust flag mapping (18 flags). - Table B: numeric-legacy → canonical named value mapping (one row per legacy ID across fragmentation, instrument, protocol, enzyme-specificity). - Three worked examples (plain tryptic DDA; TMT 10-plex; phospho STY) showing the Java MS-GF+ command line and the msgf-rust equivalent side-by-side. - Notes on behaviors that simply don't exist on the Rust side (no -tda flag, no -e enzyme flag, no mzXML/PKL/MS2 input, no mzIdentML output). msgf-rust silently accepts the legacy forms (--fragmentation 3, --mod, --ntt) for backward compatibility with quantms scripts. New canonical forms are documented for fresh users. Co-authored-by: Cursor <cursoragent@cursor.com>
The docs/ tree predated the Rust cutover and described the Java tool (mvn build, JAR distribution, Java CLI). Content that still applies has been migrated to root-level README.md, DOCS.md, and CLI_MIGRATION.md. Deleted (38 tracked files): - docs/msgfplus.md (full Java CLI reference — superseded by DOCS.md §1) - docs/msgfdb_modfile.md (mods.txt grammar — superseded by DOCS.md §2) - docs/output.md (PIN/TSV columns — superseded by DOCS.md §3) - docs/buildsa.md (Java standalone SA builder — Java-only utility) - docs/training-scoring-models.md (Java trainer — superseded by DOCS.md §6) - docs/isobariclabeling.md (TMT/iTRAQ — superseded by DOCS.md §7) - docs/troubleshooting.md (Java JVM tuning — Java-only) - docs/changelog.md (Java release notes — GitHub Releases tracks v0.1.0+) - docs/readme.md (Java tool overview — superseded by root README.md) - docs/benchmarks/ (3 PNG figures from Java perf comparison — stale) - docs/examples/ (Mods.txt + activation/enzyme/protocol samples — inline examples in DOCS.md instead) - docs/parameterfiles/ (15 Java -conf templates — no Rust equivalent) Preserved: - docs/superpowers/specs/ — design specs (engineering planning). - docs/superpowers/plans/ — implementation plans (engineering planning). - docs/parity-analysis/ (already gitignored since commit 5e9b63a; no action needed). Co-authored-by: Cursor <cursoragent@cursor.com>
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
Design spec:
docs/superpowers/specs/2026-05-23-iter39-docs-rewrite-design.md.CLI changes (one commit, fully backward-compatible)
Canonical (shown in --help):
--fragmentation auto|CID|ETD|HCD|UVPD(was numeric 0..=4)--instrument low-res|high-res|TOF|QExactive(was numeric 0..=3)--protocol auto|phospho|iTRAQ|iTRAQ-phospho|TMT|standard(was numeric 0..=5)--enzyme-specificity non-specific|semi|fully(was --ntt 0..=2)--mods <FILE>(was --mod, kept as hidden alias)Legacy (silently accepted): numeric 0..=N for the four enum flags, --ntt as a clap alias for --enzyme-specificity, --mod as a hidden alias for --mods. Quantms scripts using legacy form keep working unchanged.
A new regression test (
cli_accepts_both_named_and_numeric_param_values) runs a search twice — once with legacy numeric flags, once with canonical named flags — and asserts equivalent PIN output (header + sorted data rows; row order may vary between process invocations due to parallel scheduling).Test plan
Made with Cursor