Skip to content

iter39: docs + CLI rename for the post-cutover state#30

Merged
ypriverol merged 8 commits into
devfrom
iter39-docs-rewrite
May 23, 2026
Merged

iter39: docs + CLI rename for the post-cutover state#30
ypriverol merged 8 commits into
devfrom
iter39-docs-rewrite

Conversation

@ypriverol
Copy link
Copy Markdown
Member

Summary

  • Rewrite README.md as a linear narrative serving quantms operators + mass-spec researchers (~190 lines).
  • Add DOCS.md at repo root: single-file reference for CLI, formats, training, migration (~505 lines).
  • Add CLI_MIGRATION.md: Java MS-GF+ → msgf-rust flag map + numeric legacy → named-value table + 3 worked examples (~100 lines).
  • Rename CLI flags from Java-historical numeric IDs to Rust-idiomatic named values; legacy forms still accepted silently for quantms script compat.
  • Delete the legacy docs/ tree (38 tracked files); preserve docs/ engineering-planning artifacts.

Design spec: docs/superpowers/specs/2026-05-23-iter39-docs-rewrite-design.md.

CLI changes (one commit, fully backward-compatible)

Canonical (shown in --help):

  • --fragmentation auto|CID|ETD|HCD|UVPD (was numeric 0..=4)
  • --instrument low-res|high-res|TOF|QExactive (was numeric 0..=3)
  • --protocol auto|phospho|iTRAQ|iTRAQ-phospho|TMT|standard (was numeric 0..=5)
  • --enzyme-specificity non-specific|semi|fully (was --ntt 0..=2)
  • --mods <FILE> (was --mod, kept as hidden alias)

Legacy (silently accepted): numeric 0..=N for the four enum flags, --ntt as a clap alias for --enzyme-specificity, --mod as a hidden alias for --mods. Quantms scripts using legacy form keep working unchanged.

A new regression test (cli_accepts_both_named_and_numeric_param_values) runs a search twice — once with legacy numeric flags, once with canonical named flags — and asserts equivalent PIN output (header + sorted data rows; row order may vary between process invocations due to parallel scheduling).

Test plan

  • cargo test --release --workspace passes (37+ test binaries, 0 new failures vs baseline)
  • New round-trip test guards the back-compat path
  • cargo build --release produces clean binary
  • Existing CI workflow (.github/workflows/ci.yml) needs no changes; the 7 known-skipped tests stay skipped

Made with Cursor

ypriverol and others added 8 commits May 23, 2026 09:20
First bigbio release of msgf+
…r state

Design document for iter39:
- Rewrite README.md as a linear narrative serving both quantms operators
  and mass-spec researchers (~190 lines).
- New single-file DOCS.md reference at repo root (~505 lines).
- New CLI_MIGRATION.md with Java → Rust flag mapping + numeric-legacy
  → named-value table + worked examples (~100 lines).
- CLI rename: numeric enum IDs → named values (--fragmentation HCD vs
  --fragmentation 3); --ntt → --enzyme-specificity; --mod → --mods.
  All legacy forms still accepted silently for quantms script compat.
- Delete the entire user-facing docs/ tree.

The Rust port now beats Java MS-GF+ on all 3 benchmark datasets; this
iteration treats msgf-rust as a new app and writes its docs from scratch
to fit.

Acronym style: HCD/CID/ETD/UVPD/TMT/iTRAQ/TOF uppercase, QExactive in
brand casing, descriptive values (auto, low-res, fully, etc.) in
lowercase kebab-case. clap parses case-insensitively so quantms scripts
that lowercase values still work.

ScoringParamGen porting is acknowledged as roadmap work, not in this
iteration.
Implementation plan for iter39 — docs rewrite + CLI rename.

Plan structure: 5 sequential commits on iter39-docs-rewrite, decomposed
into 8 tasks of bite-sized TDD steps.

- Tasks 1-3 produce Commit 1: CLI rename + enums + custom parsers +
  resolver signature change + 15 updated unit tests + 1 new round-trip
  integration test.
- Task 4: rewrite README.md (full content embedded).
- Task 5: add DOCS.md (skeleton + per-section content guides; the
  prose-heavy sections defer to the spec and source code for content).
- Task 6: add CLI_MIGRATION.md (full content embedded — Table A
  Java→Rust, Table B legacy-numeric→named, three worked examples).
- Task 7: delete the legacy docs/ tree (36+ tracked files);
  engineering planning subdirectories preserved.
- Task 8: push branch + open PR.

Each step is one action (2-5 min). Commits land in dependency order.
The new round-trip test (cli_smoke.rs) guards the back-compat path
by asserting --fragmentation 3 and --fragmentation HCD produce
byte-identical PIN output.

Constraint observed: no commit message in this plan contains the word
that triggers the no-claude-attribution hook.
Replace numeric Java-historical enum flags with Rust-idiomatic named
values and rename --mod → --mods, --ntt → --enzyme-specificity. All
legacy forms still accepted silently for quantms script compat.

Canonical (shown in --help):
- --fragmentation auto|CID|ETD|HCD|UVPD     (default: auto)
- --instrument low-res|high-res|TOF|QExactive (default: low-res)
- --protocol auto|phospho|iTRAQ|iTRAQ-phospho|TMT|standard (default: auto)
- --enzyme-specificity non-specific|semi|fully (default: fully)
- --mods <FILE>   (singular --mod kept as hidden alias)

Legacy (silently accepted):
- --fragmentation 0..=4
- --instrument 0..=3
- --protocol 0..=5
- --ntt 0..=2          (--ntt is also a clap alias of --enzyme-specificity)
- --mod <FILE>

clap parses values case-insensitively, so quantms scripts that lowercase
named values (--fragmentation hcd) keep working.

Internal:
- Added four ValueEnum-derived enums: Fragmentation, Instrument,
  Protocol, EnzymeSpecificity.
- Added four custom value parsers: parse_fragmentation,
  parse_instrument, parse_protocol, parse_enzyme_specificity. Each tries
  the canonical named value first, falls back to the legacy numeric ID.
- Changed resolve_bundled_param and resolve_bundled_param_for_activation
  signatures from Option<u8> triples to strongly-typed enums. The
  "all-defaults short-circuit" (which produced HCD_QExactive_Tryp.param
  pre-iter39 when no flags were given) is preserved via the
  Fragmentation::Auto + Instrument::LowRes + Protocol::Auto check.
- Updated the 15 param_resolver_tests for the new signature; replaced
  the three "rejects out of range" resolver tests with equivalent tests
  on the parser functions (clap rejects bad values at parse time now).

Verified:
- cargo test --release -p msgf-rust → 18 passed (15 resolver tests
  + 3 new parser-out-of-range tests).
- cargo test --release -p msgf-rust --test cli_smoke → 8 passed
  (7 existing + 1 new round-trip).
- cargo test --release --workspace → no new failures vs baseline.

New regression guard: cli_accepts_both_named_and_numeric_param_values
runs a small search twice (once with --fragmentation 3 --protocol 4,
once with --fragmentation HCD --protocol TMT) and asserts PIN outputs
are byte-identical.

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace the legacy Java-tool README (193 lines, Java 17 + JAR + mvn) with
a linear-narrative README for the Rust port (~190 lines, dual audience).

Sections, top to bottom:
1. Title + tagline + badges (CI, release, license)
2. What is this? — one paragraph, names UCSD original
3. Why msgf-rust? — benchmark table vs Java on Astral / PXD001819 / TMT
4. Install — release archive, cargo install, build from source
5. Quick Start — minimal command, one paragraph on .pin row shape
6. Common workflows — tryptic DDA, TMT, TSV output, quantms integration
7. CLI summary — table of ~17 most-used flags
8. Auto-detection — activation/instrument detection from mzML
9. Parity vs Java MS-GF+ — short summary; pointer to DOCS.md §8d
10. Citation
11. License — UCSD-Noncommercial; pointer to java-legacy and
    java-legacy-original branches
12. Acknowledgments

quantms operators have a labeled section in #6 + the CLI summary in #7.
Researchers see the benchmark proof up front in #3.

The full CLI reference, mods.txt grammar, PIN/TSV column docs, training
notes, and Java→Rust migration table live in DOCS.md (separate commit).
The Java→Rust flag mapping table lives in CLI_MIGRATION.md (separate
commit).

Co-authored-by: Cursor <cursoragent@cursor.com>
Add DOCS.md at repo root: the full power-user reference covering all
flags, formats, build/test workflow, training notes, and Java→Rust
migration. ~505 lines, navigated via a top-of-file table of contents.

Sections:
1. CLI reference — every flag with type/default/description and
   accepted legacy form
2. Mods.txt format — grammar + 3 worked examples
3. Output formats — PIN columns, TSV columns, when to use which
4. Auto-detection — activation method detection from mzML +
   param-file resolution table
5. Building from source — Rust 1.85+, cargo build/test, the 7 CI-skipped
   tests and reasons
6. Training new .param files — current state (reuse Java's bundled
   files), roadmap (port ScoringParamGen), interim workflow
   (train on java-legacy, --param-file at the Rust binary)
7. Isobaric labeling — TMT and iTRAQ workflows, required mods entries,
   auto-selected param file
8. Java MS-GF+ → msgf-rust migration — flag rename table, behavior
   differences, known parity divergences
9. License and citation

The DOCS.md design follows the linear-narrative pattern of README.md:
no nested directories, no site generator, just one Cmd-F-friendly file.

Co-authored-by: Cursor <cursoragent@cursor.com>
One-page reference for porting Java MS-GF+ command lines or quantms
scripts to msgf-rust. Covers:

- Table A: Java flag → msgf-rust flag mapping (18 flags).
- Table B: numeric-legacy → canonical named value mapping (one row per
  legacy ID across fragmentation, instrument, protocol, enzyme-specificity).
- Three worked examples (plain tryptic DDA; TMT 10-plex; phospho STY)
  showing the Java MS-GF+ command line and the msgf-rust equivalent
  side-by-side.
- Notes on behaviors that simply don't exist on the Rust side (no
  -tda flag, no -e enzyme flag, no mzXML/PKL/MS2 input, no mzIdentML
  output).

msgf-rust silently accepts the legacy forms (--fragmentation 3,
--mod, --ntt) for backward compatibility with quantms scripts. New
canonical forms are documented for fresh users.

Co-authored-by: Cursor <cursoragent@cursor.com>
The docs/ tree predated the Rust cutover and described the Java tool
(mvn build, JAR distribution, Java CLI). Content that still applies has
been migrated to root-level README.md, DOCS.md, and CLI_MIGRATION.md.

Deleted (38 tracked files):
- docs/msgfplus.md (full Java CLI reference — superseded by DOCS.md §1)
- docs/msgfdb_modfile.md (mods.txt grammar — superseded by DOCS.md §2)
- docs/output.md (PIN/TSV columns — superseded by DOCS.md §3)
- docs/buildsa.md (Java standalone SA builder — Java-only utility)
- docs/training-scoring-models.md (Java trainer — superseded by DOCS.md §6)
- docs/isobariclabeling.md (TMT/iTRAQ — superseded by DOCS.md §7)
- docs/troubleshooting.md (Java JVM tuning — Java-only)
- docs/changelog.md (Java release notes — GitHub Releases tracks v0.1.0+)
- docs/readme.md (Java tool overview — superseded by root README.md)
- docs/benchmarks/ (3 PNG figures from Java perf comparison — stale)
- docs/examples/ (Mods.txt + activation/enzyme/protocol samples —
  inline examples in DOCS.md instead)
- docs/parameterfiles/ (15 Java -conf templates — no Rust equivalent)

Preserved:
- docs/superpowers/specs/ — design specs (engineering planning).
- docs/superpowers/plans/ — implementation plans (engineering planning).
- docs/parity-analysis/ (already gitignored since commit 5e9b63a;
  no action needed).

Co-authored-by: Cursor <cursoragent@cursor.com>
@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 23, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b9ab6e18-a66d-4b20-abff-3cd0b240fda6

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch iter39-docs-rewrite

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ypriverol ypriverol merged commit 0b137bc into dev May 23, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant