Skip to content

feat: add datawal-cli inspect binary (#18)#24

Merged
robertoberto merged 4 commits into
mainfrom
feat/cli-inspect
May 21, 2026
Merged

feat: add datawal-cli inspect binary (#18)#24
robertoberto merged 4 commits into
mainfrom
feat/cli-inspect

Conversation

@robertoberto
Copy link
Copy Markdown
Contributor

Summary

Lands the read-only inspector half of datawal-cli (issue #18 part 1 of 2). Provides a datawal binary in a new workspace member datawal-cli with five inspect-only subcommands: scan, get, report, verify, dump. All subcommands support a --json mode emitting datawal.cli.v1 schema records, one per line.

The mutating subcommands (export, compact, check) ship in a separate follow-up PR as documented in dev/0.1.4-plan.md and on issue #18.

What ships

  • crates/datawal-cli/ — new workspace member with publish = false (re-enabled by the 0.1.4 release PR once datawal 0.1.4 is on crates.io with scan_iter).
  • datawal scan <store> — one {kind:"record",...} line per record; --limit, --from-segment, --from-offset.
  • datawal get <store> --key-base64 | --key-hex — emits {kind:"value",value_base64:...} on hit, {kind:"miss"} on miss with exit 2.
  • datawal report <store> — emits the cached RecoveryReport populated by RecordLog::open.
  • datawal verify <store> — walks via scan_iter and emits {kind:"verify",frames_checked,crc_failures,tail_truncated_at}. Tail truncation downgrades exit to 2; CRC failure on sealed segments hard-fails to exit 3.
  • datawal dump <store>{kind:"frame",...} lines with key_len/payload_len but NO payload bytes (header-only view).
  • examples/populate_smoke_store.rs + examples/cli_read_smoke.sh — end-to-end shell smoke. NOT wired into CI; documented in the crate README as a developer-side check.

Schema

datawal.cli.v1. Every JSON line carries {schema:"datawal.cli.v1", kind:"record"|"frame"|"report"|"verify"|"value"|"miss", ...}. Full table lives in crates/datawal-cli/README.md.

Exit codes

Code Meaning
0 Success
1 Invocation / I/O / decoding error
2 Logical miss: get key not found, or verify tail truncated
3 CRC mismatch on sealed segment (hard recovery error)

Constraints honoured

  • Read-only: no RecordLog::append, no compact_to, no rotate_segment reached. Mutate CLI lives in a follow-up PR.
  • Single-writer single-process: integration tests cover the case where another process holds the lock and the CLI exits with a clear message.
  • RecordLog::open takes the fs2 advisory lock; same-process reopen is not attempted. scan and verify share one RecordLog handle and borrow it immutably via scan_iter(&self).
  • WIRE_VERSION=1 untouched. No corpus mutation. Public surface of datawal unchanged.
  • MSRV 1.75.0 preserved (deps pinned: clap "=4.5.20", others via workspace).

Tests

crates/datawal-cli/tests/integration.rs — 14 tests covering scan/get/report/verify/dump, human form, JSON schema shape, hex+base64 key encodings, miss exit code, bad-encoding rejection, missing-key-arg, dump frame-without-payload, and cross-process lock contention. assert_cmd + predicates + tempfile.

Smoke script (examples/cli_read_smoke.sh) round-trips every subcommand with jq assertions. Local run: green.

Out of scope

  • Mutate CLI (export/compact/check/put/delete/rotate) — separate PR.
  • Crates.io publication — deferred to the 0.1.4 release PR (datawal 0.1.4 must be on crates.io first so datawal-cli can depend on it by version, not path).

Refs #18.

assert_cmd 2.2.0+ declares edition = "2024" and rust-version = "1.85",
which Cargo 1.75.0 cannot parse. Lock-only pin to 2.1.2 keeps the dev
dependency callable on MSRV without changing Cargo.toml's caret range,
mirroring the existing getrandom 0.3.4 pin documented in AGENTS.md.

CommandCargoExt::cargo_bin has been marked #[deprecated(since = 2.1.0)]
in favour of a cargo::cargo_bin! macro that ships only in the 2.2.x
series, which itself requires edition2024. Gate the single call site
with #[allow(deprecated)] plus an explanatory comment so that stable
clippy stays clean while the MSRV job still compiles.
The first PR cut of `datawal-cli` printed every key and payload as
`key_len=N payload_len=M`, leaving operators to base64-decode by hand
even when the bytes were perfectly printable ASCII. This commit closes
that ergonomic gap while keeping the JSON schema (`datawal.cli.v1`)
bit-for-bit identical, so existing tooling that pipes `--json` into
`jq` is unaffected.

Behavioural changes (human form only, --json unchanged):

* `scan` and `dump` now render keys and payloads via a small helper
  module `bytes_render`. Auto mode prints printable ASCII as a literal
  (quoted with shell-style escapes when the value contains spaces,
  tabs, double quotes, backslashes, or is empty) and falls back to a
  prefixed `b64:` form for binary; `--bytes raw|base64|hex` forces a
  specific rendering, with `--bytes hex` emitting `hex:` prefixes so
  operators can never confuse a hex literal for a printable string.
  Payloads are truncated to 64 bytes by default with a trailing
  `...`; `--no-truncate` disables the cap.

* `get` gains a third key encoding, `--key TEXT`, complementing the
  existing `--key-base64` and `--key-hex`. The three flags now belong
  to a clap `required = true` group so exactly one must be supplied;
  the error message lists all three. For value rendering, `get` in
  auto mode prints the value as a literal when printable; for binary
  values in auto mode it prints nothing to stdout and a short hint to
  stderr ("binary value, N bytes; pass --bytes base64 or --bytes hex
  to render"). `--bytes base64|hex` emits the encoded value to stdout
  without any prefix, so the output round-trips cleanly through
  shell pipelines.

* `dump` keeps `payload_len=N` (header-only by design — `dump` never
  reads payload bytes off disk) but now also includes a human-form
  `key=...` rendered through the same helper, with the same `--bytes`
  / `--no-truncate` knobs.

JSON output is explicitly tested to be invariant under `--bytes`:
`key_base64` and `payload_base64` are always populated; no `key_hex`
or `payload_hex` fields are introduced. A dedicated test
(`json_output_unchanged_by_bytes_flag`) locks this in.

Tests: the integration suite grows from 14 to 27 cases. The new
cases cover `--key TEXT`, the three encodings being mutually
exclusive, printable / binary / quoted / hex-forced rendering for
both `scan` and `get`, default truncation and `--no-truncate`, and
the JSON invariant above. Fourteen unit tests inside `bytes_render`
exercise the helper in isolation.

`crates/datawal-cli/examples/cli_read_smoke.sh` grows three asserts
mirroring the new human-form behaviour (printable key literal,
`--bytes hex` forced rendering, `--key TEXT`) and the existing
asserts continue to pass against the updated stdout shape.

Tooling: a `justfile` section "CLI (`datawal-cli`)" is added with
`cli-build`, `cli-build-debug`, `cli-run`, `cli-path`,
`cli-install-local`, `cli-help`, and `cli-smoke` recipes. The smoke
recipe is the canonical way to validate the CLI locally; it is *not*
wired into CI, matching the project's approach to soak workloads.

Hard invariants unchanged: this commit touches only the
`datawal-cli` crate (which is `publish = false` and has no public
API surface), the workspace `justfile`, and the smoke example. The
`datawal` lib, wire format, corpus fixtures, formal models, and
public re-exports are untouched.

Refs #18.
…1.75 (#18)

byte_slice_trim_ascii (<[u8]>::trim_ascii_end) was only stabilized in
Rust 1.80. The MSRV 1.75 CI job runs cargo check and cargo test, both
of which fail with E0658 in crates/datawal-cli/tests/integration.rs at
5 call sites (lines 143, 162, 448, 475, 563).

serde_json::from_slice digests trailing whitespace natively. The CLI
emits a single JSON line followed by '\n', so dropping the explicit
trim is a no-op functionally and removes the unstable surface.

Local validation on stable: cargo test -p datawal-cli (27 pass),
cargo clippy --workspace --all-targets -D warnings (clean),
cargo fmt --check (clean), cli_read_smoke.sh (14 asserts green).
@robertoberto robertoberto merged commit 34a5a89 into main May 21, 2026
7 checks passed
@robertoberto robertoberto deleted the feat/cli-inspect branch May 21, 2026 12:17
robertoberto added a commit that referenced this pull request May 21, 2026
Adds three new datawal-cli subcommands that round out the 0.1.x CLI
surface without touching the source store on disk.

- export STORE OUTFILE        Live KV projection -> JSONL via
                              DataWal::export_jsonl. Refuses to
                              clobber an existing outfile (exit 1).
- compact STORE TARGET        Snapshot-style rebuild into TARGET
                              via DataWal::compact_to. Refuses
                              non-empty target (exit 1).
- check STORE                 DataWal-level health: open + get every
                              live key (per-record CRC revalidation)
                              + RecoveryReport. Exit 3 on hard get
                              failure; exit 2 on tail truncation
                              / mid-stream error.

Source store on disk is never modified by any subcommand. cli.rs
top-doc and commands.rs top-doc reframed accordingly: subcommands
are now in two groups (read-only inspection vs source-untouched
mutations). Public CLI binary scope is unchanged otherwise.

JSON schema 'datawal.cli.v1' additively extended with three new
kinds: 'export', 'compact', 'check' (additive, non-breaking).

Tests: +6 integration tests in tests/integration.rs (33 total,
0 failures). Local validation: cargo fmt --check, cargo clippy -D
warnings, cargo test --workspace, cargo doc -D warnings, cargo
bench --no-run, cargo publish --dry-run -p datawal — all green.

Closes #18 (CLI-2 remainder; CLI-1 closed by PR #24).
Refs #7.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant