feat: add datawal-cli inspect binary (#18)#24
Merged
Conversation
assert_cmd 2.2.0+ declares edition = "2024" and rust-version = "1.85", which Cargo 1.75.0 cannot parse. Lock-only pin to 2.1.2 keeps the dev dependency callable on MSRV without changing Cargo.toml's caret range, mirroring the existing getrandom 0.3.4 pin documented in AGENTS.md. CommandCargoExt::cargo_bin has been marked #[deprecated(since = 2.1.0)] in favour of a cargo::cargo_bin! macro that ships only in the 2.2.x series, which itself requires edition2024. Gate the single call site with #[allow(deprecated)] plus an explanatory comment so that stable clippy stays clean while the MSRV job still compiles.
The first PR cut of `datawal-cli` printed every key and payload as
`key_len=N payload_len=M`, leaving operators to base64-decode by hand
even when the bytes were perfectly printable ASCII. This commit closes
that ergonomic gap while keeping the JSON schema (`datawal.cli.v1`)
bit-for-bit identical, so existing tooling that pipes `--json` into
`jq` is unaffected.
Behavioural changes (human form only, --json unchanged):
* `scan` and `dump` now render keys and payloads via a small helper
module `bytes_render`. Auto mode prints printable ASCII as a literal
(quoted with shell-style escapes when the value contains spaces,
tabs, double quotes, backslashes, or is empty) and falls back to a
prefixed `b64:` form for binary; `--bytes raw|base64|hex` forces a
specific rendering, with `--bytes hex` emitting `hex:` prefixes so
operators can never confuse a hex literal for a printable string.
Payloads are truncated to 64 bytes by default with a trailing
`...`; `--no-truncate` disables the cap.
* `get` gains a third key encoding, `--key TEXT`, complementing the
existing `--key-base64` and `--key-hex`. The three flags now belong
to a clap `required = true` group so exactly one must be supplied;
the error message lists all three. For value rendering, `get` in
auto mode prints the value as a literal when printable; for binary
values in auto mode it prints nothing to stdout and a short hint to
stderr ("binary value, N bytes; pass --bytes base64 or --bytes hex
to render"). `--bytes base64|hex` emits the encoded value to stdout
without any prefix, so the output round-trips cleanly through
shell pipelines.
* `dump` keeps `payload_len=N` (header-only by design — `dump` never
reads payload bytes off disk) but now also includes a human-form
`key=...` rendered through the same helper, with the same `--bytes`
/ `--no-truncate` knobs.
JSON output is explicitly tested to be invariant under `--bytes`:
`key_base64` and `payload_base64` are always populated; no `key_hex`
or `payload_hex` fields are introduced. A dedicated test
(`json_output_unchanged_by_bytes_flag`) locks this in.
Tests: the integration suite grows from 14 to 27 cases. The new
cases cover `--key TEXT`, the three encodings being mutually
exclusive, printable / binary / quoted / hex-forced rendering for
both `scan` and `get`, default truncation and `--no-truncate`, and
the JSON invariant above. Fourteen unit tests inside `bytes_render`
exercise the helper in isolation.
`crates/datawal-cli/examples/cli_read_smoke.sh` grows three asserts
mirroring the new human-form behaviour (printable key literal,
`--bytes hex` forced rendering, `--key TEXT`) and the existing
asserts continue to pass against the updated stdout shape.
Tooling: a `justfile` section "CLI (`datawal-cli`)" is added with
`cli-build`, `cli-build-debug`, `cli-run`, `cli-path`,
`cli-install-local`, `cli-help`, and `cli-smoke` recipes. The smoke
recipe is the canonical way to validate the CLI locally; it is *not*
wired into CI, matching the project's approach to soak workloads.
Hard invariants unchanged: this commit touches only the
`datawal-cli` crate (which is `publish = false` and has no public
API surface), the workspace `justfile`, and the smoke example. The
`datawal` lib, wire format, corpus fixtures, formal models, and
public re-exports are untouched.
Refs #18.
…1.75 (#18) byte_slice_trim_ascii (<[u8]>::trim_ascii_end) was only stabilized in Rust 1.80. The MSRV 1.75 CI job runs cargo check and cargo test, both of which fail with E0658 in crates/datawal-cli/tests/integration.rs at 5 call sites (lines 143, 162, 448, 475, 563). serde_json::from_slice digests trailing whitespace natively. The CLI emits a single JSON line followed by '\n', so dropping the explicit trim is a no-op functionally and removes the unstable surface. Local validation on stable: cargo test -p datawal-cli (27 pass), cargo clippy --workspace --all-targets -D warnings (clean), cargo fmt --check (clean), cli_read_smoke.sh (14 asserts green).
robertoberto
added a commit
that referenced
this pull request
May 21, 2026
Adds three new datawal-cli subcommands that round out the 0.1.x CLI
surface without touching the source store on disk.
- export STORE OUTFILE Live KV projection -> JSONL via
DataWal::export_jsonl. Refuses to
clobber an existing outfile (exit 1).
- compact STORE TARGET Snapshot-style rebuild into TARGET
via DataWal::compact_to. Refuses
non-empty target (exit 1).
- check STORE DataWal-level health: open + get every
live key (per-record CRC revalidation)
+ RecoveryReport. Exit 3 on hard get
failure; exit 2 on tail truncation
/ mid-stream error.
Source store on disk is never modified by any subcommand. cli.rs
top-doc and commands.rs top-doc reframed accordingly: subcommands
are now in two groups (read-only inspection vs source-untouched
mutations). Public CLI binary scope is unchanged otherwise.
JSON schema 'datawal.cli.v1' additively extended with three new
kinds: 'export', 'compact', 'check' (additive, non-breaking).
Tests: +6 integration tests in tests/integration.rs (33 total,
0 failures). Local validation: cargo fmt --check, cargo clippy -D
warnings, cargo test --workspace, cargo doc -D warnings, cargo
bench --no-run, cargo publish --dry-run -p datawal — all green.
Closes #18 (CLI-2 remainder; CLI-1 closed by PR #24).
Refs #7.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Lands the read-only inspector half of datawal-cli (issue #18 part 1 of 2). Provides a
datawalbinary in a new workspace memberdatawal-cliwith five inspect-only subcommands:scan,get,report,verify,dump. All subcommands support a--jsonmode emittingdatawal.cli.v1schema records, one per line.The mutating subcommands (
export,compact,check) ship in a separate follow-up PR as documented indev/0.1.4-plan.mdand on issue #18.What ships
crates/datawal-cli/— new workspace member withpublish = false(re-enabled by the 0.1.4 release PR oncedatawal 0.1.4is on crates.io withscan_iter).datawal scan <store>— one{kind:"record",...}line per record;--limit,--from-segment,--from-offset.datawal get <store> --key-base64 | --key-hex— emits{kind:"value",value_base64:...}on hit,{kind:"miss"}on miss with exit 2.datawal report <store>— emits the cachedRecoveryReportpopulated byRecordLog::open.datawal verify <store>— walks viascan_iterand emits{kind:"verify",frames_checked,crc_failures,tail_truncated_at}. Tail truncation downgrades exit to 2; CRC failure on sealed segments hard-fails to exit 3.datawal dump <store>—{kind:"frame",...}lines withkey_len/payload_lenbut NO payload bytes (header-only view).examples/populate_smoke_store.rs+examples/cli_read_smoke.sh— end-to-end shell smoke. NOT wired into CI; documented in the crate README as a developer-side check.Schema
datawal.cli.v1. Every JSON line carries{schema:"datawal.cli.v1", kind:"record"|"frame"|"report"|"verify"|"value"|"miss", ...}. Full table lives incrates/datawal-cli/README.md.Exit codes
Constraints honoured
RecordLog::append, nocompact_to, norotate_segmentreached. Mutate CLI lives in a follow-up PR.RecordLog::opentakes the fs2 advisory lock; same-process reopen is not attempted.scanandverifyshare oneRecordLoghandle and borrow it immutably viascan_iter(&self).datawalunchanged.clap "=4.5.20", others via workspace).Tests
crates/datawal-cli/tests/integration.rs— 14 tests covering scan/get/report/verify/dump, human form, JSON schema shape, hex+base64 key encodings, miss exit code, bad-encoding rejection, missing-key-arg, dump frame-without-payload, and cross-process lock contention.assert_cmd+predicates+tempfile.Smoke script (
examples/cli_read_smoke.sh) round-trips every subcommand withjqassertions. Local run: green.Out of scope
export/compact/check/put/delete/rotate) — separate PR.datawal 0.1.4must be on crates.io first sodatawal-clican depend on it by version, not path).Refs #18.