Release Release 0.5 · dunnock/pqls

v0.5.0 — Cycle-9 polish

--sample: replaced full-file read with RowSelection-based sampling — now O(sample size), not O(file size). Massive perf win on multi-GB files.
--columns composes with --schema — projection now also filters schema output, not just rows.
Timestamp format consistency: scan-stats timestamps now match the same RFC 3339 / Z format used by --csv and --ndjson dump modes.
n_distinct null-adjustment semantics documented in SKILL.md (distinct count subtracts 1 when null_count > 0).
Exit codes unified (POSIX): user-input errors → 2 (was 3), file errors → 1. SKILL.md updated.

v0.4.0 — Cycles 6–8

--diff A B schema comparison; exits 0 if identical, 1 if different. Supports --json for machine-readable diff.
n_distinct column in --scan-stats — distinct count per column when scanning.
Arrow IPC decode in --kv-meta: the ARROW:schema key is now decoded as FlatBuffer schema (no more opaque base64).
Structured type fields in --diff JSON output.
-h/--help strings rewritten with examples and full descriptions for every flag.
--scan-stats timestamp quoting fix in detail mode.

v0.3.0 — Cycles 3–4

-d --scan-stats: when a file was written without embedded statistics (common with Arrow writers), pqls now scans the whole file to compute min/max/null counts.
Arrow-written file note: detail mode prints a hint when stats are absent, pointing at --scan-stats.
Various cycle-4 bug fixes (help exit code, diff JSON shape, scan-stats timing edge cases).

v0.2.0 — Cycle 2

First major feature expansion after v0.

--schema — print schema only (column name + type), no row count or path decoration.
--json — machine-readable JSON output for --schema, --kv-meta, --check, --partition-stats, --diff.
--ndjson — stream rows as newline-delimited JSON. Combines with --head N and --sample N.
--sample N — emit N randomly-sampled rows; requires --ndjson or --csv.
--columns col1,col2 — comma-separated projection list.
--kv-meta — print Parquet key-value metadata (writer version, custom properties).
SKILL.md shipped at repo root — agent contract (when to use, input/output contracts, exit codes, common patterns).
README: comparison table vs parquet-tools / DuckDB / fastparquet, agent usage section.

v0.1.0 — Initial release

Single static Linux binary, curl | sh installer.
Default mode: schema, row count, row groups, file size.
-d/--detail: per-row-group statistics with embedded min/max/null counts.
--csv [--head N]: dump rows as CSV.
-r/--recursive: walk directories; list .parquet files.
Hive-partitioned dataset discovery (directory mode).
GitHub Actions: CI on PR + push to main, release matrix on tag push (x86_64-unknown-linux-gnu, aarch64-unknown-linux-gnu via cross).
MIT OR Apache-2.0 dual license.

Provide feedback