Releases: dunnock/pqls
Releases · dunnock/pqls
v0.7.1
Full Changelog: v0.7.0...v0.7.1
v0.7.0
Full Changelog: v0.6.0...v0.7.0
v0.6.0
Full Changelog: v0.5.0...v0.6.0
v0.5.0
Full Changelog: pqls-0.5...v0.5.0
Release 0.5
v0.5.0 — Cycle-9 polish
- --sample: replaced full-file read with RowSelection-based sampling — now O(sample size), not O(file size). Massive perf win on multi-GB files.
- --columns composes with --schema — projection now also filters schema output, not just rows.
- Timestamp format consistency: scan-stats timestamps now match the same RFC 3339 / Z format used by --csv and --ndjson dump modes.
- n_distinct null-adjustment semantics documented in SKILL.md (distinct count subtracts 1 when null_count > 0).
- Exit codes unified (POSIX): user-input errors → 2 (was 3), file errors → 1. SKILL.md updated.
v0.4.0 — Cycles 6–8
- --diff A B schema comparison; exits 0 if identical, 1 if different. Supports --json for machine-readable diff.
- n_distinct column in --scan-stats — distinct count per column when scanning.
- Arrow IPC decode in --kv-meta: the ARROW:schema key is now decoded as FlatBuffer schema (no more opaque base64).
- Structured type fields in --diff JSON output.
- -h/--help strings rewritten with examples and full descriptions for every flag.
- --scan-stats timestamp quoting fix in detail mode.
v0.3.0 — Cycles 3–4
- -d --scan-stats: when a file was written without embedded statistics (common with Arrow writers), pqls now scans the whole file to compute min/max/null counts.
- Arrow-written file note: detail mode prints a hint when stats are absent, pointing at --scan-stats.
- Various cycle-4 bug fixes (help exit code, diff JSON shape, scan-stats timing edge cases).
v0.2.0 — Cycle 2
First major feature expansion after v0.
- --schema — print schema only (column name + type), no row count or path decoration.
- --json — machine-readable JSON output for --schema, --kv-meta, --check, --partition-stats, --diff.
- --ndjson — stream rows as newline-delimited JSON. Combines with --head N and --sample N.
- --sample N — emit N randomly-sampled rows; requires --ndjson or --csv.
- --columns col1,col2 — comma-separated projection list.
- --kv-meta — print Parquet key-value metadata (writer version, custom properties).
- SKILL.md shipped at repo root — agent contract (when to use, input/output contracts, exit codes, common patterns).
- README: comparison table vs parquet-tools / DuckDB / fastparquet, agent usage section.
v0.1.0 — Initial release
- Single static Linux binary, curl | sh installer.
- Default mode: schema, row count, row groups, file size.
- -d/--detail: per-row-group statistics with embedded min/max/null counts.
- --csv [--head N]: dump rows as CSV.
- -r/--recursive: walk directories; list .parquet files.
- Hive-partitioned dataset discovery (directory mode).
- GitHub Actions: CI on PR + push to main, release matrix on tag push (x86_64-unknown-linux-gnu, aarch64-unknown-linux-gnu via cross).
- MIT OR Apache-2.0 dual license.