Skip to content

History / Aggregation and Statistics

Revisions

  • Aggregation-and-Statistics: add Whitespace markers section Restores the whitespace-marker reference (previously on the removed Supplemental page) that stats/frequency --vis-whitespace help links to. Mapping mirrors WHITESPACE_MARKERS in src/util.rs.

    @jqnatividad jqnatividad committed Jun 3, 2026
  • stats: document zero_padded_numeric; bump stat count 47 -> 48 Reflects PR #3934 (qsv #3938): the new --zero-padded-numeric flag/column brings the stats summary-statistic count to 48. Document the flag in the main stats page (Aggregation-and-Statistics) and bump the 47 -> 48 count across the pages that cite it. The xsv-vs-qsv-1.0.0 legacy snapshot in Comparison.md and the describegpt 47-token content-type vocabulary are intentionally left as-is. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

    @jqnatividad jqnatividad committed Jun 3, 2026
  • wiki: adopt GitHub Alerts for callouts across the wiki Convert advisory blockquotes and inline callouts to semantic GitHub Alerts (NOTE/TIP/IMPORTANT/CAUTION): - [!NOTE] for the standard category-page "workflow layer" ledes and other top-of-page orientation / "canonical reference" notes - [!IMPORTANT] for behavior-affecting gotchas (auto approx-stats on OOM, group-by unsupported aggs, MiniJinja filter-errors-as-values, synthesize cross-column correlation, profile always-warn RFC4180) - [!TIP] for the Binary-Variants TL;DR and Why-qsv "why it matters" - [!CAUTION] for foreach shell-injection risk and joinp --cross blowup Also update the Contributing-to-the-Wiki category template to use the [!NOTE] lede so new pages follow the convention. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

    @jqnatividad jqnatividad committed May 30, 2026
  • docs(wiki): update for qsv 20.1.0 — "Synthetic Data" release - AI & Documentation: add synthesize section; expand describegpt with Content Types (47-token vocab), --two-pass cross-field refinement, deterministic unique_id tag, --markdown-template, lower-LLM-cost notes - Aggregation & Statistics: document auto-fallback to approximate modes on OOM (stats: --quantile/cardinality-method approx; frequency: --sketch-method frequent_items) with little-endian gating - Recipe-Larger-than-RAM: reframe explicit approx flags as auto-on-OOM (override-to-lock semantics) with stats cache mode-key note - SQL & Polars: pivotp --agg quantile@<p> / q@<p> with p95 example - Command Reference: add synthesize row in AI & Documentation table - Cookbook: new "Generate" section; new Recipe-Synthesize-Fake-Data.md with end-to-end describegpt --two-pass --infer-content-type → synthesize walkthrough, locale switching, --consistent-fakes variant, caveats - Sidebar: link the new Synthesize Fake Data recipe - Home: bump "70+ commands" → "73 commands across four binary variants"; add 20.1.0 highlight strip with deep-links - FAQ: refresh MSRV anchor 20.0.0 → 20.1.0 (Rust 1.95 unchanged) - Troubleshooting: link both 20.0.0 and 20.1.0 changelogs in excel section Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

    @jqnatividad jqnatividad committed May 18, 2026
  • wiki: flesh out Aggregation-and-Statistics + Joins-and-Set-Ops Aggregation-and-Statistics covers stats, moarstats, frequency, pragmastat, dedup, extdedup, extsort. Examples emphasize: stats cache for downstream speed (NYC 311), Apache DataSketches approx mode for huge cardinalities, Pragmastat robust statistics for skewed data (Allegheny property sales), extsort/extdedup for files > RAM. Joins-and-Set-Ops covers join, joinp, exclude, partition, split. Examples: wcp + country_continent lookup, NYC 311 + NOAA weather asof join, salary band non-equi join, partition NYC 311 by Borough, chunk 27M-row exports for parallel processing. Both pages: quick decision table, per-command sections with real-world anchored examples, deep-links to /docs/help/, "See also" cross-links. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

    @jqnatividad jqnatividad committed May 13, 2026
  • wiki: add stubs for Phase B/C/D/E pages so sidebar links resolve Adds 39 placeholder pages so every sidebar entry resolves to real content rather than a 404. Each stub declares its tier, the phase it will be filled in, and a one-paragraph preview of what's coming. They link back to Home / Getting-Started / Command-Reference / Cookbook for navigation. Pages added: - Phase B (Command Reference, 13): Command-Reference, Selection-and- Inspection, Transform-and-Reshape, Aggregation-and-Statistics, Joins- and-Set-Ops, SQL-and-Polars, Validation-and-Schema, Conversion-and-IO, Geospatial, HTTP-and-Web, Scripting-Luau-Python, Indexing-Compression- Diff, AI-and-Documentation - Phase C (Cookbook recipes, 12): Recipe-Inspect-Unknown-CSV, Recipe- Clean-and-Normalize, Recipe-Geographic-Enrichment, Recipe-Date- Enrichment, Recipe-CKAN-Integration, Recipe-JSON-Schema-Validate, Recipe-Build-a-Data-Pipeline, Recipe-Stats-to-Insights, Recipe-Fetch- and-Cache, Recipe-Larger-than-RAM, Recipe-Diff-and-Audit, Recipe-Multi- Table-Joins - Phase D (Tuning + ecosystem, 8): Performance-Tuning, Environment- Variables, Stats-Cache-and-Caching, Lookup-Tables, Claude-Cowork-Plugin, MCP-Server, qsv-pro-Spotlight, Integrations - Phase E (Polish, 6): Troubleshooting, FAQ, Comparison, Glossary, External-Resources, Contributing-to-the-Wiki Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

    @jqnatividad jqnatividad committed May 13, 2026