Skip to content

History / Recipe Clean and Normalize

Revisions

  • wiki: Phase C batch 1 - Cookbook index + 3 beginner recipes Cookbook index now categorizes 12 recipes by goal (inspect / clean / enrich / validate / aggregate / combine / big files / integration) with anchor dataset and command columns. Legacy snippets (CKAN, Date Enrichment, multi-table join, cat varying columns, geocode) preserved at the bottom of the index with pointers to the expanded Recipe pages. Recipe-Inspect-Unknown-CSV: sniff -> headers -> count -> stats -> frequency -> sample -> table walkthrough on wcp.csv, ~0.7s for full stats on 2.7M rows. Variations: remote sniff, describegpt natural- language summary, colorized output. Recipe-Clean-and-Normalize: 6-step pipeline on Boston 311 covering input --auto-skip, safenames, regex replace of sentinel nulls, group-by fill, sort+dedup with audit trail. Variations: applydp for CKAN, pseudonymization, censoring, schema validation. Recipe-Date-Enrichment: expands the legacy date-enrichment snippets on NYC 311. Adds Year/YearMonth/Weekday/Quarter/TAT columns via datefmt + getquarter.lua + turnaroundtime.lua; partitions output by quarter. Variations: Brooklyn-only TAT, regex date columns, SQL- style aggregation, business-hours filter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

    @jqnatividad jqnatividad committed May 13, 2026
  • wiki: add stubs for Phase B/C/D/E pages so sidebar links resolve Adds 39 placeholder pages so every sidebar entry resolves to real content rather than a 404. Each stub declares its tier, the phase it will be filled in, and a one-paragraph preview of what's coming. They link back to Home / Getting-Started / Command-Reference / Cookbook for navigation. Pages added: - Phase B (Command Reference, 13): Command-Reference, Selection-and- Inspection, Transform-and-Reshape, Aggregation-and-Statistics, Joins- and-Set-Ops, SQL-and-Polars, Validation-and-Schema, Conversion-and-IO, Geospatial, HTTP-and-Web, Scripting-Luau-Python, Indexing-Compression- Diff, AI-and-Documentation - Phase C (Cookbook recipes, 12): Recipe-Inspect-Unknown-CSV, Recipe- Clean-and-Normalize, Recipe-Geographic-Enrichment, Recipe-Date- Enrichment, Recipe-CKAN-Integration, Recipe-JSON-Schema-Validate, Recipe-Build-a-Data-Pipeline, Recipe-Stats-to-Insights, Recipe-Fetch- and-Cache, Recipe-Larger-than-RAM, Recipe-Diff-and-Audit, Recipe-Multi- Table-Joins - Phase D (Tuning + ecosystem, 8): Performance-Tuning, Environment- Variables, Stats-Cache-and-Caching, Lookup-Tables, Claude-Cowork-Plugin, MCP-Server, qsv-pro-Spotlight, Integrations - Phase E (Polish, 6): Troubleshooting, FAQ, Comparison, Glossary, External-Resources, Contributing-to-the-Wiki Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

    @jqnatividad jqnatividad committed May 13, 2026