-
Notifications
You must be signed in to change notification settings - Fork 104
Recipe Date Enrichment
Tier: Intermediate
Commands used: datefmt, luau, partition, apply
Anchor dataset: NYC 311 Service Requests (1M-row sample, nyc311samp.csv)
This recipe expands the original Cookbook entry. The original snippets are preserved at the bottom of Cookbook for backwards compatibility.
NYC 311 service-request records have a Created Date like 12/31/2014 11:59:45 PM. For analytics you typically want:
- ISO 8601 normalized timestamp
- a
Quartercolumn (Q1,Q2, …) for fiscal-year reporting - a
Yearcolumn for partitioning - a
Weekdaycolumn for "is this complaint a weekday or weekend phenomenon?" -
Turnaround Time(TAT) — how long betweenCreated DateandClosed Date? - partitioned output: one file per quarter for parallel downstream processing
All of this in one qsv pipeline.
curl -LO https://raw.githubusercontent.com/wiki/dathere/qsv/files/nyc311samp.csv
ls -lh nyc311samp.csv
qsv headers nyc311samp.csvKey columns: Created Date, Closed Date, Borough, Complaint Type, Status, Resolution Action Updated Date.
For larger experiments, also try the 1M-row bundled sample: resources/test/NYC_311_SR_2010-2020-sample-1M.csv.
qsv datefmt 'Created Date' nyc311samp.csv > step1.csvdatefmt recognizes 19 input formats automatically. The default output is RFC 3339 (2014-12-31T23:59:45Z).
qsv datefmt 'Created Date' --formatstr '%Y' --new-column Year step1.csv > step2.csv
qsv datefmt 'Created Date' --formatstr '%Y-%m' --new-column YearMonth step2.csv > step3.csv--formatstr takes chrono strftime specifiers.
qsv datefmt 'Created Date' --formatstr '%A' --new-column Weekday step3.csv > step4.csv%A is the full weekday name. For numeric quarter, use a small Luau script (since strftime doesn't have a quarter specifier):
-- getquarter.lua (also at docs/cookbook/lua/getquarter.lua)
local month = tonumber(string.sub(_['Created Date'], 6, 7)) -- ISO YYYY-MM
local q = math.ceil(month / 3)
return "Q" .. qqsv luau map Quarter -x -f getquarter.lua step4.csv > step5.csvThe -x flag enables global column variables (so _['Created Date'] works).
# turnaroundtime.lua: returns TAT in days, or "" if Closed Date is blank
local fmt = "!yyyy-MM-dd'T'HH:mm:ssX" -- qsv's normalized ISO
local opened = _['Created Date']
local closed = _['Closed Date']
if closed == "" then return "" end
return tostring(
(qsv_parse_date(closed, fmt) - qsv_parse_date(opened, fmt)) / 86400
)qsv luau map TAT -x -f turnaroundtime.lua step5.csv > step6.csvBoth scripts are bundled at docs/cookbook/lua/ — use them as starting points.
qsv partition Quarter nyc311byqtr/ step6.csv
ls nyc311byqtr/
# Q1.csv Q2.csv Q3.csv Q4.csvNow you have four files for parallel downstream processing.
qsv datefmt 'Created Date' nyc311samp.csv \
| qsv datefmt 'Created Date' --formatstr '%Y' --new-column Year \
| qsv datefmt 'Created Date' --formatstr '%Y-%m' --new-column YearMonth \
| qsv datefmt 'Created Date' --formatstr '%A' --new-column Weekday \
| qsv luau map Quarter -x -f getquarter.lua \
| qsv luau map TAT -x -f turnaroundtime.lua \
> nyc311_enriched.csv
# Then partition
qsv partition Quarter nyc311byqtr/ nyc311_enriched.csvqsv search --select Borough -i brooklyn nyc311samp.csv \
| qsv luau map TAT -x -f turnaroundtime.lua \
| tee brooklyn-311-details.csv \
| qsv stats --everything > brooklyn-tat-stats.csvtee writes the intermediate to a file and pipes the same data to stats for aggregation.
qsv datefmt '/(?i)_date$/' nyc311samp.csv > all_dates_iso.csvThe regex column selector matches any column whose name ends in _date (case-insensitive).
qsv sqlp nyc311samp.csv \
"SELECT
strftime('%Y-Q', \"Created Date\") || ((strftime('%m', \"Created Date\") - 1) / 3 + 1) AS quarter,
Borough,
COUNT(*) AS complaints,
AVG(julianday(\"Closed Date\") - julianday(\"Created Date\")) AS avg_tat_days
FROM nyc311samp
WHERE \"Closed Date\" != ''
GROUP BY quarter, Borough
ORDER BY quarter, complaints DESC"(Adjust strftime syntax for Polars SQL or DuckDB depending on the engine; see SQL & Polars.)
qsv luau filter \
"local h = tonumber(string.sub(_['Created Date'], 12, 13)); return h >= 9 and h <= 17" \
nyc311_enriched.csv > business_hours.csv-
datefmtis multithreaded (🚀 in the README legend) — no index required. -
luauis single-process but the interpreter is fast (millions of rows/sec on simple expressions). - On the full 27M-row NYC 311 export, the whole pipeline above takes ~2 minutes on an M2 Pro with an index.
- For a no-index NYC 311 1M-sample run: ~6 seconds.
- Transform & Reshape → datefmt
- Scripting (Luau / Python) — Luau details
docs/cookbook/lua/getquarter.luadocs/cookbook/lua/turnaroundtime.lua- qsv-recipes — more community Luau scripts
- Joins & Set Ops → partition
- Recipe: Stats → Insights — after enrichment, derive aggregates
- Recipe: Multi-Table Joins — join the enriched output to weather (asof join)
- Cookbook (legacy) — original short snippets
qsv — GitHub · Releases · Discussions · qsv pro · Try it online · Benchmarks · datHere · DeepWiki · Dual-licensed MIT / Unlicense
Edit this page: Contributing to the Wiki
Home · Why qsv? · Tier legend
- All Commands (index)
- Selection & Inspection
- Transform & Reshape
- Aggregation & Statistics
- Joins & Set Ops
- SQL & Polars
- Validation & Schema
- Metadata Profiling (profile)
- Conversion & I/O
- Geospatial
- HTTP & Web
- Get & Disk Cache
- Scripting (Luau / Python)
- Indexing, Compression & Diff
- AI & Documentation