Recipe Date Enrichment

Recipe: Date Enrichment

Tier: Intermediate Commands used: datefmt, luau, partition, apply Anchor dataset: NYC 311 Service Requests (1M-row sample, nyc311samp.csv)

This recipe expands the original Cookbook entry. The original snippets are preserved at the bottom of Cookbook for backwards compatibility.

Problem

NYC 311 service-request records have a Created Date like 12/31/2014 11:59:45 PM. For analytics you typically want:

ISO 8601 normalized timestamp
a Quarter column (Q1, Q2, …) for fiscal-year reporting
a Year column for partitioning
a Weekday column for "is this complaint a weekday or weekend phenomenon?"
Turnaround Time (TAT) — how long between Created Date and Closed Date?
partitioned output: one file per quarter for parallel downstream processing

All of this in one qsv pipeline.

Data

curl -LO https://raw.githubusercontent.com/wiki/dathere/qsv/files/nyc311samp.csv
ls -lh nyc311samp.csv
qsv headers nyc311samp.csv

Key columns: Created Date, Closed Date, Borough, Complaint Type, Status, Resolution Action Updated Date.

For larger experiments, also try the 1M-row bundled sample: resources/test/NYC_311_SR_2010-2020-sample-1M.csv.

Solution

1. Normalize `Created Date` to ISO 8601

qsv datefmt 'Created Date' nyc311samp.csv > step1.csv

datefmt recognizes 19 input formats automatically. The default output is RFC 3339 (2014-12-31T23:59:45Z).

2. Add `Year` and `Year-Month` columns

qsv datefmt 'Created Date' --formatstr '%Y' --new-column Year step1.csv > step2.csv
qsv datefmt 'Created Date' --formatstr '%Y-%m' --new-column YearMonth step2.csv > step3.csv

--formatstr takes chrono strftime specifiers.

3. Add `Weekday` and `Quarter` columns

qsv datefmt 'Created Date' --formatstr '%A' --new-column Weekday step3.csv > step4.csv

%A is the full weekday name. For numeric quarter, use a small Luau script (since strftime doesn't have a quarter specifier):

-- getquarter.lua (also at docs/cookbook/lua/getquarter.lua)
local month = tonumber(string.sub(_['Created Date'], 6, 7))  -- ISO YYYY-MM
local q = math.ceil(month / 3)
return "Q" .. q

qsv luau map Quarter -x -f getquarter.lua step4.csv > step5.csv

The -x flag enables global column variables (so _['Created Date'] works).

4. Compute `Turnaround Time` (TAT)

# turnaroundtime.lua: returns TAT in days, or "" if Closed Date is blank
local fmt = "!yyyy-MM-dd'T'HH:mm:ssX"   -- qsv's normalized ISO
local opened = _['Created Date']
local closed = _['Closed Date']
if closed == "" then return "" end
return tostring(
  (qsv_parse_date(closed, fmt) - qsv_parse_date(opened, fmt)) / 86400
)

qsv luau map TAT -x -f turnaroundtime.lua step5.csv > step6.csv

Both scripts are bundled at docs/cookbook/lua/ — use them as starting points.

5. Partition into one file per quarter

qsv partition Quarter nyc311byqtr/ step6.csv
ls nyc311byqtr/
# Q1.csv  Q2.csv  Q3.csv  Q4.csv

Now you have four files for parallel downstream processing.

Full pipeline (one shot)

qsv datefmt 'Created Date' nyc311samp.csv \
  | qsv datefmt 'Created Date' --formatstr '%Y' --new-column Year \
  | qsv datefmt 'Created Date' --formatstr '%Y-%m' --new-column YearMonth \
  | qsv datefmt 'Created Date' --formatstr '%A' --new-column Weekday \
  | qsv luau map Quarter -x -f getquarter.lua \
  | qsv luau map TAT -x -f turnaroundtime.lua \
  > nyc311_enriched.csv

# Then partition
qsv partition Quarter nyc311byqtr/ nyc311_enriched.csv

Variations

Average TAT per Borough, in Brooklyn only

qsv search --select Borough -i brooklyn nyc311samp.csv \
  | qsv luau map TAT -x -f turnaroundtime.lua \
  | tee brooklyn-311-details.csv \
  | qsv stats --everything > brooklyn-tat-stats.csv

tee writes the intermediate to a file and pipes the same data to stats for aggregation.

Cast all `*_date` columns to ISO 8601 in one pass

qsv datefmt '/(?i)_date$/' nyc311samp.csv > all_dates_iso.csv

The regex column selector matches any column whose name ends in _date (case-insensitive).

Use a SQL-style aggregation instead

qsv sqlp nyc311samp.csv \
  "SELECT
     strftime('%Y-Q', \"Created Date\") || ((strftime('%m', \"Created Date\") - 1) / 3 + 1) AS quarter,
     Borough,
     COUNT(*) AS complaints,
     AVG(julianday(\"Closed Date\") - julianday(\"Created Date\")) AS avg_tat_days
   FROM nyc311samp
   WHERE \"Closed Date\" != ''
   GROUP BY quarter, Borough
   ORDER BY quarter, complaints DESC"

(Adjust strftime syntax for Polars SQL or DuckDB depending on the engine; see SQL & Polars.)

Filter business-hours-only events

qsv luau filter \
  "local h = tonumber(string.sub(_['Created Date'], 12, 13)); return h >= 9 and h <= 17" \
  nyc311_enriched.csv > business_hours.csv

Performance notes

datefmt is multithreaded (🚀 in the README legend) — no index required.
luau is single-process but the interpreter is fast (millions of rows/sec on simple expressions).
On the full 27M-row NYC 311 export, the whole pipeline above takes ~2 minutes on an M2 Pro with an index.
For a no-index NYC 311 1M-sample run: ~6 seconds.

Recipe Date Enrichment

Recipe: Date Enrichment

Problem

Data

Solution

1. Normalize Created Date to ISO 8601

2. Add Year and Year-Month columns

3. Add Weekday and Quarter columns

4. Compute Turnaround Time (TAT)

5. Partition into one file per quarter

Full pipeline (one shot)

Variations

Average TAT per Borough, in Brooklyn only

Cast all *_date columns to ISO 8601 in one pass

Use a SQL-style aggregation instead

Filter business-hours-only events

Performance notes

See also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Get Started

Command Reference

Cookbook

Tuning & Internals

Ecosystem

Reference

Legacy

Clone this wiki locally

1. Normalize `Created Date` to ISO 8601

2. Add `Year` and `Year-Month` columns

3. Add `Weekday` and `Quarter` columns

4. Compute `Turnaround Time` (TAT)

Cast all `*_date` columns to ISO 8601 in one pass