-
Notifications
You must be signed in to change notification settings - Fork 104
Scripting Luau Python
Tier: Advanced
Commands covered: luau, py, foreach, template
Note
Per-command flag reference lives in /docs/help/. This page is the workflow layer — when to reach for each command and how they compose.
When SQL, joins, and apply aren't enough, drop into a scripting language. qsv ships two embedded DSLs plus a shell-out command and a templating engine:
-
luau— qsv's flagship DSL, a Luau 0.722 interpreter with BEGIN/MAIN/END blocks, random-access mode, lookup-table helpers, and ~30 qsv-specific helper functions. Available in feature-capable andqsvdpbuilds. -
py— Python expressions per row, including f-strings. Requires Python 3.10+ and theqsvpy311/312/313binary variant. -
foreach— shell out a command per row. The escape hatch when neither DSL fits. -
template— MiniJinja render CSV rows into any text shape (HTML, Markdown, SQL, prompts, …).
For deep-dives: docs/INTERPRETERS.md plus the legacy Luau Development and Luau Helper Functions Examples wiki pages.
| If you want to… | Use | Notes |
|---|---|---|
| Per-row computation with lookup tables, BEGIN/MAIN/END, random access | luau |
qsv's DSL; fast; works in feature_capable and qsvdp builds |
| Per-row computation in Python with f-strings and stdlib | py |
Slower than Luau (no JIT); requires qsvpy* variant |
| Run any shell command once per row | foreach |
The footgun: validate inputs! |
| Render rows into per-row files or a combined report | template |
MiniJinja; supports register_lookup
|
qsv's flagship DSL — a Luau 0.722 interpreter that runs a script for every (or any) row. It's fast (millions of rows/sec on simple expressions) and ships in feature_capable and qsvdp builds. Two subcommands:
-
map— add one or more computed columns -
filter— keep rows where the script returns a truthy value
The result of a map script becomes the new column's value; a filter script's return decides whether the row survives. The END block's return goes to stderr, not the output CSV.
For the full flag list see
/docs/help/luau.md; the subsections below are the workflow view. The examples use a smalldata.csvwithletter,Amountcolumns (mirroringtests/test_luau.rs).
A row's columns are reachable three ways:
-
directly by name —
Amount(disable with-g/--no-globalsfor speed or when a name collides with a Luau global) -
colby name —col.Amountorcol["Total Balance"](for names with spaces/punctuation) -
colby 1-based index —col[1],col[2](needs--colindex, auto-enabled by--no-headers)
# three equivalent ways to add columns a + b
qsv luau map c "a + b" data.csv
qsv luau map c "col.a + col['b']" data.csv
qsv luau map c --colindex "col[1] + col[2]" data.csvValues arrive as strings; Luau coerces them in arithmetic, but use tonumber() to be explicit (and to parse for comparisons). Emit multiple columns by returning a Luau table:
qsv luau map newcol1,newcol2,newcol3 "{number + 1, number + 2, number + 3}" data.csv-g/--no-globals keeps only col (faster, avoids shadowing Luau globals); -r/--remap writes only the listed new columns, dropping the originals (see the qsv_coalesce example below).
A script has three stages: BEGIN (once, before any row — init aggregators, load lookups), MAIN (per row — the map/filter body), and END (once, after all rows — emit aggregates to stderr). Supply BEGIN/END as separate scripts via -B/--begin and -E/--end, or embed them with the BEGIN { … }! / END { … }! syntax (each block must start at the beginning of a line):
# running total per row (MAIN) + grand total & min/max (END → stderr)
qsv luau map Total \
'BEGIN { tot = 0; gtot = 0; amts = {} }!
amts[_IDX] = Amount; tot = tot + Amount; gtot = gtot + tot; return tot
END { return "Min/Max: " .. math.min(unpack(amts)) .. "/" .. math.max(unpack(amts)) ..
" Grand total of " .. _ROWCOUNT .. " rows: " .. gtot }!' \
data.csv
# stdout: the CSV with a Total column (13, 37, 109, 116)
# stderr: Min/Max: 7/72 Grand total of 4 rows: 275The equivalent with separate flags:
qsv luau map Total \
--begin "tot = 0; gtot = 0; amts = {}" \
"amts[_IDX] = Amount; tot = tot + Amount; gtot = gtot + tot; return tot" \
--end 'return "Grand total: " .. gtot' \
data.csv| Variable | Access | Meaning |
|---|---|---|
_IDX |
read | current row number in sequential mode (0 during BEGIN) |
_ROWCOUNT |
read | total rows — set in END (sequential), or from the start (random-access) |
_INDEX |
read/write | assigning to it jumps to that row → enables random-access mode |
_LASTROW |
read | last row number; referencing it also triggers random-access mode |
In sequential mode the MAIN script runs once per row in order. Touching _INDEX or _LASTROW switches to random-access mode: MAIN re-runs for whatever row _INDEX points at, looping until _INDEX goes out of range (<= 0 or > _LASTROW). This needs a CSV index — create one with qsv index data.csv beforehand, or call qsv_autoindex() in BEGIN.
# walk the file backwards, accumulating a reverse running total
qsv index data.csv
qsv luau map rev_total \
'BEGIN { _INDEX = _LASTROW; tot = 0 }!
tot = tot + Amount;
_INDEX = _INDEX - 1;
return tot' \
data.csvfilter keeps a row when its script is truthy. A nil, false, or numeric 0/0.0 result drops the row; any other value (including negative numbers and non-empty strings) keeps it. So a bare tonumber(...) doubles as a "is non-zero number" filter:
qsv luau filter "tonumber(Amount) > 25" data.csv
qsv luau filter "tonumber(a) >= tonumber(b)" data.csv
# keeps rows whose `number` parses to a non-zero value (drops 0, 0.0, blanks)
qsv luau filter "tonumber(number)" data.csvTyping long scripts inline gets old fast. Any script argument (MAIN, --begin, --end) is loaded from disk when it starts with file: or ends in .lua/.luau:
qsv luau map Type -B init.lua file:debitcredit.script -E end.luau data.csvThe qsv repo ships ready-to-use scripts in docs/cookbook/lua/ (getquarter.lua, turnaroundtime.lua, …):
curl -LO https://raw.githubusercontent.com/dathere/qsv/master/docs/cookbook/lua/getquarter.lua
qsv luau map Quarter getquarter.lua NYC_311_SR_2010-2020-sample-1M.csv > with_quarter.csvBeyond Luau's minimal stdlib, qsv preloads ~30 qsv_* helpers. The cumulative/window helpers each take an optional trailing name so several can run in parallel with independent state.
| Helper | Category | What it does |
|---|---|---|
qsv_cumsum(value[, name]) |
Window | Running cumulative sum |
qsv_cumprod(value[, name]) |
Window | Running cumulative product |
qsv_cummax(value[, name]) |
Window | Running cumulative maximum |
qsv_cummin(value[, name]) |
Window | Running cumulative minimum |
qsv_cumany(value[, name]) |
Window | Cumulative logical OR — true once any value seen is truthy |
qsv_cumall(value[, name]) |
Window | Cumulative logical AND — true while every value seen is truthy |
qsv_lag(value[, lag][, default][, name]) |
Window | Value from lag rows back (default 1); default until enough rows seen |
qsv_diff(value[, periods][, name]) |
Window | Difference between the current value and the one periods rows back |
qsv_accumulate(value, func, init[, name]) |
Window | Fold rows with a custom func(acc, x) reducer |
qsv_register_lookup(name, uri[, ttl]) |
Lookup | Register a lookup table from a file, URL, dathere://, or ckan://
|
qsv_loadcsv(name, file, keycol) |
Lookup | Load a local CSV into a table keyed by keycol
|
qsv_loadjson(name, file) |
Lookup | Load a JSON object/array file into a Luau table |
qsv_coalesce(arg1, …, argN) |
Value | Return the first non-empty argument |
qsv_break(msg) |
Control | Stop processing immediately; msg goes to stderr |
qsv_skip() |
Control | Drop the current row from map output without stopping the run |
qsv_insertrecord(col1, …, colN) |
Control | Insert an extra row into the output |
qsv_autoindex() |
Control | Build a CSV index on the fly (enables random-access mode) |
qsv_writefile(file, data) |
I/O | Append data to a file ("_NEWFILE!" truncates/creates it first) |
qsv_writejson(value, file, pretty) |
I/O | Serialize a Luau value/table to a JSON file |
qsv_cmd(args) |
I/O | Run a qsv command; returns a table with .stdout / .stderr
|
qsv_shellcmd(cmd, args) |
I/O | Run an allow-listed shell command |
qsv_log(level, …) |
I/O | Log at INFO/WARN/ERROR/DEBUG/TRACE (gated by QSV_LOG_LEVEL) |
qsv_fileexists(path) |
I/O | Return whether a file exists |
qsv_sleep(ms) |
I/O | Pause processing for ms milliseconds |
qsv_getenv(var) / qsv_setenv(var, val)
|
Env | Read / set an environment variable |
The cumulative/window helpers track state across rows for you — no manual accumulator needed:
# single running total
qsv luau map running_total "qsv_cumsum(amount)" data.csv
# several at once — each keyed by a name so their state stays separate
qsv luau map revenue_total,expenses_total,profit_total \
'{ qsv_cumsum(revenue, "revenue"),
qsv_cumsum(expenses, "expenses"),
qsv_cumsum(tonumber(revenue) - tonumber(expenses), "profit") }' \
finances.csv
# lag by 1 and 2 rows (default value "0" for the first rows)
qsv luau map lag1,lag2 '{ qsv_lag(value, 1, "0"), qsv_lag(value, 2, "0") }' data.csvLookup tables turn a key column into enriched fields. Register once in BEGIN, then index by key in MAIN:
qsv luau map state_capital \
'BEGIN { qsv_register_lookup("us_states", "dathere://us-states-example.csv") }!
return us_states[State] and us_states[State].Capital or "n/a"' \
data.csvqsv_loadcsv(table, file, keycol) is a lighter local-file variant, and qsv_loadjson(table, file) loads a JSON object/array into a Luau table:
qsv luau map description \
'BEGIN { qsv_loadcsv("lookup", "descriptions.csv", "letter") }!
return lookup[letter].description' \
data.csvControl flow: qsv_break("msg") stops processing early (the message goes to stderr); qsv_skip() drops the current row from map output without halting:
# stop the run as soon as an Amount exceeds 25
qsv luau map Total \
'if tonumber(Amount) > 25 then qsv_break("over budget") end
tot = (tot or 0) + Amount; return tot' \
data.csvqsv_coalesce returns its first non-empty argument — handy with --remap to merge fallback columns into one:
# overwrite name with name_right when present, keeping only id,name
qsv luau map id,name --remap '{ id, qsv_coalesce(name_right, name) }' data.csvLuau's string.find / string.match / string.gsub use Lua patterns, where ( ) . % + - * ? [ ] ^ $ are special. To match them literally, either escape with % or pass true as the 4th arg to string.find for plain-text mode:
# escape the parentheses...
qsv luau map newpoints \
"if string.find(Name, 'Jane %(Smith%)') then return 10 else return Points end" data.csv
# ...or use plain-text mode (4th param = true)
qsv luau map newpoints \
"if string.find(Name, 'Jane (Smith)', 1, true) then return 10 else return Points end" data.csvqsv bundles LuaDate v2.2.1 (load with require "./date"); other modules load via Luau's require:
qsv luau map days_added \
'local date = require "./date"
return tostring(date(date_col):adddays(tonumber(number)):fmt("${iso}"))' \
data.csvUse qsv_log("debug", ...) and set QSV_LOG_LEVEL=debug to trace a script (see docs/Logging.md). At debug level qsv skips bytecode precompilation so errors carry line numbers — but note qsv strips comments before executing, so match line numbers against the comment-stripped script in the log. A bad row yields <ERROR> (map) or is kept (filter); --max-errors (default 10, 0 to disable) bounds how many before aborting.
See also: /docs/help/luau.md, docs/INTERPRETERS.md, tests/test_luau.rs — runnable examples, Luau Development, Luau Helper Functions Examples, qsv-lookup-tables, qsv-recipes, Lookup Tables.
Python expression per row. Requires Python 3.10+ (matching the qsvpy311/312/313 binary you installed). The Python interpreter is dynamically linked — there's a 3-50× overhead vs Luau due to the GIL and call-per-row Python frame setup, but it's worth it when you need pandas-style expressions or a library only available in Python.
Column values are accessible four ways: as a local variable (spaces replaced with _), as col.colname, as col["colname"], or as col[N].
qsv py map balance_summary 'f"{int(amount):,.0f} ({col.tier})"' txns.csvqsv py map normalized_phone \
'import re; re.sub(r"[^\d]", "", col["Phone"])[:10]' \
contacts.csv(Note: py does support import statements — see docs/INTERPRETERS.md for caveats and --helper for shared imports.)
qsv py filter \
'datetime.fromisoformat(col["created_at"]).weekday() < 5' \
events.csv > weekday_events.csv# helper.py
import re, datetime
def clean_phone(s):
return re.sub(r"[^\d]", "", s)[:10]qsv py map clean_phone 'clean_phone(col.Phone)' \
--helper helper.py contacts.csv > clean.csvSee also: /docs/help/py.md, docs/INTERPRETERS.md, luau — typically 3-50× faster, Binary Variants — pick the qsvpy* matching your Python.
Execute a shell command once per row.
Caution
foreach runs an arbitrary shell command for every row, with the row's values interpolated into the command — so untrusted input can lead to command injection. Validate inputs aggressively, and prefer a built-in (apply, luau, py, template, replace) whenever one fits.
{} in the command is replaced by the value of the named column. {var} references another column by name.
qsv foreach pdf_url 'curl -s {} | pdftotext - -' documents.csv > combined.txtqsv foreach query --unify 'qsv search --year 2020 {}' queries.csv > all_results.csv--unify keeps headers from the first invocation and skips them on subsequent invocations.
qsv foreach query -u -c from_query 'qsv search {}' queries.csv > tagged.csv# 1. Split the giant CSV by Borough into one file per borough
qsv partition Borough nyc311_by_borough --filename '{}.csv' NYC_311.csv
# 2. For each borough file, run a heavy command in parallel
ls nyc311_by_borough/*.csv > infile_list.csv
qsv foreach filename 'heavy-stats-script {}' infile_list.csvDon't use foreach for things qsv already does in one shot — apply, luau, py, template, replace. Reach for it only when you genuinely need an external command.
See also: /docs/help/foreach.md, partition, split, template — for in-process per-row text rendering.
Render rows through a MiniJinja template. Either combine all rows into one stream or write one file per row.
Column names become variables (non-alphanumeric characters → _). qsv adds custom filters (format_float, human_count, pluralize, …) on top of MiniJinja's defaults plus minijinja-contrib.
{# report.j2 #}
# {{ Borough }} — {{ year_month }}
- Total complaints: **{{ complaints | human_count }}**
- Most common type: {{ top_complaint }}
{% if complaints > 1000 -%}
⚠️ High volume — review prioritization.
{% endif %}qsv template --template-file report.j2 \
reports_outdir/ \
--outfilename '{{ Borough }}_{{ year_month }}.md' \
borough_summary.csv--outfilename itself is a Jinja template — each row's filename is computed from its column values.
{% set ok = register_lookup("us_states", "dathere://us-states-example.csv") -%}
Dear {{ first_name | title }} {{ last_name | title }},
You qualify for {{ us_states[us_state].program }} in {{ us_states[us_state].name }}.qsv template --template-file rows.j2 - --output combined.html data.csvSELECT * FROM events
WHERE {{ field }} {{ operator }} '{{ value }}';qsv template --template-file rule.j2 - rules.csv | psqlSee also: /docs/help/template.md, MiniJinja Templating — qsv's templating across commands, custom filters & tips, MiniJinja docs, fetchpost --payload-tpl — MiniJinja for HTTP POST bodies, Lookup Tables.
- Command Reference (index)
- Aggregation & Statistics → frequency --stats-filter — Luau expressions on the stats cache
-
docs/INTERPRETERS.md— Luau and Python build details - Luau Development — legacy wiki page (preserved)
- Luau Helper Functions Examples — legacy wiki page (preserved)
- qsv-lookup-tables — companion repo for lookup CSVs
- qsv-recipes — community Luau scripts
-
Lookup Tables —
dynamicEnum/register_lookupdeep-dive
qsv — GitHub · Releases · Discussions · qsv pro · Try it online · Benchmarks · datHere · DeepWiki · Dual-licensed MIT / Unlicense
Edit this page: Contributing to the Wiki
Home · Why qsv? · Tier legend
- All Commands (index)
- Selection & Inspection
- Transform & Reshape
- Aggregation & Statistics
- Joins & Set Ops
- SQL & Polars
- Validation & Schema
- Metadata Profiling (profile)
- Conversion & I/O
- Geospatial
- HTTP & Web
- Get & Disk Cache
- Scripting (Luau / Python)
- Indexing, Compression & Diff
- AI & Documentation