Skip to content

Scripting Luau Python

Joel Natividad edited this page May 30, 2026 · 6 revisions

Scripting (Luau / Python)

Tier: Advanced Commands covered: luau, py, foreach, template

Note

Per-command flag reference lives in /docs/help/. This page is the workflow layer — when to reach for each command and how they compose.

When SQL, joins, and apply aren't enough, drop into a scripting language. qsv ships two embedded DSLs plus a shell-out command and a templating engine:

  • luau — qsv's flagship DSL, a Luau 0.722 interpreter with BEGIN/MAIN/END blocks, random-access mode, lookup-table helpers, and ~30 qsv-specific helper functions. Available in feature-capable and qsvdp builds.
  • py — Python expressions per row, including f-strings. Requires Python 3.10+ and the qsvpy311/312/313 binary variant.
  • foreach — shell out a command per row. The escape hatch when neither DSL fits.
  • template — MiniJinja render CSV rows into any text shape (HTML, Markdown, SQL, prompts, …).

For deep-dives: docs/INTERPRETERS.md plus the legacy Luau Development and Luau Helper Functions Examples wiki pages.

Quick decision table

If you want to… Use Notes
Per-row computation with lookup tables, BEGIN/MAIN/END, random access luau qsv's DSL; fast; works in feature_capable and qsvdp builds
Per-row computation in Python with f-strings and stdlib py Slower than Luau (no JIT); requires qsvpy* variant
Run any shell command once per row foreach The footgun: validate inputs!
Render rows into per-row files or a combined report template MiniJinja; supports register_lookup

luau

qsv's flagship DSL — a Luau 0.722 interpreter that runs a script for every (or any) row. It's fast (millions of rows/sec on simple expressions) and ships in feature_capable and qsvdp builds. Two subcommands:

  • map — add one or more computed columns
  • filter — keep rows where the script returns a truthy value

The result of a map script becomes the new column's value; a filter script's return decides whether the row survives. The END block's return goes to stderr, not the output CSV.

For the full flag list see /docs/help/luau.md; the subsections below are the workflow view. The examples use a small data.csv with letter,Amount columns (mirroring tests/test_luau.rs).

Referencing columns & emitting columns

A row's columns are reachable three ways:

  1. directly by nameAmount (disable with -g/--no-globals for speed or when a name collides with a Luau global)
  2. col by namecol.Amount or col["Total Balance"] (for names with spaces/punctuation)
  3. col by 1-based indexcol[1], col[2] (needs --colindex, auto-enabled by --no-headers)
# three equivalent ways to add columns a + b
qsv luau map c "a + b" data.csv
qsv luau map c "col.a + col['b']" data.csv
qsv luau map c --colindex "col[1] + col[2]" data.csv

Values arrive as strings; Luau coerces them in arithmetic, but use tonumber() to be explicit (and to parse for comparisons). Emit multiple columns by returning a Luau table:

qsv luau map newcol1,newcol2,newcol3 "{number + 1, number + 2, number + 3}" data.csv

-g/--no-globals keeps only col (faster, avoids shadowing Luau globals); -r/--remap writes only the listed new columns, dropping the originals (see the qsv_coalesce example below).

BEGIN / MAIN / END blocks

A script has three stages: BEGIN (once, before any row — init aggregators, load lookups), MAIN (per row — the map/filter body), and END (once, after all rows — emit aggregates to stderr). Supply BEGIN/END as separate scripts via -B/--begin and -E/--end, or embed them with the BEGIN { … }! / END { … }! syntax (each block must start at the beginning of a line):

# running total per row (MAIN) + grand total & min/max (END → stderr)
qsv luau map Total \
  'BEGIN { tot = 0; gtot = 0; amts = {} }!
   amts[_IDX] = Amount; tot = tot + Amount; gtot = gtot + tot; return tot
   END { return "Min/Max: " .. math.min(unpack(amts)) .. "/" .. math.max(unpack(amts)) ..
                " Grand total of " .. _ROWCOUNT .. " rows: " .. gtot }!' \
  data.csv
# stdout: the CSV with a Total column (13, 37, 109, 116)
# stderr: Min/Max: 7/72 Grand total of 4 rows: 275

The equivalent with separate flags:

qsv luau map Total \
  --begin "tot = 0; gtot = 0; amts = {}" \
  "amts[_IDX] = Amount; tot = tot + Amount; gtot = gtot + tot; return tot" \
  --end 'return "Grand total: " .. gtot' \
  data.csv

Special variables & random-access mode

Variable Access Meaning
_IDX read current row number in sequential mode (0 during BEGIN)
_ROWCOUNT read total rows — set in END (sequential), or from the start (random-access)
_INDEX read/write assigning to it jumps to that row → enables random-access mode
_LASTROW read last row number; referencing it also triggers random-access mode

In sequential mode the MAIN script runs once per row in order. Touching _INDEX or _LASTROW switches to random-access mode: MAIN re-runs for whatever row _INDEX points at, looping until _INDEX goes out of range (<= 0 or > _LASTROW). This needs a CSV index — create one with qsv index data.csv beforehand, or call qsv_autoindex() in BEGIN.

# walk the file backwards, accumulating a reverse running total
qsv index data.csv
qsv luau map rev_total \
  'BEGIN { _INDEX = _LASTROW; tot = 0 }!
   tot = tot + Amount;
   _INDEX = _INDEX - 1;
   return tot' \
  data.csv

filter

filter keeps a row when its script is truthy. A nil, false, or numeric 0/0.0 result drops the row; any other value (including negative numbers and non-empty strings) keeps it. So a bare tonumber(...) doubles as a "is non-zero number" filter:

qsv luau filter "tonumber(Amount) > 25" data.csv
qsv luau filter "tonumber(a) >= tonumber(b)" data.csv
# keeps rows whose `number` parses to a non-zero value (drops 0, 0.0, blanks)
qsv luau filter "tonumber(number)" data.csv

Loading scripts from files

Typing long scripts inline gets old fast. Any script argument (MAIN, --begin, --end) is loaded from disk when it starts with file: or ends in .lua/.luau:

qsv luau map Type -B init.lua file:debitcredit.script -E end.luau data.csv

The qsv repo ships ready-to-use scripts in docs/cookbook/lua/ (getquarter.lua, turnaroundtime.lua, …):

curl -LO https://raw.githubusercontent.com/dathere/qsv/master/docs/cookbook/lua/getquarter.lua
qsv luau map Quarter getquarter.lua NYC_311_SR_2010-2020-sample-1M.csv > with_quarter.csv

Helper functions

Beyond Luau's minimal stdlib, qsv preloads ~30 qsv_* helpers. The cumulative/window helpers each take an optional trailing name so several can run in parallel with independent state.

Helper Category What it does
qsv_cumsum(value[, name]) Window Running cumulative sum
qsv_cumprod(value[, name]) Window Running cumulative product
qsv_cummax(value[, name]) Window Running cumulative maximum
qsv_cummin(value[, name]) Window Running cumulative minimum
qsv_cumany(value[, name]) Window Cumulative logical OR — true once any value seen is truthy
qsv_cumall(value[, name]) Window Cumulative logical AND — true while every value seen is truthy
qsv_lag(value[, lag][, default][, name]) Window Value from lag rows back (default 1); default until enough rows seen
qsv_diff(value[, periods][, name]) Window Difference between the current value and the one periods rows back
qsv_accumulate(value, func, init[, name]) Window Fold rows with a custom func(acc, x) reducer
qsv_register_lookup(name, uri[, ttl]) Lookup Register a lookup table from a file, URL, dathere://, or ckan://
qsv_loadcsv(name, file, keycol) Lookup Load a local CSV into a table keyed by keycol
qsv_loadjson(name, file) Lookup Load a JSON object/array file into a Luau table
qsv_coalesce(arg1, …, argN) Value Return the first non-empty argument
qsv_break(msg) Control Stop processing immediately; msg goes to stderr
qsv_skip() Control Drop the current row from map output without stopping the run
qsv_insertrecord(col1, …, colN) Control Insert an extra row into the output
qsv_autoindex() Control Build a CSV index on the fly (enables random-access mode)
qsv_writefile(file, data) I/O Append data to a file ("_NEWFILE!" truncates/creates it first)
qsv_writejson(value, file, pretty) I/O Serialize a Luau value/table to a JSON file
qsv_cmd(args) I/O Run a qsv command; returns a table with .stdout / .stderr
qsv_shellcmd(cmd, args) I/O Run an allow-listed shell command
qsv_log(level, …) I/O Log at INFO/WARN/ERROR/DEBUG/TRACE (gated by QSV_LOG_LEVEL)
qsv_fileexists(path) I/O Return whether a file exists
qsv_sleep(ms) I/O Pause processing for ms milliseconds
qsv_getenv(var) / qsv_setenv(var, val) Env Read / set an environment variable

The cumulative/window helpers track state across rows for you — no manual accumulator needed:

# single running total
qsv luau map running_total "qsv_cumsum(amount)" data.csv

# several at once — each keyed by a name so their state stays separate
qsv luau map revenue_total,expenses_total,profit_total \
  '{ qsv_cumsum(revenue, "revenue"),
     qsv_cumsum(expenses, "expenses"),
     qsv_cumsum(tonumber(revenue) - tonumber(expenses), "profit") }' \
  finances.csv

# lag by 1 and 2 rows (default value "0" for the first rows)
qsv luau map lag1,lag2 '{ qsv_lag(value, 1, "0"), qsv_lag(value, 2, "0") }' data.csv

Lookup tables turn a key column into enriched fields. Register once in BEGIN, then index by key in MAIN:

qsv luau map state_capital \
  'BEGIN { qsv_register_lookup("us_states", "dathere://us-states-example.csv") }!
   return us_states[State] and us_states[State].Capital or "n/a"' \
  data.csv

qsv_loadcsv(table, file, keycol) is a lighter local-file variant, and qsv_loadjson(table, file) loads a JSON object/array into a Luau table:

qsv luau map description \
  'BEGIN { qsv_loadcsv("lookup", "descriptions.csv", "letter") }!
   return lookup[letter].description' \
  data.csv

Control flow: qsv_break("msg") stops processing early (the message goes to stderr); qsv_skip() drops the current row from map output without halting:

# stop the run as soon as an Amount exceeds 25
qsv luau map Total \
  'if tonumber(Amount) > 25 then qsv_break("over budget") end
   tot = (tot or 0) + Amount; return tot' \
  data.csv

qsv_coalesce returns its first non-empty argument — handy with --remap to merge fallback columns into one:

# overwrite name with name_right when present, keeping only id,name
qsv luau map id,name --remap '{ id, qsv_coalesce(name_right, name) }' data.csv

Pattern-matching gotcha

Luau's string.find / string.match / string.gsub use Lua patterns, where ( ) . % + - * ? [ ] ^ $ are special. To match them literally, either escape with % or pass true as the 4th arg to string.find for plain-text mode:

# escape the parentheses...
qsv luau map newpoints \
  "if string.find(Name, 'Jane %(Smith%)') then return 10 else return Points end" data.csv

# ...or use plain-text mode (4th param = true)
qsv luau map newpoints \
  "if string.find(Name, 'Jane (Smith)', 1, true) then return 10 else return Points end" data.csv

Dates & external libraries

qsv bundles LuaDate v2.2.1 (load with require "./date"); other modules load via Luau's require:

qsv luau map days_added \
  'local date = require "./date"
   return tostring(date(date_col):adddays(tonumber(number)):fmt("${iso}"))' \
  data.csv

Debugging tips

Use qsv_log("debug", ...) and set QSV_LOG_LEVEL=debug to trace a script (see docs/Logging.md). At debug level qsv skips bytecode precompilation so errors carry line numbers — but note qsv strips comments before executing, so match line numbers against the comment-stripped script in the log. A bad row yields <ERROR> (map) or is kept (filter); --max-errors (default 10, 0 to disable) bounds how many before aborting.

See also: /docs/help/luau.md, docs/INTERPRETERS.md, tests/test_luau.rs — runnable examples, Luau Development, Luau Helper Functions Examples, qsv-lookup-tables, qsv-recipes, Lookup Tables.

py

Python expression per row. Requires Python 3.10+ (matching the qsvpy311/312/313 binary you installed). The Python interpreter is dynamically linked — there's a 3-50× overhead vs Luau due to the GIL and call-per-row Python frame setup, but it's worth it when you need pandas-style expressions or a library only available in Python.

Column values are accessible four ways: as a local variable (spaces replaced with _), as col.colname, as col["colname"], or as col[N].

Example: integer arithmetic with f-strings

qsv py map balance_summary 'f"{int(amount):,.0f} ({col.tier})"' txns.csv

Example: regex normalization in Python (clearer than the equivalent replace chain)

qsv py map normalized_phone \
  'import re; re.sub(r"[^\d]", "", col["Phone"])[:10]' \
  contacts.csv

(Note: py does support import statements — see docs/INTERPRETERS.md for caveats and --helper for shared imports.)

Example: filter rows where Polars'-style logic doesn't fit

qsv py filter \
  'datetime.fromisoformat(col["created_at"]).weekday() < 5' \
  events.csv > weekday_events.csv

Example: shared helper file with imports

# helper.py
import re, datetime

def clean_phone(s):
    return re.sub(r"[^\d]", "", s)[:10]
qsv py map clean_phone 'clean_phone(col.Phone)' \
  --helper helper.py contacts.csv > clean.csv

See also: /docs/help/py.md, docs/INTERPRETERS.md, luau — typically 3-50× faster, Binary Variants — pick the qsvpy* matching your Python.

foreach

Execute a shell command once per row.

Caution

foreach runs an arbitrary shell command for every row, with the row's values interpolated into the command — so untrusted input can lead to command injection. Validate inputs aggressively, and prefer a built-in (apply, luau, py, template, replace) whenever one fits.

{} in the command is replaced by the value of the named column. {var} references another column by name.

Example: per-row shell pipeline (download then OCR)

qsv foreach pdf_url 'curl -s {} | pdftotext - -' documents.csv > combined.txt

Example: unify subcommand outputs into one CSV

qsv foreach query --unify 'qsv search --year 2020 {}' queries.csv > all_results.csv

--unify keeps headers from the first invocation and skips them on subsequent invocations.

Example: add a "source query" column to the unified output

qsv foreach query -u -c from_query 'qsv search {}' queries.csv > tagged.csv

Example: split + parallel processing pattern

# 1. Split the giant CSV by Borough into one file per borough
qsv partition Borough nyc311_by_borough --filename '{}.csv' NYC_311.csv

# 2. For each borough file, run a heavy command in parallel
ls nyc311_by_borough/*.csv > infile_list.csv
qsv foreach filename 'heavy-stats-script {}' infile_list.csv

Don't use foreach for things qsv already does in one shot — apply, luau, py, template, replace. Reach for it only when you genuinely need an external command.

See also: /docs/help/foreach.md, partition, split, template — for in-process per-row text rendering.

template

Render rows through a MiniJinja template. Either combine all rows into one stream or write one file per row.

Column names become variables (non-alphanumeric characters → _). qsv adds custom filters (format_float, human_count, pluralize, …) on top of MiniJinja's defaults plus minijinja-contrib.

Example: render a per-borough markdown summary

{# report.j2 #}
# {{ Borough }} — {{ year_month }}

- Total complaints: **{{ complaints | human_count }}**
- Most common type: {{ top_complaint }}

{% if complaints > 1000 -%}
  ⚠️ High volume — review prioritization.
{% endif %}
qsv template --template-file report.j2 \
  reports_outdir/ \
  --outfilename '{{ Borough }}_{{ year_month }}.md' \
  borough_summary.csv

--outfilename itself is a Jinja template — each row's filename is computed from its column values.

Example: render lookup-table-enriched form letters

{% set ok = register_lookup("us_states", "dathere://us-states-example.csv") -%}
Dear {{ first_name | title }} {{ last_name | title }},
You qualify for {{ us_states[us_state].program }} in {{ us_states[us_state].name }}.

Example: render to stdout (combined output)

qsv template --template-file rows.j2 - --output combined.html data.csv

Example: dynamically generate SQL from CSV-driven rules

SELECT * FROM events
WHERE {{ field }} {{ operator }} '{{ value }}';
qsv template --template-file rule.j2 - rules.csv | psql

See also: /docs/help/template.md, MiniJinja Templating — qsv's templating across commands, custom filters & tips, MiniJinja docs, fetchpost --payload-tpl — MiniJinja for HTTP POST bodies, Lookup Tables.

See also

Clone this wiki locally