-
Notifications
You must be signed in to change notification settings - Fork 104
Lookup Tables
Tier: Advanced
Lookup tables let qsv resolve values against reference CSVs at runtime — local files, HTTP URLs, or CKAN-hosted resources. Three commands consume them:
-
luau—qsv_register_lookup("name", "url_or_path")makes a CSV available as a Lua-indexable table -
validate—dynamicEnumkeyword validates a column against the values in a CSV -
template— sameregister_lookupAPI as Luau, available inside MiniJinja templates
All three share the same URL-scheme support: file://, http(s)://, dathere://, ckan://.
The companion repo is qsv-lookup-tables — curated lookup CSVs for common needs (US states, NACE codes, ICD-10, ISO 4217, …) — referenced via the dathere:// URL scheme.
| Scheme | Resolves to | Example |
|---|---|---|
file:// or bare path |
A local file | /data/us_states.csv |
http:// / https://
|
An HTTP(S) URL | https://example.com/us_states.csv |
dathere:// |
A curated CSV in the qsv-lookup-tables repo |
dathere://us-states-example.csv |
ckan:// |
A CKAN dataset resource (by resource ID) | ckan://abc123-resource-id |
For ckan://, set QSV_CKAN_API to the CKAN action API endpoint and QSV_CKAN_TOKEN for private resources. See Environment Variables.
All remote URLs are cached at $QSV_CACHE_DIR (default ~/.qsv-cache/). The cache key includes the URL. Re-runs reuse the local copy — no re-download.
To invalidate:
rm -rf ~/.qsv-cache/lookup-tables/See Stats Cache & Caching → Luau lookup-table cache.
-- BEGIN block runs once before any row is processed
BEGIN {
qsv_register_lookup("us_states", "dathere://us-states-example.csv")
qsv_register_lookup("agencies", "https://example.com/nyc-agencies.csv")
qsv_register_lookup("rates", "/local/path/to/tax_rates.csv")
}!
-- MAIN block runs once per row
local state_info = us_states[State] or {} -- table indexed by the FIRST column
local agency_info = agencies[Agency] or {}
local rate = rates[State] and rates[State].rate or 0
-- Return the value that becomes the new column
return state_info.region or "Unknown"The lookup table is indexed by the first column of the registered CSV. Each value is a table (Lua dict) keyed by remaining column names.
-- nta_enrich.lua
BEGIN {
qsv_register_lookup("nta_by_zip", "https://raw.githubusercontent.com/dathere/qsv-lookup-tables/main/nyc_zip_to_nta.csv")
}!
local zip = string.match(_['Incident Address'] or '', "%d%d%d%d%d")
if zip and nta_by_zip[zip] then
return nta_by_zip[zip].nta_name
end
return "Unknown"qsv luau map nta -x -f nta_enrich.lua NYC_311.csv > with_nta.csvSee Scripting (Luau / Python) → luau and the legacy Luau Helper Functions Examples page for more patterns.
Use dynamicEnum in a JSON Schema instead of a hardcoded enum when the valid-value list is dynamic or too large.
{
"type": "object",
"properties": {
"Agency": {
"type": "string",
"dynamicEnum": "NYC_agencies.csv"
}
}
}The first column of NYC_agencies.csv is the value list. Additional columns are ignored.
{
"properties": {
"currency": { "dynamicEnum": "https://example.com/iso_4217_codes.csv" }
}
}{
"properties": {
"borough": { "dynamicEnum": "ckan://abc123-nyc-boroughs-resource-id" }
}
}Requires QSV_CKAN_API set to the CKAN action API endpoint.
{
"properties": {
"us_state": { "dynamicEnum": "dathere://us-states-example.csv" }
}
}See Validation & Schema → validate and Recipe: JSON Schema Validation.
{% set ok = register_lookup("us_states", "dathere://us-states-example.csv") -%}
Dear {{ first_name | title }} {{ last_name | title }},
You qualify for the {{ us_states[us_state].program }} program in {{ us_states[us_state].name }}.
{% if us_state in ['CA', 'NY', 'TX'] -%}
Note: large-state-specific terms apply.
{% endif %}qsv template --template-file form_letter.j2 - data.csv > letters.txtSee Scripting (Luau / Python) → template and the official Luau doc example of register_lookup in templates.
A lookup CSV needs:
- A header row
- The first column is the key
- All other columns are values, accessible as
table[key].column_namein Luau / templates
Example us_states.csv:
state,name,region,population,capital
CA,California,West,39512223,Sacramento
NY,New York,Northeast,19453561,Albany
TX,Texas,South,28995881,Austin
...Then in Luau:
us_states["CA"].name -- "California"
us_states["CA"].region -- "West"
us_states["CA"].population -- "39512223" (string; tonumber() to coerce)Browse qsv-lookup-tables for the full curated catalog. Frequently-useful ones:
-
dathere://us-states-example.csv— US state codes + names + regions -
dathere://currency_codes.csv— ISO 4217 currency codes - (And many domain-specific lookups — check the repo)
To contribute a lookup CSV, see the contributing guidelines of the lookup-tables repo.
- The lookup CSV is loaded once in the BEGIN block (Luau) or on first
register_lookupcall (template). Per-row lookups are O(1) hash-table reads. - Remote URLs are cached on first download — second run is instant.
- The full lookup table is held in memory; size is typically small (KB) for code lookups. For huge lookups (10M+ rows), consider streaming the lookup from SQLite via a
foreachloop instead.
-
Scripting (Luau / Python) —
luauandtemplate -
Validation & Schema —
validatewithdynamicEnum -
Environment Variables —
QSV_CACHE_DIR,QSV_CKAN_API,QSV_CKAN_TOKEN - Stats Cache & Caching — Luau cache invalidation
- qsv-lookup-tables repo — curated lookup CSVs
-
Luau Helper Functions Examples — legacy wiki page with
qsv_register_lookuppatterns - Recipe: Geographic Enrichment — lookup-fallback pattern
- Recipe: JSON Schema Validation
-
Recipe: CKAN Integration —
ckan://URL scheme in production
qsv — GitHub · Releases · Discussions · qsv pro · Try it online · Benchmarks · datHere · DeepWiki · Dual-licensed MIT / Unlicense
Edit this page: Contributing to the Wiki
Home · Why qsv? · Tier legend
- All Commands (index)
- Selection & Inspection
- Transform & Reshape
- Aggregation & Statistics
- Joins & Set Ops
- SQL & Polars
- Validation & Schema
- Metadata Profiling (profile)
- Conversion & I/O
- Geospatial
- HTTP & Web
- Get & Disk Cache
- Scripting (Luau / Python)
- Indexing, Compression & Diff
- AI & Documentation