Skip to content
Joel Natividad edited this page May 13, 2026 · 2 revisions

Lookup Tables

Tier: Advanced

Lookup tables let qsv resolve values against reference CSVs at runtime — local files, HTTP URLs, or CKAN-hosted resources. Three commands consume them:

  • luauqsv_register_lookup("name", "url_or_path") makes a CSV available as a Lua-indexable table
  • validatedynamicEnum keyword validates a column against the values in a CSV
  • template — same register_lookup API as Luau, available inside MiniJinja templates

All three share the same URL-scheme support: file://, http(s)://, dathere://, ckan://.

The companion repo is qsv-lookup-tables — curated lookup CSVs for common needs (US states, NACE codes, ICD-10, ISO 4217, …) — referenced via the dathere:// URL scheme.

URL schemes

Scheme Resolves to Example
file:// or bare path A local file /data/us_states.csv
http:// / https:// An HTTP(S) URL https://example.com/us_states.csv
dathere:// A curated CSV in the qsv-lookup-tables repo dathere://us-states-example.csv
ckan:// A CKAN dataset resource (by resource ID) ckan://abc123-resource-id

For ckan://, set QSV_CKAN_API to the CKAN action API endpoint and QSV_CKAN_TOKEN for private resources. See Environment Variables.

Caching

All remote URLs are cached at $QSV_CACHE_DIR (default ~/.qsv-cache/). The cache key includes the URL. Re-runs reuse the local copy — no re-download.

To invalidate:

rm -rf ~/.qsv-cache/lookup-tables/

See Stats Cache & Caching → Luau lookup-table cache.

In luauqsv_register_lookup

-- BEGIN block runs once before any row is processed
BEGIN {
  qsv_register_lookup("us_states", "dathere://us-states-example.csv")
  qsv_register_lookup("agencies", "https://example.com/nyc-agencies.csv")
  qsv_register_lookup("rates", "/local/path/to/tax_rates.csv")
}!

-- MAIN block runs once per row
local state_info = us_states[State] or {}     -- table indexed by the FIRST column
local agency_info = agencies[Agency] or {}
local rate = rates[State] and rates[State].rate or 0

-- Return the value that becomes the new column
return state_info.region or "Unknown"

The lookup table is indexed by the first column of the registered CSV. Each value is a table (Lua dict) keyed by remaining column names.

Worked example: enrich NYC 311 with NTA name from a ZIP-code lookup

-- nta_enrich.lua
BEGIN {
  qsv_register_lookup("nta_by_zip", "https://raw.githubusercontent.com/dathere/qsv-lookup-tables/main/nyc_zip_to_nta.csv")
}!

local zip = string.match(_['Incident Address'] or '', "%d%d%d%d%d")
if zip and nta_by_zip[zip] then
  return nta_by_zip[zip].nta_name
end
return "Unknown"
qsv luau map nta -x -f nta_enrich.lua NYC_311.csv > with_nta.csv

See Scripting (Luau / Python) → luau and the legacy Luau Helper Functions Examples page for more patterns.

In validatedynamicEnum

Use dynamicEnum in a JSON Schema instead of a hardcoded enum when the valid-value list is dynamic or too large.

Example: validate Agency against a CSV of approved agencies

{
  "type": "object",
  "properties": {
    "Agency": {
      "type": "string",
      "dynamicEnum": "NYC_agencies.csv"
    }
  }
}

The first column of NYC_agencies.csv is the value list. Additional columns are ignored.

Example: validate against a remote URL

{
  "properties": {
    "currency": { "dynamicEnum": "https://example.com/iso_4217_codes.csv" }
  }
}

Example: validate against a CKAN resource

{
  "properties": {
    "borough": { "dynamicEnum": "ckan://abc123-nyc-boroughs-resource-id" }
  }
}

Requires QSV_CKAN_API set to the CKAN action API endpoint.

Example: validate against the dathere curated repo

{
  "properties": {
    "us_state": { "dynamicEnum": "dathere://us-states-example.csv" }
  }
}

See Validation & Schema → validate and Recipe: JSON Schema Validation.

In template — MiniJinja register_lookup

{% set ok = register_lookup("us_states", "dathere://us-states-example.csv") -%}
Dear {{ first_name | title }} {{ last_name | title }},

You qualify for the {{ us_states[us_state].program }} program in {{ us_states[us_state].name }}.

{% if us_state in ['CA', 'NY', 'TX'] -%}
Note: large-state-specific terms apply.
{% endif %}
qsv template --template-file form_letter.j2 - data.csv > letters.txt

See Scripting (Luau / Python) → template and the official Luau doc example of register_lookup in templates.

Lookup table format

A lookup CSV needs:

  • A header row
  • The first column is the key
  • All other columns are values, accessible as table[key].column_name in Luau / templates

Example us_states.csv:

state,name,region,population,capital
CA,California,West,39512223,Sacramento
NY,New York,Northeast,19453561,Albany
TX,Texas,South,28995881,Austin
...

Then in Luau:

us_states["CA"].name        -- "California"
us_states["CA"].region      -- "West"
us_states["CA"].population  -- "39512223" (string; tonumber() to coerce)

Common lookup tables in dathere://

Browse qsv-lookup-tables for the full curated catalog. Frequently-useful ones:

  • dathere://us-states-example.csv — US state codes + names + regions
  • dathere://currency_codes.csv — ISO 4217 currency codes
  • (And many domain-specific lookups — check the repo)

To contribute a lookup CSV, see the contributing guidelines of the lookup-tables repo.

Performance notes

  • The lookup CSV is loaded once in the BEGIN block (Luau) or on first register_lookup call (template). Per-row lookups are O(1) hash-table reads.
  • Remote URLs are cached on first download — second run is instant.
  • The full lookup table is held in memory; size is typically small (KB) for code lookups. For huge lookups (10M+ rows), consider streaming the lookup from SQLite via a foreach loop instead.

See also

Clone this wiki locally