Skip to content

MiniJinja Templating

Joel Natividad edited this page May 30, 2026 · 4 revisions

MiniJinja Templating

Tier: Intermediate Commands that use it: template, fetchpost, describegpt, profile

Note

This page is the cross-cutting templating layer. Per-command flags live in /docs/help/. For the full template language, see the MiniJinja docs and the Jinja2 template syntax reference.

Several qsv commands embed MiniJinja (currently v2.20), a Rust implementation of Jinja2. Wherever you see the ⛩️ symbol in the Command Reference, that command renders text from your data with the same template language — so a filter you learn for template works the same in a fetchpost payload, a describegpt prompt, or a profile formula.

Where MiniJinja shows up

Command What gets templated How to supply the template
template An arbitrary text/Markdown/HTML/CSV-per-row document --template <str> or -t, --template-file <file>
fetchpost The HTTP POST request body (JSON or any content type) -t, --payload-tpl <file>
describegpt The LLM prompt(s) sent for inference --prompt-file <toml> (templated prompt fields)
profile CKAN scheming formula / suggestion_formula fields → derived metadata --spec <yaml>

In every case, CSV column values become template variables and the rendered output is what the command emits or sends.

How CSV data maps to template variables

For per-row commands (template, fetchpost), each row is rendered independently:

  • Column headers become variable names. Non-alphanumeric characters are converted to underscore: first name{{ first_name }}, us-state{{ us_state }}.
  • With --no-headers, columns are addressed by 1-based index with a _c prefix: {{ _c1 }}, {{ _c2 }}, … (template).
  • QSV_ROWNO holds the current 1-based row number — handy for output filenames (--outfilename in template).
  • All field values are strings. Cast with |int or |float before math or before any filter that needs a number (see Tips below).

Shared globals (qsv_g)

template (-J, --globals-json <file>) and fetchpost accept a JSON file of values shared across every row render, accessed under the qsv_g namespace:

Report for {{ qsv_g.school_name }} — {{ qsv_g.year }}
Student: {{ last_name|title }}, {{ first_name|title }}

qsv's custom filters & functions (template)

Beyond the stock Jinja2 filters and the minijinja-contrib set, template registers these qsv-specific helpers. They do not require casting — they accept the raw string field for convenience:

Filter Purpose Example
substr(start[, end]) Substring by byte range `{{ code
format_float(precision) Parse → fixed-precision float (max 16) `{{ balance
human_count Integer with thousands separators `{{ rows
human_float_count Float with thousands separators `{{ amount
round_banker(places) Banker's rounding (round-half-to-even) `{{ rate
to_bool Truthiness of true/1/yes/t/y or non-zero number `{% if active
lookup("table", "Column") Value from a registered lookup table `{{ us_state

Plus one custom function:

Function Purpose
register_lookup("name", "resource") Load a lookup table (local path, HTTP/HTTPS, dathere://, or ckan://) and bind it to name for |lookup. Returns true on success.

Important

Filter errors are values, not crashes. When a custom filter can't parse its input (e.g. format_float on non-numeric text), it returns the --customfilter-error string (default <FILTER_ERROR>) instead of aborting the run. Set --customfilter-error "<empty string>" to emit nothing on error.

Lookup tables in templates

register_lookup() is pre-scanned from the template body before rendering, so the table is ready on the first row. Because of this pre-scan, a register_lookup(...) call buried inside a conditional still runs at startup.

{% set ok = register_lookup("us_states", "dathere://us-states-example.csv") -%}
{% if ok and us_state not in ["DE", "CA"] -%}
  {% set tax = us_state|lookup("us_states", "Sales Tax (2023)")|float -%}
  {{ us_state|lookup("us_states", "Name") }}: {{ tax }}%
{% endif %}

See Lookup Tables for resource schemes, caching (--cache-dir, QSV_CACHE_DIR), and CKAN options (--ckan-api, --ckan-token).

Shared data-wrangling filters & functions (all commands)

These fill gaps that neither MiniJinja core nor minijinja-contrib cover (regex, integer-exact rounding, messy-date parsing, padding, slugs, hashing). They are registered on every MiniJinja-powered command — template, fetchpost, describegpt, and profile — and are present in all binary variants (qsv, qsvlite, qsvdp, qsvmcp), with no cargo feature gate. Values are coerced from strings, so you usually don't need |float first.

Filter Purpose Example
regex_replace("pat", "rep") Replace all regex matches; $1 / ${name} capture refs in the replacement `{{ phone
regex_match("pat") true if the regex matches anywhere `{% if id
regex_find("pat") First whole match, or "" if none `{{ text
floor / ceil Round down / up. Integer inputs stay exact (incl. values beyond f64's 2⁵³ range, up to u64); fractional inputs return a float `{{ "42.7"
datefmt("fmt"[, prefer_dmy]) Parse a messy date string (19+ formats, via qsv-dateparser) and reformat with a chrono format string. Unlike contrib's dateformat, this parses arbitrary strings `{{ d
zfill(width) Left-pad with zeros, keeping a leading sign `{{ "42"
lpad(width[, fill]) / rpad(width[, fill]) Left / right pad to width with fill (default space) `{{ name
slugify URL/DB/CKAN-safe slug (lowercase, non-alphanumeric runs → -, trimmed) `{{ title
blake3 BLAKE3 hex digest of the value — stable surrogate / content keys for dedup, joins, change-detection `{{ row
fromjson / parse_json Parse a JSON-in-a-cell string into an indexable value `{{ (meta

Plus one function:

Function Purpose
coalesce(a, b, …) First argument that is not undefined / none / empty string (broader than the single-fallback default/d)

Notes:

  • floor/ceil precision. Integer inputs pass through exactly (signed i64 and large unsigned u64 IDs alike); an integer literal too large for either is rejected with an error rather than silently approximated. Fractional inputs go through f64 and return a float — pipe |int for a clean integer.
  • Regex caching. Compiled patterns are cached (bounded) for reuse across rows, so a literal pattern compiles once.
  • Errors. Invalid regex, unparseable dates, malformed JSON, and non-numeric floor/ceil inputs raise a template error. In template, that surfaces as a per-row RENDERING ERROR (counted), not a crash.

datefmt in action: describegpt dictionary Min/Max

describegpt --infer-content-type uses datefmt in its default dictionary template. For Date/DateTime fields, the LLM infers the column's actual strftime format (validated against the data) and stamps it onto the Content Type, e.g. date:%m/%d/%Y or datetime:%m/%d/%Y %I:%M:%S %p. The dictionary's Min and Max come from qsv stats normalized to RFC 3339 (2013-01-24), which looks different from how the dates actually appear in the data. The template extracts the inferred format and reformats Min/Max so they match the column's real presentation (and the verbatim Examples/Enumeration values):

{% set df = entry.content_type | regex_replace("^(date|datetime):", "")
            if entry.content_type | regex_match("^(date|datetime):") else "" %}
{% if df and entry.min %}{{ entry.min | datefmt(df) }}{% else %}{{ entry.min }}{% endif %}

So a column whose dates read 01/24/2013 shows Min/Max as 01/24/2013 too, not 2013-01-24. Fields without an inferred date format (bare date/datetime, or non-date content types) are left unchanged. Custom --prompt-file dictionary templates can adopt the same {% set df %} pattern. The ^(date|datetime): anchor matches only the bare-token prefix up to the first :, so formats containing colons (%I:%M:%S) are preserved intact.

profile formula helpers

profile's --spec formulas run in a richer environment — a native Rust port of DataPusher+'s jinja2_helpers.py — with metadata-oriented helpers such as format_bytes, format_date, format_coordinates, calculate_percentage, sanitize_iso_8601_interval, spatial_extent_wkt, temporal_resolution, guess_accrual_periodicity, build_csvw_schema, and build_croissant_fields. These are specific to the profiling pipeline; see Metadata Profiling and DataPusher+'s dataset-druf.yaml for usage.

What's enabled (and why it matters)

qsv compiles MiniJinja with these capabilities, so they're available in every templated command:

  • pycompat — call Python-style string methods directly: {{ name.upper() }}, {{ s.startswith("A") }}, {{ s.strip() }}.
  • datetime + timezone (minijinja-contrib) — datetimeformat, dateformat, timeformat, now() for date/time rendering.
  • urlencode — percent-encode values, important for fetchpost payloads: {{ q|urlencode }}.
  • loop_controls{% break %} / {% continue %} inside {% for %} loops.
  • rand (minijinja-contrib) — random/randrange-style helpers.
  • Text shaping (minijinja-contrib) — wordwrap, wordcount, Unicode-aware word wrapping.
  • json{{ obj|tojson }} for emitting valid JSON (the backbone of fetchpost JSON payloads).
  • speedups / stacker — performance and deep-recursion safety; no syntax impact.

Tips & tricks

Cast before you compute. Fields arrive as strings. Stock Jinja math and many filters need numbers:

{{ (balance|float - discount|float)|format_float(2) }}
{{ "%.1f"|format(score|float) }}

Trim whitespace with -. Add a minus to a block's delimiters to strip surrounding whitespace/newlines — essential when generating compact JSON or clean CSV:

{%- for r in rows -%}
  {{ r.id }}{% if not loop.last %},{% endif %}
{%- endfor -%}

Comments don't render. Use {# … #} for notes that won't appear in output.

fetchpost: keep JSON valid. Build the body with |tojson and |urlencode rather than hand-quoting. fetchpost validates the rendered JSON and aborts on malformed output, so let the filters do the escaping:

{"name": {{ full_name|tojson }}, "q": "{{ query|urlencode }}"}

describegpt: prompts are templates too. The --prompt-file TOML's prompt fields are MiniJinja — you can interpolate dataset stats/frequency/dictionary context into the LLM prompt. See resources/describegpt_defaults.toml for the built-in templates.

Use loop.* variables. loop.index, loop.first, loop.last, loop.length make separators and headers easy in template documents.

Python string methods work (via pycompat): {{ code.replace("-", "") }}, {{ name.title() }}, {{ s.split(",") }}.

See also

Clone this wiki locally