Skip to content

feat(template): add data-wrangling MiniJinja filters shared across commands#3921

Merged
jqnatividad merged 10 commits into
masterfrom
more-minijinja-custom-functions
May 29, 2026
Merged

feat(template): add data-wrangling MiniJinja filters shared across commands#3921
jqnatividad merged 10 commits into
masterfrom
more-minijinja-custom-functions

Conversation

@jqnatividad
Copy link
Copy Markdown
Collaborator

Summary

Adds a shared src/minijinja_filters.rs module of pure, always-on MiniJinja filters/functions that close real gaps for data-wrangling templates, wired into all four MiniJinja-powered commands (template, fetchpost, describegpt, profile) via a single register(env) call.

Each filter was verified to be genuinely missing — not provided by minijinja 2.20 core, minijinja-contrib, or qsv's existing filters (e.g. nothing anywhere offers regex; pycompat has no zfill/rjust/ljust; core round has no rounding-mode arg).

Filters / functions added

Name Purpose
regex_replace(pattern, repl) Replace ALL matches ($1/${name} capture refs)
regex_match(pattern) Bool — for {% if %}
regex_find(pattern) First whole match, or ""
floor / ceil Round down / up to a whole integer
datefmt(fmt[, prefer_dmy]) Parse a messy date string (19+ formats via qsv-dateparser) and reformat
zfill(width) / lpad(width[, fill]) / rpad(...) Padding incl. leading-zero preservation
slugify URL/DB/CKAN-safe slug
blake3 BLAKE3 hex digest (stable surrogate/content keys)
fromjson / parse_json Parse a JSON-in-a-cell string into an indexable value
coalesce(a, b, ...) First arg that isn't undefined/none/empty

Design notes

  • No cargo feature gateregex, blake3, qsv-dateparser, serde_json are always compiled in, so the filters exist in qsv, qsvlite, qsvdp, and qsvmcp.
  • All functions are pure and Send + Sync, so the single Environment that template shares across rayon worker threads can call them concurrently. The regex cache uses read-lock-then-clone (Regex is Arc-backed), so matching never holds the lock.
  • Errors map to minijinja::Error; in template, a bad pattern/value degrades to a per-row RENDERING ERROR (caught + counted), not a crash.

Context

This is the outcome of evaluating a "Luau-in-templates" idea, which on review mostly overlapped with existing capabilities (pycompat, printf format, qsv's format_float/round_banker/lookup) and carried real costs (per-thread Lua VM, per-row context serialization, two languages in one template). These targeted filters deliver the practical value at a fraction of the complexity; heavy logic remains better served by qsv luau in a pipeline.

Testing

  • 13 new tests in tests/test_template.rs; all passing.
  • Full suites pass with no regressions: template (52), profile (64), describegpt (74).
  • Builds clean: -F all_features, -F lite, -F datapusher_plus.
  • cargo clippy -F all_features clean for the new code; cargo +nightly fmt applied.
  • template USAGE updated and docs/help/template.md regenerated via qsv --generate-help-md.

🤖 Generated with Claude Code

…mmands

Add a shared `src/minijinja_filters.rs` module registering pure, always-on
MiniJinja filters/functions that fill real gaps (verified absent from
minijinja 2.20 core, minijinja-contrib, and qsv's existing filters):

  - regex_replace / regex_match / regex_find  (runtime-cached patterns)
  - floor / ceil                              (core round has no rounding mode)
  - datefmt(fmt[, prefer_dmy])                (parse 19+ messy date formats via
                                               qsv-dateparser, then reformat)
  - zfill / lpad / rpad                       (pycompat lacks zfill/rjust/ljust)
  - slugify                                   (URL/DB/CKAN-safe slugs)
  - blake3                                    (stable surrogate/content keys)
  - fromjson / parse_json                     (JSON-in-a-cell -> indexable value)
  - coalesce(a, b, ...)                       (first non-empty arg)

Wired into all four MiniJinja-powered commands via a single register() call:
template, fetchpost, describegpt, and profile. No cargo feature gate -- every
dependency used (regex, blake3, qsv-dateparser, serde_json) is always compiled
in, so the filters are available in qsv, qsvlite, qsvdp, and qsvmcp.

All functions are pure and Send + Sync, so the single Environment that
`template` shares across rayon worker threads calls them concurrently; the
regex cache uses read-lock-then-clone so matching never holds the lock.

Adds 13 tests to tests/test_template.rs; updates `template` USAGE and
regenerates docs/help/template.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented May 29, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 38 complexity

Metric Results
Complexity 38

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

jqnatividad and others added 6 commits May 29, 2026 11:47
)

- regex cache: cap at 256 entries so data-derived patterns (e.g.
  regex_match(pattern_column)) can't grow the process-global cache
  unbounded and exhaust memory.
- floor/ceil: reject NaN/infinity and out-of-i64-range values via a
  to_i64() guard instead of an `as` cast that silently saturates to
  0/i64::MIN/i64::MAX.
- add regression tests for the out-of-range floor guard.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…borev #2589)

i64::MAX as f64 rounds up to 2^63 (9223372036854775808.0), so the prior
inclusive range check admitted 2^63 and then saturated it to i64::MAX on
cast. Switch to an exclusive upper bound at 2^63 (i64::MIN is exactly
representable, so the lower bound stays inclusive). Extend the regression
test to cover the finite out-of-range value 2^63, not only infinity.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
)

floor/ceil of an integer is the integer itself, so integer-string inputs
now short-circuit via an i64 parse before any f64 conversion. This makes
valid boundary values like i64::MAX (9223372036854775807, which rounds UP
to 2^63 as f64) round-trip exactly instead of being rejected by the 2^63
range guard. Only genuinely fractional inputs take the f64 path. Add an
i64::MAX/i64::MIN boundary regression test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Returning a float removes the i64-cast range/precision guardrails entirely
(no saturation, no 2^63 boundary edge cases, no precision loss for huge
integers). NaN/infinity now render transparently rather than silently
becoming a wrong integer. Users pipe `|int` when an integer is wanted
(`{{ v|floor|int }}`).

Updates the USAGE note + regenerates docs/help/template.md, and replaces the
i64-cast regression tests with float-output, `|int`, and non-numeric-error
cases.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Returning pure f64 lost precision for large integer inputs (e.g. i64::MAX
or any ID above 2^53 rounded during the f64 parse). Restore an integer-
string fast path: integer inputs pass through exactly as an integer Value,
and only genuinely fractional inputs go through f64 (returning a float).
No i64 cast, so still no saturation/range pitfalls. Re-add boundary
regression coverage for 2^53+1, i64::MAX, and i64::MIN.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…2593)

Extend the exact integer fast path to u64 so large unsigned IDs (up to
u64::MAX) pass through unchanged. Integer-syntax strings that fit neither
i64 nor u64 now error instead of silently approximating through f64,
making the "integers stay exact" contract honest. Add regression coverage
for i64::MAX+1 and u64::MAX (exact) plus u64::MAX+1 and i64::MIN-1 (error).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a shared minijinja_filters module for data-wrangling-oriented MiniJinja filters/functions and wires it into the MiniJinja-powered commands so templates can use the same helpers consistently.

Changes:

  • Adds shared filters/functions for regex, rounding, date formatting, padding, slugging, hashing, JSON parsing, and coalescing.
  • Registers the shared filter set in template, fetchpost, describegpt, and profile, plus all binary entry points.
  • Adds template integration tests and updates template help documentation.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/minijinja_filters.rs Implements and registers the shared MiniJinja filters/functions.
src/main.rs Adds the shared module to the main binary.
src/mainlite.rs Adds the shared module to the lite binary.
src/maindp.rs Adds the shared module to the datapusher-plus binary.
src/cmd/template.rs Documents and registers the shared filters for template.
src/cmd/fetchpost.rs Registers the shared filters for payload templates.
src/cmd/describegpt.rs Registers the shared filters for markdown and prompt rendering.
src/cmd/profile/formula_helpers.rs Registers the shared filters in profile formula environments.
tests/test_template.rs Adds coverage for the new filters through qsv template.
docs/help/template.md Updates generated help for the new template filters.

Comment thread docs/help/template.md Outdated
jqnatividad and others added 3 commits May 29, 2026 14:41
The new USAGE block started with `qsv ` (triggering help_markdown_gen's
console auto-fence) and relied on indentation the generator strips, so
docs/help/template.md rendered the intro as a console block and collapsed
the aligned filter list. Reword the intro to not start with `qsv ` and wrap
the list in an explicit ``` fence (preserved verbatim by the generator), so
the columns and multi-line entries render correctly. Regenerate the help md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#2595)

format_examples passed literal `> ...` blockquote lines (e.g. GitHub
`[!NOTE]` admonitions) straight to the catch-all without a trailing blank
line, so the note wasn't closed and the following paragraph could be
absorbed as a CommonMark lazy continuation. Add a blockquote branch
mirroring the existing `#`-comment handling: emit the line, then a blank
line once the next non-empty line is not a blockquote line. Regenerate
docs/help/template.md (only file affected).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…roborev #2596)

Add regression tests for the blockquote fix: one asserts a blank line
separates a `> [!NOTE]` block from following prose (guarding against the
lazy-continuation bug), another asserts adjacent `>` lines stay in a single
blockquote.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jqnatividad jqnatividad merged commit 428090e into master May 29, 2026
17 checks passed
@jqnatividad jqnatividad deleted the more-minijinja-custom-functions branch May 29, 2026 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants