Fix: model fails to load when chat template uses HuggingFace generation tags by tobocop2 · Pull Request #2226 · abetlen/llama-cpp-python

tobocop2 · 2026-05-22T03:40:54Z

Problem

GGUFs whose embedded tokenizer.chat_template uses {% generation %} / {% endgeneration %} (SmolLM3 and other HF-shipped models) fail to load with TemplateSyntaxError in Llama.__init__, even when the caller passes a chat_format override. Jinja2ChatFormatter eagerly compiles every embedded template.

Solution

Register a Jinja extension that treats both tags as inert wrappers: the body renders as-is, the markers emit nothing. No behavioral change for templates that don't use the tags.

Relationship to #2082

PR #2082 attempted the same approach but is incomplete:

The parse() method references nodes.Const("") without importing nodes from jinja2, so it would NameError on first use.
parser.stream.skip(1); return nodes.Const("") consumes only the tag name. It never advances past the body or the closing {% endgeneration %}, so the parser is left in a broken state and the next template construct fails to parse.

This PR addresses both: imports nodes and Extension explicitly, and consumes the body via parser.parse_statements(("name:endgeneration",), drop_needle=True) so the wrapped content renders and the parser advances past the closing tag. Includes a unit test that fails today and passes with the fix.

Closes #2225.

HuggingFace's transformers chat-template extension adds {% generation %} and {% endgeneration %} tags so trainers can mark generation spans for loss masking. The tags ship in GGUF tokenizer.chat_template metadata (SmolLM3 et al), but jinja2's default environment doesn't recognize them, so Llama() raises TemplateSyntaxError at init for any affected GGUF, even when the caller passes an explicit chat_format override. Register a minimal Jinja extension that treats both tags as inert wrappers: the body between them renders as-is, the markers themselves emit nothing. No behavioral change for templates that don't use the tags. Prior art: PR abetlen#2082 attempted the same approach but referenced an unimported 'nodes' module and didn't consume the body or closing tag.

tobocop2 mentioned this pull request May 22, 2026

Model fails to load when chat template uses HuggingFace generation tags #2225

Open

4 tasks

tobocop2 changed the title ~~Add Jinja extension for {% generation %} tag to fix GGUF load failures (SmolLM3 et al)~~ Fix: model fails to load when chat template uses HuggingFace generation tags May 22, 2026

tobocop2 mentioned this pull request May 22, 2026

fix: opencode integration findings from the QA matrix tobocop2/lilbee#279

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: model fails to load when chat template uses HuggingFace generation tags#2226

Fix: model fails to load when chat template uses HuggingFace generation tags#2226
tobocop2 wants to merge 1 commit into
abetlen:mainfrom
tobocop2:fix/jinja-generation-tag

tobocop2 commented May 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tobocop2 commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Relationship to #2082

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tobocop2 commented May 22, 2026 •

edited

Loading