Skip to content

Fix: model fails to load when chat template uses HuggingFace generation tags#2226

Open
tobocop2 wants to merge 1 commit into
abetlen:mainfrom
tobocop2:fix/jinja-generation-tag
Open

Fix: model fails to load when chat template uses HuggingFace generation tags#2226
tobocop2 wants to merge 1 commit into
abetlen:mainfrom
tobocop2:fix/jinja-generation-tag

Conversation

@tobocop2
Copy link
Copy Markdown

@tobocop2 tobocop2 commented May 22, 2026

Problem

GGUFs whose embedded tokenizer.chat_template uses {% generation %} / {% endgeneration %} (SmolLM3 and other HF-shipped models) fail to load with TemplateSyntaxError in Llama.__init__, even when the caller passes a chat_format override. Jinja2ChatFormatter eagerly compiles every embedded template.

Solution

Register a Jinja extension that treats both tags as inert wrappers: the body renders as-is, the markers emit nothing. No behavioral change for templates that don't use the tags.

Relationship to #2082

PR #2082 attempted the same approach but is incomplete:

  • The parse() method references nodes.Const("") without importing nodes from jinja2, so it would NameError on first use.
  • parser.stream.skip(1); return nodes.Const("") consumes only the tag name. It never advances past the body or the closing {% endgeneration %}, so the parser is left in a broken state and the next template construct fails to parse.

This PR addresses both: imports nodes and Extension explicitly, and consumes the body via parser.parse_statements(("name:endgeneration",), drop_needle=True) so the wrapped content renders and the parser advances past the closing tag. Includes a unit test that fails today and passes with the fix.

Closes #2225.

HuggingFace's transformers chat-template extension adds {% generation %}
and {% endgeneration %} tags so trainers can mark generation spans for
loss masking. The tags ship in GGUF tokenizer.chat_template metadata
(SmolLM3 et al), but jinja2's default environment doesn't recognize
them, so Llama() raises TemplateSyntaxError at init for any affected
GGUF, even when the caller passes an explicit chat_format override.

Register a minimal Jinja extension that treats both tags as inert
wrappers: the body between them renders as-is, the markers themselves
emit nothing. No behavioral change for templates that don't use the
tags.

Prior art: PR abetlen#2082 attempted the same approach but referenced an
unimported 'nodes' module and didn't consume the body or closing tag.
@tobocop2 tobocop2 changed the title Add Jinja extension for {% generation %} tag to fix GGUF load failures (SmolLM3 et al) Fix: model fails to load when chat template uses HuggingFace generation tags May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Model fails to load when chat template uses HuggingFace generation tags

1 participant