Model fails to load when chat template uses HuggingFace generation tags

# Prerequisites

- [x] I am running the latest code.
- [x] I carefully followed the README.md.
- [x] I searched issues and discussions; no existing issue covers this (PR #2082 attempts a fix but is stale and has a missing import).
- [x] I reviewed the Discussions.

# Expected Behavior

`Llama(model_path=...)` should successfully load a GGUF whose embedded `tokenizer.chat_template` contains HuggingFace's `{% generation %}` / `{% endgeneration %}` Jinja tags (e.g. SmolLM3, and any future HF-shipped model adopting the same template extension), even when the caller intends to pass an explicit `chat_format` override.

# Current Behavior

`Llama.__init__` raises `jinja2.exceptions.TemplateSyntaxError: Encountered unknown tag 'generation'` before the model is usable. The error fires during `Jinja2ChatFormatter.__init__`, which eagerly compiles every chat template found in GGUF metadata regardless of whether the caller will use it.

The `{% generation %}` tag is a HuggingFace transformers chat-template extension that marks training-time generation spans for loss masking. It has no inference-time meaning, but jinja2's default environment doesn't recognize it.

# Environment and Context

- Hardware: Apple M1 Pro, 16 GB
- OS: macOS 14.6 (Darwin 23.6.0)
- Python: 3.12.9
- llama-cpp-python: main (commit current as of 2025-05-21)
- jinja2: 3.x

# Failure Information

```
jinja2.exceptions.TemplateSyntaxError: Encountered unknown tag 'generation'.
Jinja was looking for the following tags: 'elif' or 'else' or 'endif'.
The innermost block that needs to be closed is 'if'.
```

# Steps to Reproduce

1. `pip install llama-cpp-python`
2. `huggingface-cli download bartowski/HuggingFaceTB_SmolLM3-3B-GGUF HuggingFaceTB_SmolLM3-3B-Q4_K_M.gguf`
3. `python -c "from llama_cpp import Llama; Llama(model_path='./HuggingFaceTB_SmolLM3-3B-Q4_K_M.gguf', chat_format='chatml')"`

The `chat_format='chatml'` override is intentionally provided to show the failure occurs even when the embedded template would be bypassed: the template is compiled at init regardless.

# Related

- PR #2082 attempts a fix via a similar Jinja extension but has a missing `nodes` import and incomplete `parse()` body, plus seven months of no reviewer activity.

A complete fix is available; will open a PR shortly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model fails to load when chat template uses HuggingFace generation tags #2225

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information

Steps to Reproduce

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Model fails to load when chat template uses HuggingFace generation tags #2225

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information

Steps to Reproduce

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions