Skip to content

refactor(qwen3.5): hard-code enable_thinking default per model#71

Merged
hallerite merged 1 commit into
mainfrom
qwen35-hardcode-thinking
May 27, 2026
Merged

refactor(qwen3.5): hard-code enable_thinking default per model#71
hallerite merged 1 commit into
mainfrom
qwen35-hardcode-thinking

Conversation

@hallerite
Copy link
Copy Markdown
Member

@hallerite hallerite commented May 27, 2026

Why

Qwen35Renderer resolved its enable_thinking default by probing the tokenizer's chat template at construction:

if cfg.enable_thinking is None:
    cfg = cfg.model_copy(update={"enable_thinking": _detect_enable_thinking_default(tokenizer)})
# _detect_enable_thinking_default → tokenizer.apply_chat_template(...)

Since Qwen35RendererConfig.enable_thinking defaults to None, this fired on a normal Qwen35Renderer(tok) — calling apply_chat_template on the hot path. That pulls transformers into construction and breaks bring-your-own-tokenizer use (a raw tokenizers.Tokenizer has no apply_chat_template), which is exactly the dependency we're trying to shed (issue #31).

What changed

Replaced the probe with a hard-coded table keyed by model name, enumerating every checkpoint routed to the qwen3.5 / qwen3.6 renderer:

_ENABLE_THINKING_DEFAULTS = {
    "Qwen/Qwen3.5-0.8B": False,   # small sizes flip polarity → thinking off
    "Qwen/Qwen3.5-2B":   False,
    "Qwen/Qwen3.5-4B":   True,    # big sizes default thinking on
    "Qwen/Qwen3.5-9B":   True,
    "Qwen/Qwen3.5-35B-A3B":   True,
    "Qwen/Qwen3.5-122B-A10B": True,
    "Qwen/Qwen3.5-397B-A17B": True,
    "Qwen/Qwen3.6-35B-A3B":   True,
}

Unknown / fine-tuned checkpoints fall back to True (the big-model default, matching the old probe's failure fallback); pass an explicit enable_thinking= for a small-size fine-tune that needs False.

Validation

  • The values are exactly what the probe returned — already pinned by tests/test_qwen35_size_coverage.py::test_qwen35_enable_thinking_polarity_default and the byte-parity barrage against each size's own apply_chat_template. Full size-coverage suite: 37 passed (all 7 sizes, with/without gen prompt).
  • New guard test test_construction_does_not_call_apply_chat_template: builds a Qwen35Renderer with a stub tokenizer whose apply_chat_template raises, and asserts construction succeeds + resolves the right default.
  • ruff + ty clean.

🤖 Generated with Claude Code


Note

Low Risk
Behavior is unchanged for mapped models per existing parity tests; risk is limited to unknown fine-tunes that relied on probe vs table fallback (still defaults to True).

Overview
Qwen35Renderer no longer calls apply_chat_template at construction to infer enable_thinking. When config leaves it None, defaults now come from _ENABLE_THINKING_DEFAULTS keyed by tokenizer.name_or_path (0.8B/2B → False, larger Qwen3.5 sizes and Qwen/Qwen3.6-35B-A3BTrue, unknown checkpoints → True).

Docs and tests/test_qwen35_size_coverage.py were updated to describe hard-coded polarity instead of auto-detection, and test_construction_does_not_call_apply_chat_template asserts a stub tokenizer without chat-template support can still be constructed.

Reviewed by Cursor Bugbot for commit 45f02d6. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Replace dynamic tokenizer probing with a static lookup for Qwen35Renderer enable_thinking defaults

  • Removes _detect_enable_thinking_default, which called tokenizer.apply_chat_template to infer the enable_thinking default at construction time.
  • Adds _ENABLE_THINKING_DEFAULTS, a module-level dict in qwen35.py mapping known Qwen3.5/3.6 model names to their correct enable_thinking polarity.
  • The new _default_enable_thinking helper looks up tokenizer.name_or_path in this table, falling back to True for unknown or fine-tuned checkpoints.
  • A new test verifies that Qwen35Renderer construction no longer calls apply_chat_template at all.

Macroscope summarized 45f02d6.

`Qwen35Renderer` previously probed the tokenizer's chat template at
construction (`apply_chat_template`) to learn each checkpoint's
`enable_thinking` polarity. Because the config default is `None`, that
probe ran on a plain `Qwen35Renderer(tok)` — pulling `transformers` onto
the hot path and breaking bring-your-own-tokenizer use (a raw
`tokenizers.Tokenizer` has no `apply_chat_template`).

Replace it with a hard-coded `_ENABLE_THINKING_DEFAULTS` table keyed by
model name, covering every checkpoint routed to the `qwen3.5` / `qwen3.6`
renderer (small 0.8B/2B → False, the rest → True). Unknown / fine-tuned
checkpoints fall back to `True` (the big-model default); pass an explicit
`enable_thinking=` to override. Values are the same ones the probe
returned — pinned by the existing polarity + byte-parity tests in
`tests/test_qwen35_size_coverage.py`.

Adds a guard test asserting construction never calls `apply_chat_template`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@macroscopeapp
Copy link
Copy Markdown

macroscopeapp Bot commented May 27, 2026

Approvability

Verdict: Approved

This refactor replaces dynamic auto-detection of enable_thinking defaults with a hard-coded lookup table, preserving identical behavior for all known Qwen3.5 models while avoiding a transformers dependency at construction time. The change is well-tested and maintains byte-parity with the previous implementation.

You can customize Macroscope's approvability policy. Learn more.

@hallerite hallerite merged commit e729baa into main May 27, 2026
11 checks passed
@hallerite hallerite deleted the qwen35-hardcode-thinking branch May 27, 2026 17:55
hallerite added a commit that referenced this pull request May 27, 2026
…e from Tokenizer

Brings in #68 (examples), #69 (harmony floor), #71 (qwen3.5 hard-coded
enable_thinking). The only qwen35.py conflict is resolved by keeping #71's
hard-coded `_ENABLE_THINKING_DEFAULTS` table (no `apply_chat_template`
probe) on top of #31's `Tokenizer`/`Processor` type hints.

Now that #71 removed the last hand-coded-renderer call to
`apply_chat_template`, drop it from the `Tokenizer` protocol so a plain
`tokenizers.Tokenizer` wrapper satisfies it. `apply_chat_template` moves to
a new `ChatTemplateTokenizer(Tokenizer, Protocol)` subtype, required only by
`DefaultRenderer` (the generic chat-template fallback).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant