refactor(qwen3.5): hard-code enable_thinking default per model#71
Merged
Conversation
`Qwen35Renderer` previously probed the tokenizer's chat template at construction (`apply_chat_template`) to learn each checkpoint's `enable_thinking` polarity. Because the config default is `None`, that probe ran on a plain `Qwen35Renderer(tok)` — pulling `transformers` onto the hot path and breaking bring-your-own-tokenizer use (a raw `tokenizers.Tokenizer` has no `apply_chat_template`). Replace it with a hard-coded `_ENABLE_THINKING_DEFAULTS` table keyed by model name, covering every checkpoint routed to the `qwen3.5` / `qwen3.6` renderer (small 0.8B/2B → False, the rest → True). Unknown / fine-tuned checkpoints fall back to `True` (the big-model default); pass an explicit `enable_thinking=` to override. Values are the same ones the probe returned — pinned by the existing polarity + byte-parity tests in `tests/test_qwen35_size_coverage.py`. Adds a guard test asserting construction never calls `apply_chat_template`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ApprovabilityVerdict: Approved This refactor replaces dynamic auto-detection of You can customize Macroscope's approvability policy. Learn more. |
hallerite
added a commit
that referenced
this pull request
May 27, 2026
…e from Tokenizer Brings in #68 (examples), #69 (harmony floor), #71 (qwen3.5 hard-coded enable_thinking). The only qwen35.py conflict is resolved by keeping #71's hard-coded `_ENABLE_THINKING_DEFAULTS` table (no `apply_chat_template` probe) on top of #31's `Tokenizer`/`Processor` type hints. Now that #71 removed the last hand-coded-renderer call to `apply_chat_template`, drop it from the `Tokenizer` protocol so a plain `tokenizers.Tokenizer` wrapper satisfies it. `apply_chat_template` moves to a new `ChatTemplateTokenizer(Tokenizer, Protocol)` subtype, required only by `DefaultRenderer` (the generic chat-template fallback).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Qwen35Rendererresolved itsenable_thinkingdefault by probing the tokenizer's chat template at construction:Since
Qwen35RendererConfig.enable_thinkingdefaults toNone, this fired on a normalQwen35Renderer(tok)— callingapply_chat_templateon the hot path. That pullstransformersinto construction and breaks bring-your-own-tokenizer use (a rawtokenizers.Tokenizerhas noapply_chat_template), which is exactly the dependency we're trying to shed (issue #31).What changed
Replaced the probe with a hard-coded table keyed by model name, enumerating every checkpoint routed to the
qwen3.5/qwen3.6renderer:Unknown / fine-tuned checkpoints fall back to
True(the big-model default, matching the old probe's failure fallback); pass an explicitenable_thinking=for a small-size fine-tune that needsFalse.Validation
tests/test_qwen35_size_coverage.py::test_qwen35_enable_thinking_polarity_defaultand the byte-parity barrage against each size's ownapply_chat_template. Full size-coverage suite: 37 passed (all 7 sizes, with/without gen prompt).test_construction_does_not_call_apply_chat_template: builds aQwen35Rendererwith a stub tokenizer whoseapply_chat_templateraises, and asserts construction succeeds + resolves the right default.ruff+tyclean.🤖 Generated with Claude Code
Note
Low Risk
Behavior is unchanged for mapped models per existing parity tests; risk is limited to unknown fine-tunes that relied on probe vs table fallback (still defaults to True).
Overview
Qwen35Rendererno longer callsapply_chat_templateat construction to inferenable_thinking. When config leaves itNone, defaults now come from_ENABLE_THINKING_DEFAULTSkeyed bytokenizer.name_or_path(0.8B/2B →False, larger Qwen3.5 sizes andQwen/Qwen3.6-35B-A3B→True, unknown checkpoints →True).Docs and
tests/test_qwen35_size_coverage.pywere updated to describe hard-coded polarity instead of auto-detection, andtest_construction_does_not_call_apply_chat_templateasserts a stub tokenizer without chat-template support can still be constructed.Reviewed by Cursor Bugbot for commit 45f02d6. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Replace dynamic tokenizer probing with a static lookup for
Qwen35Rendererenable_thinkingdefaults_detect_enable_thinking_default, which calledtokenizer.apply_chat_templateto infer theenable_thinkingdefault at construction time._ENABLE_THINKING_DEFAULTS, a module-level dict in qwen35.py mapping known Qwen3.5/3.6 model names to their correctenable_thinkingpolarity._default_enable_thinkinghelper looks uptokenizer.name_or_pathin this table, falling back toTruefor unknown or fine-tuned checkpoints.Qwen35Rendererconstruction no longer callsapply_chat_templateat all.Macroscope summarized 45f02d6.