Skip to content

feat: add all 7 Qwen3.5 sizes to MODEL_RENDERER_MAP, autodetect think polarity#8

Merged
hallerite merged 1 commit intomainfrom
feat/qwen35-add-sizes
May 7, 2026
Merged

feat: add all 7 Qwen3.5 sizes to MODEL_RENDERER_MAP, autodetect think polarity#8
hallerite merged 1 commit intomainfrom
feat/qwen35-add-sizes

Conversation

@hallerite
Copy link
Copy Markdown
Member

Summary

  • Adds all seven Qwen3.5 sizes to MODEL_RENDERER_MAP: 0.8B, 2B, 4B, 9B, 35B-A3B, 122B-A10B, 397B-A17B (previously only the middle three were listed).
  • Teaches Qwen35Renderer to autodetect the enable_thinking default from the tokenizer's chat template at construction.
  • Bumps version 0.1.70.1.8.

Why

Qwen3.5 ships two chat-template variants that differ only in the polarity of the gen-prompt branch:

Variant Sizes Default at gen prompt
Big-default-on 4B / 9B / 35B-A3B / 122B-A10B / 397B-A17B <think>\n (thinking on)
Small-default-off 0.8B / 2B <think>\n\n</think>\n\n (empty block)

The previous Qwen35Renderer hardcoded enable_thinking=True, which matched the big-default variant but not the small one — so 0.8B / 2B couldn't be added to the map without silently breaking byte parity at add_generation_prompt=True.

Implementation

Qwen35Renderer.__init__(... enable_thinking: bool | None = None):

  • When None (the new default), runs a one-shot tokenizer.apply_chat_template(...) probe and reads the polarity off the resulting tail (</think> ⇒ default off).
  • Explicit enable_thinking=True/False still wins — autodetect only fires when the caller omits the flag.
  • Probe failures fall back to True (matches the big-default majority).

Tests

New tests/test_qwen35_size_coverage.py (37 cases):

  1. Map shape — exactly the seven listed sizes route to qwen3.5; catches silent additions.
  2. Polarity autodetect — each of the 7 sizes resolves to its expected default (big → True, small → False) when constructed without an explicit flag.
  3. Byte parity barrage — for each of the 7 sizes × 4 representative conversations (system+user, single-turn, gen-prompt, with reasoning), Qwen35Renderer.render_ids matches apply_chat_template token-for-token.

Full suite: 937 passed, 48 skipped, 1 xfailed.

Test plan

  • pytest tests/test_qwen35_size_coverage.py — 37 cases pass
  • Full suite (pytest tests/ --ignore=tests/test_client.py) — 937 pass, no regressions
  • Pre-commit hooks (ruff check + format) clean

🤖 Generated with Claude Code

… polarity

Five Qwen3.5 sizes (4B / 9B / 35B-A3B / 122B-A10B / 397B-A17B) ship one
chat template (default ``enable_thinking=true``); the smaller 0.8B and
2B sizes ship the polarity-flipped variant (default
``enable_thinking=false`` → empty ``<think>\n\n</think>\n\n`` at the
generation-prompt boundary).

``Qwen35Renderer`` now probes the tokenizer's chat_template at
construction with a one-shot ``apply_chat_template`` call and resolves
its ``enable_thinking`` default from the polarity it detects. All seven
sizes now render byte-identical to their own ``apply_chat_template`` —
including with ``add_generation_prompt=True`` where the polarity diverges.

Explicit ``enable_thinking=True/False`` still overrides; the autodetect
only fires when the caller omits the flag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@hallerite hallerite force-pushed the feat/qwen35-add-sizes branch from f2b8a3c to 76a7310 Compare May 7, 2026 16:49
@hallerite hallerite merged commit 509127c into main May 7, 2026
6 checks passed
@hallerite hallerite deleted the feat/qwen35-add-sizes branch May 7, 2026 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant