feat: add renderers package by hallerite · Pull Request #1068 · PrimeIntellect-ai/verifiers

hallerite · 2026-03-25T23:44:10Z

Summary

Adds packages/renderers/ — a standalone package that owns message ↔ token conversion as an alternative to vLLM's Jinja chat templates and the existing TITO/MITO server-side machinery. Every sampled assistant turn keeps its exact tokens across the rollout boundary; the trainer sees bit-for-bit what vLLM produced.

Status: experimental. Available alongside the production openai_chat_completions_token (TITO) client, which remains the recommended path for production training. Renderers offer stronger token-preservation guarantees but only ship hand-coded support for a subset of models, and corner cases are still being shaken out. The package is text-only — Qwen3VLRenderer works against the Qwen3-VL tokenizer for text conversations only; multimodal training should continue to use MITO.

Full design, motivation, and examples of the failure modes this fixes: packages/renderers/README.md.

Renderer matrix

Template family	Renderer class	Models
Qwen chatml	`Qwen3Renderer` / `Qwen35Renderer` / `Qwen36Renderer` / `Qwen3VLRenderer`	Qwen3, Qwen3.5, Qwen3.6, Qwen3-VL (text-only)
GLM (next-turn markers)	`GLM5Renderer` / `GLM45Renderer`	GLM-5, GLM-5.1, GLM-4.5, GLM-4.7
MiniMax chatml-variant	`MiniMaxM2Renderer`	MiniMax-M2 / M2.5
Kimi im-role format	`KimiK2Renderer` / `KimiK25Renderer`	Kimi K2, K2.5, K2.6
DeepSeek V3 tool-section	`DeepSeekV3Renderer`	DeepSeek V3
Nemotron chatml + XML tools	`Nemotron3Renderer`	Nemotron-3 Nano / Super
OpenAI harmony	`GptOssRenderer`	GPT-OSS
Fallback (`apply_chat_template`)	`DefaultRenderer`	anything else, with optional `tool_parser` / `reasoning_parser`

Architecture

messages → renderer.render_ids() → [token IDs]
                                       ↓
                              vLLM /v1/generate
                                       ↓
[completion IDs] → renderer.parse_response() → ParsedResponse(content, reasoning_content, tool_calls)

The Renderer Protocol:

render() / render_ids() — messages → tokens with per-token message attribution for loss masking.
parse_response() — completion tokens → structured message via token-ID boundary scanning (no regex on decoded text).
get_stop_token_ids() — turn-close tokens.
bridge_to_next_turn() — extends prev_prompt + prev_completion with the new turn's tokens; returns None if the renderer can't prove prefix-stability (caller falls back to fresh render).

Key design decisions

Per-renderer bridges, hand-coded. No shared chatml_bridge / glm_bridge helper — that approach rendered [dummy_assistant, *new_messages] and diffed against [dummy_assistant] to extract extension tokens, which broke on templates that treated the dummy as an invalid prefix (GLM-5.1 wraps the last assistant with empty <think></think>, harmony's assistant uses different channels historical vs latest, Kimi auto-injects a default system). Each renderer's bridge now hand-emits the new-turn tokens by calling the same per-role inline helpers that render() uses, so the two paths can't silently diverge. Two small shared primitives remain: trim_to_turn_close (scan prev_completion for a template-specific close token; on truncation, append the canonical close so the bridge still extends) and reject_assistant_in_extension (bridges refuse to re-tokenize model-sampled assistant content).
Truncation handling. Hand-coded renderers always synthesize the canonical turn-close (<|im_end|>, <|endoftext|>, harmony's <|end|>, …) when vLLM hits max_tokens and the prior completion has no close token, so the next prompt still extends the prior step's tokens verbatim; the synthetic token lands in the merged sample's prompt_ids (mask=False) and never enters loss or KL. DefaultRenderer.bridge_to_next_turn returns None unconditionally — it wraps an unknown Jinja template and can't prove the extension contract holds — so the caller falls back to a fresh re-render.
Pluggable parsers for DefaultRenderer. Hand-coded renderers bake parsing in. DefaultRenderer takes optional tool_parser= / reasoning_parser= kwargs wired to registries in renderers.parsers. Built-ins today: qwen3, qwen3.5, glm, deepseek_v3 for tools; think for reasoning.

RendererClient + `/v1/generate`

Adds RendererClient (verifiers client type "renderer") — renders messages client-side, POSTs raw token IDs to vLLM's /v1/generate, parses completions back into structured responses. Multi-turn rollouts reuse the prior step's exact tokens through bridge_to_next_turn; no re-rendering of sampled content.

A RendererPool offloads sync tokenization to threads so concurrent rollouts tokenize in parallel instead of blocking the event loop.

Test plan

packages/renderers/tests/test_render_ids.py — multi-model parity matrix vs apply_chat_template (1 documented xfail for an upstream Jinja bug on Qwen3-VL content=None).
packages/renderers/tests/test_roundtrip.py — render → parse round-trip per renderer: content, reasoning, single and multiple tool-calls.
packages/renderers/tests/test_bridge.py — bridge contract invariants per hand-coded renderer: extends prev verbatim, rejects assistant-role extension, synthesizes close on truncation, extension contains the new-message content.
packages/renderers/tests/test_incremental.py — unit coverage of trim_to_turn_close + reject_assistant_in_extension edge cases.
packages/renderers/tests/test_parsers.py / test_parse_response.py / test_parse_response_robustness.py — parsing on truncated / malformed output; includes regression test for parse_qwen3 JSON-decode-error fallback.
tests/test_renderer_e2e.py — end-to-end TITO rollout with scripted vLLM; asserts token preservation and multi-turn bridge extension.
E2E in prime-rl #2278: wordle multi-turn samples_per_rollout = 1.00, reward + KL track main. Qwen3.5-35B-A3B + mini-swe-agent-plus: 0 breaks vs main's 32 in the same step (see README §4 for the concrete break modes).

🤖 Generated with Claude Code

Note

Medium Risk
Large, mostly additive change that introduces new tokenization/parsing/bridging logic and a new dependency; errors here can subtly affect rollout correctness and RL training data integrity, though existing client paths remain available.

Overview
Introduces a new standalone packages/renderers package that implements a Renderer protocol, per-model message→token renderers (Qwen/GLM/Kimi/DeepSeek/MiniMax/Nemotron/GPT-OSS) with bridge_to_next_turn support, and a fallback DefaultRenderer that wraps tokenizer.apply_chat_template with optional tool/reasoning parsers.

Adds a renderer-backed inference path (renderers.client.completions_request) that sends prompt_token_ids to vLLM’s /generate endpoint, parses completion token IDs back into structured outputs, and supports parallel tokenization via RendererPool.

Updates docs (evaluation.md, training.md, faqs.md, reference.md) to document the new renderer client_type, expand client type listings/descriptions, and clarify RL training tradeoffs between MITO/TITO/renderer approaches.

^{Reviewed by Cursor Bugbot for commit ee6cdb5. Bugbot is set up for automated code reviews on this repo. Configure here.}

Adds packages/renderers/ — a standalone package for deterministic message-to-token conversion that replaces Jinja chat templates. Renderers (6 total): - Qwen3Renderer, Qwen35Renderer (Qwen family) - GLM5Renderer, GLM45Renderer (GLM family) - MiniMaxM2Renderer (MiniMax M2/M2.5) - DefaultRenderer (fallback: uses tokenizer.apply_chat_template) Each renderer implements: - render_ids(messages) -> token IDs (messages -> tokens) - parse_response(token_ids) -> ParsedResponse (tokens -> structured message) - get_stop_token_ids() -> stop tokens RendererClient: new verifiers client type ("renderer") that uses renderers for all tokenization. Sends token IDs to vLLM /v1/completions directly. No MITO/TITO prefix matching, no /tokenize calls. Auto-detection: create_renderer(tokenizer) picks the right renderer from tokenizer special tokens. Falls back to DefaultRenderer for unsupported models.

… attribution 175 parametrized tests across 7 models × 25 cases: - test_render_ids: token-for-token correctness against apply_chat_template - test_parse_response: content/reasoning/tool extraction - test_build_helpers: supervised sample + trajectory step Fixes: - GLM-5/GLM-4.5/MiniMax: None content rendered as "None" (matches Jinja) - GLM-4.5: BPE boundary fix for content + \n before <tool_call> - DefaultRenderer: incremental rendering for per-token message attribution Adding a new model family = add one entry to conftest.RENDERER_MODELS.

INTELLECT-3.1: auto-detects to DefaultRenderer (apply_chat_template fallback) because its tokenizer has aggressive BPE merges that break piece-by-piece encoding. The IntellectRenderer is available as "intellect" for future optimization but is not the auto-detect default. Kimi K2.5: identified as needing a custom KimiRenderer (TODO). The template uses unique tokens (<|im_user|>, <|im_assistant|>, <|im_middle|>) and always appends a generation prompt, making the DefaultRenderer's incremental approach incompatible. Skipped in barrage tests for now. Also: - Fixed DefaultRenderer to always pass tokenize=True (Kimi returns str by default) - Fixed _expected() in tests to handle tokenizers returning str - 200 barrage tests passing across 8 models Model coverage: - Qwen3 (custom) ✓ - Qwen3.5 (custom) ✓ - GLM-5 / GLM-4.7-Flash (custom) ✓ - GLM-4.5-Air (custom) ✓ - MiniMax-M2.5 (custom) ✓ - INTELLECT-3.1 (default) ✓ - Qwen2.5 (default) ✓ - Kimi K2.5 — TODO: needs KimiRenderer

KimiRenderer for moonshotai/Kimi-K2.5: - Unique format: <|im_user|>/<|im_assistant|>/<|im_middle|> role tokens - TypeScript namespace tool definitions - Tool calls via <|tool_calls_section_begin|>/<|tool_call_begin|> tokens - All 25 barrage tests passing Auto-detection: replaced fragile token-sniffing heuristics with a simple MODEL_RENDERER_MAP that maps model name prefixes to renderer names. Falls back to DefaultRenderer for unknown models. 225 barrage tests across 9 models, all passing.

New renderers/parsing.py with extraction functions ported from vLLM: - extract_reasoning_qwen (qwen3_reasoning_parser) - extract_reasoning_glm (basic think/content split) - extract_reasoning_minimax (minimax_m2_reasoning_parser) - extract_reasoning_kimi (kimi_k2_reasoning_parser) - extract_tool_calls_hermes (hermes_tool_parser — Qwen3 JSON) - extract_tool_calls_qwen35xml (qwen3xml_tool_parser — Qwen3.5 XML) - extract_tool_calls_glm (glm4_moe/glm47_moe_tool_parser) - extract_tool_calls_minimax (minimax_m2_tool_parser) - extract_tool_calls_kimi (kimi_k2_tool_parser) Same regex patterns, same edge case handling as vLLM. All renderers now delegate parse_response() to these shared functions. Truncation: <think> present without </think> → truncated reasoning. No <think> marker → plain content (not assumed truncated). 312 barrage tests passing.

…ded text Replaced all decode-then-regex parsing with token ID scanning: - Find special token boundaries (</think>, <tool_call>, etc.) by their token IDs directly in the sequence - Decode only the text segments between boundaries - No false positives from content that happens to look like special tokens Each model family has a dedicated parse function: - parse_qwen3: Hermes JSON tool calls by <tool_call> token ID - parse_qwen35: XML tool calls + <think>/<\think> by token ID - parse_glm: <arg_key>/<arg_value> pairs by token ID - parse_minimax: <minimax:tool_call> by token ID, invoke/parameter by text - parse_kimi: full token-level (section/begin/end/arg_begin all by ID) Truncation: <think> token present without </think> → truncated reasoning. No <think> token → plain content. 312 barrage tests passing.

- Strip at first stop token (truncate) instead of only trailing - Remove <think> from reasoning_ids regardless of position (fixes GLM-4.5 where \n precedes <think> in completion) Verified end-to-end for all 6 model families: ✓ Qwen3: thinking + content + tool calls + names + args ✓ Qwen3.5: thinking + content + tool calls + names + args ✓ GLM-5: thinking + content + tool calls + names + args ✓ GLM-4.5: thinking + content + tool calls + names + args ✓ MiniMax: thinking + content + tool calls + names + args ✓ Kimi: content + tool calls + args (no thinking/names by design) 312 barrage tests passing.

- Removed KimiRenderer (too complex for now, needs more iteration) - Removed unused messages.py (normalize_messages, deserialize_tool_calls, strip_message_content — not imported anywhere in the package) - Cleaned up parse_kimi from parsing.py - 273 barrage tests passing across 7 models

The proxy now forwards to /v1/generate (our custom endpoint) instead of /v1/completions. For VLM, extracts raw images from messages and sends them alongside renderer's token IDs. vLLM processes images server-side while text tokenization is fully client-side via the Renderer. Also updated client.py to use /v1/generate.

…S and strong Message typing Add five new model-family renderers with full render/parse support: - DeepSeekV3Renderer: fullwidth Unicode tokens, <think> text tags, tool call section markers - KimiK2Renderer: im_user/im_assistant/im_system format, tool_calls_section markers, default system prompt - KimiK25Renderer: extends K2 with <think> prefill, vision/media support, TypeScript tool declarations - Nemotron3Renderer: Qwen-style im_start/im_end with XML tool declarations, universal thinking blocks - GptOssRenderer: Harmony channel-based format (analysis/commentary/final), TypeScript tools Also introduces strong typing across the package: - Message, Content, ContentPart, ToolCall, ToolSpec TypedDicts in base.py - All renderer signatures updated from dict[str, Any] to proper types - Renderer protocol updated to use Message and ToolSpec Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Cast OpenAI SDK message/tool types to Message/ToolSpec at the renderer_client boundary. Add override annotations for methods that legitimately change the response type from OpenAIChatResponse to dict. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…tCompletionsClient RendererClient does client-side tokenization via /v1/generate, not the chat completions API. Inheriting from OpenAIChatCompletionsClient was wrong — it forced type mismatches (OpenAIChatResponse vs dict) that required override annotations. Now inherits Client[AsyncOpenAI, list[RendererMessage], dict, ToolSpec] with its own to_native_prompt that converts verifiers Pydantic messages to renderer TypedDicts cleanly. No casts, no type: ignore overrides. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

RendererClient now uses a shared RendererPool (32 slots by default) that offloads render_ids() and parse_response() to threads via asyncio.to_thread(). HuggingFace fast tokenizers release the GIL during Rust encoding, so concurrent rollouts tokenize in parallel instead of serializing on the event loop. Benchmarks on 30-core EPYC with 22K-token conversations: N=8: 164ms → 46ms (3.6x) N=16: 330ms → 103ms (3.2x) N=32: 659ms → 196ms (3.4x) When a single Renderer is passed (tests, simple usage), the original non-threaded path is preserved. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

`_encode_tools_typescript` filtered with `tool.get("type") != "function"` which silently dropped every flat `ToolSpec` (the TypedDict in `renderers.base`: `{name, description, parameters}` with no `type` key). Production callers pass `ToolSpec`; tests happen to use the OpenAI envelope format `{"type":"function","function":{...}}`, which is why the regression slipped through. Now accept both shapes: unwrap `tool["function"]` when the envelope is present, otherwise treat the dict as a flat ToolSpec. Non-function envelope types (e.g. `"_plugin"`) are still skipped. Caught by Cursor Bugbot in PR #1068 thread r3151243707. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vLLM serves gpt-oss via the `openai-harmony` reference encoder, which is also what the model was trained on. The previous hand-rolled ~550-line implementation only covered a subset of the harmony spec (no system preamble, no auto channel-routing line, no canonical ``<|return|>`` for terminal turns, partial TS schema rendering for tools), and HF's chat_template.jinja diverges from harmony in small ways anyway. Approach: thin adapter over `openai-harmony`. Per-message `enc.render(m)` produces token streams that concatenate byte-identical to `enc.render_conversation` (verified empirically), so we get per-token attribution for free — emit `enc.render(m)` for each caller message and tag tokens with the caller index. The system+developer prefix needs `enc.render_conversation` (not per-message) because harmony injects a channel-routing line into SystemContent based on conversation-level info; per-message rendering doesn't see that. Caller messages map to harmony as: - first `system` → `DeveloperContent.with_instructions(content)` - `user` → `Role.USER` - `assistant` final-channel for text + commentary-channel recipient=`functions.<name>` per tool_call - `tool` → `Role.TOOL` with `name=functions.<name>`, recipient=`assistant`, channel=`commentary` - historical `reasoning_content` is dropped (matches harmony's `render_conversation` behaviour — analysis-channel messages are stripped from rendered history; reasoning is per-turn only) Last assistant final-channel close gets patched from `<|end|>` to `<|return|>` to match `render_conversation_for_training`. Tests: - `packages/renderers/tests/test_gpt_oss_harmony_parity.py`: 7 new parity tests against `enc.render_conversation_for_training` — no-system, system+user, terminal-assistant `<|return|>`, tools layout, full tool-call+result cycle, reasoning-content stripping, generation prompt scaffolding. - conftest matrix: `openai/gpt-oss-20b` added; an autouse fixture skips it for HF-parity test files (`test_render_ids`, `test_build_helpers`, `test_parse_response*`) since the renderer intentionally matches harmony, not HF Jinja. Filed under conftest so a single skip rule covers all four files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`OpenAIChatCompletionsTokenClient` and its tests were dropped on the renderers branch in favour of the new `RendererClient`. Restore them so both paths coexist on this branch — callers can opt into either via `ClientConfig.client_type`: - `"renderer"` → renderers-v2 path (client-side tokenization + `/v1/generate`) - `"openai_chat_completions_token"` → server-side token-aware chat (`/v1/chat/completions/tokens`) Files restored verbatim from origin/main: - verifiers/clients/openai_chat_completions_token_client.py - tests/test_openai_chat_completions_token_client.py Plumbing: - verifiers/clients/__init__.py: re-add the import, the `resolve_client` dispatch case, and the `__all__` entry. - verifiers/types.py: re-add `"openai_chat_completions_token"` to the `ClientType` Literal. The server-side endpoint (`serving_chat_with_tokens.py`) lives in prime-rl and will need a matching restoration on that side before this client can be exercised end-to-end on the renderers branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`auto_system_injected` (line 160) was computed but never read; the actual auto-injection bookkeeping uses `auto_system_idx`. The unused loop variable `name` in `_render_assistant`'s tool_calls block was dead since the K2 template emits arguments only — function names live in the `tool_call['id']` field. CI fix only — no behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The logger init and the bridge-metrics module-level helpers were declared between two import groups, which made ruff flag every import below them as E402 (module-level import not at top of file). Move them after the imports — same code, just relocated — so ruff is happy. CI fix only — no behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three classes of pre-existing test failures on the renderers branch (visible in CI for weeks); none affect production behavior, all unblock the test job from going green. 1. ``test_renderer_client_honors_configured_renderer_name`` and ``test_renderer_client_uses_renderer_model_name_override``: ``_get_renderer_or_pool`` now passes ``tool_parser=None, reasoning_parser=None`` to ``create_renderer``. Update the ``assert_called_once_with`` mocks to match. 2. ``test_get_incremental_prompt_ids_*`` (3 tests with the ``_BridgeRenderer`` fake): the fake had only ``render_ids`` and ``parse_response`` from the old diff-based bridge protocol — ``_get_incremental_prompt_ids`` now calls ``renderer.bridge_to_next_turn``. Add a ``bridge_to_next_turn`` method that returns ``prev_prompt + prev_completion + trailing + extension``, mimicking what a real bridge stitches together. Track bridge calls separately so the "without re-rendering completion" test can assert that ``render_ids`` is NOT called and the bridge path is taken. 3. Two parametrize cases that test features that were prototyped but not merged into the renderers package — strict xfail so they auto-flip to xpass once the feature lands: - ``test_get_incremental_prompt_ids_bridges_over_truncated_step [Qwen/Qwen2.5-0.5B-Instruct]``: DefaultRenderer's ``bridge_to_next_turn`` always returns None. ``synthesize_close_on_truncation`` for unknown templates was prototyped in site-packages but never merged. - ``test_extension_break_emits_diagnostic_log``: the bridge no longer surfaces a break for Qwen3.5's strip-thinking-from-history pattern, so the diagnostic log never fires for this scenario. Needs a different repro (e.g. trajectory with empty ``completion_ids`` or an invalid tail) to exercise the log path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… tests CI runs ``ruff format --check`` in addition to ``ruff check``. The format job was failing on 17 files — purely formatting-only changes that were never normalised after recent edits to: - packages/renderers/renderers/ (client, glm45, glm5, gpt_oss, kimi_k25, minimax_m2, parsers, parsing) - packages/renderers/tests/ (test_bridge, test_client, test_gpt_oss_harmony_parity, test_incremental, test_parsers, test_roundtrip) - tests/test_renderer_client.py, tests/test_renderer_e2e.py - verifiers/clients/renderer_client.py Ran ``uv run ruff format``; both ``ruff format --check`` and ``ruff check`` are now clean. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…erClient When TITO was originally dropped on the renderers branch, both docs mentioning ``openai_chat_completions_token`` were rewritten to point at the new ``renderer`` client. Now that TITO is restored (commit ``748d03e0``) and lives next to ``RendererClient``, the docs need to list both: - ``docs/evaluation.md``: extend the ``--api-client-type`` flag's enumerated list to include both ``openai_chat_completions_token`` and ``renderer``. - ``docs/reference.md``: - re-add ``"openai_chat_completions_token"`` to the ``ClientType`` Literal block (matches ``verifiers/types.py``). - re-add the ``OpenAIChatCompletionsTokenClient`` row to the Built-in Client Implementations table, with a one-line note distinguishing it from the renderer client (server-side templating + ``/v1/chat/completions/tokens`` route vs client-side tokenization through the ``renderers`` package). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…sample The function returns ``(token_ids, loss_mask)`` for any caller-defined masking policy — its only specifically-supervised aspect was the name. ``build_training_sample`` better reflects the canonical ``(ids, mask)`` builder used across both SFT and RL training paths, matches the wording in trainer code/configs, and reads cleaner alongside ``build_trajectory_step``. The previous name is dropped without a deprecation alias because ``renderers`` isn't released yet — no external users to break. Touches: - packages/renderers/renderers/base.py (def) - packages/renderers/renderers/__init__.py (import + __all__) - packages/renderers/tests/test_build_helpers.py (module docstring, import, 2 test names, 2 call sites) - 4 docstring/comment mentions in default.py, kimi_k2.py, nemotron3.py, test_message_indices.py prime-rl has no callers (verified via grep). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- openai_chat_completions_token_client: tokenize() default extra_kwargs={} is a mutable-default footgun; use None and lazily initialize. - environment / types: restore start_timer (perf_counter) for monotonic elapsed-ms; the consolidation onto start_time (time.time) could produce negative deltas on NTP step. - docs/reference: ClientType list was missing nemorl_chat_completions; add it and the corresponding row in the Built-in Clients table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The DEBUG-only diagnostic in _log_extension_break (and the last_reason / last_detail tracking around it in _get_incremental_prompt_ids) was reaching into renderer._tokenizer to decode token windows on bridge failure, which was an encapsulation violation through a private attr. The diagnostic was xfailing in tests and not exercised on main; better diagnostics can be reintroduced once the design is clearer. Removes ~270 lines: the helper, the dedicated logger, the per-step category tracking, and the xfailing observability test in test_renderer_e2e. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two cleanup passes folded into one commit since they share files: * Drop synthesize_close_on_truncation: every hand-coded renderer hardcoded True at the class level and DefaultRenderer's bridge_to_next_turn returns None unconditionally regardless of the flag, so the runtime knob never did anything user-visible. Removed from the Renderer Protocol, both factories (create_renderer, create_renderer_pool), DefaultRenderer.__init__, the misleading "ignoring for renderer=X" log, and from every hand-coded renderer's bridge (the `synthesize_close=(self._x if self.synthesize_close_on_truncation else None)` ternary becomes a direct `synthesize_close=self._x`). Tests updated: test_bridge drops the opt-out branch, test_incremental drops the obsolete opt-out test, test_renderer_client loses its `synth_close` parameter and the now-impossible Qwen2.5/default xfail entry. * Strip multimodal: the renderer pipeline can't carry image bytes through /v1/generate, and the config validator already routes VLMs to MITO. Removed ImagePart from the type union (and the package's public exports), all multimodal handling in Qwen3VLRenderer (~250 lines: image loaders, processor wiring, multimodal-content branches in render/render_ids/bridge), KimiK25Renderer's _emit_image + media tokens, and the image/video branches in Nemotron3 / Qwen35 content rendering. Passing image content to any renderer now raises ValueError. README §6 updated. Multimodal tests in test_render_ids dropped (kept the auto-routing smoke test for Qwen3-VL). Net: +45 / -644 lines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mikasenghaas · 2026-04-28T22:27:30Z


    client_idx: int = 0
    client_type: ClientType = "openai_chat_completions"
+    renderer: str = "auto"


hm in wondering if we should bundle those somehow? so like a discriminated union of client types with some shared args but mostly disjoint

mikasenghaas · 2026-04-28T22:52:44Z

+### What we gain
+
+- **RL correctness.** A prompt/completion split we control, which is exactly what `bridge_to_next_turn` relies on to keep rollouts from fragmenting under truncation or re-tokenization.
+- **Testable parity.** Per-model renderers are plain Python. We can render the same conversation through the renderer and through HF's `apply_chat_template` and assert token-level parity. Every edge case (empty thinking, multiple tool calls, truncated turns) becomes a unit test instead of undefined behavior buried inside Jinja.


mikasenghaas · 2026-04-28T22:53:14Z

+
+- **RL correctness.** A prompt/completion split we control, which is exactly what `bridge_to_next_turn` relies on to keep rollouts from fragmenting under truncation or re-tokenization.
+- **Testable parity.** Per-model renderers are plain Python. We can render the same conversation through the renderer and through HF's `apply_chat_template` and assert token-level parity. Every edge case (empty thinking, multiple tool calls, truncated turns) becomes a unit test instead of undefined behavior buried inside Jinja.
+- **Escape hatch.** Anything without a hand-coded renderer falls back to `DefaultRenderer` (a generic `apply_chat_template` wrapper), which mirrors the previous TITO path.


what sort of guarantees, if any, can we give for this fallback? what are the pros and cons compare to fallback to mito?

yeah, let me be more clear about this

mikasenghaas · 2026-04-28T22:57:47Z

+
+### Per-renderer bridges
+
+Each hand-coded renderer implements `bridge_to_next_turn` directly for its model's chat template — no shared generic helper, just Python that knows what tokens the template would insert between turns. Qwen3's bridge knows about `<|im_start|>role\n … <|im_end|>\n`; GLM's bridge knows that turns end when the next role marker appears; DeepSeek V3, Kimi K2/K2.5, Nemotron-3, GPT-OSS, MiniMax each have their own. On a clean stop, vLLM's `completion_ids` already includes the template's close token; on truncation, the renderer synthesizes the canonical close (`<|im_end|>`, `<|endoftext|>`, or the equivalent for that model) so the extension invariant still holds, and the synthetic close is masked out of the loss because the model didn't produce it.


* docs/evaluation.md: --api-client-type list was missing nemorl_chat_completions and ordered inconsistently with verifiers/types.py; align with the source of truth. * base.py / renderer_client.py: simplify factory closures from default-arg-as-pseudo-closure (factory(_name=…, _model=…, …)) to plain factory() -> Renderer; the captured locals are stable for the function's lifetime, no late-binding footgun. * clients: drop redundant maybe_normalize_messages calls from OpenAIChatCompletionsClient.to_native_prompt and RendererClient.to_native_prompt. PR #1027 explicitly centralized message normalization in the env loop and removed downstream copies; we re-introduced the redundancy by accident. * renderer_client.py: collapse _get_renderer_or_pool's inline factory + RendererPool() construction into a direct create_renderer_pool() call (single source of truth for pool construction lives in packages/renderers/renderers/base.py). * README: switch the create_renderer / create_renderer_pool import examples to the top-level package path. Tests in test_renderer_client.py updated to patch create_renderer_pool instead of the now-uncalled create_renderer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The renderer ClientType is RL-specific (token preservation across turns, multi-turn extension invariant via bridge_to_next_turn, truncation-safe close synthesis), but neither training.md nor faqs.md mentioned it. * training.md: add an "Inference Client Types" subsection under "RL Rules of Thumb" that contrasts MITO / TITO / renderer and recommends renderer (or TITO as fallback) for RL workloads. Also updates "Non-Increasing Chat Templates" to note that the renderer client sidesteps the Qwen3/DeepSeek <think>-stripping issue by tokenizing client-side. * faqs.md: add a Training FAQ "Which client_type should I use for RL training?" with the same three-way breakdown and a link to training.md for details. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The renderer client type ships hand-coded support for only a subset of models and corner cases are still being shaken out. Production RL workloads should use openai_chat_completions_token (TITO) — it's the tried-and-tested path with broad coverage. Try renderer when you want the stronger token-preservation guarantees and your model has a hand-coded renderer. * training.md / faqs.md: tag renderer with *(experimental)* and flip the recommendation to TITO-first. * renderers/README.md: add an explicit "Status: experimental" callout in the intro; drop the misleading "Replaces the old TITO client" line — TITO and renderer ship side-by-side. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ients Mika #7 — the renderer client had 15 module-level helpers as a trailer block after the class, while every other client (openai_chat_completions, openai_chat_completions_token, etc.) co-locates helpers above the class. Pure positional move; helpers stay module-level (they're pure functions tested directly via import). Nothing extracted to clients/utils since _normalize_for_comparison has renderer-specific semantics (tool_call.arguments JSON-decode, None-filtering) that don't match the TITO version. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two trailing blank lines left over from removing the diagnostic test in 9afd448. ruff format --check caught it; ruff check passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two bugbot findings on PR #1068: 1. ``build_training_sample`` accepted a ``collapse_consecutive_tool_messages`` parameter that was never referenced in the body. No actual caller (verified across verifiers + prime-rl) — prime-rl SFT uses its own ``build_incremental_token_mask`` which has its own collapse implementation. Drop the dead parameter. 2. ``KimiK25Renderer`` silently dropped ``role="tool_declare"`` messages in the input list with ``if role == "tool_declare": continue``, regardless of whether ``tools=`` was passed. The K2.5 chat template actually iterates every message — tool_declare included — through ``set_roles`` + ``render_content``, emitting ``<|im_system|>tool_declare<|im_middle|>{content}<|im_end|>``. The ``tools=`` parameter is a separate path that fires once before the loop, not a deduplicating gate. Removing the early-skip lets tool_declare messages flow through the existing generic content handler, matching the template exactly. Adds a regression test ``test_kimi_k25_tool_declare_message_without_tools_param`` verifying parity against ``apply_chat_template`` for this case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit e2ede3f. Configure here.}

…ink> KimiK2Renderer was calling _extract_thinking on every assistant turn, which split inline <think>...</think> out of content and then discarded the extracted reasoning. Result: any caller passing content like "<think>secret</think>visible" got "visible" emitted instead of the verbatim string, disagreeing with apply_chat_template. Kimi K2's chat template emits ``message.content`` verbatim — there is no reasoning_content support, no inline-tag stripping. The separate reasoning_content field is just dropped (the template never reads it). Drop _extract_thinking entirely (single caller) and emit content directly. Adds test_kimi_k2_inline_think_tags_render_verbatim which asserts parity against apply_chat_template for the bugbot-flagged case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

completions_request was building its return dict without threading through id, model, or created from vLLM's /generate response — so RendererClient.from_native_response always fell back to its defaults (id="", created=0, model=""), and downstream Response objects had empty metadata even when vLLM populated those fields. Pass the three fields through with safe defaults (empty string / 0) so callers using them for logging, request correlation, or model attribution see real values. Adds test_from_native_response_propagates_id_model_created as a regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

hallerite changed the title ~~feat: add renderers package — deterministic chat template rendering~~ feat: add renderers package Mar 25, 2026

cursor Bot reviewed Mar 25, 2026

View reviewed changes

Comment thread packages/renderers/renderers/kimi.py Outdated

Comment thread verifiers/types.py

Comment thread packages/renderers/renderers/messages.py Outdated

cursor Bot reviewed Mar 25, 2026

View reviewed changes

Comment thread verifiers/clients/renderer_client.py

Comment thread packages/renderers/renderers/client.py Outdated

hallerite marked this pull request as draft March 26, 2026 00:03

hallerite mentioned this pull request Mar 26, 2026

PoC Renderers PrimeIntellect-ai/prime-rl#2090

Closed

5 tasks

hallerite added 11 commits April 7, 2026 14:32

chore: remove IntellectRenderer (needs full-string encoding approach)

3494ef1

run ruff

815c397

style: ruff format + check

cfc9ac2

hallerite force-pushed the renderers branch from 41d816f to cfc9ac2 Compare April 7, 2026 14:43

hallerite and others added 10 commits April 7, 2026 15:49

feat: switch training clients to renderers

77c0373

fix: harden renderer proxy routing

49e1e5f

refactor: simplify renderer transport

070c5da

style: ruff check + format

e42b025

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into renderers

74df920

merge: origin/main into renderers

fa5f015

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

hallerite marked this pull request as ready for review April 14, 2026 17:19

cursor Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread packages/renderers/renderers/deepseek_v3.py

Comment thread packages/renderers/renderers/gpt_oss.py Outdated

Comment thread docs/reference.md

cursor Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread packages/renderers/renderers/kimi_k25.py

hallerite and others added 9 commits April 28, 2026 23:34

cursor Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread packages/renderers/renderers/client.py

hallerite and others added 3 commits April 29, 2026 04:26

mikasenghaas reviewed Apr 28, 2026

View reviewed changes

cursor Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread packages/renderers/renderers/kimi_k25.py

Comment thread docs/evaluation.md Outdated

hallerite and others added 3 commits April 29, 2026 05:05

cursor Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread packages/renderers/renderers/base.py Outdated

Comment thread docs/training.md

hallerite and others added 3 commits April 29, 2026 05:28

style: ruff format trailing newlines in test_renderer_e2e

aa2ac32

Two trailing blank lines left over from removing the diagnostic test in 9afd448. ruff format --check caught it; ruff check passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread packages/renderers/renderers/kimi_k2.py

hallerite requested a review from willccbb April 29, 2026 01:39

mikasenghaas approved these changes Apr 29, 2026

View reviewed changes

eligotts approved these changes Apr 29, 2026

View reviewed changes

hallerite merged commit b7dd85f into main Apr 29, 2026
7 of 8 checks passed


		### Per-renderer bridges

		Each hand-coded renderer implements `bridge_to_next_turn` directly for its model's chat template — no shared generic helper, just Python that knows what tokens the template would insert between turns. Qwen3's bridge knows about `<\|im_start\|>role\n … <\|im_end\|>\n`; GLM's bridge knows that turns end when the next role marker appears; DeepSeek V3, Kimi K2/K2.5, Nemotron-3, GPT-OSS, MiniMax each have their own. On a clean stop, vLLM's `completion_ids` already includes the template's close token; on truncation, the renderer synthesizes the canonical close (`<\|im_end\|>`, `<\|endoftext\|>`, or the equivalent for that model) so the extension invariant still holds, and the synthetic close is masked out of the loss because the model didn't produce it.

Conversation

hallerite commented Mar 25, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Renderer matrix

Architecture

Key design decisions

RendererClient + /v1/generate

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mikasenghaas Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mikasenghaas Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

mikasenghaas Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

hallerite Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mikasenghaas Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hallerite commented Mar 25, 2026 •

edited by cursor Bot

Loading

RendererClient + `/v1/generate`