Fix TITO bridge extraction and truncation handling by eligotts · Pull Request #1005 · PrimeIntellect-ai/verifiers

eligotts · 2026-03-11T00:45:28Z

Summary

Fix bridge extraction: Rewrote TITO bridge token extraction to use a dummy-assistant dual-tokenization approach that correctly handles all chat templates (Qwen3, Qwen3.5, GLM-4.5/4.7) including GLM's stop-token-as-role-marker pattern and Qwen3's context-dependent think block injection.
Fix truncation gate: The TITO truncation gate now checks both tokens["is_truncated"] (seq_len overflow) and response.message.is_truncated (finish_reason="length" from vLLM). Previously only seq_len overflow was checked, so max_tokens truncation was missed — TITO would attempt bridge stitching on a completion without a stop token, producing malformed token sequences.

Problem

When a completion hit max_tokens (e.g., thinking model burns through token budget), vLLM returns finish_reason="length" and does not include a stop token in completion token_ids. The TITO client's truncation gate only checked TrajectoryStepTokens["is_truncated"], which reflects seq_len overflow but not max_tokens truncation. So it proceeded with bridge extraction, grabbed a random content token as the "stop token" for gap calculation, and produced broken stitched sequences.

Testing

Empirically validated stop token behavior across 6 models on prime-rl's custom vLLM server (/v1/chat/completions/tokens endpoint):

Model	Stop Token	stop ✓	tool_calls ✓	length (no stop) ✓
Qwen3-4B-Instruct-2507	151645 `<\|im_end\|>`	✅	✅	✅
Qwen3-0.6B	151645 `<\|im_end\|>`	✅	n/a	✅
Qwen3-8B	151645 `<\|im_end\|>`	✅	✅	✅
Qwen3-30B-A3B	151645 `<\|im_end\|>`	✅	✅	✅
Qwen3.5-4B	248046	✅	✅	✅
GLM-4.7-Flash	154827 `<\|user\|>` / 154829 `<\|observation\|>`	✅	✅	✅

Confirmed across all models: finish_reason=length never includes a stop token in token_ids. Bridge extraction and GLM dedup logic validated for both tool-call multi-turn (wiki-search style) and user-message multi-turn (alphabet-sort style).

Ran RL training experiments with use_token_client=true across Qwen3-4B, Qwen3-0.6B, Qwen3-30B-A3B, and Qwen3.5-4B on wiki-search and alphabet-sort environments, confirming no TITO errors.

Test plan

Run alphabet-sort with low max_tokens (e.g., 32) to trigger frequent finish_reason=length on multi-turn — verify MITO fallback fires and rollout completes correctly
Run wiki-search with use_token_client=true — verify TITO path works for tool-call multi-turn
Verify no regression on single-turn environments

🤖 Generated with Claude Code

Note

Medium Risk
Touches core prompt/token stitching logic used for KV-cache reuse across turns; failures can corrupt prompts or force MITO fallback, though changes add additional validation and safe fallbacks.

Overview
Fixes TITO prompt stitching in OpenAIChatCompletionsTokenClient.get_prompt_ids by replacing full-prompt slicing/suffix caching with a dummy-assistant dual-tokenization approach that extracts only the minimal “bridge” tokens needed after a cached prefix, including handling templates where stop tokens can act as role markers.

Adds stricter gating to avoid stitching when the matched previous completion was truncated (checking both token parsing overflow and finish_reason="length" via response.message.is_truncated), plus validates that the post-prefix “env tail” is composed of tool messages with an optional trailing user message; otherwise it falls back to message-based inference (MITO).

^{Written by Cursor Bugbot for commit ed76a51. This will update automatically on new commits. Configure here.}

Replaces the previous bridge extraction that tokenized the real assistant message (breaking on Qwen3's context-dependent think block injection) with a robust dual-tokenization approach: - Tokenize [dummy_assistant + env_messages] with gen=True - Tokenize [dummy_assistant] with gen=False - Extract bridge via subtraction, accounting for the gap between the engine's stop token and the template's inter-turn separator - Dedup stop tokens that double as role markers (GLM's <|observation|>) - Handle truncated completions by falling back to MITO - Support multiple trailing env messages (multi-tool responses) Verified across Qwen3, Qwen2.5, Hermes, and GLM model families with exact bridge matches on all edge cases (empty content, unicode, injection attempts, multi-tool, multi-turn chains). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The TITO client's truncation gate only checked TrajectoryStepTokens["is_truncated"], which reflects seq_len overflow but not max_tokens truncation (finish_reason="length" from vLLM). When a completion hit max_tokens, is_truncated was False (total sequence fit within seq_len), so TITO proceeded with bridge extraction. Without a stop token at the end of the truncated completion, the gap calculation used a random content token, producing malformed stitched sequences. Now checks both sources: - tokens["is_truncated"] for seq_len overflow - response.message.is_truncated for finish_reason="length" Either triggers MITO fallback, which re-tokenizes from messages correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Trailing env message count misses mixed tool+user sequences
- Removed the premature break after counting a trailing user message so preceding trailing tool/observation messages are now included in the env-message bridge count.

Or push these changes by commenting:

@cursor push 8c90cb20f7

Preview (8c90cb20f7)

diff --git a/verifiers/clients/openai_chat_completions_token_client.py b/verifiers/clients/openai_chat_completions_token_client.py
--- a/verifiers/clients/openai_chat_completions_token_client.py
+++ b/verifiers/clients/openai_chat_completions_token_client.py
@@ -42,7 +42,6 @@
         elif role == "user" and count == 0:
             # A user follow-up (not a tool response) is also an env message
             count = 1
-            break
         else:
             break
     return count

_{This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.}

Applied via @cursor push command

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace _count_trailing_env_messages with direct derivation from the prefix match result. The env messages are simply prompt_messages[prefix_len:] — no need to independently re-derive from the tail. Validate the pattern: all tool messages, with optionally a user message last. Falls back to MITO on unexpected message shapes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mikasenghaas and others added 5 commits March 6, 2026 23:14

initial

ff3d262

safe mito fallback + difflib for debug

0520bd9

debug log

0d622cd

cursor bot reviewed Mar 11, 2026

View reviewed changes

Comment thread verifiers/clients/openai_chat_completions_token_client.py Outdated

cursoragent and others added 4 commits March 11, 2026 03:07

Fix trailing env message counting for tool+user tails

0f78bac

Applied via @cursor push command

Fix type annotations for bridge extraction variables

a95fb57

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix ruff formatting

ed76a51

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

eligotts merged commit 8ac737c into main Mar 30, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix TITO bridge extraction and truncation handling#1005

Fix TITO bridge extraction and truncation handling#1005
eligotts merged 9 commits intomainfrom
eli/fix-tito

eligotts commented Mar 11, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

eligotts commented Mar 11, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Testing

Test plan

Uh oh!

cursor bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

eligotts commented Mar 11, 2026 •

edited by cursor bot

Loading

cursor bot left a comment •

edited

Loading