Fix TITO bridge extraction and truncation handling#1005
Merged
Conversation
Replaces the previous bridge extraction that tokenized the real assistant message (breaking on Qwen3's context-dependent think block injection) with a robust dual-tokenization approach: - Tokenize [dummy_assistant + env_messages] with gen=True - Tokenize [dummy_assistant] with gen=False - Extract bridge via subtraction, accounting for the gap between the engine's stop token and the template's inter-turn separator - Dedup stop tokens that double as role markers (GLM's <|observation|>) - Handle truncated completions by falling back to MITO - Support multiple trailing env messages (multi-tool responses) Verified across Qwen3, Qwen2.5, Hermes, and GLM model families with exact bridge matches on all edge cases (empty content, unicode, injection attempts, multi-tool, multi-turn chains). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The TITO client's truncation gate only checked TrajectoryStepTokens["is_truncated"], which reflects seq_len overflow but not max_tokens truncation (finish_reason="length" from vLLM). When a completion hit max_tokens, is_truncated was False (total sequence fit within seq_len), so TITO proceeded with bridge extraction. Without a stop token at the end of the truncated completion, the gap calculation used a random content token, producing malformed stitched sequences. Now checks both sources: - tokens["is_truncated"] for seq_len overflow - response.message.is_truncated for finish_reason="length" Either triggers MITO fallback, which re-tokenizes from messages correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Trailing env message count misses mixed tool+user sequences
- Removed the premature break after counting a trailing user message so preceding trailing tool/observation messages are now included in the env-message bridge count.
Or push these changes by commenting:
@cursor push 8c90cb20f7
Preview (8c90cb20f7)
diff --git a/verifiers/clients/openai_chat_completions_token_client.py b/verifiers/clients/openai_chat_completions_token_client.py
--- a/verifiers/clients/openai_chat_completions_token_client.py
+++ b/verifiers/clients/openai_chat_completions_token_client.py
@@ -42,7 +42,6 @@
elif role == "user" and count == 0:
# A user follow-up (not a tool response) is also an env message
count = 1
- break
else:
break
return countThis Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
Applied via @cursor push command
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace _count_trailing_env_messages with direct derivation from the prefix match result. The env messages are simply prompt_messages[prefix_len:] — no need to independently re-derive from the tail. Validate the pattern: all tool messages, with optionally a user message last. Falls back to MITO on unexpected message shapes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
tokens["is_truncated"](seq_len overflow) andresponse.message.is_truncated(finish_reason="length"from vLLM). Previously only seq_len overflow was checked, so max_tokens truncation was missed — TITO would attempt bridge stitching on a completion without a stop token, producing malformed token sequences.Problem
When a completion hit
max_tokens(e.g., thinking model burns through token budget), vLLM returnsfinish_reason="length"and does not include a stop token incompletion token_ids. The TITO client's truncation gate only checkedTrajectoryStepTokens["is_truncated"], which reflects seq_len overflow but not max_tokens truncation. So it proceeded with bridge extraction, grabbed a random content token as the "stop token" for gap calculation, and produced broken stitched sequences.Testing
Empirically validated stop token behavior across 6 models on prime-rl's custom vLLM server (
/v1/chat/completions/tokensendpoint):<|im_end|><|im_end|><|im_end|><|im_end|><|user|>/ 154829<|observation|>Confirmed across all models:
finish_reason=lengthnever includes a stop token intoken_ids. Bridge extraction and GLM dedup logic validated for both tool-call multi-turn (wiki-search style) and user-message multi-turn (alphabet-sort style).Ran RL training experiments with
use_token_client=trueacross Qwen3-4B, Qwen3-0.6B, Qwen3-30B-A3B, and Qwen3.5-4B on wiki-search and alphabet-sort environments, confirming no TITO errors.Test plan
max_tokens(e.g., 32) to trigger frequentfinish_reason=lengthon multi-turn — verify MITO fallback fires and rollout completes correctlyuse_token_client=true— verify TITO path works for tool-call multi-turn🤖 Generated with Claude Code
Note
Medium Risk
Touches core prompt/token stitching logic used for KV-cache reuse across turns; failures can corrupt prompts or force MITO fallback, though changes add additional validation and safe fallbacks.
Overview
Fixes TITO prompt stitching in
OpenAIChatCompletionsTokenClient.get_prompt_idsby replacing full-prompt slicing/suffix caching with a dummy-assistant dual-tokenization approach that extracts only the minimal “bridge” tokens needed after a cached prefix, including handling templates where stop tokens can act as role markers.Adds stricter gating to avoid stitching when the matched previous completion was truncated (checking both token parsing overflow and
finish_reason="length"viaresponse.message.is_truncated), plus validates that the post-prefix “env tail” is composed of tool messages with an optional trailing user message; otherwise it falls back to message-based inference (MITO).Written by Cursor Bugbot for commit ed76a51. This will update automatically on new commits. Configure here.