Skip to content

fix(history): atomic pair-aware truncation for tool_call blocks#487

Open
TatsuKo-Tsukimi wants to merge 1 commit intodataelement:mainfrom
TatsuKo-Tsukimi:fix/history-pair-aware-truncation
Open

fix(history): atomic pair-aware truncation for tool_call blocks#487
TatsuKo-Tsukimi wants to merge 1 commit intodataelement:mainfrom
TatsuKo-Tsukimi:fix/history-pair-aware-truncation

Conversation

@TatsuKo-Tsukimi
Copy link
Copy Markdown

Summary

Replace naive conversation[-ctx_size:] slicing with a walker that treats assistant.tool_calls and its matching role="tool" messages as one atomic block. Naive slicing can leave an orphan role="tool" at the head when the cut lands mid-pair — OpenAI rejects this with No tool call found for function call output (issue #446).

New helper services/history_window.truncate_by_message_count detects orphan tool messages by tool_call_id matching, not adjacency — a tool inserted between a valid pair and other messages (from malformed persistence or upstream truncation) is dropped, not folded into a block. Robust against orphans at any position.

Changes

  • New: backend/app/services/history_window.py — pair-aware truncation helper
  • New: backend/tests/test_history_window.py — 16 unit tests (empty / budget edges / parallel tool_calls / multi-pair / head and mid orphans / orphan-adjacent-to-pair / 60-message synthetic invariant)
  • Edited: backend/app/api/websocket.py — replace head-only pop guard at the tool-call truncation site
  • Edited: backend/app/api/feishu.py — same pattern for the IM channel path
  • Leaves app/services/llm/caller.py:626 untouched (call_agent_llm short-reply path's hardcoded [-10:] is intentional)

Stack

This is PR 1/4 of a context-budget hygiene series. The full stack:

  1. #THIS_PR — pair-aware truncation (this PR)
  2. follow-up — token-aware truncation alongside message cap
  3. follow-up — truncate large tool results with workspace spill
  4. follow-up — parallelize read-only tool calls + consolidate three tool loops

Each follow-up is its own PR, posted separately. Recommend reviewing in order; merging in order.

Test plan

  • pytest backend/tests/test_history_window.py — 16/16 pass
  • ruff check on touched files — clean
  • Reviewer: confirm orphan detection by ID (not adjacency) is the desired semantic
  • Reviewer: verify caller.py:626 [-10:] short-reply path was intentional to leave untouched

Addresses the failure mode reported in #446. Reproduce: long enough conversation with tool calls; legacy slice + head-pop leaves an orphan that reaches OpenAI Responses API. New helper drops it.

🤖 Generated with Claude Code

Replace naive `conversation[-ctx_size:]` slicing with a walker that treats
assistant.tool_calls and its matching role="tool" messages as one atomic
block. Naive slicing can leave an orphan role="tool" at the head when the
cut lands mid-pair — OpenAI rejects this with "No tool call found for
function call output" (issue dataelement#446).

New helper `services/history_window.truncate_by_message_count` walks tail
backward, identifies blocks, and includes blocks whole or not at all.
Orphan tools are silently dropped regardless of budget.

Replaces head-only pop guard in:
  - app/api/websocket.py (web chat)
  - app/api/feishu.py (feishu channel)

Leaves app/services/llm/caller.py:626 untouched (call_agent_llm short-reply
path's hardcoded [-10:] is intentional).

Tests: 15 covering empty input, budget edges, parallel tool_calls,
multi-pair, head/mid orphans, realistic 60-message invariant.

Addresses dataelement#446 failure mode.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 114be24657

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# Naive [-ctx_size:] slicing can leave orphan tool messages at the
# head when the cut lands mid-pair, which OpenAI rejects with
# "No tool call found for function call output" (issue #446).
_truncated = truncate_by_message_count(conversation, ctx_size)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Bound pair-aware truncation to recent history

truncate_by_message_count(conversation, ctx_size) now runs over the full in-memory conversation on every turn, but this websocket loop keeps appending messages and does not prune that list, so truncation cost grows with session length rather than being effectively bounded by ctx_size like the previous conversation[-ctx_size:] slice. In long-lived chats this can introduce steadily increasing per-turn latency and CPU usage; pre-limiting the input (for example to a bounded tail window) before pair-aware truncation would avoid this regression while preserving the pairing fix.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant