Skip to content

fix(core): sanitize orphaned tool_use/tool_result on history restore (#1360)#1362

Merged
bug-ops merged 2 commits intomainfrom
fix-1360-history-restore
Mar 8, 2026
Merged

fix(core): sanitize orphaned tool_use/tool_result on history restore (#1360)#1362
bug-ops merged 2 commits intomainfrom
fix-1360-history-restore

Conversation

@bug-ops
Copy link
Owner

@bug-ops bug-ops commented Mar 8, 2026

Summary

  • Cross-session history restore could produce invalid tool_use/tool_result sequences at history boundaries, causing Claude API 400 errors on session resume
  • Add sanitize_tool_pairs() post-load sanitization in load_history() that removes orphaned tool messages at both ends of restored history
  • 6 unit tests added covering all boundary conditions

Root causes

RC-1: load_history_filtered() LIMIT clause can split a tool_use/tool_result pair at the boundary, leaving an orphaned tool_use as the last restored message.

RC-2: Session interruption (Ctrl+C, timeout, crash) between persisting the assistant tool_use message and the user tool_result message leaves an orphaned tool_use in SQLite. On next session restore this triggers a Claude API 400.

Changes

  • crates/zeph-core/src/agent/persistence.rs: Add private sanitize_tool_pairs(messages: &mut Vec<Message>) -> usize. Called in load_history() after the loading loop on the just-loaded slice (split off from self.messages to exclude the system prompt). The function loops until stable, removing:
    1. Trailing assistant messages that have ToolUse parts with no following user ToolResult
    2. Leading user messages that have ToolResult parts with no preceding assistant ToolUse
      Each removal logs tracing::warn! with affected tool IDs.

Tests

6 new unit tests in agent::persistence::tests:

Test Scenario
load_history_removes_trailing_orphan_tool_use Trailing orphan removed
load_history_removes_leading_orphan_tool_result Leading orphan removed
load_history_preserves_complete_tool_pairs Valid pair preserved
load_history_handles_multiple_trailing_orphans Multiple consecutive orphans removed
load_history_no_tool_messages_unchanged Plain messages pass through
load_history_removes_both_leading_and_trailing_orphans Loop handles both ends in one call

Test plan

  • cargo +nightly fmt --check passes
  • cargo clippy --workspace --features full -- -D warnings passes
  • cargo nextest run --workspace --features full --lib --bins passes (4693 tests)
  • All 10 load_history tests pass
  • config_default_snapshot failure is pre-existing on main (confirmed), unrelated to this PR

Notes

  • No SQL queries modified
  • O(n) remove(0) for leading orphan removal is acceptable given history_limit bounds; can be optimized with VecDeque in a future pass (R-1)
  • Mid-sequence orphan detection (RC-4) deferred as separate low-severity issue (R-2)

Closes #1360

bug-ops added 2 commits March 8, 2026 22:25
…1360)

Cross-session history restore could produce invalid tool_use/tool_result
sequences at history boundaries, causing Claude API 400 errors.

Two root causes:
- RC-1: load_history_filtered() LIMIT clause can split a tool_use/tool_result
  pair at the boundary, leaving an orphaned tool_use as the last restored message.
- RC-2: Session interruption between persisting the assistant tool_use message
  and the user tool_result message leaves an orphaned tool_use in SQLite.

Add sanitize_tool_pairs() called in load_history() after the loading loop.
The function loops until stable, removing:
1. Trailing assistant messages that have ToolUse parts with no following
   user message containing ToolResult parts.
2. Leading user messages that have ToolResult parts with no preceding
   assistant message containing ToolUse parts.

Each removal is logged via tracing::warn with the affected tool IDs.

Add 6 unit tests covering all cases: trailing orphan, leading orphan,
complete pair preserved, multiple consecutive orphans, plain messages
unchanged, and combined leading+trailing orphans in one history.
@github-actions github-actions bot added bug Something isn't working size/L Large PR (201-500 lines) documentation Improvements or additions to documentation rust Rust code changes core zeph-core crate and removed size/L Large PR (201-500 lines) labels Mar 8, 2026
@bug-ops bug-ops enabled auto-merge (squash) March 8, 2026 21:28
@bug-ops bug-ops merged commit e356654 into main Mar 8, 2026
40 of 42 checks passed
@bug-ops bug-ops deleted the fix-1360-history-restore branch March 8, 2026 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working core zeph-core crate documentation Improvements or additions to documentation rust Rust code changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: cross-session history restore produces invalid tool_use/tool_result sequence

1 participant