Skip to content

feat: allow multiple recall tool calls per request (multi-turn recall)#404

Merged
BYK merged 1 commit into
mainfrom
feat/multi-turn-recall
May 20, 2026
Merged

feat: allow multiple recall tool calls per request (multi-turn recall)#404
BYK merged 1 commit into
mainfrom
feat/multi-turn-recall

Conversation

@BYK
Copy link
Copy Markdown
Owner

@BYK BYK commented May 19, 2026

Summary

Allows the model to call recall multiple times within a single turn, enabling drill-down from high-level search results into specific t:<id> source citations. Previously, the gateway stripped recall from the tools list after the first follow-up, preventing any further recall calls.

Problem

The gateway stripped the recall tool after the first recall follow-up (recall.ts:438-442). This prevented the LLM from:

  1. Following up on t:<id> source citations in distillation results
  2. Making multiple targeted recall queries in a single conversation turn
  3. Drilling down from a general recall result to specific details

Approach

Treat recall like any other tool — stop artificially limiting it to one call per turn. The continuation path is now recall-aware, with a generous safety-net cap (MAX_RECALL_DEPTH = 10) to prevent pathological loops.

Changes

packages/gateway/src/recall.ts

  • Add MAX_RECALL_DEPTH = 10 safety-net constant
  • Remove tool stripping from buildRecallFollowUp() — recall stays in the tools list

packages/gateway/src/pipeline.ts

  • Non-streaming path: Replace linear recall interception + defense-in-depth with a while loop. Each iteration executes recall, stores the result, builds a follow-up, and checks if the continuation also has recall. Usage accumulated across iterations.
  • Streaming path: Replace inline continuation pipe + defense-in-depth suppression with a loop using a fresh RecallAwareAccumulator per continuation stream. Block indices tracked across iterations via cumulative blockOffset.

packages/gateway/src/stream/anthropic.ts

  • Add blockOffset option to createRecallAwareAccumulator() — shifts all emitted block indices for continuation streams
  • Add suppressMessageStart option — suppresses message_start events for continuation streams where the client already received one

Tests

  • Updated buildRecallFollowUp test to verify recall is kept in tools
  • Added blockOffset tests (correct index shifting, composition with suppression)
  • Added suppressMessageStart tests
  • Added continuation stream scenario test (simulates two chained recall follow-ups with correct cumulative block indexing)

Closes #399

Stop stripping the recall tool from follow-up requests, making the
continuation path recall-aware. The model can now call recall multiple
times within a single turn — e.g. first a broad search, then drilling
down into specific t:<id> source citations.

Changes:
- recall.ts: Add MAX_RECALL_DEPTH (10) safety-net constant. Remove tool
  stripping from buildRecallFollowUp() — recall stays in the tools list.
- pipeline.ts: Non-streaming path becomes a while loop that re-checks
  hasRecallToolUse() after each continuation. Streaming path uses a
  fresh RecallAwareAccumulator per continuation stream with blockOffset
  and suppressMessageStart options for correct SSE block indexing.
- stream/anthropic.ts: Add blockOffset and suppressMessageStart options
  to createRecallAwareAccumulator() for continuation stream support.

Closes #399
@BYK BYK force-pushed the feat/multi-turn-recall branch from a46c39b to 3bbf047 Compare May 20, 2026 06:46
@BYK BYK merged commit b496ad3 into main May 20, 2026
10 checks passed
@BYK BYK deleted the feat/multi-turn-recall branch May 20, 2026 07:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: allow multiple recall tool calls per request (multi-turn recall)

1 participant