feat: allow multiple recall tool calls per request (multi-turn recall)#404
Merged
Conversation
c0a01df to
a46c39b
Compare
This was referenced May 19, 2026
Stop stripping the recall tool from follow-up requests, making the continuation path recall-aware. The model can now call recall multiple times within a single turn — e.g. first a broad search, then drilling down into specific t:<id> source citations. Changes: - recall.ts: Add MAX_RECALL_DEPTH (10) safety-net constant. Remove tool stripping from buildRecallFollowUp() — recall stays in the tools list. - pipeline.ts: Non-streaming path becomes a while loop that re-checks hasRecallToolUse() after each continuation. Streaming path uses a fresh RecallAwareAccumulator per continuation stream with blockOffset and suppressMessageStart options for correct SSE block indexing. - stream/anthropic.ts: Add blockOffset and suppressMessageStart options to createRecallAwareAccumulator() for continuation stream support. Closes #399
a46c39b to
3bbf047
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Allows the model to call recall multiple times within a single turn, enabling drill-down from high-level search results into specific
t:<id>source citations. Previously, the gateway stripped recall from the tools list after the first follow-up, preventing any further recall calls.Problem
The gateway stripped the recall tool after the first recall follow-up (
recall.ts:438-442). This prevented the LLM from:t:<id>source citations in distillation resultsApproach
Treat recall like any other tool — stop artificially limiting it to one call per turn. The continuation path is now recall-aware, with a generous safety-net cap (
MAX_RECALL_DEPTH = 10) to prevent pathological loops.Changes
packages/gateway/src/recall.tsMAX_RECALL_DEPTH = 10safety-net constantbuildRecallFollowUp()— recall stays in the tools listpackages/gateway/src/pipeline.tswhileloop. Each iteration executes recall, stores the result, builds a follow-up, and checks if the continuation also has recall. Usage accumulated across iterations.RecallAwareAccumulatorper continuation stream. Block indices tracked across iterations via cumulativeblockOffset.packages/gateway/src/stream/anthropic.tsblockOffsetoption tocreateRecallAwareAccumulator()— shifts all emitted block indices for continuation streamssuppressMessageStartoption — suppressesmessage_startevents for continuation streams where the client already received oneTests
buildRecallFollowUptest to verify recall is kept in toolsblockOffsettests (correct index shifting, composition with suppression)suppressMessageStarttestsCloses #399