Skip to content

feat: improve recall tool description + add cross-session cue eval scenarios#396

Merged
BYK merged 1 commit into
mainfrom
feat/recall-tool-description-and-eval
May 19, 2026
Merged

feat: improve recall tool description + add cross-session cue eval scenarios#396
BYK merged 1 commit into
mainfrom
feat/recall-tool-description-and-eval

Conversation

@BYK
Copy link
Copy Markdown
Owner

@BYK BYK commented May 19, 2026

Problem

When a user says things like "we had this thing from earlier sessions" at the start of a conversation (layer 0, no compression), the LLM does not use the recall tool. The current tool description frames everything around "trimmed context" — which isn't true at layer 0, so the LLM dismisses the need to search.

Solution

1. Recall Tool Description Rewrite

Rewrote RECALL_TOOL_DESCRIPTION with a dual-trigger structure that separates two distinct cases:

  1. Cross-session references (always true) — explicit cue phrases like "last time", "we discussed", "earlier", "remember". Prior sessions are never in context.
  2. Missing details (true when compressed) — file paths, decisions, preferences not visible in the current window.

This works at every gradient layer because cross-session content is never in context, regardless of compression state.

2. Eval Extension for Cross-Session Recall Cue Detection

Extended the eval suite to test whether the LLM uses recall when given conversational cross-session references:

  • x-lore-recall-invoked response header — added to all non-streaming recall return paths in the gateway pipeline so the eval harness can detect recall usage (the gateway handles recall transparently — clients never see tool_use blocks)
  • RECALL_TRIGGER scoring criterion — new judge criterion that scores whether the LLM appropriately used recall for cross-session references (1-5 scale)
  • crossSessionCueRecall rubric — factual_accuracy (0.25), completeness (0.25), recall_trigger (0.3), temporal_attribution (0.2)
  • recallInvoked metadata flow — propagated from askQuestionViaGateway()judge()EvalResult.metadata, conditionally included in the judge prompt only when the rubric has a recall_trigger criterion
  • 8 new MSR-1 questions (msr1-q13 through msr1-q20) — same factual content as existing questions but phrased with natural conversational cues:
    • "Remember that auth bug we had?"
    • "We set up something for token refresh last time"
    • "Earlier we discussed why we went with JWT"
    • "What was that password hashing library we picked"
    • "remind me why" (PKCE)
    • "We changed the rate limiter key at some point"
    • "We built up the auth module over a couple of sessions"
    • "What were those callback URLs we configured"

Files Changed

File Change
packages/core/src/recall.ts Rewrite RECALL_TOOL_DESCRIPTION
packages/gateway/src/pipeline.ts Add extraHeaders to nonStreamHttpResponse(), set x-lore-recall-invoked header
packages/core/eval/judge.ts Add RECALL_TRIGGER criterion, crossSessionCueRecall rubric, metadata param to judge()
packages/core/eval/harness.ts Read recall header, propagate recallInvoked through scoring pipeline
packages/core/eval/scenarios/multi-session-recall.ts Add 8 new cross-session cue questions

Verification

  • bun run typecheck — all 4 packages pass
  • bun test — all 1630 tests pass
  • All changes are backward compatible (new params are optional, existing questions unaffected)

@BYK BYK force-pushed the feat/recall-tool-description-and-eval branch from 5e06851 to b174c8e Compare May 19, 2026 17:28
…enarios

Rewrite RECALL_TOOL_DESCRIPTION with dual-trigger structure so the LLM
uses recall at layer 0 (early session) when users reference past sessions:
  (1) Cross-session references — explicit cue phrases like 'last time',
      'we discussed', 'earlier', 'remember'. Prior sessions are never
      in context.
  (2) Missing details — file paths, decisions, preferences not visible
      in the current window.

Extend the eval suite to test cross-session recall trigger sensitivity:
- Add x-lore-recall-invoked response header to non-streaming recall paths
- Add RECALL_TRIGGER scoring criterion and crossSessionCueRecall rubric
- Pass recallInvoked metadata through judge for recall_trigger scoring
- Add 8 new MSR-1 questions using conversational cross-session cues
  (msr1-q13 through msr1-q20)
@BYK BYK force-pushed the feat/recall-tool-description-and-eval branch from b174c8e to 0ad89f4 Compare May 19, 2026 18:13
@BYK BYK enabled auto-merge (squash) May 19, 2026 18:13
@BYK BYK merged commit 17180ea into main May 19, 2026
10 checks passed
@BYK BYK deleted the feat/recall-tool-description-and-eval branch May 19, 2026 18:14
This was referenced May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant