feat: improve recall tool description + add cross-session cue eval scenarios by BYK · Pull Request #396 · BYK/loreai

BYK · 2026-05-19T17:16:04Z

Problem

When a user says things like "we had this thing from earlier sessions" at the start of a conversation (layer 0, no compression), the LLM does not use the recall tool. The current tool description frames everything around "trimmed context" — which isn't true at layer 0, so the LLM dismisses the need to search.

Solution

1. Recall Tool Description Rewrite

Rewrote RECALL_TOOL_DESCRIPTION with a dual-trigger structure that separates two distinct cases:

Cross-session references (always true) — explicit cue phrases like "last time", "we discussed", "earlier", "remember". Prior sessions are never in context.
Missing details (true when compressed) — file paths, decisions, preferences not visible in the current window.

This works at every gradient layer because cross-session content is never in context, regardless of compression state.

2. Eval Extension for Cross-Session Recall Cue Detection

Extended the eval suite to test whether the LLM uses recall when given conversational cross-session references:

x-lore-recall-invoked response header — added to all non-streaming recall return paths in the gateway pipeline so the eval harness can detect recall usage (the gateway handles recall transparently — clients never see tool_use blocks)
RECALL_TRIGGER scoring criterion — new judge criterion that scores whether the LLM appropriately used recall for cross-session references (1-5 scale)
crossSessionCueRecall rubric — factual_accuracy (0.25), completeness (0.25), recall_trigger (0.3), temporal_attribution (0.2)
recallInvoked metadata flow — propagated from askQuestionViaGateway() → judge() → EvalResult.metadata, conditionally included in the judge prompt only when the rubric has a recall_trigger criterion
8 new MSR-1 questions (msr1-q13 through msr1-q20) — same factual content as existing questions but phrased with natural conversational cues:
- "Remember that auth bug we had?"
- "We set up something for token refresh last time"
- "Earlier we discussed why we went with JWT"
- "What was that password hashing library we picked"
- "remind me why" (PKCE)
- "We changed the rate limiter key at some point"
- "We built up the auth module over a couple of sessions"
- "What were those callback URLs we configured"

Files Changed

File	Change
`packages/core/src/recall.ts`	Rewrite `RECALL_TOOL_DESCRIPTION`
`packages/gateway/src/pipeline.ts`	Add `extraHeaders` to `nonStreamHttpResponse()`, set `x-lore-recall-invoked` header
`packages/core/eval/judge.ts`	Add `RECALL_TRIGGER` criterion, `crossSessionCueRecall` rubric, `metadata` param to `judge()`
`packages/core/eval/harness.ts`	Read recall header, propagate `recallInvoked` through scoring pipeline
`packages/core/eval/scenarios/multi-session-recall.ts`	Add 8 new cross-session cue questions

Verification

bun run typecheck — all 4 packages pass
bun test — all 1630 tests pass
All changes are backward compatible (new params are optional, existing questions unaffected)

…enarios Rewrite RECALL_TOOL_DESCRIPTION with dual-trigger structure so the LLM uses recall at layer 0 (early session) when users reference past sessions: (1) Cross-session references — explicit cue phrases like 'last time', 'we discussed', 'earlier', 'remember'. Prior sessions are never in context. (2) Missing details — file paths, decisions, preferences not visible in the current window. Extend the eval suite to test cross-session recall trigger sensitivity: - Add x-lore-recall-invoked response header to non-streaming recall paths - Add RECALL_TRIGGER scoring criterion and crossSessionCueRecall rubric - Pass recallInvoked metadata through judge for recall_trigger scoring - Add 8 new MSR-1 questions using conversational cross-session cues (msr1-q13 through msr1-q20)

BYK force-pushed the feat/recall-tool-description-and-eval branch from 5e06851 to b174c8e Compare May 19, 2026 17:28

BYK force-pushed the feat/recall-tool-description-and-eval branch from b174c8e to 0ad89f4 Compare May 19, 2026 18:13

BYK enabled auto-merge (squash) May 19, 2026 18:13

BYK merged commit 17180ea into main May 19, 2026
10 checks passed

BYK deleted the feat/recall-tool-description-and-eval branch May 19, 2026 18:14

This was referenced May 21, 2026

publish: BYK/loreai@0.23.0 #439

Closed

publish: BYK/loreai@0.23.0 #448

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve recall tool description + add cross-session cue eval scenarios#396

feat: improve recall tool description + add cross-session cue eval scenarios#396
BYK merged 1 commit into
mainfrom
feat/recall-tool-description-and-eval

BYK commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BYK commented May 19, 2026

Problem

Solution

1. Recall Tool Description Rewrite

2. Eval Extension for Cross-Session Recall Cue Detection

Files Changed

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant