Skip to content

bug: model writes marker-like text instead of calling recall tool #409

@BYK

Description

@BYK

Problem

In the CM-1 eval (live mode), the model sometimes writes text that resembles a recall marker (e.g. 📚 Fetching details for t:... and t:... simultaneously…) as its final answer instead of actually calling the recall tool.

Evidence

Question cm-1-e1: "What test file was created for the upload abort handling?"

  • Expected: src/__tests__/upload-abort.test.ts
  • Got: 📚 Fetching details for t:e8a949d7... and t:36d96080... simultaneously…
  • Score: 1/5

The model generated free-form text that looks like a marker but never invoked the recall tool. The actual buildRecallMarker() produces 📚 Fetching detail for <id>… (singular), so this isn't a marker leak — the model composed this text itself.

Root Cause Hypothesis

The model sees recall markers from previous turns in the conversation and mimics the format instead of using the tool. This could be addressed by:

  1. Making the marker format less "tool-like" so the model doesn't confuse it with an action
  2. Adding explicit instructions in the recall tool description to always use the tool, never write markers manually
  3. Adjusting the QA prompt to discourage this behavior

Impact

This affected 1 of 15 CM-1 questions in live eval. Score impact: ~0.27 points on the overall CM-1 average.

Context

Discovered during live eval of #404 (multi-turn recall).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions