Skip BM25 topic injection on short user messages (#383) by rockfordlhotka · Pull Request #384 · MarimerLLC/rockbot

rockfordlhotka · 2026-05-11T06:03:21Z

Summary

When the user message is ≤30 characters, AgentContextBuilder skips per-turn BM25 long-term memory recall, episodic memory recall, skill recall, and service-hint injection — and the embedding generation that backs them. Conversation history, active rules, identity entries, and working memory still flow.

Knowledge-graph memory-seed triples drop out as a consequence since the seed list collapses to empty; user-seed triples (entity name match) still run unchanged.

Why

Tracked in #383. On short low-signal turns, BM25 over the user message returns lexically-noisy hits that drown out the recent conversation thread. gpt-5.4-mini (Low tier) then summarises the injected memory instead of replying to what was said — typically by issuing a SaveMemory call and writing "Noted, I saved X." as the reply.

Two production incidents in blazor-session:

user message	length	bad reply
"I'll find out soon"	18	"Noted. The Cathedral City trip is on the board for this coming winter, and the health note is now saved with it."
"Hopefully we can go this coming winter…"	67	"Noted. I've got that on the travel ledger: hoping for Cathedral City this coming winter…" (also duplicated; tracked separately)

The 30-char threshold catches the 18-char case cleanly. Longer-message variants of the same wrong-context pattern are not addressed here — left as a follow-up under #383.

Behaviour after the change

Single new log line when the gate fires:
AgentContextBuilder: Short user message (N chars ≤ 30) — skipping per-turn topic search injection
Long messages: unchanged.
First-turn fallback (recalled.Count == 0 && history.Count == 1) still runs, so a brand-new session with a short opening message still gets a small memory injection.

Version

Bumped Directory.Build.props to 0.10.51 to match the image already deployed to the maintainer's cluster.

Test plan

dotnet build clean
Live verification — running pod on 0.10.51 for ~25h, gate fired twice on short turns, no wrong-topic SaveMemory→summary replies observed since deploy
Unit tests for the gate (follow-up; existing AgentContextBuilderKnowledgeGraphTests scaffolding is the natural home)

When the user message is ≤30 characters, AgentContextBuilder skips per-turn BM25 long-term memory recall, episodic memory recall, skill recall, and service-hint injection — and the embedding generation that backs them. Conversation history, active rules, identity entries, and working memory still flow; those are session grounding, not topic search. Knowledge-graph memory-seed triples drop out as a consequence since the seed list collapses to empty; user-seed triples (entity name match) still run unchanged. Rationale (#383): on short low-signal turns ("ok", "I'll find out soon"), BM25 over the user message returned lexically-noisy hits that drowned out the recent conversation thread. The model — gpt-5.4-mini on Low tier — then summarised the injected memory instead of replying to what was actually said, often by issuing a SaveMemory call and writing "Noted, I saved X." as the reply. Scope: - Threshold is 30 chars. Both observed incidents (18 chars, 24 chars) fall well under; the threshold catches them cleanly without affecting longer messages. - Longer-message variants of the same wrong-context pattern (e.g. 122-char turn that still produced a wrong-topic SaveMemory→summary reply) are not addressed here — tracked as a follow-up in #383. - One log line emitted when the gate fires, so the behaviour is visible in production logs. Bumps version to 0.10.51 to match the image already deployed to the maintainer's cluster. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The 6 capability-claim tests and 1 knowledge-graph test exercised paths inside LTM injection using "anything" / "what did Alice think?" as the user message — both below the new 30-char gate, which now skips LTM injection entirely. The tests don't care about the literal text (memory store responses are stubbed), so just use longer placeholder strings that exceed the threshold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Enforce mcp/{server-name} naming for MCP server skills (#381) Reinforces the mcp/{server-name} naming convention so per-server skills land in a protected namespace and the MCP guide's Step 0 fast-path works. - common-directives.md: per-server skills MUST live under mcp/{server-name}/. Single canonical entry or sub-skills grouped by functional area both fine. - skill-dream.md / DreamService prompt builder: narrow the namespaced-singleton constraint. Cross-server merges remain forbidden (incl. across sub-skills); within a single mcp/{server}/* namespace, normal semantic-overlap merging applies so duplicates self-heal while distinct functional areas are preserved. - McpSkillNameMigrationService: one-shot startup pass that renames legacy top-level skills matching a known MCP server_name to mcp/{server-name}. Idempotent; carries a "REMOVE AFTER MIGRATION" marker for follow-up cleanup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Skip BM25 topic injection on short user messages (#383) (#384) * Skip BM25 topic injection on short user messages (#383) When the user message is ≤30 characters, AgentContextBuilder skips per-turn BM25 long-term memory recall, episodic memory recall, skill recall, and service-hint injection — and the embedding generation that backs them. Conversation history, active rules, identity entries, and working memory still flow; those are session grounding, not topic search. Knowledge-graph memory-seed triples drop out as a consequence since the seed list collapses to empty; user-seed triples (entity name match) still run unchanged. Rationale (#383): on short low-signal turns ("ok", "I'll find out soon"), BM25 over the user message returned lexically-noisy hits that drowned out the recent conversation thread. The model — gpt-5.4-mini on Low tier — then summarised the injected memory instead of replying to what was actually said, often by issuing a SaveMemory call and writing "Noted, I saved X." as the reply. Scope: - Threshold is 30 chars. Both observed incidents (18 chars, 24 chars) fall well under; the threshold catches them cleanly without affecting longer messages. - Longer-message variants of the same wrong-context pattern (e.g. 122-char turn that still produced a wrong-topic SaveMemory→summary reply) are not addressed here — tracked as a follow-up in #383. - One log line emitted when the gate fires, so the behaviour is visible in production logs. Bumps version to 0.10.51 to match the image already deployed to the maintainer's cluster. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix AgentContextBuilder tests for short-message dampening The 6 capability-claim tests and 1 knowledge-graph test exercised paths inside LTM injection using "anything" / "what did Alice think?" as the user message — both below the new 30-char gate, which now skips LTM injection entirely. The tests don't care about the literal text (memory store responses are stubbed), so just use longer placeholder strings that exceed the threshold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix CapabilityClaimEndToEndTests for short-message dampening --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add DiagnosticLoggingChatClient watch for response duplication (#383) (#385) Adds a behaviour-neutral DelegatingChatClient that inspects assistant messages returned by the inner IChatClient and warns when the response shows the fingerprint of duplicated content reported by users — either more than one TextContent in a single assistant message, or a TextContent whose first 40-char prefix repeats after a newline (catches both exact "X\nX" and truncated "X.Y\nX" patterns). Wired in via AddRockBotTieredChatClients so each tier (Low/Balanced/High) gets the watch between its raw OpenAI client and RockBotFunctionInvokingChatClient. The tier label is included in every warning, identifying which model produced the duplication. Background: three production duplications observed against gpt-5.4-mini on Azure Foundry (Low tier), all `\n`-separated. Microsoft.Extensions.AI 10.5.x analysis ruled out the adapter and middleware as sources (Contents.ConcatText joins TextContent items with no separator; FunctionInvokingChatClient does not merge messages; ExtractAssistantText returns a single message's .Text). The likely remaining explanation is the upstream provider returning literal "X\nX" in choices[0].message.content. This wrapper captures the structure on the next occurrence so we know whether to add a sanitiser, push the issue upstream, or revisit the adapter analysis. No behaviour change; only emits warnings when the fingerprint matches. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rockfordlhotka mentioned this pull request May 11, 2026

Add DiagnosticLoggingChatClient watch for response duplication (#383) #385

Merged

2 tasks

rockfordlhotka and others added 2 commits May 11, 2026 01:13

Fix CapabilityClaimEndToEndTests for short-message dampening

5906834

rockfordlhotka merged commit 7865106 into rockfordlhotka/381-enforce-mcp-skill-naming May 11, 2026
2 checks passed

rockfordlhotka deleted the rockfordlhotka/383-short-message-context-dampening branch May 11, 2026 06:22

This was referenced May 11, 2026

Short-message wrong-context defense in depth (#383) #395

Merged

Wrong-context replies still appear on longer follow-ups that bypass the 30-char gate #397

Open

Agent replies to injected memory instead of recent prompt on short user messages #383

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip BM25 topic injection on short user messages (#383)#384

Skip BM25 topic injection on short user messages (#383)#384
rockfordlhotka merged 3 commits into
rockfordlhotka/381-enforce-mcp-skill-namingfrom
rockfordlhotka/383-short-message-context-dampening

rockfordlhotka commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rockfordlhotka commented May 11, 2026

Summary

Why

Behaviour after the change

Version

Test plan

Related

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant