Skip to content

Skip BM25 topic injection on short user messages (#383)#384

Merged
rockfordlhotka merged 3 commits into
rockfordlhotka/381-enforce-mcp-skill-namingfrom
rockfordlhotka/383-short-message-context-dampening
May 11, 2026
Merged

Skip BM25 topic injection on short user messages (#383)#384
rockfordlhotka merged 3 commits into
rockfordlhotka/381-enforce-mcp-skill-namingfrom
rockfordlhotka/383-short-message-context-dampening

Conversation

@rockfordlhotka
Copy link
Copy Markdown
Member

Summary

When the user message is ≤30 characters, AgentContextBuilder skips per-turn BM25 long-term memory recall, episodic memory recall, skill recall, and service-hint injection — and the embedding generation that backs them. Conversation history, active rules, identity entries, and working memory still flow.

Knowledge-graph memory-seed triples drop out as a consequence since the seed list collapses to empty; user-seed triples (entity name match) still run unchanged.

Why

Tracked in #383. On short low-signal turns, BM25 over the user message returns lexically-noisy hits that drown out the recent conversation thread. gpt-5.4-mini (Low tier) then summarises the injected memory instead of replying to what was said — typically by issuing a SaveMemory call and writing "Noted, I saved X." as the reply.

Two production incidents in blazor-session:

user message length bad reply
"I'll find out soon" 18 "Noted. The Cathedral City trip is on the board for this coming winter, and the health note is now saved with it."
"Hopefully we can go this coming winter…" 67 "Noted. I've got that on the travel ledger: hoping for Cathedral City this coming winter…" (also duplicated; tracked separately)

The 30-char threshold catches the 18-char case cleanly. Longer-message variants of the same wrong-context pattern are not addressed here — left as a follow-up under #383.

Behaviour after the change

  • Single new log line when the gate fires:
    AgentContextBuilder: Short user message (N chars ≤ 30) — skipping per-turn topic search injection
  • Long messages: unchanged.
  • First-turn fallback (recalled.Count == 0 && history.Count == 1) still runs, so a brand-new session with a short opening message still gets a small memory injection.

Version

Bumped Directory.Build.props to 0.10.51 to match the image already deployed to the maintainer's cluster.

Test plan

  • dotnet build clean
  • Live verification — running pod on 0.10.51 for ~25h, gate fired twice on short turns, no wrong-topic SaveMemory→summary replies observed since deploy
  • Unit tests for the gate (follow-up; existing AgentContextBuilderKnowledgeGraphTests scaffolding is the natural home)

Related

🤖 Generated with Claude Code

When the user message is ≤30 characters, AgentContextBuilder skips
per-turn BM25 long-term memory recall, episodic memory recall, skill
recall, and service-hint injection — and the embedding generation that
backs them. Conversation history, active rules, identity entries, and
working memory still flow; those are session grounding, not topic
search. Knowledge-graph memory-seed triples drop out as a consequence
since the seed list collapses to empty; user-seed triples (entity name
match) still run unchanged.

Rationale (#383): on short low-signal turns ("ok", "I'll find out
soon"), BM25 over the user message returned lexically-noisy hits that
drowned out the recent conversation thread. The model — gpt-5.4-mini
on Low tier — then summarised the injected memory instead of replying
to what was actually said, often by issuing a SaveMemory call and
writing "Noted, I saved X." as the reply.

Scope:
- Threshold is 30 chars. Both observed incidents (18 chars, 24 chars)
  fall well under; the threshold catches them cleanly without
  affecting longer messages.
- Longer-message variants of the same wrong-context pattern (e.g.
  122-char turn that still produced a wrong-topic SaveMemory→summary
  reply) are not addressed here — tracked as a follow-up in #383.
- One log line emitted when the gate fires, so the behaviour is
  visible in production logs.

Bumps version to 0.10.51 to match the image already deployed to the
maintainer's cluster.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rockfordlhotka and others added 2 commits May 11, 2026 01:13
The 6 capability-claim tests and 1 knowledge-graph test exercised paths
inside LTM injection using "anything" / "what did Alice think?" as the
user message — both below the new 30-char gate, which now skips LTM
injection entirely. The tests don't care about the literal text (memory
store responses are stubbed), so just use longer placeholder strings
that exceed the threshold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rockfordlhotka rockfordlhotka merged commit 7865106 into rockfordlhotka/381-enforce-mcp-skill-naming May 11, 2026
2 checks passed
@rockfordlhotka rockfordlhotka deleted the rockfordlhotka/383-short-message-context-dampening branch May 11, 2026 06:22
rockfordlhotka added a commit that referenced this pull request May 11, 2026
* Enforce mcp/{server-name} naming for MCP server skills (#381)

Reinforces the mcp/{server-name} naming convention so per-server skills
land in a protected namespace and the MCP guide's Step 0 fast-path works.

- common-directives.md: per-server skills MUST live under mcp/{server-name}/.
  Single canonical entry or sub-skills grouped by functional area both fine.
- skill-dream.md / DreamService prompt builder: narrow the namespaced-singleton
  constraint. Cross-server merges remain forbidden (incl. across sub-skills);
  within a single mcp/{server}/* namespace, normal semantic-overlap merging
  applies so duplicates self-heal while distinct functional areas are preserved.
- McpSkillNameMigrationService: one-shot startup pass that renames legacy
  top-level skills matching a known MCP server_name to mcp/{server-name}.
  Idempotent; carries a "REMOVE AFTER MIGRATION" marker for follow-up cleanup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Skip BM25 topic injection on short user messages (#383) (#384)

* Skip BM25 topic injection on short user messages (#383)

When the user message is ≤30 characters, AgentContextBuilder skips
per-turn BM25 long-term memory recall, episodic memory recall, skill
recall, and service-hint injection — and the embedding generation that
backs them. Conversation history, active rules, identity entries, and
working memory still flow; those are session grounding, not topic
search. Knowledge-graph memory-seed triples drop out as a consequence
since the seed list collapses to empty; user-seed triples (entity name
match) still run unchanged.

Rationale (#383): on short low-signal turns ("ok", "I'll find out
soon"), BM25 over the user message returned lexically-noisy hits that
drowned out the recent conversation thread. The model — gpt-5.4-mini
on Low tier — then summarised the injected memory instead of replying
to what was actually said, often by issuing a SaveMemory call and
writing "Noted, I saved X." as the reply.

Scope:
- Threshold is 30 chars. Both observed incidents (18 chars, 24 chars)
  fall well under; the threshold catches them cleanly without
  affecting longer messages.
- Longer-message variants of the same wrong-context pattern (e.g.
  122-char turn that still produced a wrong-topic SaveMemory→summary
  reply) are not addressed here — tracked as a follow-up in #383.
- One log line emitted when the gate fires, so the behaviour is
  visible in production logs.

Bumps version to 0.10.51 to match the image already deployed to the
maintainer's cluster.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix AgentContextBuilder tests for short-message dampening

The 6 capability-claim tests and 1 knowledge-graph test exercised paths
inside LTM injection using "anything" / "what did Alice think?" as the
user message — both below the new 30-char gate, which now skips LTM
injection entirely. The tests don't care about the literal text (memory
store responses are stubbed), so just use longer placeholder strings
that exceed the threshold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix CapabilityClaimEndToEndTests for short-message dampening

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add DiagnosticLoggingChatClient watch for response duplication (#383) (#385)

Adds a behaviour-neutral DelegatingChatClient that inspects assistant
messages returned by the inner IChatClient and warns when the response
shows the fingerprint of duplicated content reported by users — either
more than one TextContent in a single assistant message, or a
TextContent whose first 40-char prefix repeats after a newline (catches
both exact "X\nX" and truncated "X.Y\nX" patterns).

Wired in via AddRockBotTieredChatClients so each tier (Low/Balanced/High)
gets the watch between its raw OpenAI client and
RockBotFunctionInvokingChatClient. The tier label is included in every
warning, identifying which model produced the duplication.

Background: three production duplications observed against
gpt-5.4-mini on Azure Foundry (Low tier), all `\n`-separated.
Microsoft.Extensions.AI 10.5.x analysis ruled out the adapter and
middleware as sources (Contents.ConcatText joins TextContent items
with no separator; FunctionInvokingChatClient does not merge messages;
ExtractAssistantText returns a single message's .Text). The likely
remaining explanation is the upstream provider returning literal
"X\nX" in choices[0].message.content. This wrapper captures the
structure on the next occurrence so we know whether to add a
sanitiser, push the issue upstream, or revisit the adapter analysis.

No behaviour change; only emits warnings when the fingerprint matches.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant