Skip BM25 topic injection on short user messages (#383)#384
Merged
rockfordlhotka merged 3 commits intoMay 11, 2026
Conversation
When the user message is ≤30 characters, AgentContextBuilder skips per-turn BM25 long-term memory recall, episodic memory recall, skill recall, and service-hint injection — and the embedding generation that backs them. Conversation history, active rules, identity entries, and working memory still flow; those are session grounding, not topic search. Knowledge-graph memory-seed triples drop out as a consequence since the seed list collapses to empty; user-seed triples (entity name match) still run unchanged. Rationale (#383): on short low-signal turns ("ok", "I'll find out soon"), BM25 over the user message returned lexically-noisy hits that drowned out the recent conversation thread. The model — gpt-5.4-mini on Low tier — then summarised the injected memory instead of replying to what was actually said, often by issuing a SaveMemory call and writing "Noted, I saved X." as the reply. Scope: - Threshold is 30 chars. Both observed incidents (18 chars, 24 chars) fall well under; the threshold catches them cleanly without affecting longer messages. - Longer-message variants of the same wrong-context pattern (e.g. 122-char turn that still produced a wrong-topic SaveMemory→summary reply) are not addressed here — tracked as a follow-up in #383. - One log line emitted when the gate fires, so the behaviour is visible in production logs. Bumps version to 0.10.51 to match the image already deployed to the maintainer's cluster. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 tasks
The 6 capability-claim tests and 1 knowledge-graph test exercised paths inside LTM injection using "anything" / "what did Alice think?" as the user message — both below the new 30-char gate, which now skips LTM injection entirely. The tests don't care about the literal text (memory store responses are stubbed), so just use longer placeholder strings that exceed the threshold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7865106
into
rockfordlhotka/381-enforce-mcp-skill-naming
2 checks passed
rockfordlhotka
added a commit
that referenced
this pull request
May 11, 2026
* Enforce mcp/{server-name} naming for MCP server skills (#381)
Reinforces the mcp/{server-name} naming convention so per-server skills
land in a protected namespace and the MCP guide's Step 0 fast-path works.
- common-directives.md: per-server skills MUST live under mcp/{server-name}/.
Single canonical entry or sub-skills grouped by functional area both fine.
- skill-dream.md / DreamService prompt builder: narrow the namespaced-singleton
constraint. Cross-server merges remain forbidden (incl. across sub-skills);
within a single mcp/{server}/* namespace, normal semantic-overlap merging
applies so duplicates self-heal while distinct functional areas are preserved.
- McpSkillNameMigrationService: one-shot startup pass that renames legacy
top-level skills matching a known MCP server_name to mcp/{server-name}.
Idempotent; carries a "REMOVE AFTER MIGRATION" marker for follow-up cleanup.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Skip BM25 topic injection on short user messages (#383) (#384)
* Skip BM25 topic injection on short user messages (#383)
When the user message is ≤30 characters, AgentContextBuilder skips
per-turn BM25 long-term memory recall, episodic memory recall, skill
recall, and service-hint injection — and the embedding generation that
backs them. Conversation history, active rules, identity entries, and
working memory still flow; those are session grounding, not topic
search. Knowledge-graph memory-seed triples drop out as a consequence
since the seed list collapses to empty; user-seed triples (entity name
match) still run unchanged.
Rationale (#383): on short low-signal turns ("ok", "I'll find out
soon"), BM25 over the user message returned lexically-noisy hits that
drowned out the recent conversation thread. The model — gpt-5.4-mini
on Low tier — then summarised the injected memory instead of replying
to what was actually said, often by issuing a SaveMemory call and
writing "Noted, I saved X." as the reply.
Scope:
- Threshold is 30 chars. Both observed incidents (18 chars, 24 chars)
fall well under; the threshold catches them cleanly without
affecting longer messages.
- Longer-message variants of the same wrong-context pattern (e.g.
122-char turn that still produced a wrong-topic SaveMemory→summary
reply) are not addressed here — tracked as a follow-up in #383.
- One log line emitted when the gate fires, so the behaviour is
visible in production logs.
Bumps version to 0.10.51 to match the image already deployed to the
maintainer's cluster.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Fix AgentContextBuilder tests for short-message dampening
The 6 capability-claim tests and 1 knowledge-graph test exercised paths
inside LTM injection using "anything" / "what did Alice think?" as the
user message — both below the new 30-char gate, which now skips LTM
injection entirely. The tests don't care about the literal text (memory
store responses are stubbed), so just use longer placeholder strings
that exceed the threshold.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Fix CapabilityClaimEndToEndTests for short-message dampening
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add DiagnosticLoggingChatClient watch for response duplication (#383) (#385)
Adds a behaviour-neutral DelegatingChatClient that inspects assistant
messages returned by the inner IChatClient and warns when the response
shows the fingerprint of duplicated content reported by users — either
more than one TextContent in a single assistant message, or a
TextContent whose first 40-char prefix repeats after a newline (catches
both exact "X\nX" and truncated "X.Y\nX" patterns).
Wired in via AddRockBotTieredChatClients so each tier (Low/Balanced/High)
gets the watch between its raw OpenAI client and
RockBotFunctionInvokingChatClient. The tier label is included in every
warning, identifying which model produced the duplication.
Background: three production duplications observed against
gpt-5.4-mini on Azure Foundry (Low tier), all `\n`-separated.
Microsoft.Extensions.AI 10.5.x analysis ruled out the adapter and
middleware as sources (Contents.ConcatText joins TextContent items
with no separator; FunctionInvokingChatClient does not merge messages;
ExtractAssistantText returns a single message's .Text). The likely
remaining explanation is the upstream provider returning literal
"X\nX" in choices[0].message.content. This wrapper captures the
structure on the next occurrence so we know whether to add a
sanitiser, push the issue upstream, or revisit the adapter analysis.
No behaviour change; only emits warnings when the fingerprint matches.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When the user message is ≤30 characters,
AgentContextBuilderskips per-turn BM25 long-term memory recall, episodic memory recall, skill recall, and service-hint injection — and the embedding generation that backs them. Conversation history, active rules, identity entries, and working memory still flow.Knowledge-graph memory-seed triples drop out as a consequence since the seed list collapses to empty; user-seed triples (entity name match) still run unchanged.
Why
Tracked in #383. On short low-signal turns, BM25 over the user message returns lexically-noisy hits that drown out the recent conversation thread.
gpt-5.4-mini(Low tier) then summarises the injected memory instead of replying to what was said — typically by issuing aSaveMemorycall and writing "Noted, I saved X." as the reply.Two production incidents in
blazor-session:The 30-char threshold catches the 18-char case cleanly. Longer-message variants of the same wrong-context pattern are not addressed here — left as a follow-up under #383.
Behaviour after the change
AgentContextBuilder: Short user message (N chars ≤ 30) — skipping per-turn topic search injectionrecalled.Count == 0 && history.Count == 1) still runs, so a brand-new session with a short opening message still gets a small memory injection.Version
Bumped
Directory.Build.propsto0.10.51to match the image already deployed to the maintainer's cluster.Test plan
dotnet buildclean0.10.51for ~25h, gate fired twice on short turns, no wrong-topic SaveMemory→summary replies observed since deployAgentContextBuilderKnowledgeGraphTestsscaffolding is the natural home)Related
🤖 Generated with Claude Code