fix(phai): mark slash command output as system-generated in LLM history#60104
Conversation
|
🎭 Playwright didn't run on this PR — your changes touch code that could affect E2E behavior, but Playwright is opt-in via label now to keep CI cost down. Add the Most PRs don't need this. Real regressions still get caught on master and fix-forward. |
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
ee/hogai/utils/test/test_anthropic.py:112-147
The three new tests for `convert_assistant_message_to_anthropic_message` are separate methods, but the file already uses `@parameterized.expand` for the same style of table-driven input/output checks (see `test_get_thinking_from_assistant_message` above). Folding them into one parameterised test keeps the pattern consistent and removes the three-fold repetition of boilerplate setup.
```suggestion
@parameterized.expand(
[
(
"no_source",
AssistantMessage(content="Hello", id="1"),
lambda text: text == "Hello",
),
(
"slash_command_source",
AssistantMessage(
content="Current conversation: 5 credits",
id="1",
meta=AssistantMessageMetadata(source="slash_command:usage"),
),
lambda text: "/usage slash command" in text
and "deterministic PostHog code" in text
and text.endswith("Current conversation: 5 credits"),
),
(
"unknown_source",
AssistantMessage(
content="Hello",
id="1",
meta=AssistantMessageMetadata(source="unknown_source"),
),
lambda text: text == "Hello",
),
]
)
def test_convert_assistant_message_provenance_note(self, _name, message, text_check):
result = convert_assistant_message_to_anthropic_message(message, {})
self.assertEqual(len(result), 1)
content = result[0].content
assert isinstance(content, list)
assert isinstance(content[0], dict)
self.assertTrue(text_check(content[0]["text"]))
```
### Issue 2 of 2
ee/hogai/chat_agent/slash_commands/test/test_slash_command_handler.py:170-215
Same preference for parameterised tests applies here. The three `_stamp_message_source` scenarios (no meta, existing meta fields, existing source) share identical boilerplate and differ only in inputs and assertions, making them a natural fit for `@parameterized.expand`. Also note that `PartialAssistantState` is imported locally inside each method — moving it to the module-level import block would be cleaner.
```suggestion
@parameterized.expand(
[
(
"no_meta",
AssistantMessage(content="hi", id="1"),
SlashCommandName.FIELD_USAGE,
"slash_command:usage",
None,
),
(
"preserves_existing_meta_fields",
AssistantMessage(
content="hi",
id="1",
meta=AssistantMessageMetadata(thinking=[{"type": "thinking", "content": "x"}]),
),
SlashCommandName.FIELD_REMEMBER,
"slash_command:remember",
[{"type": "thinking", "content": "x"}],
),
(
"does_not_overwrite_existing_source",
AssistantMessage(
content="hi",
id="1",
meta=AssistantMessageMetadata(source="slash_command:other"),
),
SlashCommandName.FIELD_USAGE,
"slash_command:other",
None,
),
]
)
def test_stamp_message_source(self, _name, input_message, command, expected_source, expected_thinking):
from ee.hogai.utils.types import PartialAssistantState
result = PartialAssistantState(messages=[input_message])
stamped = SlashCommandHandlerNode._stamp_message_source(result, command)
msg = stamped.messages[0]
assert isinstance(msg, AssistantMessage)
assert msg.meta is not None
self.assertEqual(msg.meta.source, expected_source)
if expected_thinking is not None:
self.assertEqual(msg.meta.thinking, expected_thinking)
```
Reviews (1): Last reviewed commit: "fix(phai): mark slash command output as ..." | Re-trigger Greptile |
|
Size Change: 0 B Total Size: 80.2 MB ℹ️ View Unchanged
|
| meta = ( | ||
| msg.meta.model_copy(update={"source": source}) | ||
| if msg.meta | ||
| else AssistantMessageMetadata(source=source) |
There was a problem hiding this comment.
It exposes another bug:thinking_face: Prefilling assistant messages is not supported in Sonnet 4.6+: https://platform.claude.com/docs/en/about-claude/models/migration-guide#when-migrating-from-sonnet-4-5. A system reminder instead would solve that explaining that it was generated by a slash command.
There was a problem hiding this comment.
Good call, switched to that approach in my latest commit.
convert_assistant_message_to_anthropic_message now keeps the AIMessage content untouched and appends a separate HumanMessage with the provenance note at conversion time, so no prefilling.
Re-verified locally with /usage followed by "how accurate is that?" and the agent treats the report as real.
skoob13
left a comment
There was a problem hiding this comment.
LGTM, would be great to fix the prefilling thingy.
|
👋 Visual changes detected for this PR. Review and approve in PostHog Visual Review If these changes are unexpected, they may be caused by a flaky test or a broken snapshot on master. Don't approve — rerun the job or wait for a fix. |
Problem
When a user runs a slash command like
/usage, the chat agent produces a deterministicAssistantMessagefrom Python code — the handler queries real billing data, formats a report, and persists it as anAssistantMessage. The message ends up in conversation history.On the next turn, the runner serialises history through
convert_to_anthropic_messages. The slash-command message becomes a plain LangChainAIMessageindistinguishable from an LLM-authored turn. So when the user follows up with something like "how accurate is that?" or "are you sure?", the model has no way of knowing it didn't author the previous message. Its prior ("I'm an LLM, I don't have a usage tool") overrides the conversation history and it hallucinates an apology, claiming it fabricated the numbers.The current mitigation in #58862 is a system-prompt note telling the model not to claim
/usageoutput is fabricated. It works for direct questions about/usage, but it's too weak to survive natural follow-ups because the model doesn't connect the deictic "that" to the slash command, and the soft prompt rule loses to a stronger prior.The same class of bug affects
/remember,/feedback, and/ticket. They just produce shorter messages, so the dissonance is smaller and customers haven't reported them yet.Changes
sourcefield onAssistantMessageMetadata(TS schema → regenerated JSON + Python schema). Format isslash_command:<name>for slash-command output.SlashCommandHandlerNode.arun, post-process the command's result and stampmeta.sourceon every emittedAssistantMessage. Centralised so all four current commands and any future ones are covered automatically without touching each handler.convert_assistant_message_to_anthropic_message, whenmeta.sourceindicates a slash command, prepend a neutral provenance note to the LangChain text content:[System note: the block below was produced by the /<name> slash command. It is deterministic PostHog code, not assistant-authored content.]The note is only injected at the LangChain conversion layer, so user-visible rendering is unchanged.
The phrasing is deliberately neutral, no "do not claim it's fabricated" wording. Once the provenance is in the message itself the model's contradiction disappears, so we don't need negative phrasing that prompts the model to think about that failure mode.
How did you test this code?
Publish to changelog?
no
🤖 Agent context
Co-Authored with Claude Code Opus 4.7