Conversation
There was a problem hiding this comment.
Pull request overview
This PR strengthens persona character-keeping by injecting an explicit “help-seeker, not counselor” reminder immediately before the chatbot’s latest message when prompting the persona LLM, reducing the chance of role-reversal after reading supportive provider text.
Changes:
- Added a dedicated prefix (including the persona role reminder) that is prepended to the last provider message when
role == Role.PERSONA. - Updated unit tests to assert the new prefix behavior for persona prompts.
- Minor formatting-only tweaks to Azure LLM debug logging.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| utils/conversation_utils.py | Adds persona last-provider message prefix and applies it in build_langchain_messages for Role.PERSONA. |
| tests/unit/utils/test_conversation_utils.py | Updates/extends tests to validate the new prefix augmentation behavior. |
| llm_clients/azure_llm.py | Removes outdated comment and applies minor debug print formatting cleanup. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
With the above fixes, I still catch role-reversal when using Opus 4.5: By turn 13, I noticed from the user (Opus 4.5): The provider (Grok 4) proceeded to say The logs show the metadata is correct, so we have 2 probably things happening here:
Expand for responses & metadataIf this truly is guardrails causing the "role-reversal", we will need to look further into what models can best role-play different risk levels of SI. |
Interesting. I do think that we are hitting the system prompts. Now it's interesting why Grok responded like that. I do wonder if we should not read too much into it. Is it the only instance, that we know of, in which grok pretended to be someone else? |
That's the first I've seen! Haven't been looking for it though |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Description
We've experienced the LLM role-playing a persona can reverse roles and try to act like the assistant:
This PR aims to improve the user/persona LLM's character-keeping integrity