-
Notifications
You must be signed in to change notification settings - Fork 9
Description
The "user" and "assistant" roles were likely created because ChatGPT was only ever made with that dynamic in mind. There's a few issues with doing this:
- Having roles be standard words causes issues when they are used in everyday language. This leads it to misinterpret what is part of the template and what is part of the chat. Alpaca is a perfect example of this because it uses markdown for its template.
- It limits the scope of what it can be. Even if you tell it that it's something else entirely, having it inherently be called an assistant causes it to keep this mindset when in other roles. A lot of datasets have role-play data in them, but it can still take a lot of prompting to fully shake off the assistant mindset.
- It assumes that there are only two different speakers. Let's say that you had several different agents working together. How would the LLM properly distinguish which message came from which agent? Likewise, for a multi-character role-play, how would it properly attribute a message to a certain character?
From the perspective of the LLM, there is no "human" or "AI"; it can generate messages for both without any issues. This is why I think having dedicated roles such as "user" and "assistant" is not something we should have. Instead, those roles should be combined into one, chat, and instead use names to distinguish who wrote what message, so that it can remain as flexible as possible.
The roles should be converted into special tokens like the rest of the format: <|system|> for the system role, <|tool|> for the tool role, and <|chat|> for the messages. For the names, I propose name=[], with the name being enclosed inside. This would allow for names with spaces in them unlike the original implementation.
Here's how it would look like:
<|im_start|><|system|>
{system message}<|im_end|>
<|im_start|><|chat|> name=[Hirose Koichi]
{message contents}<|im_end|>
<|im_start|><|chat|> name=[Dolphin]
{message contents}<|im_end|>
And if we wanted to have a multi-agent or character setup, it would look like this:
<|im_start|><|system|>
{system message}<|im_end|>
<|im_start|><|chat|> name=[Project Manager]
{message contents}<|im_end|>
<|im_start|><|chat|> name=[Python Coder]
{message contents}<|im_end|>
<|im_start|><|chat|> name=[Code Tester]
{message contents}<|im_end|>
<|im_start|><|chat|> name=[Code Debugger]
{message contents}<|im_end|>
This is a much more elegant solution that allows it to handle as many extra roles as needed without forcing a default on it. I understand wanting to keep it as 1:1 to the original implementation as possible, but we should look to improve it wherever we can if we want this to become a widely accepted standard. I think change is fine, but the sooner we make those changes, the better.