As previously agreed via email with @antirez, I am opening this issue to share the details of this vulnerability publicly.
I'm reporting a vulnerability in the ds4 parser. The problem is the lack of separation between instructions (special tokens) and data (user prompt). Specifically, the tokenize_rendered_chat_vocab function applies special_token_at directly to the user prompt, without sanitizing it. This allows a user to inject special tokens, such as <think>, <|Assistant|>, |DSML|, etc.
The attack scenario could be, for example, the attacker sending a prompt that creates a fake history for LLM. In this situation, the model can be fooled by the context (e.g., be tricked into making arbitrary tool calls, become misaligned, etc).
I isolated the parser logic in a standalone PoC code (you don't need the necessary hardware to run it).
One possible fix is to ensure that special_token_at is only evaluated for strings generated by server logic, while forcing user input to pass exclusively through the standard BPE tokenizer.
If needed, I’d be more than happy to help by sending a pull request!
As previously agreed via email with @antirez, I am opening this issue to share the details of this vulnerability publicly.
I'm reporting a vulnerability in the ds4 parser. The problem is the lack of separation between instructions (special tokens) and data (user prompt). Specifically, the
tokenize_rendered_chat_vocabfunction appliesspecial_token_atdirectly to the user prompt, without sanitizing it. This allows a user to inject special tokens, such as<think>,<|Assistant|>,|DSML|, etc.The attack scenario could be, for example, the attacker sending a prompt that creates a fake history for LLM. In this situation, the model can be fooled by the context (e.g., be tricked into making arbitrary tool calls, become misaligned, etc).
I isolated the parser logic in a standalone PoC code (you don't need the necessary hardware to run it).
One possible fix is to ensure that
special_token_atis only evaluated for strings generated by server logic, while forcing user input to pass exclusively through the standard BPE tokenizer.If needed, I’d be more than happy to help by sending a pull request!