ds4 parser vulnerability: user prompt can insert special tokens that are interpreted by the model

*As previously agreed via email with @antirez, I am opening this issue to share the details of this vulnerability publicly.*

I'm reporting a vulnerability in the ds4 parser. The problem is the lack of separation between instructions (special tokens) and data (user prompt). Specifically, the [`tokenize_rendered_chat_vocab`](https://github.com/antirez/ds4/blob/main/ds4.c#L14460) function applies [`special_token_at`](https://github.com/antirez/ds4/blob/main/ds4.c#L14426) directly to the user prompt, without sanitizing it. This allows a user to inject special tokens, such as `<think>`, `<|Assistant|>`, `|DSML|`, etc.

The attack scenario could be, for example, the attacker sending a prompt that creates a fake history for LLM. In this situation, the model can be fooled by the context (e.g., be tricked into making arbitrary tool calls, become misaligned,  etc).

I isolated the parser logic in a standalone [PoC code](https://gist.github.com/pintorem/692b47edc8e043071fd6b8d51b09f913) (you don't need the necessary hardware to run it).

One possible fix is to ensure that `special_token_at` is only evaluated for strings generated by server logic, while forcing user input to pass exclusively through the standard BPE tokenizer.

If needed, I’d be more than happy to help by sending a pull request!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ds4 parser vulnerability: user prompt can insert special tokens that are interpreted by the model #95

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

ds4 parser vulnerability: user prompt can insert special tokens that are interpreted by the model #95

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions