Skip to content

ds4 parser vulnerability: user prompt can insert special tokens that are interpreted by the model #95

@pintorem

Description

@pintorem

As previously agreed via email with @antirez, I am opening this issue to share the details of this vulnerability publicly.

I'm reporting a vulnerability in the ds4 parser. The problem is the lack of separation between instructions (special tokens) and data (user prompt). Specifically, the tokenize_rendered_chat_vocab function applies special_token_at directly to the user prompt, without sanitizing it. This allows a user to inject special tokens, such as <think>, <|Assistant|>, |DSML|, etc.

The attack scenario could be, for example, the attacker sending a prompt that creates a fake history for LLM. In this situation, the model can be fooled by the context (e.g., be tricked into making arbitrary tool calls, become misaligned, etc).

I isolated the parser logic in a standalone PoC code (you don't need the necessary hardware to run it).

One possible fix is to ensure that special_token_at is only evaluated for strings generated by server logic, while forcing user input to pass exclusively through the standard BPE tokenizer.

If needed, I’d be more than happy to help by sending a pull request!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions