[WIP] Add Qwen3 chat format support in llama_chat_format.py #1

Copilot · 2025-12-15T18:36:32Z

Thanks for asking me to work on this. I will get started on it and keep this PR's description up to date as I form a plan and make progress.

Original prompt

背景

当前仓库只支持到 Qwen2.5 模型，需要扩展支持 Qwen3 系列模型（包括文本和视觉语言多模态版本）。

需要完成的修改

1. 在 `llama_cpp/llama_chat_format.py` 中添加 Qwen3 文本对话格式

在现有的 format_qwen 函数附近（约第 1038 行），添加一个新的 qwen3 chat format：

@register_chat_format("qwen3")
def format_qwen3(
    messages: List[llama_types.ChatCompletionRequestMessage],
    **kwargs: Any,
) -> ChatFormatterResponse:
    _roles = dict(user="<|im_start|>user", assistant="<|im_start|>assistant")
    system_message = _get_system_message(messages) or "You are Qwen, a helpful assistant."
    system_template = "<|im_start|>system\n{system_message}"
    system_message = system_template.format(system_message=system_message)
    _messages = _map_roles(messages, _roles)
    _messages.append((_roles["assistant"], None))
    _sep = "<|im_end|>"
    _prompt = _format_chatml(system_message, _messages, _sep)
    return ChatFormatterResponse(prompt=_prompt, stop=["<|im_end|>", "<|endoftext|>"])

2. 在 `llama_cpp/llama_chat_format.py` 中添加 Qwen3VLChatHandler 类

在现有的 Qwen25VLChatHandler 类之后（约第 3520 行附近），添加新的 Qwen3VLChatHandler 类：

class Qwen3VLChatHandler(Llava15ChatHandler):
    DEFAULT_SYSTEM_MESSAGE = "You are Qwen, a helpful assistant."

    CHAT_FORMAT = (
        "{% for message in messages %}"
        "{% if loop.first and message['role'] != 'system' %}"
        "<|im_start|>system\n"
        "You are Qwen, a helpful assistant.<|im_end|>\n"
        "{% endif %}"
        "<|im_start|>{{ message['role'] }}\n"
        "{% if message['content'] is string %}"
        "{{ message['content'] }}<|im_end|>\n"
        "{% else %}"
        "{% for content in message['content'] %}"
        "{% if content['type'] == 'image_url' %}"
        "{% if content.image_url is string %}"
        "{{ content.image_url }}"
        "{% else %}"
        "{{ content.image_url.url }}"
        "{% endif %}"
        "{% elif content['type'] == 'text' %}"
        "{{ content['text'] }}"
        "{% endif %}"
        "{% endfor %}"
        "<|im_end|>\n"
        "{% endif %}"
        "{% endfor %}"
        "<|im_start|>assistant\n"
    )

    def __call__(self, **kwargs):
        llama = kwargs['llama']

        # Clear state for multiple runs
        llama.reset()
        llama._ctx.kv_cache_clear()
        llama.n_tokens = 0

        if hasattr(llama, 'input_ids'):
            llama.input_ids.fill(0)

        # Clear any handler state
        if hasattr(self, '_last_image_embed'):
            self._last_image_embed = None
            self._last_image_hash = None

        if self.verbose:
            messages = kwargs.get('messages', [])
            image_count = len(self.get_image_urls(messages))
            print(f"Qwen3VL - Cleared state, processing {image_count} images", file=sys.stderr)

        # Use parent implementation
        return super().__call__(**kwargs)

3. 修改 `llama_cpp/server/model.py`

在 load_llama_from_model_settings 函数中，找到处理 qwen2.5-vl 的代码块（约第 175-184 行），在其后添加对 qwen3-vl 的支持：

        elif settings.chat_format == "qwen3-vl":
            assert settings.clip_model_path is not None, "clip model not found"
            if settings.hf_model_repo_id is not None:
                chat_handler = (
                    llama_cpp.llama_chat_format.Qwen3VLChatHandler.from_pretrained(
                        repo_id=settings.hf_model_repo_id,
                        filename=settings.clip_model_path,
                        verbose=settings.verbose,
                    )
                )
            else:
                chat_handler = llama_cpp.llama_chat_format.Qwen3VLChatHandler(
                    clip_model_path=settings.clip_model_path, verbose=settings.verbose
                )

4. 更新 `README.md` 文档

在 README.md 的多模态模型表格中（约第 507 行附近），添加 Qwen3-VL 的说明：

| qwen3-vl | Qwen3VLChatHandler | qwen3-vl |

验收标准

可以使用 chat_format="qwen3" 加载 Qwen3 文本模型并正常对话
可以使用 chat_format="qwen3-vl" 配合 clip 模型加载 Qwen3-VL 多模态模型
服务器模式下可以正确识别和加载 qwen3-vl 格式

This pull request was created as a result of the following prompt from Copilot chat.

背景

当前仓库只支持到 Qwen2.5 模型，需要扩展支持 Qwen3 系列模型（包括文本和视觉语言多模态版本）。

需要完成的修改

1. 在 `llama_cpp/llama_chat_format.py` 中添加 Qwen3 文本对话格式

在现有的 format_qwen 函数附近（约第 1038 行），添加一个新的 qwen3 chat format：

@register_chat_format("qwen3")
def format_qwen3(
    messages: List[llama_types.ChatCompletionRequestMessage],
    **kwargs: Any,
) -> ChatFormatterResponse:
    _roles = dict(user="<|im_start|>user", assistant="<|im_start|>assistant")
    system_message = _get_system_message(messages) or "You are Qwen, a helpful assistant."
    system_template = "<|im_start|>system\n{system_message}"
    system_message = system_template.format(system_message=system_message)
    _messages = _map_roles(messages, _roles)
    _messages.append((_roles["assistant"], None))
    _sep = "<|im_end|>"
    _prompt = _format_chatml(system_message, _messages, _sep)
    return ChatFormatterResponse(prompt=_prompt, stop=["<|im_end|>", "<|endoftext|>"])

2. 在 `llama_cpp/llama_chat_format.py` 中添加 Qwen3VLChatHandler 类

在现有的 Qwen25VLChatHandler 类之后（约第 3520 行附近），添加新的 Qwen3VLChatHandler 类：

class Qwen3VLChatHandler(Llava15ChatHandler):
    DEFAULT_SYSTEM_MESSAGE = "You are Qwen, a helpful assistant."

    CHAT_FORMAT = (
        "{% for message in messages %}"
        "{% if loop.first and message['role'] != 'system' %}"
        "<|im_start|>system\n"
        "You are Qwen, a helpful assistant.<|im_end|>\n"
        "{% endif %}"
        "<|im_start|>{{ message['role'] }}\n"
        "{% if message['content'] is string %}"
        "{{ message['content'] }}<|im_end|>\n"
        "{% else %}"
        "{% for content in message['content'] %}"
        "{% if content['type'] == 'image_url' %}"
        "{% if content.image_url is string %}"
        "{{ content.image_url }}"
        "{% else %}"
        "{{ content.image_url.url }}"
        "{% endif %}"
        "{% elif content['type'] == 'text' %}"
        "{{ content['text'] }}"
        "{% endif %}"
        "{% endfor %}"
        "<|im_end|>\n"
        "{% endif %}"
        "{% endfor %}"
        "<|im_start|>assistant\n"
    )

    def __call__(self, **kwargs):
        llama = kwargs['llama']

        # Clear state for multiple runs
        llama.reset()
        llama._ctx.kv_cache_clear()
        llama.n_tokens = 0

        if hasattr(llama, 'input_ids'):
            llama.input_ids.fill(0)

        # Clear any handler state
        if hasattr(self, '_last_image_embed'):
            self._last_image_embed = None
            self._last_image_hash = None

        if self.verbose:
            messages = kwargs.get('messages', [])
            image_count = len(self.get_image_urls(messages))
            print(f"Qwen3VL - Cleared state, processing {image_count} images", file=sys.stderr)

        # Use parent implementation
        return super().__call__(**kwargs)

3. 修改 `llama_cpp/server/model.py`

在 load_llama_from_model_settings 函数中，找到处理 qwen2.5-vl 的代码块（约第 175-184 行），在其后添加对 qwen3-vl 的支持：

        elif settings.chat_format == "qwen3-vl":
            assert settings.clip_model_path is not None, "clip model not found"
            if settings.hf_model_repo_id is not None:
                chat_handler = (
                    llama_cpp.llama_chat_format.Qwen3VLChatHandler.from_pretrained(
                        repo_id=settings.hf_model_repo_id,
                        filename=settings.clip_model_path,
                        verbose=settings.verbose,
                    )
                )
            else:
                chat_handler = llama_cpp.llama_chat_format.Qwen3VLChatHandler(
                    clip_model_path=settings.clip_model_path, verbose=settings.verbose
                )

4. 更新 `README.md` 文档

在 README.md 的多模态模型表格中（约第 507 行附近），添加 Qwen3-VL 的说明：

| qwen3-vl | Qwen3VLChatHandler | qwen3-vl |

验收标准

可以使用 chat_format="qwen3" 加载 Qwen3 文本模型并正常对话
可以使用 chat_format="qwen3-vl" 配合 clip 模型加载 Qwen3-VL 多模态模型
服务器模式下可以正确识别和加载 qwen3-vl 格式

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot

Copilot wasn't able to review any files in this pull request.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI self-assigned this Dec 15, 2025

Initial plan

bd96a38

Copilot AI assigned MrChenLearnSpace Dec 15, 2025

Copilot started work on behalf of MrChenLearnSpace December 15, 2025 18:36 View session

MrChenLearnSpace marked this pull request as ready for review December 15, 2025 18:37

Copilot AI review requested due to automatic review settings December 15, 2025 18:37

Copilot AI reviewed Dec 15, 2025

View reviewed changes

MrChenLearnSpace merged commit 8bb2105 into main Dec 15, 2025
1 check failed

Copilot AI requested a review from MrChenLearnSpace December 15, 2025 18:38

Copilot stopped work on behalf of MrChenLearnSpace due to an error December 15, 2025 18:38
Copilot has encountered an error. See logs for additional details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add Qwen3 chat format support in llama_chat_format.py #1

[WIP] Add Qwen3 chat format support in llama_chat_format.py #1

Uh oh!

Copilot AI commented Dec 15, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[WIP] Add Qwen3 chat format support in llama_chat_format.py #1

[WIP] Add Qwen3 chat format support in llama_chat_format.py #1

Uh oh!

Conversation

Copilot AI commented Dec 15, 2025

背景

需要完成的修改

1. 在 llama_cpp/llama_chat_format.py 中添加 Qwen3 文本对话格式

2. 在 llama_cpp/llama_chat_format.py 中添加 Qwen3VLChatHandler 类

3. 修改 llama_cpp/server/model.py

4. 更新 README.md 文档

验收标准

背景

需要完成的修改

1. 在 llama_cpp/llama_chat_format.py 中添加 Qwen3 文本对话格式

2. 在 llama_cpp/llama_chat_format.py 中添加 Qwen3VLChatHandler 类

3. 修改 llama_cpp/server/model.py

4. 更新 README.md 文档

验收标准

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. 在 `llama_cpp/llama_chat_format.py` 中添加 Qwen3 文本对话格式

2. 在 `llama_cpp/llama_chat_format.py` 中添加 Qwen3VLChatHandler 类

3. 修改 `llama_cpp/server/model.py`

4. 更新 `README.md` 文档

1. 在 `llama_cpp/llama_chat_format.py` 中添加 Qwen3 文本对话格式

2. 在 `llama_cpp/llama_chat_format.py` 中添加 Qwen3VLChatHandler 类

3. 修改 `llama_cpp/server/model.py`

4. 更新 `README.md` 文档