Skip to content

[Bug]: vllm running on new H20-3e Nvidia card has occasional garbled bug using Qwen 2.5 VL 72B #19723

@chenteddyqq

Description

@chenteddyqq

Your current environment

current environment
H20-3e Nvidia with 2
Cuda 12.6
vllm version 0.9.1 current

🐛 Describe the bug

vllm serve /data01/vllm/llm_qwenvl25/ --served-model-name qwen2vl --api-key mr-98765 --tensor-parallel-size 2 --trust-remote-code --host 0.0.0.0 --port 8001 --gpu-memory-utilization 0.95 --no-enable-prefix-caching --generation-config /data01/vllm/llm_qwenvl25/generation_config.json

The model VL can be loaded successful, but when in the chating, it has some strange characters like 📐⚗️

Sometimes, it stops before all the answer shows. IN A WORD , the answer is not correct with abnormal characters.

Does anyone meet the same problem ? Thanks

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions