[Bug]: vllm running on new H20-3e Nvidia card has occasional garbled bug using Qwen 2.5 VL 72B

### Your current environment

current environment
H20-3e Nvidia with 2
Cuda 12.6 
vllm version 0.9.1 current


### 🐛 Describe the bug

vllm serve /data01/vllm/llm_qwenvl25/ --served-model-name qwen2vl --api-key mr-98765 --tensor-parallel-size 2 --trust-remote-code --host 0.0.0.0 --port 8001 --gpu-memory-utilization 0.95 --no-enable-prefix-caching --generation-config /data01/vllm/llm_qwenvl25/generation_config.json

The model VL can be loaded successful, but when in the chating, it has some strange characters like 📐⚗️

Sometimes, it stops before all the answer shows. IN A WORD , the answer is not correct with abnormal characters.

Does anyone meet the same problem ? Thanks  

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: vllm running on new H20-3e Nvidia card has occasional garbled bug using Qwen 2.5 VL 72B #19723

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: vllm running on new H20-3e Nvidia card has occasional garbled bug using Qwen 2.5 VL 72B #19723

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions