Skip to content

ERNIE-4.5-VL-28B模型在NVIDIA RTX PRO 6000 Blackwell 上加载失败,出现CUDA error 209和101 #3930

@ChungTak

Description

@ChungTak

GPU硬件信息

  • Product Name: NVIDIA RTX PRO 6000 Blackwell Server Edition
  • Product Brand: NVIDIA
  • Product Architecture: Blackwell
  • 显存: 96GB GDDR7

Driver Version: 580.82.07 (575驱动也一样)
CUDA Version: 12.9
Paddle版本: paddlepaddle-gpu==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu129

计算能力(SM版本)= SM120

计算能力通过以下代码查询:

python
import torch
if torch.cuda.is_available():
capability = torch.cuda.get_device_capability(0)
print(f"SM{capability[0]}{capability[1]}") # 返回 SM120

fastdeploy-gpu-80_90 版本2.1.0( Nightly 版本问题一样)

验证环境-正常

import paddle

from paddle.jit.marker import unified

paddle.utils.run_check()

Running verify PaddlePaddle program ...
I0906 02:12:41.045678 24555 pir_interpreter.cc:1524] New Executor is Running ...
W0906 02:12:41.046741 24555 gpu_resources.cc:114] Please NOTE: device: 0, GPU Compute Capability: 12.0, Driver API Version: 13.0, Runtime API Version: 12.9
I0906 02:12:41.047454 24555 pir_interpreter.cc:1547] pir interpreter is running by multi-thread mode ...
PaddlePaddle works well on 1 GPU.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
from fastdeploy.model_executor.ops.gpu import beam_search_softmax

W0906 02:12:42.124501 24555 ir_context.cc:306] custom_op.static_op_save_output_topk_ op already registered.
W0906 02:12:42.124531 24555 custom_operator.cc:967] Operator (static_op_save_output_topk) has been registered.
W0906 02:12:42.124600 24555 ir_context.cc:306] custom_op.static_op_save_output_dynamic_ op already registered.
W0906 02:12:42.124605 24555 custom_operator.cc:967] Operator (static_op_save_output_dynamic) has been registered.
W0906 02:12:42.124691 24555 ir_context.cc:306] custom_op.static_op_save_output_ op already registered.
W0906 02:12:42.124696 24555 custom_operator.cc:967] Operator (static_op_save_output) has been registered.
W0906 02:12:42.124879 24555 ir_context.cc:306] custom_op.static_op_transfer_output op already registered.
W0906 02:12:42.124886 24555 custom_operator.cc:967] Operator (static_op_transfer_output) has been registered.
W0906 02:12:42.124979 24555 ir_context.cc:306] custom_op.static_op_get_output_topk_ op already registered.
W0906 02:12:42.124984 24555 custom_operator.cc:967] Operator (static_op_get_output_topk) has been registered.
W0906 02:12:42.125034 24555 ir_context.cc:306] custom_op.static_op_get_output_dynamic_ op already registered.
W0906 02:12:42.125039 24555 custom_operator.cc:967] Operator (static_op_get_output_dynamic) has been registered.
W0906 02:12:42.125229 24555 ir_context.cc:306] custom_op.static_op_rebuild_padding_cpu op already registered.
W0906 02:12:42.125234 24555 custom_operator.cc:967] Operator (static_op_rebuild_padding_cpu) has been registered.
W0906 02:12:42.126255 24555 ir_context.cc:306] custom_op.static_op_get_output_ op already registered.
W0906 02:12:42.126262 24555 custom_operator.cc:967] Operator (static_op_get_output) has been registered.

运行 fastdeploy

bash
export ENABLE_V1_KVCACHE_SCHEDULER=1
python -m fastdeploy.entrypoints.openai.api_server
--model baidu/ERNIE-4.5-VL-28B-A3B-Paddle
--port 8180
--metrics-port 8181
--engine-worker-queue-port 8182
--tensor-parallel-size 1
--max-model-len 32768
--max-num-seqs 128
--limit-mm-per-prompt '{"image": 10, "video": 1}'
--reasoning-parser ernie-45-vl
--gpu-memory-utilization 0.9
--enable-chunked-prefill
--max-num-batched-tokens 384
--enable-mm

模型加载过程中出错

[2025-09-06 02:21:12,148] [ INFO] - Start load layer 27
[2025-09-06 02:21:15,817] [ INFO] - Model loading took 69.24437856674194 seconds
CUDA error 209 [/paddle/third_party/cccl/cub/cub/util_device.cuh, 83]: no kernel image is available for execution on the device
CUDA error 101 [/paddle/third_party/cccl/cub/cub/util_device.cuh, 102]: invalid device ordinal
CUDA error 209 [/paddle/third_party/cccl/cub/cub/util_device.cuh, 83]: no kernel image is available for execution on the device

详细log 文件

log.tar.gz

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions