-
Notifications
You must be signed in to change notification settings - Fork 631
Description
GPU硬件信息
- Product Name: NVIDIA RTX PRO 6000 Blackwell Server Edition
- Product Brand: NVIDIA
- Product Architecture: Blackwell
- 显存: 96GB GDDR7
Driver Version: 580.82.07 (575驱动也一样)
CUDA Version: 12.9
Paddle版本: paddlepaddle-gpu==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu129
计算能力(SM版本)= SM120
计算能力通过以下代码查询:
python
import torch
if torch.cuda.is_available():
capability = torch.cuda.get_device_capability(0)
print(f"SM{capability[0]}{capability[1]}") # 返回 SM120
fastdeploy-gpu-80_90 版本2.1.0( Nightly 版本问题一样)
验证环境-正常
import paddle
from paddle.jit.marker import unified
paddle.utils.run_check()
Running verify PaddlePaddle program ...
I0906 02:12:41.045678 24555 pir_interpreter.cc:1524] New Executor is Running ...
W0906 02:12:41.046741 24555 gpu_resources.cc:114] Please NOTE: device: 0, GPU Compute Capability: 12.0, Driver API Version: 13.0, Runtime API Version: 12.9
I0906 02:12:41.047454 24555 pir_interpreter.cc:1547] pir interpreter is running by multi-thread mode ...
PaddlePaddle works well on 1 GPU.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
from fastdeploy.model_executor.ops.gpu import beam_search_softmax
W0906 02:12:42.124501 24555 ir_context.cc:306] custom_op.static_op_save_output_topk_ op already registered.
W0906 02:12:42.124531 24555 custom_operator.cc:967] Operator (static_op_save_output_topk) has been registered.
W0906 02:12:42.124600 24555 ir_context.cc:306] custom_op.static_op_save_output_dynamic_ op already registered.
W0906 02:12:42.124605 24555 custom_operator.cc:967] Operator (static_op_save_output_dynamic) has been registered.
W0906 02:12:42.124691 24555 ir_context.cc:306] custom_op.static_op_save_output_ op already registered.
W0906 02:12:42.124696 24555 custom_operator.cc:967] Operator (static_op_save_output) has been registered.
W0906 02:12:42.124879 24555 ir_context.cc:306] custom_op.static_op_transfer_output op already registered.
W0906 02:12:42.124886 24555 custom_operator.cc:967] Operator (static_op_transfer_output) has been registered.
W0906 02:12:42.124979 24555 ir_context.cc:306] custom_op.static_op_get_output_topk_ op already registered.
W0906 02:12:42.124984 24555 custom_operator.cc:967] Operator (static_op_get_output_topk) has been registered.
W0906 02:12:42.125034 24555 ir_context.cc:306] custom_op.static_op_get_output_dynamic_ op already registered.
W0906 02:12:42.125039 24555 custom_operator.cc:967] Operator (static_op_get_output_dynamic) has been registered.
W0906 02:12:42.125229 24555 ir_context.cc:306] custom_op.static_op_rebuild_padding_cpu op already registered.
W0906 02:12:42.125234 24555 custom_operator.cc:967] Operator (static_op_rebuild_padding_cpu) has been registered.
W0906 02:12:42.126255 24555 ir_context.cc:306] custom_op.static_op_get_output_ op already registered.
W0906 02:12:42.126262 24555 custom_operator.cc:967] Operator (static_op_get_output) has been registered.
运行 fastdeploy
bash
export ENABLE_V1_KVCACHE_SCHEDULER=1
python -m fastdeploy.entrypoints.openai.api_server
--model baidu/ERNIE-4.5-VL-28B-A3B-Paddle
--port 8180
--metrics-port 8181
--engine-worker-queue-port 8182
--tensor-parallel-size 1
--max-model-len 32768
--max-num-seqs 128
--limit-mm-per-prompt '{"image": 10, "video": 1}'
--reasoning-parser ernie-45-vl
--gpu-memory-utilization 0.9
--enable-chunked-prefill
--max-num-batched-tokens 384
--enable-mm
模型加载过程中出错
[2025-09-06 02:21:12,148] [ INFO] - Start load layer 27
[2025-09-06 02:21:15,817] [ INFO] - Model loading took 69.24437856674194 seconds
CUDA error 209 [/paddle/third_party/cccl/cub/cub/util_device.cuh, 83]: no kernel image is available for execution on the device
CUDA error 101 [/paddle/third_party/cccl/cub/cub/util_device.cuh, 102]: invalid device ordinal
CUDA error 209 [/paddle/third_party/cccl/cub/cub/util_device.cuh, 83]: no kernel image is available for execution on the device
详细log 文件