-
Notifications
You must be signed in to change notification settings - Fork 661
Open
Description
测试环境
platform:maca
paddle:dev20251120
paddle_metax_gpu: dev20251121
fastdeploy:develop branch,based on ff26158
现象描述
刚编译完 fastdeploy 包,能够正常导入 deepseek 相关的 op 接口
from fastdeploy.model_executor.ops.gpu import fused_rotary_position_encoding
运行 run_model.py 后,导入 deepseek op 就会失败
run_model.py 源码
import os
os.environ["MACA_VISIBLE_DEVICES"] = "6,7"
os.environ["FD_MOE_BACKEND"] = "cutlass"
os.environ["PADDLE_XCCL_BACKEND"] = "metax_gpu"
os.environ["FLAGS_weight_only_linear_arch"] = "80"
os.environ["FD_METAX_KVCACHE_MEM"] = "8"
os.environ["FD_ENC_DEC_BLOCK_NUM"] = "0"
# os.environ["FD_METAX_DENSE_QUANT_TYPE"] = "wint8"
# "/root/model/ERNIE-4.5-21B-A3B-Paddle"
# "/root/model/ERNIE-4.5-0.3B-Paddle"
# "/root/model/ERNIE-4.5-21B-A3B-Thinking"
import fastdeploy
llm = fastdeploy.LLM(model="/root/model/ERNIE-4.5-VL-28B-A3B-Thinking",
tensor_parallel_size=1,
load_choices="default_v1",
engine_worker_queue_port=8899,
quantization="wint8",
disable_custom_all_reduce=True,
)
prompts = [
# "who are you?",
"A robe takes 2 bolts of blue fiber and half that much white fiber. How many bolts in total does it take?",
]
# sampling_params = fastdeploy.SamplingParams(top_p=0.0, max_tokens=2047, temperature=0.0)
sampling_params = fastdeploy.SamplingParams(top_p=0.95, max_tokens=128, temperature=0.6)
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs.text
print(f"Prompt: {prompt!r}")
print(f"Generated: {generated_text!r}")
终端打印
刚编译出 fd 包时的导入测试
紧接着运行 run_model.py 时
接着再测试一开始导包方式

Metadata
Metadata
Assignees
Labels
No labels