-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
软件环境
- paddlepaddle: 63ae2a5 (commit id)
- paddlenlp: bffd3b549bf55f88d875151ae67a0dbc3540f32e (commit id)重复问题
- I have searched the existing issues
错误描述
当运行运行静态图推理的benchmark时,分布式启动会报出如下的错误
Traceback (most recent call last):
File "/softwares/PaddleNLP/llm/devices/sdaa/llama/./../../../predict/predictor.py", line 1626, in <module>
predict()
File "/softwares/PaddleNLP/llm/devices/sdaa/llama/./../../../predict/predictor.py", line 1583, in predict
benchmark(predictor, predictor_args, model_args)
File "/softwares/PaddleNLP/llm/devices/sdaa/llama/./../../../predict/predictor.py", line 1607, in benchmark
outputs, batch_tokens = predictor.predict(batch_source_text, return_tokens=True)
TypeError: cannot unpack non-iterable NoneType object
LAUNCH INFO 2025-03-16 07:56:39,282 Exit code -15
经分析,是因为静态图模式下StaticGraphBlockInferencePredictor的predict函数在return_tokens为True时,仅会在0号进程上返回output_tokens,然后再benchmark函数中,每一个进程都会执行下面两句:
outputs, batch_tokens = predictor.predict(batch_source_text, return_tokens=True)
output_tokens += sum([len(tokens) for tokens in batch_tokens])
这就使得非0号进程无法获得batch_tokens并报如上的错误。稳定复现步骤 & 代码
跑一个任意的分布式静态图推理benchmark即可复现,比如:
1、先cd llm
2、然后静态图导出:
CUDA_VISIBLE_DEVICES=0,1 python -m paddle.distributed.launch predict/export_model.py --model_name_or_path meta-llama/Llama-2-7b --inference_model --output_path ./output_dir/exported_model/llama2_7b_block_size32 --dtype float16 --block_attn --device gpu
3、再静态图推理:
CUDA_VISIBLE_DEVICES=0,1 python -m paddle.distributed.launch predict/predictor.py --model_name_or_path output_dir/exported_model/llama2_7b_block_size32 --dtype float16 --mode static --inference_model 1 --block_attn 1 --device gpu --benchmark 1 --src_length 300 --max_length 100 --batch_size 1
附报错截图及问题代码:
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working


