Skip to content

[Bug]: 静态图推理predic benchmark报错 #10149

@zhaohaixu

Description

@zhaohaixu

软件环境

- paddlepaddle: 63ae2a5 (commit id)
- paddlenlp: bffd3b549bf55f88d875151ae67a0dbc3540f32e (commit id)

重复问题

  • I have searched the existing issues

错误描述

当运行运行静态图推理的benchmark时,分布式启动会报出如下的错误

Traceback (most recent call last):
  File "/softwares/PaddleNLP/llm/devices/sdaa/llama/./../../../predict/predictor.py", line 1626, in <module>
    predict()
  File "/softwares/PaddleNLP/llm/devices/sdaa/llama/./../../../predict/predictor.py", line 1583, in predict
    benchmark(predictor, predictor_args, model_args)
  File "/softwares/PaddleNLP/llm/devices/sdaa/llama/./../../../predict/predictor.py", line 1607, in benchmark
    outputs, batch_tokens = predictor.predict(batch_source_text, return_tokens=True)
TypeError: cannot unpack non-iterable NoneType object
LAUNCH INFO 2025-03-16 07:56:39,282 Exit code -15

经分析,是因为静态图模式下StaticGraphBlockInferencePredictor的predict函数在return_tokens为True时,仅会在0号进程上返回output_tokens,然后再benchmark函数中,每一个进程都会执行下面两句:

outputs, batch_tokens = predictor.predict(batch_source_text, return_tokens=True)
output_tokens += sum([len(tokens) for tokens in batch_tokens])

这就使得非0号进程无法获得batch_tokens并报如上的错误。

稳定复现步骤 & 代码

跑一个任意的分布式静态图推理benchmark即可复现,比如:
1、先cd llm
2、然后静态图导出:
CUDA_VISIBLE_DEVICES=0,1 python -m paddle.distributed.launch predict/export_model.py --model_name_or_path meta-llama/Llama-2-7b --inference_model --output_path ./output_dir/exported_model/llama2_7b_block_size32 --dtype float16 --block_attn --device gpu
3、再静态图推理:
CUDA_VISIBLE_DEVICES=0,1 python -m paddle.distributed.launch predict/predictor.py --model_name_or_path output_dir/exported_model/llama2_7b_block_size32 --dtype float16 --mode static --inference_model 1 --block_attn 1 --device gpu --benchmark 1 --src_length 300 --max_length 100 --batch_size 1

附报错截图及问题代码:

Image

Image

Image

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions