[Bug]: DeepSeek-R1-Distill-Qwen-32B运行出错，报(PreconditionNotMet) op [] kernel output args (0) defs should equal op outputs (11)

### 软件环境

```Markdown
机器环境：V100
环境：
export MODEL_PATH=${MODEL_PATH:-$PWD}
docker run  -i --rm  --gpus all --shm-size 32G --network=host --privileged --cap-add=SYS_PTRACE --name=paddlenlp_llm -v $MODEL_PATH:/models -v/home/:/host_mnt -dit ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddlenlp:llm-serving-cuda118-cudnn8-v2.1 /bin/bash

export model_name="deepseek-ai/DeepSeek-R1-Distill-Qwen-32B/weight_only_int8"
export MODEL_PATH=/models/
start_server $model_name
```

### 重复问题

- [x] I have searched the existing issues

### 错误描述

```Markdown
--- Running PIR pass [group_norm_silu_fuse_pass]
--- Running PIR pass [matmul_scale_fuse_pass]
--- Running PIR pass [matmul_transpose_fuse_pass]
--- Running PIR pass [transpose_flatten_concat_fuse_pass]
--- Running PIR pass [remove_redundant_transpose_pass]
--- Running PIR pass [horizontal_fuse_pass]
--- Running PIR pass [transfer_layout_pass]
--- Running PIR pass [common_subexpression_elimination_pass]
--- Running PIR pass [params_sync_among_devices_pass]
I0324 08:43:09.321724   289 print_statistics.cc:50] --- detected [708] subgraphs!
--- Running PIR pass [constant_folding_pass]
I0324 08:43:09.324760   289 pir_interpreter.cc:1619] New Executor is Running ...
I0324 08:43:09.325079   289 pir_interpreter.cc:1643] pir interpreter is running by multi-thread mode ...
I0324 08:43:09.334658   289 print_statistics.cc:44] --- detected [5, 1395] subgraphs!
--- Running PIR pass [dead_code_elimination_pass]
I0324 08:43:09.336036   289 print_statistics.cc:50] --- detected [3] subgraphs!
--- Running PIR pass [replace_fetch_with_shadow_output_pass]
I0324 08:43:09.336899   289 print_statistics.cc:50] --- detected [1] subgraphs!
Traceback (most recent call last):
  File "/opt/source/PaddleNLP/llm/server/server/server/engine/infer.py", line 792, in <module>
    main()
  File "/opt/source/PaddleNLP/llm/server/server/server/engine/infer.py", line 787, in main
    model_runner = ModelRunner(args)
  File "/opt/source/PaddleNLP/llm/server/server/server/engine/infer.py", line 105, in __init__
    self.infer_engine = InferenceEngine(
  File "/opt/source/PaddleNLP/llm/server/server/server/engine/infer.py", line 718, in __init__
    self._init_predictor()
  File "/opt/source/PaddleNLP/llm/server/server/server/engine/infer.py", line 741, in _init_predictor
    self.predictor = paddle.inference.create_predictor(config)
RuntimeError: (PreconditionNotMet) op [] kernel output args (0) defs should equal op outputs (11)
  [Hint: Expected op_item->num_results() == output_defs.size(), but received op_item->num_results():11 != output_defs.size():0.] (at ../paddle/fluid/pir/transforms/pd_op_to_kernel_pass.cc:2517)

报错感觉像是配置文件不对，很笼统，不知道如何排查具体问题
```

### 稳定复现步骤 & 代码

export MODEL_PATH=${MODEL_PATH:-$PWD}
docker run  -i --rm  --gpus all --shm-size 32G --network=host --privileged --cap-add=SYS_PTRACE --name=paddlenlp_llm -v $MODEL_PATH:/models -v/home/:/host_mnt -dit ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddlenlp:llm-serving-cuda118-cudnn8-v2.1 /bin/bash

export model_name="deepseek-ai/DeepSeek-R1-Distill-Qwen-32B/weight_only_int8"
export MODEL_PATH=/models/
start_server $model_name

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: DeepSeek-R1-Distill-Qwen-32B运行出错，报(PreconditionNotMet) op [] kernel output args (0) defs should equal op outputs (11) #10257

软件环境

重复问题

错误描述

稳定复现步骤 & 代码

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: DeepSeek-R1-Distill-Qwen-32B运行出错，报(PreconditionNotMet) op [] kernel output args (0) defs should equal op outputs (11) #10257

Description

软件环境

重复问题

错误描述

稳定复现步骤 & 代码

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions