Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot import name 'MSRA' from 'paddle.fluid.initializer' (/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/initializer.py) #55765

Closed
Liangwei-0521 opened this issue Jul 28, 2023 · 11 comments
Assignees
Labels
status/following-up 跟进中 type/debug 帮用户debug

Comments

@Liangwei-0521
Copy link

bug描述 Describe the Bug

Bug:
ImportError: cannot import name 'MSRA' from 'paddle.fluid.initializer' (/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/initializer.py)

运行代码:
import os
import paddleslim.quant as quant

  def _quant_embeddings(args):
      # input_prefix: 静态模型路径
      # 最终量化的模型路径:
      self.args.output_filename_prefix = "quant_emb"
      paddle.enable_static()
      place = paddle.set_device(self.args.device)
      exe = paddle.static.Executor(place)
      main_program, feed_target_names, fetch_targets = paddle.static.load_inference_model(input_prefix, exe)
  
      config = {"quantize_op_types": ["lookup_table_v2"], "lookup_table_v2": {"quantize_type": "moving_average_abs_max"}}
  
      quant_emb_program = quant.quant_embedding(main_program, place, config)
  
      input_dir = os.path.dirname(input_prefix)
  
      paddle.fluid.io.save_inference_model(
          input_dir,
          feed_target_names,
          fetch_targets,
          exe,
          quant_emb_program,
          model_filename=self.args.output_filename_prefix + ".pdmodel",
          params_filename=self.args.output_filename_prefix + ".pdiparams",
          export_for_deployment=True,
          program_only=False,
      )

版本:
paddle-bfloat 0.1.7
paddle2onnx 1.0.0
paddlefsl 1.1.0
paddlehub 2.3.0
paddlenlp 2.5.2
paddlepaddle 2.5.1
paddleslim 2.4.0

其他补充信息 Additional Supplementary Information

No response

@qili93
Copy link
Contributor

qili93 commented Jul 28, 2023

@1998-Chen 您好,请提供一下 ImportError: cannot import name 'MSRA' from 'paddle.fluid.initializer' 前后完整的python报错的stack信息,以便分析这个错误是在Paddle框架中还是在PaddleSlim中出现的,谢谢!

另外可以尝试下把 paddle.fluid.io.save_inference_model 接口换成 paddle.static.save_inference_model 接口。

Paddle从2.5.0开始不再支持所有 paddle.fluid 开头的接口了,可以参考 Paddle 2.5 release信息 https://github.com/PaddlePaddle/Paddle/releases/tag/v2.5.0 中 2. 不兼容升级 的 paddle.fluid API全面退场的内容。

@Liangwei-0521
Copy link
Author

Liangwei-0521 commented Jul 28, 2023

任务描述:我们想基于paddleslim实现QAT量化以及embedding量化,QAT量化后用paddle inference推理无报错。然而,想进行embedding量化,发现高版本的paddle2.5.0报错,具体报错内容如下:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
/tmp/ipykernel_383/2533658229.py in <module>
      1 import os
----> 2 import paddleslim.quant as quant
      3 
      4 def _quant_embeddings(args):
      5     # input_prefix: 静态模型路径

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleslim/__init__.py in <module>
     14 
     15 from __future__ import absolute_import
---> 16 from paddleslim import models
     17 from paddleslim import prune
     18 from paddleslim import nas

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleslim/models/__init__.py in <module>
     14 
     15 from __future__ import absolute_import
---> 16 from .util import image_classification
     17 from .slimfacenet import SlimFaceNet_A_x0_60, SlimFaceNet_B_x0_75, SlimFaceNet_C_x0_75
     18 from .slim_mobilenet import SlimMobileNet_v1, SlimMobileNet_v2, SlimMobileNet_v3, SlimMobileNet_v4, SlimMobileNet_v5

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleslim/models/util.py in <module>
      1 from __future__ import absolute_import
      2 import paddle.fluid as fluid
----> 3 from ..models import classification_models
      4 
      5 __all__ = ["image_classification"]

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleslim/models/classification_models.py in <module>
      1 from __future__ import absolute_import
----> 2 from .mobilenet import MobileNet
      3 from .resnet import ResNet34, ResNet50
      4 from .mobilenet_v2 import MobileNetV2
      5 __all__ = ["model_list", "MobileNet", "ResNet34", "ResNet50", "MobileNetV2"]

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleslim/models/mobilenet.py in <module>
      3 from __future__ import print_function
      4 import paddle.fluid as fluid
----> 5 from paddle.fluid.initializer import MSRA
      6 from paddle.fluid.param_attr import ParamAttr
      7 

ImportError: cannot import name 'MSRA' from 'paddle.fluid.initializer' (/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/initializer.py)

结合你所说的paddle2.5.0不再支持paddle.fluid全部接口,我们将paddlepaddle降低版本至version==2.4.0,上述错误可以解决,即使用低版本的paddle实现了paddleslim的embedding量化。但是,在低版本的paddle执行静态模型推理时,报以下错误:

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
/tmp/ipykernel_2165/16845876.py in <module>
     28 # 执行测试
     29 for batch_data in batches:
---> 30     results.extend(predictor.predict(batch_data, tokenizer))
     31 end = time.time()
     32 run_time = end - start

/tmp/ipykernel_2165/1978965815.py in predict(self, data, tokenizer)
     79         self.input_handles[1].copy_from_cpu(token_type_ids)
     80         self.input_handles[2].copy_from_cpu(attention_mask)
---> 81         self.predictor.run()
     82         sim_score = self.output_handle.copy_to_cpu()
     83 

NotImplementedError: (Unimplemented) There are no kernels which are registered in the lookup_table_v2 operator.
  [Hint: Expected kernels_iter != all_op_kernels.end(), but received kernels_iter == all_op_kernels.end().] (at /paddle/paddle/fluid/framework/operator.cc:1895)
  [operator < lookup_table_v2 > error]

为此,我使用了高版本的paddlepaddle(version==2.5.0)进行推理量化后的模型,上述的报错消失,然而高版本的paddlepaddle实现模型推理的时出现以下的Warnig,在CPU下面测试并无加速,甚至比未量化的模型更慢:

[2023-07-28 11:09:52,467] [    INFO] - We are using <class 'paddlenlp.transformers.ernie.fast_tokenizer.ErnieFastTokenizer'> to load './checkpoint/ernie-3.0-nano-zh'.
46665it [00:04, 11158.08it/s]
I0728 11:09:57.157755  2568 analysis_predictor.cc:1471] MKLDNN is enabled
--- Running analysis [ir_graph_build_pass]
I0728 11:09:57.163435  2568 executor.cc:187] Old Executor is Running.
--- Running analysis [ir_analysis_pass]
--- Running IR pass [mkldnn_placement_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [layer_norm_fuse_pass]
--- Running IR pass [attention_lstm_fuse_pass]
--- Running IR pass [seqconv_eltadd_relu_fuse_pass]
--- Running IR pass [seqpool_cvm_concat_fuse_pass]
--- Running IR pass [mul_lstm_fuse_pass]
--- Running IR pass [fc_gru_fuse_pass]
--- Running IR pass [mul_gru_fuse_pass]
--- Running IR pass [seq_concat_fc_fuse_pass]
--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]
--- Running IR pass [matmul_v2_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]
W0728 11:09:57.251009  2568 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 11:09:57.251089  2568 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 11:09:57.251137  2568 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 11:09:57.251505  2568 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 11:09:57.251556  2568 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 11:09:57.251602  2568 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 11:09:57.251648  2568 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 11:09:57.251693  2568 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 11:09:57.251739  2568 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 11:09:57.251957  2568 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 11:09:57.252005  2568 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 11:09:57.252050  2568 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 11:09:57.252095  2568 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 11:09:57.252140  2568 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 11:09:57.252185  2568 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 11:09:57.252383  2568 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 11:09:57.252430  2568 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 11:09:57.252475  2568 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 11:09:57.252521  2568 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 11:09:57.252565  2568 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 11:09:57.252610  2568 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 11:09:57.252841  2568 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 11:09:57.252889  2568 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 11:09:57.252934  2568 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
I0728 11:09:57.253104  2568 fuse_pass_base.cc:59] ---  detected 10 subgraphs
--- Running IR pass [matmul_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
--- Running IR pass [repeated_fc_relu_fuse_pass]
--- Running IR pass [squared_mat_sub_fuse_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_transpose_bn_fuse_pass]
--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]
--- Running IR pass [is_test_pass]
--- Running IR pass [constant_folding_pass]
--- Running IR pass [squeeze2_transpose2_onednn_fuse_pass]
--- fused 0 squeeze2 with transpose2
--- Running IR pass [depthwise_conv_mkldnn_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_affine_channel_mkldnn_fuse_pass]
--- Running IR pass [conv_transpose_bn_fuse_pass]
--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_bias_mkldnn_fuse_pass]
--- Running IR pass [conv_transpose_bias_mkldnn_fuse_pass]
--- Running IR pass [conv_elementwise_add_mkldnn_fuse_pass]
--- Running IR pass [conv_activation_mkldnn_fuse_pass]
--- Running IR pass [scale_matmul_fuse_pass]
I0728 11:09:57.395366  2568 fuse_pass_base.cc:59] ---  detected 4 subgraphs
---    fused 4 scale with matmul
--- Running IR pass [reshape_transpose_matmul_mkldnn_fuse_pass]
I0728 11:09:57.409730  2568 fuse_pass_base.cc:59] ---  detected 12 subgraphs
---    fused 12 reshape + transpose + matmul with reshape's xshape with transpose's xshape
--- Running IR pass [matmul_transpose_reshape_mkldnn_fuse_pass]
I0728 11:09:57.421487  2568 fuse_pass_base.cc:59] ---  detected 4 subgraphs
---    fused 4 fused_matmul + transpose + reshape patterns
--- Running IR pass [matmul_elementwise_add_mkldnn_fuse_pass]
I0728 11:09:57.425412  2568 fuse_pass_base.cc:59] ---  detected 4 subgraphs
---    fused 4 fused_matmul (as x) with elementwise_add
I0728 11:09:57.428108  2568 fuse_pass_base.cc:59] ---  detected 2 subgraphs
---    fused 2 matmul (as x) with elementwise_add
W0728 11:09:57.432423  2568 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(matmul_elementwise_add_mkldnn_fuse_pass) failed!
W0728 11:09:57.432436  2568 matmul_elementwise_add_mkldnn_fuse_pass.cc:64] op compat for matmul_elementwise_add_mkldnn_fuse_pass failed.
W0728 11:09:57.432500  2568 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(matmul_elementwise_add_mkldnn_fuse_pass) failed!
W0728 11:09:57.432503  2568 matmul_elementwise_add_mkldnn_fuse_pass.cc:64] op compat for matmul_elementwise_add_mkldnn_fuse_pass failed.
W0728 11:09:57.432565  2568 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(matmul_elementwise_add_mkldnn_fuse_pass) failed!
W0728 11:09:57.432569  2568 matmul_elementwise_add_mkldnn_fuse_pass.cc:64] op compat for matmul_elementwise_add_mkldnn_fuse_pass failed.
W0728 11:09:57.432653  2568 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(matmul_elementwise_add_mkldnn_fuse_pass) failed!
W0728 11:09:57.432657  2568 matmul_elementwise_add_mkldnn_fuse_pass.cc:64] op compat for matmul_elementwise_add_mkldnn_fuse_pass failed.
W0728 11:09:57.432719  2568 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(matmul_elementwise_add_mkldnn_fuse_pass) failed!
W0728 11:09:57.432723  2568 matmul_elementwise_add_mkldnn_fuse_pass.cc:64] op compat for matmul_elementwise_add_mkldnn_fuse_pass failed.
W0728 11:09:57.432791  2568 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(matmul_elementwise_add_mkldnn_fuse_pass) failed!
W0728 11:09:57.432803  2568 matmul_elementwise_add_mkldnn_fuse_pass.cc:64] op compat for matmul_elementwise_add_mkldnn_fuse_pass failed.
W0728 11:09:57.432865  2568 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(matmul_elementwise_add_mkldnn_fuse_pass) failed!
W0728 11:09:57.432868  2568 matmul_elementwise_add_mkldnn_fuse_pass.cc:64] op compat for matmul_elementwise_add_mkldnn_fuse_pass failed.
W0728 11:09:57.432930  2568 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(matmul_elementwise_add_mkldnn_fuse_pass) failed!
W0728 11:09:57.432933  2568 matmul_elementwise_add_mkldnn_fuse_pass.cc:64] op compat for matmul_elementwise_add_mkldnn_fuse_pass failed.
W0728 11:09:57.432994  2568 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(matmul_elementwise_add_mkldnn_fuse_pass) failed!
W0728 11:09:57.432998  2568 matmul_elementwise_add_mkldnn_fuse_pass.cc:64] op compat for matmul_elementwise_add_mkldnn_fuse_pass failed.
W0728 11:09:57.433059  2568 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(matmul_elementwise_add_mkldnn_fuse_pass) failed!
W0728 11:09:57.433063  2568 matmul_elementwise_add_mkldnn_fuse_pass.cc:64] op compat for matmul_elementwise_add_mkldnn_fuse_pass failed.
W0728 11:09:57.433125  2568 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(matmul_elementwise_add_mkldnn_fuse_pass) failed!
W0728 11:09:57.433127  2568 matmul_elementwise_add_mkldnn_fuse_pass.cc:64] op compat for matmul_elementwise_add_mkldnn_fuse_pass failed.
W0728 11:09:57.433189  2568 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(matmul_elementwise_add_mkldnn_fuse_pass) failed!
W0728 11:09:57.433192  2568 matmul_elementwise_add_mkldnn_fuse_pass.cc:64] op compat for matmul_elementwise_add_mkldnn_fuse_pass failed.
W0728 11:09:57.433254  2568 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(matmul_elementwise_add_mkldnn_fuse_pass) failed!
W0728 11:09:57.433257  2568 matmul_elementwise_add_mkldnn_fuse_pass.cc:64] op compat for matmul_elementwise_add_mkldnn_fuse_pass failed.
W0728 11:09:57.433318  2568 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(matmul_elementwise_add_mkldnn_fuse_pass) failed!
W0728 11:09:57.433321  2568 matmul_elementwise_add_mkldnn_fuse_pass.cc:64] op compat for matmul_elementwise_add_mkldnn_fuse_pass failed.
W0728 11:09:57.433383  2568 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(matmul_elementwise_add_mkldnn_fuse_pass) failed!
W0728 11:09:57.433387  2568 matmul_elementwise_add_mkldnn_fuse_pass.cc:64] op compat for matmul_elementwise_add_mkldnn_fuse_pass failed.
W0728 11:09:57.433447  2568 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(matmul_elementwise_add_mkldnn_fuse_pass) failed!
W0728 11:09:57.433450  2568 matmul_elementwise_add_mkldnn_fuse_pass.cc:64] op compat for matmul_elementwise_add_mkldnn_fuse_pass failed.
W0728 11:09:57.433512  2568 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(matmul_elementwise_add_mkldnn_fuse_pass) failed!
W0728 11:09:57.433516  2568 matmul_elementwise_add_mkldnn_fuse_pass.cc:64] op compat for matmul_elementwise_add_mkldnn_fuse_pass failed.
W0728 11:09:57.433576  2568 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(matmul_elementwise_add_mkldnn_fuse_pass) failed!
W0728 11:09:57.433579  2568 matmul_elementwise_add_mkldnn_fuse_pass.cc:64] op compat for matmul_elementwise_add_mkldnn_fuse_pass failed.
W0728 11:09:57.433640  2568 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(matmul_elementwise_add_mkldnn_fuse_pass) failed!
W0728 11:09:57.433643  2568 matmul_elementwise_add_mkldnn_fuse_pass.cc:64] op compat for matmul_elementwise_add_mkldnn_fuse_pass failed.
W0728 11:09:57.433707  2568 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(matmul_elementwise_add_mkldnn_fuse_pass) failed!
W0728 11:09:57.433710  2568 matmul_elementwise_add_mkldnn_fuse_pass.cc:64] op compat for matmul_elementwise_add_mkldnn_fuse_pass failed.
W0728 11:09:57.433779  2568 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(matmul_elementwise_add_mkldnn_fuse_pass) failed!
W0728 11:09:57.433781  2568 matmul_elementwise_add_mkldnn_fuse_pass.cc:64] op compat for matmul_elementwise_add_mkldnn_fuse_pass failed.
W0728 11:09:57.433842  2568 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(matmul_elementwise_add_mkldnn_fuse_pass) failed!
W0728 11:09:57.433845  2568 matmul_elementwise_add_mkldnn_fuse_pass.cc:64] op compat for matmul_elementwise_add_mkldnn_fuse_pass failed.
W0728 11:09:57.433907  2568 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(matmul_elementwise_add_mkldnn_fuse_pass) failed!
W0728 11:09:57.433909  2568 matmul_elementwise_add_mkldnn_fuse_pass.cc:64] op compat for matmul_elementwise_add_mkldnn_fuse_pass failed.
W0728 11:09:57.433971  2568 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(matmul_elementwise_add_mkldnn_fuse_pass) failed!
W0728 11:09:57.433974  2568 matmul_elementwise_add_mkldnn_fuse_pass.cc:64] op compat for matmul_elementwise_add_mkldnn_fuse_pass failed.
--- Running IR pass [matmul_activation_mkldnn_fuse_pass]
I0728 11:09:57.439098  2568 fuse_pass_base.cc:59] ---  detected 1 subgraphs
---    fused 1 fused_matmul with sigmoid activation
I0728 11:09:57.440340  2568 fuse_pass_base.cc:59] ---  detected 1 subgraphs
---    fused 1 fused_matmul with tanh activation
--- Running IR pass [fc_mkldnn_pass]
--- Running IR pass [fc_act_mkldnn_fuse_pass]
--- Running IR pass [fc_elementwise_add_mkldnn_fuse_pass]
--- Running IR pass [batch_norm_act_fuse_pass]
--- Running IR pass [softplus_activation_onednn_fuse_pass]
--- Running IR pass [shuffle_channel_mkldnn_detect_pass]
--- Running IR pass [elementwise_act_onednn_fuse_pass]
I0728 11:09:57.462807  2568 fuse_pass_base.cc:59] ---  detected 4 subgraphs
---    fused 4 elementwise_add with gelu activation
--- Running IR pass [layer_norm_onednn_optimization_pass]
I0728 11:09:57.475025  2568 fuse_pass_base.cc:59] ---  detected 9 subgraphs
---    optimized 9 layer_norms by merging Scale and Bias
--- Running IR pass [operator_scale_onednn_fuse_pass]
--- Running IR pass [operator_unsqueeze2_onednn_fuse_pass]
--- Running IR pass [operator_reshape2_onednn_fuse_pass]
--- Running analysis [save_optimized_model_pass]
W0728 11:09:57.483620  2568 save_optimized_model_pass.cc:28] save_optim_cache_model is turned off, skip save_optimized_model_pass
--- Running analysis [ir_params_sync_among_devices_pass]
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [ir_graph_to_program_pass]
I0728 11:09:57.519243  2568 analysis_predictor.cc:1660] ======= optimize end =======
I0728 11:09:57.521556  2568 naive_executor.cc:164] ---  skip [feed], feed -> attention_mask_pos
I0728 11:09:57.521575  2568 naive_executor.cc:164] ---  skip [feed], feed -> token_type_ids_pos
I0728 11:09:57.521579  2568 naive_executor.cc:164] ---  skip [feed], feed -> input_ids_pos
I0728 11:09:57.522771  2568 naive_executor.cc:164] ---  skip [sigmoid_0.tmp_0], fetch -> fetch
I0728 11:09:57.525840  2568 onednn_context.cc:81] oneDNN v2.7.3

以上便是,项目中出现的错误链,因此,想询问如何基于高版本的paddle实现paddleslim量化(embedding 量化)。 @qili93

@qili93
Copy link
Contributor

qili93 commented Jul 28, 2023

你好,PaddleSlim的版本需要和 PaddlePaddle版本保持一致,目前 PaddleSlim 尚未发布对应v2.5.0或者v2.5.1的版本,可以尝试使用 PaddleSlim 的 release/2.5 分支得到支持 PaddlePaddle 2.5 版本的 PaddleSlim 安装包,具体方式为

# 下载 release/2.5 分支的代码
git clone https://github.com/PaddlePaddle/PaddleSlim.git -b release/2.5

# 源码编译安装
cd PaddleSlim
python setup.py install

PaddlePaddle仍旧使用 v2.5.0 或者 v2.5.1 的版本即可,请确认下更新 PaddleSlim 版本之后问题是否仍旧存在,谢谢!

@Liangwei-0521
Copy link
Author

回复:
根据您提供的建议,我使用 PaddleSlim 的 release/2.5 分支解决了上述版本冲突的问题。所用版本信息如下:

paddlenlp                      2.5.2
paddlepaddle                   2.5.0
paddlepaddle-gpu               2.5.0
paddleslim                     2.5.0

根据推理时输出的信息,

[2023-07-28 16:18:20,142] [    INFO] - We are using <class 'paddlenlp.transformers.ernie.fast_tokenizer.ErnieFastTokenizer'> to load '/home/aistudio/checkpoint/ernie-3.0-nano-zh'.
46665it [00:04, 10982.13it/s]
I0728 16:18:25.369899  1837 analysis_predictor.cc:1471] MKLDNN is enabled
--- Running analysis [ir_graph_build_pass]
I0728 16:18:25.375305  1837 executor.cc:187] Old Executor is Running.
--- Running analysis [ir_analysis_pass]
--- Running IR pass [mkldnn_placement_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [layer_norm_fuse_pass]
--- Running IR pass [attention_lstm_fuse_pass]
--- Running IR pass [seqconv_eltadd_relu_fuse_pass]
--- Running IR pass [seqpool_cvm_concat_fuse_pass]
--- Running IR pass [mul_lstm_fuse_pass]
--- Running IR pass [fc_gru_fuse_pass]
--- Running IR pass [mul_gru_fuse_pass]
--- Running IR pass [seq_concat_fc_fuse_pass]
--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]
--- Running IR pass [matmul_v2_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]
W0728 16:18:25.427968  1837 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 16:18:25.428051  1837 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 16:18:25.428100  1837 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 16:18:25.428479  1837 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 16:18:25.428534  1837 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 16:18:25.428581  1837 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 16:18:25.428628  1837 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 16:18:25.428675  1837 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 16:18:25.428722  1837 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 16:18:25.428958  1837 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 16:18:25.429009  1837 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 16:18:25.429056  1837 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 16:18:25.429103  1837 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 16:18:25.429150  1837 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 16:18:25.429196  1837 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 16:18:25.429419  1837 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 16:18:25.429469  1837 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 16:18:25.429517  1837 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 16:18:25.429563  1837 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 16:18:25.429611  1837 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 16:18:25.429658  1837 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 16:18:25.429883  1837 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 16:18:25.429934  1837 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
W0728 16:18:25.429989  1837 gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape. 
I0728 16:18:25.430182  1837 fuse_pass_base.cc:59] ---  detected 10 subgraphs
--- Running IR pass [matmul_scale_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
--- Running IR pass [repeated_fc_relu_fuse_pass]
--- Running IR pass [squared_mat_sub_fuse_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_transpose_bn_fuse_pass]
--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]
--- Running IR pass [is_test_pass]
--- Running IR pass [constant_folding_pass]
--- Running IR pass [squeeze2_transpose2_onednn_fuse_pass]
--- fused 0 squeeze2 with transpose2
--- Running IR pass [depthwise_conv_mkldnn_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_affine_channel_mkldnn_fuse_pass]
--- Running IR pass [conv_transpose_bn_fuse_pass]
--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_bias_mkldnn_fuse_pass]
--- Running IR pass [conv_transpose_bias_mkldnn_fuse_pass]
--- Running IR pass [conv_elementwise_add_mkldnn_fuse_pass]
--- Running IR pass [conv_activation_mkldnn_fuse_pass]
--- Running IR pass [scale_matmul_fuse_pass]
I0728 16:18:25.540614  1837 fuse_pass_base.cc:59] ---  detected 4 subgraphs
---    fused 4 scale with matmul
--- Running IR pass [reshape_transpose_matmul_mkldnn_fuse_pass]
I0728 16:18:25.551424  1837 fuse_pass_base.cc:59] ---  detected 12 subgraphs
---    fused 12 reshape + transpose + matmul with reshape's xshape
--- Running IR pass [matmul_transpose_reshape_mkldnn_fuse_pass]
--- Running IR pass [matmul_elementwise_add_mkldnn_fuse_pass]
I0728 16:18:25.565969  1837 fuse_pass_base.cc:59] ---  detected 4 subgraphs
---    fused 4 fused_matmul (as x) with elementwise_add
I0728 16:18:25.568967  1837 fuse_pass_base.cc:59] ---  detected 2 subgraphs
---    fused 2 matmul (as x) with elementwise_add
I0728 16:18:25.576004  1837 fuse_pass_base.cc:59] ---  detected 24 subgraphs
---    fused 24 matmul_v2 (as x) with elementwise_add
--- Running IR pass [matmul_activation_mkldnn_fuse_pass]
--- Running IR pass [fc_mkldnn_pass]
--- Running IR pass [fc_act_mkldnn_fuse_pass]
--- Running IR pass [fc_elementwise_add_mkldnn_fuse_pass]
--- Running IR pass [batch_norm_act_fuse_pass]
--- Running IR pass [softplus_activation_onednn_fuse_pass]
--- Running IR pass [shuffle_channel_mkldnn_detect_pass]
--- Running IR pass [elementwise_act_onednn_fuse_pass]
--- Running IR pass [layer_norm_onednn_optimization_pass]
I0728 16:18:25.623422  1837 fuse_pass_base.cc:59] ---  detected 9 subgraphs
---    optimized 9 layer_norms by merging Scale and Bias
--- Running IR pass [operator_scale_onednn_fuse_pass]
--- Running IR pass [operator_unsqueeze2_onednn_fuse_pass]
--- Running IR pass [operator_reshape2_onednn_fuse_pass]
--- Running analysis [save_optimized_model_pass]
W0728 16:18:25.632709  1837 save_optimized_model_pass.cc:28] save_optim_cache_model is turned off, skip save_optimized_model_pass
--- Running analysis [ir_params_sync_among_devices_pass]
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [ir_graph_to_program_pass]
I0728 16:18:25.670608  1837 analysis_predictor.cc:1660] ======= optimize end =======
I0728 16:18:25.6708  1837 analysis_predictor.cc:1660] ======= optimize end =======
I0728 16:18:25.673341  1837 naive_executor.cc:164] ---  skip [feed], feed -> attention_mask_pos
I0728 16:18:25.673357  1833341  1837 naive_executor.cc:164] ---  skip [feed], feed -> attention_mask_pos
I0728 16:18:25.673357  1837 naive_executor.cc:164] ---  skip [feed], feed -> token_type_ids_pos
I0728 16:18:25.673369  1837 naive_executor.cc:164] ---  skip [feed], feed -> input_ids_pos
I0728 16:18:25.675235  1837 naive_executor.cc:164] ---  skip [save_infer_model/scale_0.tmp_0], fetch -> fetch
I0728 16:18:25.677532  1837 onednn_context.cc:81] oneDNN v2.7.3

关于“failed!”的Warning解决,但是关于gpu_cpu_map_matmul_to_mul_pass.cc:425] matmul op not support broadcast, please check inputs'shape.的问题想进一步咨询:出现关于shape的warning,初步认为会不会是实现QAT量化时在保存模型时指定input_spec均为[None, None]导致,还是另有原因(QAT的实现方法是否正确?)

我们实现QAT的代码如下:

   import paddleslim as slim
   from paddleslim import QAT
    quant_config = {
        'weight_preprocess_type': None,
        'activation_preprocess_type': None,
        'weight_quantize_type': 'channel_wise_abs_max',
        'activation_quantize_type': 'moving_average_abs_max',
        'weight_bits': 8,
        'activation_bits': 8,
        'dtype': 'int8',
        'window_size': 10000,
        'moving_rate': 0.9,
        'quantizable_layer_type': ['Linear'],
    }
    quanter = QAT(config=quant_config)
    # 量化模型
    quanter.quantize(model)  ## model = quanter.quantize(model)

    # 训练模型
    train(args, model, train_data_loader, dev_data_loader, dev_data)

    # 保存量化模型
    print('------quant------')
    quanter.save_quantized_model(
        model,
        path=args.quant_save_path,
        input_spec=[paddle.static.InputSpec(shape=[None, None], dtype="int64"),    # inputs_ids_pos
                    paddle.static.InputSpec(shape=[None, None], dtype="int64"),             # token_type_ids_pos
                    paddle.static.InputSpec(shape=[None, None], dtype="int64"),             # attention mask
    ])
    print('------embedding quant------')

以上便是进一步需要咨询的问题。 @qili93

@Liangwei-0521
Copy link
Author

Liangwei-0521 commented Jul 28, 2023

经过测试,模型的大小减至原先模型大小的四分之一,但是推理速度严重地比传统模型慢,分两种情况:
1、QAT量化:比较传统模型,延迟10%
2、QAT量化+embedding量化(低版本paddleslim):比较传统模型,延迟一倍。
3、QAT量化+embedding量化(高版本paddleslim,即上述git链接):比较传统模型,延迟多倍。
请问是否可以提供基于paddlenlp的ernie-3.0-nano-zh模型,利用paddleslim进行QAT量化+embedding量化的最新参考代码/文档。 @qili93

@qili93
Copy link
Contributor

qili93 commented Jul 29, 2023

@wanghaoshuang 帮忙看下Slim QAT量化的问题,谢谢!

@wanghaoshuang
Copy link
Contributor

经过测试,模型的大小减至原先模型大小的四分之一,但是推理速度严重地比传统模型慢,分两种情况:
1、QAT量化:比较传统模型,延迟10%
2、QAT量化+embedding量化(低版本paddleslim):比较传统模型,延迟一倍。
3、QAT量化+embedding量化(高版本paddleslim,即上述git链接):比较传统模型,延迟多倍。
请问是否可以提供基于paddlenlp的ernie-3.0-nano-zh模型,利用paddleslim进行QAT量化+embedding量化的最新参考代码/文档。

您 QAT 实现的没问题。
PaddleSlim 产出的量化模型后,是否有加速还依赖推理库的支持。目前 BERT模型只支持在 PaddleInference + GPU 上的量化推理加速,没有在 CPU推理上优化,所以不能保证在 CPU 上有加速。

@Liangwei-0521
Copy link
Author

Liangwei-0521 commented Jul 31, 2023

根据您的建议,我们将模型放置GPU尝试进行推理(paddlepaddle-gpu=2.5.1),存在以下的报错:

@wanghaoshuang

--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_analysis_pass]
--- Running IR pass [map_op_to_another_pass]
--- Running IR pass [identity_scale_op_clean_pass]
I0731 15:22:02.315886   870 fuse_pass_base.cc:59] ---  detected 1 subgraphs
--- Running IR pass [is_test_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [delete_quant_dequant_linear_op_pass]
I0731 15:22:02.343159   870 fuse_pass_base.cc:59] ---  detected 52 subgraphs
--- Running IR pass [delete_weight_dequant_linear_op_pass]
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
/tmp/ipykernel_870/2712444669.py in <module>
     23 predictor = Predictor(model_dir, device, max_seq_length,
     24                         batch_size, use_tensorrt, precision,
---> 25                         cpu_threads, enable_mkldnn)
     26 results = []
     27 rst_dict = {}

/tmp/ipykernel_870/1978965815.py in __init__(self, model_dir, device, max_seq_length, batch_size, use_tensorrt, precision, cpu_threads, enable_mkldnn)
     50 
     51         config.switch_use_feed_fetch_ops(False)
---> 52         self.predictor = paddle.inference.create_predictor(config)
     53         self.input_handles = [
     54             self.predictor.get_input_handle(name)

NotImplementedError: (Unimplemented) Delete Weight Dequant Linear Op Pass is not supported for per-channel quantization (at ../paddle/fluid/framework/ir/delete_weight_dequant_linear_op_pass.cc:130)

具体的推理代码如下:

class Predictor(object):
    def __init__(self,
                 model_dir,
                 device="gpu",
                 max_seq_length=128,
                 batch_size=32,
                 use_tensorrt=False,
                 precision="fp32",
                 cpu_threads=10,
                 enable_mkldnn=False):
        self.max_seq_length = max_seq_length
        self.batch_size = batch_size

        model_file = model_dir + "rank_quant_emb.pdmodel"
        params_file = model_dir + "rank_quant_emb.pdiparams"
        if not os.path.exists(model_file):
            raise ValueError("not find model file path {}".format(model_file))
        if not os.path.exists(params_file):
            raise ValueError("not find params file path {}".format(params_file))
        config = paddle.inference.Config(model_file, params_file)

        if device == "gpu":
            # set GPU configs accordingly
            # such as intialize the gpu memory, enable tensorrt
            config.enable_use_gpu(100, 0)
            precision_map = {
                "fp16": inference.PrecisionType.Half,
                "fp32": inference.PrecisionType.Float32,
                "int8": inference.PrecisionType.Int8
            }
            precision_mode = precision_map[precision]

            if use_tensorrt:
                config.enable_tensorrt_engine(
                    max_batch_size=batch_size,
                    min_subgraph_size=30,
                    precision_mode=precision_mode)
        elif device == "cpu":
            # set CPU configs accordingly,
            # such as enable_mkldnn, set_cpu_math_library_num_threads
            config.disable_gpu()
            if enable_mkldnn:
                # cache 10 different shapes for mkldnn to avoid memory leak
                config.set_mkldnn_cache_capacity(10)
                config.enable_mkldnn()
            config.set_cpu_math_library_num_threads(cpu_threads)
        elif device == "xpu":
            # set XPU configs accordingly
            config.enable_xpu(100)

        config.switch_use_feed_fetch_ops(False)
        self.predictor = paddle.inference.create_predictor(config)
        self.input_handles = [
            self.predictor.get_input_handle(name)
            for name in self.predictor.get_input_names()
        ]
        self.output_handle = self.predictor.get_output_handle(
            self.predictor.get_output_names()[0])

    def predict(self, data, tokenizer):

        examples = []
        for text in data:
            input_ids, token_type_ids, attention_mask = convert_example_ranking(
                text,
                tokenizer,
                max_seq_length=self.max_seq_length,
                is_test=True)
            examples.append((input_ids, token_type_ids, attention_mask))

        batchify_fn = lambda samples, fn=Tuple(
            Pad(axis=0, pad_val=tokenizer.pad_token_id, dtype='int64'),  # input_ids
            Pad(axis=0, pad_val=tokenizer.pad_token_type_id, dtype='int64'),  # token_type_ids
            Pad(axis=0, pad_val=tokenizer.pad_token_id, dtype='int64'),
        ): fn(samples)

        input_ids, token_type_ids, attention_mask = batchify_fn(examples)
        self.input_handles[0].copy_from_cpu(input_ids)
        self.input_handles[1].copy_from_cpu(token_type_ids)
        self.input_handles[2].copy_from_cpu(attention_mask)
        self.predictor.run()
        sim_score = self.output_handle.copy_to_cpu()
        return sim_score

执行推理代码

import time 

model_dir = '/home/aistudio/checkpoint/'
device = 'gpu'
max_seq_length = 128
batch_size = 32  # 32
# 可以安装对应的Tensorrt之后进行加速
use_tensorrt = False
# 精度,也可以选择fp16,精度几乎无损
precision = 'fp16'
# cpu的线程数目
cpu_threads = 1
# 可以在CPU的情况下进行加速
enable_mkldnn = False
tokenizer = ppnlp.transformers.ErnieTokenizer.from_pretrained('./checkpoint/ernie-3.0-nano-zh')
dev_data = load_dataset(read_text, data_path='/home/aistudio/rank_data/data_v4/all_test.csv', mode='dev', lazy=False) 
batches = [
    dev_data[idx:idx + batch_size]
    for idx in range(0, len(dev_data), batch_size)
]
predictor = Predictor(model_dir, device, max_seq_length,
                        batch_size, use_tensorrt, precision,
                        cpu_threads, enable_mkldnn)
results = []
rst_dict = {}
start = time.time()
# 执行测试
for batch_data in batches:
    results.extend(predictor.predict(batch_data, tokenizer))
end = time.time()
run_time = end - start
print('运行时间:', run_time)

@lyuwenyu
Copy link
Contributor

lyuwenyu commented Aug 2, 2023

paddlepaddle==2.5.1 在安装时候指定一下版本号试一下

@paddle-bot paddle-bot bot added type/debug 帮用户debug status/following-up 跟进中 and removed type/bug-report 报bug status/new-issue 新建 labels Aug 2, 2023
@lrp123456
Copy link

#55765 (comment)
请问该问题同学是否解决了,我出现了相同的问题,无法解决

Copy link

paddle-bot bot commented Sep 10, 2024

Since you haven't replied for more than a year, we have closed this issue/pr.
If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up.
由于您超过一年未回复,我们将关闭这个issue/pr。
若问题未解决或有后续问题,请随时重新打开,我们会继续跟进。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/following-up 跟进中 type/debug 帮用户debug
Projects
None yet
Development

No branches or pull requests

5 participants