add mkldnn int8 related passes and config #38643

baoachun · 2021-12-31T08:00:20Z

PR types

New features

PR changes

Others

Describe

推理config新增mkldnn int8配置选项，及量化pass

paddle-bot-old · 2021-12-31T08:00:25Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle-bot-old · 2022-01-10T02:38:33Z

Sorry to inform you that f97739b's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

lidanqing-intel · 2022-02-17T10:25:57Z

@wozna Hi please review or continue this PR .

wozna

@baoachun You did a lot of great work. I added a few comments to the code.

I have also a question for the next steps. Because the last step of quantization are the three passes cpu_quantize_placement_pass, cpu_quantize_pass, cpu_quantize_squash_pass.
For cpu_quantize_pass, you need scales collected by your passes. To be sure, do you plan to do exactly the same way: save them as attributes to one of the ops and read them in cpu_quantize_pass?

wozna · 2022-02-22T09:46:37Z

paddle/fluid/framework/ir/mkldnn/quant_dequant_mkldnn_fuse_pass.cc

+        if (op_desc->HasAttr("fuse_relu")) {
+          const bool fuse_relu =
+              BOOST_GET_CONST(bool, op_desc->GetAttr("fuse_relu"));
+          if (fuse_relu) activation = "relu";


I think you can use GetAttrIfExists to simplify it

if (op_desc->GetAttrIfExists<bool>("fuse_relu")) activation = "relu";

wozna · 2022-02-22T09:47:35Z

paddle/fluid/framework/ir/mkldnn/quant_dequant_mkldnn_fuse_pass.cc

+          const bool fuse_relu =
+              BOOST_GET_CONST(bool, op_desc->GetAttr("fuse_relu"));
+          if (fuse_relu) activation = "relu";
+        } else if (op_desc->HasAttr("fuse_brelu")) {


wozna · 2022-02-22T10:01:19Z

paddle/fluid/framework/ir/mkldnn/quant_dequant_mkldnn_fuse_pass.cc

+         ++iter) {
+      op_node->Op()->SetAttr(iter->first + "_var_quant_scales", iter->second);
+    }
+    break;


To be sure, the first operator is found here and we enter all scales into it as separate attributes?

wozna · 2022-02-22T10:07:31Z

paddle/fluid/framework/ir/mkldnn/quant_dequant_mkldnn_fuse_pass.cc

+
+      auto* scale_tensor = var->GetMutable<LoDTensor>();
+      auto scale_data = scale_tensor->mutable_data<float>(platform::CPUPlace());
+      float scale = 1.0 / scale_data[0];


In quant2_int8_mkldnn_pass it was checked if the obtained scales did not have the value of infinity then the scale was set to 0. Maybe it is better to add such a check here?

wozna · 2022-02-22T10:12:10Z

paddle/fluid/framework/ir/mkldnn/quant_dequant_mkldnn_fuse_pass.h

@@ -0,0 +1,75 @@
+// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.


I think that date should be 2022

wozna · 2022-02-22T10:32:12Z

paddle/fluid/inference/api/paddle_pass_builder.cc

+    passes_.push_back("conv_relu_mkldnn_fuse_pass");
+    passes_.push_back("conv_relu6_mkldnn_fuse_pass");
+    // need input params?
+    /// passes_.push_back("fc_fuse_pass");


This pass should have set attributes "use_gpu" and "use_fc_padding" to False.

wozna · 2022-02-22T10:36:01Z

paddle/fluid/inference/api/paddle_pass_builder.cc

+    passes_.push_back("repeated_fc_relu_fuse_pass");
+    // enable or disable?
+    // passes_.push_back("fc_mkldnn_pass");
+    // passes_.push_back("fc_act_mkldnn_fuse_pass");


The best option here would be to make a parameter that the user can change. These two passes, depending on the model, can significantly accelerate the model, unfortunately for some models, it causes a decrease in performance.

wozna · 2022-02-22T15:04:22Z

paddle/fluid/inference/api/paddle_pass_builder.cc

+    passes_.push_back("mul_gru_fuse_pass");
+    passes_.push_back("multi_gru_fuse_pass");
+    passes_.push_back("multi_gru_seq_fuse_pass");
+    passes_.push_back("seq_concat_fc_fuse_pass");


Please update passes with recent quan2_int8_mkldnn_pass, because I can see that there where some changes there eg. in this PR #39369

wozna · 2022-02-22T15:44:14Z

paddle/fluid/framework/ir/mkldnn/requant_mkldnn_fuse_pass.cc

+//           }
+//         }
+//       }
+//   };


Calculation of scales for weights is already implemented in mkldnn_quantzier.cc, it is done for GetMaxChGRUScalingFactor or GetMaxChLSTMScalingFactor so you can use it because the functionality is exactly the same.

Paddle/paddle/fluid/inference/api/mkldnn_quantizer.cc

Line 430 in edc3ba1

AnalysisPredictor::MkldnnQuantizer::GetMaxChGRUScalingFactor(

wozna · 2022-02-22T16:04:27Z

paddle/fluid/framework/ir/mkldnn/requant_mkldnn_fuse_pass.cc

+  auto* scope = param_scope();
+  GetQuantInfo(graph, scope, weight_thresholds, var_quant_scales);
+
+  //ComputeWeightScales(graph, scope);


I understand that this pass will functionally correspond to this graph = self._compute_weight_scales (graph), do you also plan to add the functionality of graph = self._propagate_scales (graph) here?

arlesniak · 2022-02-22T19:22:18Z

paddle/fluid/framework/ir/mkldnn/quant_dequant_mkldnn_fuse_pass.cc

+  for (auto* op_node :
+       ir::TopologyVarientSort(*graph, static_cast<ir::SortKind>(0))) {
+    if (!op_node->IsOp() || op_node->Op()->Type() == "feed" ||
+        op_node->Op()->Type() == "feth")


Typo: "fetch"

arlesniak · 2022-02-22T19:31:07Z

paddle/fluid/framework/ir/mkldnn/requant_mkldnn_fuse_pass.cc

+  }
+}
+
+// void RequantMkldnnFusePass::ComputeWeightScales(ir::Graph* graph, Scope* scope,


Is the commented block of code still needed ?

arlesniak · 2022-02-22T19:31:34Z

paddle/fluid/framework/ir/mkldnn/requant_mkldnn_fuse_pass.h

@@ -0,0 +1,45 @@
+// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.


Please change to 2022

arlesniak · 2022-02-22T20:28:51Z

paddle/fluid/framework/ir/mkldnn/quant_dequant_mkldnn_fuse_pass.cc

+          if (fuse_relu) activation = "relu";
+        } else if (op_desc->HasAttr("fuse_brelu")) {
+          const bool fuse_brelu =
+              BOOST_GET_CONST(bool, op_desc->GetAttr("fuse_relu"));


Please double check what attrib name is being read, I suppose fuse_brelu

… mkldnn_int8

baoachun · 2022-03-09T11:16:38Z

paddle/fluid/framework/ir/mkldnn/quant_dequant_mkldnn_fuse_pass.cc

+    std::string output_act_name = fake_quant_out->Var()->Name();
+    auto outlinks = fake_quant_out->outputs;
+    for (auto* next_node : outlinks) {
+      next_node->Op()->RenameInput(output_act_name, input_act_name);


node is op?

wozna · 2022-03-10T13:52:12Z

paddle/fluid/framework/ir/mkldnn/requant_mkldnn_fuse_pass.cc

+
+void RequantMkldnnFusePass::ComputeWeightScales(
+    ir::Graph* graph, Scope* scope, StringPairMap& var_quant_scales) const {
+  auto get_scales = [&](Tensor* tensor, int axis) -> std::vector<float> {


Could you change all lambdas to function? It will simplify the testing process.

OK, I will do it as soon as possible.

… mkldnn_int8

baoachun · 2022-03-21T12:48:58Z

Hi @wozna , Please continue your review~

sfraczek · 2022-03-28T16:52:07Z

paddle/fluid/framework/ir/mkldnn/mkldnn_pass_util.h

@@ -0,0 +1,75 @@
+// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.


sfraczek · 2022-03-29T08:29:52Z

paddle/fluid/framework/ir/mkldnn/mkldnn_pass_util.h

+  const std::string suffix = "_" + key_suffix + "_" + flag;
+  for (auto* op_node :
+       ir::TopologyVarientSort(*graph, static_cast<ir::SortKind>(0))) {
+    if (!op_node->IsOp()) continue;


Should this condition be the same as in SaveInfoInTheFirstOp line 32-33?

Yep, do you have any suggestions please?

I don't know, there you skip feed and fetch ops so I thought maybe you have to skip it here too

if (!op_node->IsOp() || op_node->Op()->Type() == "feed" || op_node->Op()->Type() == "fetch")

sfraczek · 2022-03-29T08:39:58Z

paddle/fluid/framework/ir/mkldnn/mkldnn_pass_util.h

+        if (fake_name.find(suffix) != std::string::npos) {
+          size_t pos = fake_name.find(suffix);


Suggested change

if (fake_name.find(suffix) != std::string::npos) {

size_t pos = fake_name.find(suffix);

size_t pos = fake_name.find(suffix);

if (pos != std::string::npos) {

sfraczek · 2022-03-29T08:58:32Z

paddle/fluid/inference/api/paddle_analysis_config.h

  ///
  /// \brief Turn on MKLDNN bfloat16.
  ///
  ///
  void EnableMkldnnBfloat16();
-


I think you could revert this newline

sfraczek · 2022-03-29T12:49:52Z

paddle/fluid/framework/ir/mkldnn/quant_dequant_mkldnn_pass.cc

+        for (auto* node_input : op_node->inputs) {
+          for (auto* node_input_input : node_input->inputs) {
+            if (!node_input_input->IsOp()) continue;
+            if (op_node->Name().find("quantize_dequantize") ==


Shouldn't it be like this?

Suggested change

if (op_node->Name().find("quantize_dequantize") ==

if (node_input_input->Name().find("quantize_dequantize") ==

Yes，you are right！

sfraczek · 2022-03-30T13:56:44Z

paddle/fluid/framework/ir/mkldnn/compute_propagate_scales_mkldnn_pass.cc

+    std::unordered_map<std::string, std::vector<float>>* info_map) const {
+  for (auto iter = var_quant_scales->begin(); iter != var_quant_scales->end();
+       iter++) {
+    auto* data = iter->second.second.mutable_data<float>(platform::CPUPlace());


Suggested change

auto* data = iter->second.second.mutable_data<float>(platform::CPUPlace());

auto* data = iter->second.second.data<float>(platform::CPUPlace());

sfraczek · 2022-03-30T13:57:33Z

paddle/fluid/framework/ir/mkldnn/compute_propagate_scales_mkldnn_pass.cc

+}
+
+void ComputePropagateScalesMkldnnPass::ConvertStringPairMap(
+    StringPairMap* var_quant_scales,


Can this be const?

I will fix it in the next pr.

sfraczek · 2022-03-30T13:58:15Z

paddle/fluid/framework/ir/mkldnn/compute_propagate_scales_mkldnn_pass.cc

+
+void ComputePropagateScalesMkldnnPass::PropagateScales(
+    ir::Graph* graph, StringPairMap* var_quant_scales,
+    const std::unordered_set<std::string> scale_immutable_ops) const {


could this be reference?

sfraczek · 2022-03-30T13:58:34Z

paddle/fluid/framework/ir/mkldnn/compute_propagate_scales_mkldnn_pass.cc

+
+std::unordered_set<std::string> ComputePropagateScalesMkldnnPass::UpdateScales(
+    ir::Graph* graph, StringPairMap* var_quant_scales,
+    const std::unordered_set<std::string> scale_immutable_ops) const {


could this be reference?

sfraczek · 2022-03-30T15:44:53Z

paddle/fluid/framework/ir/mkldnn/quant_dequant_mkldnn_pass.cc

+      var, "The input persistable var of %s op is not found.", op_desc->Type());
+
+  auto* weight_tensor = var->GetMutable<LoDTensor>();
+  auto* weight_data = weight_tensor->mutable_data<float>(platform::CPUPlace());


Suggested change

auto* weight_data = weight_tensor->mutable_data<float>(platform::CPUPlace());

auto* weight_data = weight_tensor->data<float>(platform::CPUPlace());

wozna · 2022-03-31T07:46:45Z

@baoachun In this branch https://github.com/wozna/Paddle/tree/mkldnn_int8_test in last commit I added tests for your changes. There is UT for scale calculation for functions from compute_propagate_scales_mkldnn_pass and model test but for now it is performance test only.

… mkldnn_int8

lidanqing-intel · 2022-03-31T09:54:52Z

paddle/fluid/inference/api/paddle_pass_builder.cc

+  if (iter == std::end(passes_)) return -1;
+  return std::distance(std::begin(passes_), iter);
+}
+


@Silv3S may consider adding such function

Thank you. This is exactly what I needed

baoachun · 2022-04-02T07:20:37Z

Hi @wozna, what is the test scope of the new single test you added? It takes a long time for me to execute the test_analyzer_quant2_mobilenetv1_mkldnn single test, and it will get stuck, and there is also GPU information. I see that there are 1775 test cases in this single test. Is there a problem?

… mkldnn_int8

lidanqing-intel · 2022-04-02T13:32:05Z

paddle/fluid/framework/ir/mkldnn/compute_propagate_scales_mkldnn_pass.h

+namespace framework {
+namespace ir {
+
+using StringPairMap = std::unordered_map<std::string, std::pair<bool, Tensor>>;


这里还是保持std::unordered_map<std::string, std::pair<bool, Tensor>>是吗?

什么意思？要改吗？

wozna · 2022-04-05T15:07:21Z

Hi @wozna, what is the test scope of the new single test you added? It takes a long time for me to execute the test_analyzer_quant2_mobilenetv1_mkldnn single test, and it will get stuck, and there is also GPU information. I see that there are 1775 test cases in this single test. Is there a problem?

@baoachun It looks like all tests were run. Did you use ctest -R test_analyzer_quant2_mobilenetv1_mkldnn -V ?

baoachun · 2022-04-06T02:08:52Z

Hi @wozna, what is the test scope of the new single test you added? It takes a long time for me to execute the test_analyzer_quant2_mobilenetv1_mkldnn single test, and it will get stuck, and there is also GPU information. I see that there are 1775 test cases in this single test. Is there a problem?

@baoachun It looks like all tests were run. Did you use ctest -R test_analyzer_quant2_mobilenetv1_mkldnn -V ?

Yes~

lidanqing-intel · 2022-04-07T07:05:46Z

paddle/fluid/framework/ir/mkldnn/compute_propagate_scales_mkldnn_pass.cc

+  }
+}
+
+void ComputePropagateScalesMkldnnPass::GetQuantInfo(


Hi Achun, there is GetQuantInfo function definition in both compute_propogate_scalse_mkldnn_pass and cpu_quantize_pass? We can consider unifying them in next PR since this PR almost pass all CIs

是的，需要通过graph把var_quant_scales传给cpu_quantize_pass

Paddle/python/paddle/fluid/contrib/slim/quantization/quant2_int8_mkldnn_pass.py

Line 667 in 73533b9

graph, 'cpu_quantize_pass', ['quant_var_scales', 'data_layout'],

XieYunshen

LGTM

chenwhql · 2022-04-08T10:09:25Z

paddle/fluid/framework/ir/mkldnn/compute_propagate_scales_mkldnn_pass.cc

+
+std::vector<float> ComputePropagateScalesMkldnnPass::GetScales(Tensor* tensor,
+                                                               int axis) const {
+  PADDLE_ENFORCE_LT(axis, 2, "The input axis is required to be less than 2.");


heer need to specify the error type, please see https://github.com/PaddlePaddle/Paddle/wiki/Paddle-Error-Message-Writing-Specification

chenwhql · 2022-04-08T10:10:26Z

paddle/fluid/framework/ir/mkldnn/compute_propagate_scales_mkldnn_pass.cc

+  PADDLE_ENFORCE_LT(axis, 2, "The input axis is required to be less than 2.");
+  auto* data = tensor->data<float>();
+  const auto dims = tensor->dims();
+  PADDLE_ENFORCE_EQ(dims.size(), 2,


chenwhql · 2022-04-08T10:10:38Z

paddle/fluid/framework/ir/mkldnn/compute_propagate_scales_mkldnn_pass.cc

+    if (ops.count(op_desc->Type())) {
+      auto var_name = op_desc->Input(weight_name)[0];
+      auto* var = scope->FindVar(var_name);
+      PADDLE_ENFORCE_NOT_NULL(


chenwhql · 2022-04-08T10:12:03Z

paddle/fluid/framework/ir/mkldnn/compute_propagate_scales_mkldnn_pass.cc

+    Scope* scope, const std::string& wx_var_name,
+    const std::string& wh_var_name, Tensor* tensor) const {
+  auto* wx_var = scope->FindVar(wx_var_name);
+  PADDLE_ENFORCE_NOT_NULL(wx_var, "The input persistable var %s is not found.",


same above, please refer to the result of approve CI check and modify it in turn

sfraczek

LGTM

add mkldnn quant_dequant pass and int8 config

f97739b

baoachun mentioned this pull request Dec 31, 2021

[Feature] Adaptation of the new quantization method for mkldnn. #37422

Closed

lidanqing-intel assigned wozna Feb 17, 2022

lidanqing-intel added the Intel label Feb 17, 2022

wozna reviewed Feb 22, 2022

View reviewed changes

wozna assigned baoachun and lidanqing-intel and unassigned wozna Feb 22, 2022

arlesniak reviewed Feb 22, 2022

View reviewed changes

baoachun added 7 commits March 2, 2022 05:13

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

e43c2b1

… mkldnn_int8

update code

26b923f

update code

746cedc

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

caf5cd5

… mkldnn_int8

update pass

927cc7b

update pass

aed24ad

update pass

7cf13bb

lidanqing-intel removed their assignment Mar 9, 2022

baoachun commented Mar 9, 2022

View reviewed changes

wozna reviewed Mar 10, 2022

View reviewed changes

baoachun added 3 commits March 21, 2022 08:26

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

ce4009f

… mkldnn_int8

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

5f30601

… mkldnn_int8

update code

70aa20f

baoachun added 3 commits March 21, 2022 13:35

update code

aee4319

update pass

75d5c60

update pass

2be5782

$sfraczek$

sfraczek reviewed Mar 30, 2022

View reviewed changes

baoachun added 2 commits March 31, 2022 09:01

update code

e063212

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

f321457

… mkldnn_int8

lidanqing-intel reviewed Mar 31, 2022

View reviewed changes

Silv3S mentioned this pull request Mar 31, 2022

FC + elementwise_add (Residual connection) #40834

Closed

baoachun added 4 commits April 2, 2022 07:37

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

66f14f3

… mkldnn_int8

update pass

477f076

update code

16d8647

update code

8cfa14a

lidanqing-intel reviewed Apr 2, 2022

View reviewed changes

update pass

b89c385

wozna and others added 3 commits April 7, 2022 02:43

Add test for compute_propagate_scales_mkldnn_pass

cb56f07

update pass

6846fc3

update pass

106b260

lidanqing-intel reviewed Apr 7, 2022

View reviewed changes

update timeout settings

6a9510c

raindrops2sea approved these changes Apr 8, 2022

View reviewed changes

XieYunshen approved these changes Apr 8, 2022

View reviewed changes

chenwhql reviewed Apr 8, 2022

View reviewed changes

This was referenced Apr 9, 2022

add mkldnn int8 pass [step1] #41579

Merged

add mkldnn int8 pass [step2] #41592

Merged

add mkldnn int8 pass [step3] #41599

Merged

$sfraczek$

sfraczek approved these changes Apr 11, 2022

View reviewed changes

baoachun closed this Apr 14, 2022

		@@ -0,0 +1,75 @@
		// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.

		@@ -0,0 +1,45 @@
		// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.

		if (fake_name.find(suffix) != std::string::npos) {
		size_t pos = fake_name.find(suffix);

	if (op_node->Name().find("quantize_dequantize") ==
	if (node_input_input->Name().find("quantize_dequantize") ==

	auto* data = iter->second.second.mutable_data<float>(platform::CPUPlace());
	auto* data = iter->second.second.data<float>(platform::CPUPlace());

	auto* weight_data = weight_tensor->mutable_data<float>(platform::CPUPlace());
	auto* weight_data = weight_tensor->data<float>(platform::CPUPlace());

add mkldnn int8 related passes and config #38643

add mkldnn int8 related passes and config #38643

Conversation

baoachun commented Dec 31, 2021 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Dec 31, 2021

paddle-bot-old bot commented Jan 10, 2022

lidanqing-intel commented Feb 17, 2022

wozna left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

baoachun commented Mar 21, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wozna commented Mar 31, 2022 • edited Loading

lidanqing-intel Mar 31, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

baoachun commented Apr 2, 2022

lidanqing-intel Apr 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wozna commented Apr 5, 2022

baoachun commented Apr 6, 2022

lidanqing-intel Apr 7, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

XieYunshen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfraczek left a comment

Choose a reason for hiding this comment

baoachun commented Dec 31, 2021 •

edited

Loading

wozna left a comment •

edited

Loading

wozna commented Mar 31, 2022 •

edited

Loading

lidanqing-intel Mar 31, 2022 •

edited

Loading

lidanqing-intel Apr 2, 2022 •

edited

Loading

lidanqing-intel Apr 7, 2022 •

edited

Loading

$@sfraczek$ sfraczek left a comment