[RFC] add INT8 placement pass #15472

wojtuss · 2019-01-22T09:59:46Z

This is a request for comments.

There is a need for applying INT8 quantization strategy in PaddlePaddle models. This patch adds a mechanism for applying INT8 quantization strategy similar to MKL-DNN approach.

A use_int8 op attribute and int8_placement_pass in the simplest form (similar to the use_mkldnn attribute and mkldnn_placement_pass) are added. The pass will allow a user to choose which operators should be quantized and use INT8 kernels.

We envisage more passes for INT8 optimization that would utilize the use_int8 attribute.

test=develop

This patch adds a mechanism for applying INT8 quantization strategy similar to MKL-DNN approach. test=develop

sfraczek

LGTM, but push the pass code also. :D

hshen14

As discussed earlier, I would suggest to maintain such INT8 kernel op list into calibration tool since it is better not to expose such information in non-INT8-MKLDNN-device model (e.g., FP32, GPU). Calibration tool should be the better place to handle/maintain such list.

wojtuss · 2019-01-22T14:06:03Z

As discussed earlier, I would suggest to maintain such INT8 kernel op list into calibration tool since it is better not to expose such information in non-INT8-MKLDNN-device model (e.g., FP32, GPU). Calibration tool should be the better place to handle/maintain such list.

@hshen14 , the method SetINT8Op() (https://github.com/PaddlePaddle/Paddle/blob/e0fb2578a53de189fd9c675850c00cc80b657cd4/paddle/fluid/inference/api/paddle_analysis_config.h#L193) can be used in the Calibration tool to choose operators for quantization (cf. the SetMKLDNNOp() usage in

Paddle/paddle/fluid/inference/tests/api/analyzer_dam_tester.cc

Line 198 in 5a8bd82

cfg.SetMKLDNNOp(op_list);

)
Furthermore, the int8_placement_pass can be more sophisticated. Here it applies only the simplest strategy.

paddle/fluid/framework/ir/int8_placement_pass.cc

jianhang-liu · 2019-01-23T02:17:49Z

@wojtuss I understood this is the 1st version so now it only implement the simplest "placement" strategy, i.e. set "use_int8 = true" for a OP if it's in the passed in "int8 op list". Similar as what we had done for mkldnn placement. However, it's not enough even as the 1st version. We have to be able to specify which instances of OP (e.g. 1st conv) should be included (or excluded) for quantization. Remember that the 1st conv Op can't be quanzied due to we (Xeon) only support s8u8 now and there is no RELU before the 1st conv OP.

hshen14 · 2019-01-23T05:48:53Z

@wojtuss I understood this is the 1st version so now it only implement the simplest "placement" strategy, i.e. set "use_int8 = true" for a OP if it's in the passed in "int8 op list". Similar as what we had done for mkldnn placement. However, it's not enough even as the 1st version. We have to be able to specify which instances of OP (e.g. 1st conv) should be included (or excluded) for quantization. Remember that the 1st conv Op can't be quanzied due to we (Xeon) only support s8u8 now and there is no RELU before the 1st conv OP.

No, MKL-DNN provides all the capability of conv Op INT8, including first conv Op.

hshen14 · 2019-01-23T06:10:32Z

This is a request for comments.

There is a need for applying INT8 quantization strategy in PaddlePaddle models. This patch adds a mechanism for applying INT8 quantization strategy similar to MKL-DNN approach.

A use_int8 op attribute and int8_placement_pass in the simplest form (similar to the use_mkldnn attribute and mkldnn_placement_pass) are added. The pass will allow a user to choose which operators should be quantized and use INT8 kernels.

We envisage more passes for INT8 optimization that would utilize the use_int8 attribute.

test=develop

Please also consider to make the design more general, e.g., to support FP16, BF16, or other mixed precision. Is it scalable to add other attributes and maintain different separate pass? Probably we can refer to FP16 training in GPU already done in Paddle. Any example to specify such information? @luotao1 @panyx0718 Thanks.

Superjomn · 2019-01-23T06:12:15Z

paddle/fluid/framework/ir/CMakeLists.txt

@@ -32,6 +32,7 @@ cc_library(fuse_pass_base SRCS fuse_pass_base.cc DEPS pass)
 pass_library(graph_to_program_pass base)
 pass_library(graph_viz_pass base)
 pass_library(lock_free_optimize_pass base)
+pass_library(int8_placement_pass base)


to mkldnn_int8_placement_pass ?

I personally don't suggest to name it as "mkldnn_int8" considering we may use various mechanism (int8 mkl, int8 mkldnn, etc.) to implement int8 kernel.

@Superjomn , INT8 optimization is independent of MKL-DNN, we would like to keep them separately. There are operators which we are going to quantize without MKL-DNN.

@wojtuss @Superjomn @jianhang-liu How about cpu_int8_placement_pass?

Superjomn · 2019-01-23T06:13:04Z

paddle/fluid/inference/analysis/argument.h

@@ -123,6 +123,10 @@ struct Argument {
  DECL_ARGUMENT_FIELD(mkldnn_enabled_op_types, MKLDNNEnabledOpTypes,
                      std::unordered_set<std::string>);

+  // Pass a set of op types to enable its int8 kernel
+  DECL_ARGUMENT_FIELD(int8_enabled_op_types, INT8EnabledOpTypes,


mkldnn_int8 ?

There might be GPU int8 latter.

Superjomn · 2019-01-23T06:13:51Z

paddle/fluid/inference/api/paddle_analysis_config.h

@@ -151,6 +151,13 @@ struct AnalysisConfig {
   */
  bool mkldnn_enabled() const { return use_mkldnn_; }

+  /** Turn on INT8.
+   */
+  void EnableINT8();


EnableMkldnnInt8 ?

Superjomn · 2019-01-23T06:14:11Z

paddle/fluid/inference/api/paddle_analysis_config.h

+  /** Specify the operator type list to use INT8 kernel.
+   * @param op_list the operator type list.
+   */
+  void SetINT8Op(std::unordered_set<std::string> op_list) {


MkldnnInt8 ?

Superjomn · 2019-01-23T06:15:17Z

paddle/fluid/framework/ir/int8_placement_pass.cc

+      Get<std::unordered_set<std::string>>("int8_enabled_op_types");
+  for (const Node* n : graph->Nodes()) {
+    if (n->IsOp()) {
+      auto* op = n->Op();


Add an enforce here.

PADDLE_ENFORCE_NOT_NULL(n->Op())

When can op be null?

wojtuss · 2019-01-23T07:22:23Z

@wojtuss I understood this is the 1st version so now it only implement the simplest "placement" strategy, i.e. set "use_int8 = true" for a OP if it's in the passed in "int8 op list". Similar as what we had done for mkldnn placement. However, it's not enough even as the 1st version. We have to be able to specify which instances of OP (e.g. 1st conv) should be included (or excluded) for quantization. Remember that the 1st conv Op can't be quanzied due to we (Xeon) only support s8u8 now and there is no RELU before the 1st conv OP.

@jianhang-liu
As @hshen14 wrote, quantization of all the convolutions (including the first one) can be supported.
With a pass like the int8_placement_pass and using GraphPatternDetector you can very precisely pick operators for quantization according to a given pattern (eg. look for a sequence of more than one conv ops and start quantization from the second one). This is how it works for any other fuse pass.

hshen14 · 2019-01-23T08:21:23Z

@wojtuss I understood this is the 1st version so now it only implement the simplest "placement" strategy, i.e. set "use_int8 = true" for a OP if it's in the passed in "int8 op list". Similar as what we had done for mkldnn placement. However, it's not enough even as the 1st version. We have to be able to specify which instances of OP (e.g. 1st conv) should be included (or excluded) for quantization. Remember that the 1st conv Op can't be quanzied due to we (Xeon) only support s8u8 now and there is no RELU before the 1st conv OP.

@jianhang-liu
As @hshen14 wrote, quantization of all the convolutions (including the first one) can be supported.
With a pass like the int8_placement_pass and using GraphPatternDetector you can very precisely pick operators for quantization according to a given pattern (eg. look for a sequence of more than one conv ops and start quantization from the second one). This is how it works for any other fuse pass.

Brian mentioned the op instance in the meeting and Baidu suggested to use sorted topology in a vector, which can probably meet the requirement.

wojtuss · 2019-01-23T12:42:06Z

Brian mentioned the op instance in the meeting and Baidu suggested to use sorted topology in a vector, which can probably meet the requirement.

@hshen14 what do you mean by a sorted topology?

panyx0718

LG

wojtuss · 2019-02-28T22:43:29Z

This PR is updated and included in the quantization core: #15987

wojtuss · 2019-03-14T07:27:16Z

Closed, as became a part of the INT8v2 core: #15987

add INT8 placement pass

48d7390

This patch adds a mechanism for applying INT8 quantization strategy similar to MKL-DNN approach. test=develop

wojtuss added the Intel label Jan 22, 2019

wojtuss requested review from Superjomn, luotao1, sfraczek and kbinias January 22, 2019 09:59

wojtuss added the int8 label Jan 22, 2019

$sfraczek$

sfraczek reviewed Jan 22, 2019

View reviewed changes

hshen14 reviewed Jan 22, 2019

View reviewed changes

wojtuss force-pushed the wojtuss/int8-placement branch 2 times, most recently from a12b17c to 1872568 Compare January 22, 2019 14:26

$sfraczek$

sfraczek reviewed Jan 22, 2019

View reviewed changes

paddle/fluid/framework/ir/int8_placement_pass.cc Outdated Show resolved Hide resolved

wojtuss force-pushed the wojtuss/int8-placement branch from 1872568 to 48d7390 Compare January 22, 2019 14:30

Superjomn reviewed Jan 23, 2019

View reviewed changes

luotao1 requested a review from wzzju January 24, 2019 03:44

panyx0718 approved these changes Jan 25, 2019

View reviewed changes

wojtuss mentioned this pull request Feb 28, 2019

C-API quantization core #15987

Closed

wojtuss closed this Feb 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] add INT8 placement pass #15472

[RFC] add INT8 placement pass #15472

wojtuss commented Jan 22, 2019 •

edited

$@sfraczek$ sfraczek left a comment •

edited

hshen14 left a comment •

edited

wojtuss commented Jan 22, 2019

jianhang-liu commented Jan 23, 2019

hshen14 commented Jan 23, 2019

hshen14 commented Jan 23, 2019 •

edited

Superjomn Jan 23, 2019

jianhang-liu Jan 23, 2019

wojtuss Jan 23, 2019 •

edited

luotao1 Jan 23, 2019

Superjomn Jan 23, 2019

Superjomn Jan 23, 2019

Superjomn Jan 23, 2019

Superjomn Jan 23, 2019

$@sfraczek$ sfraczek Jan 23, 2019

wojtuss commented Jan 23, 2019

hshen14 commented Jan 23, 2019

wojtuss commented Jan 23, 2019

panyx0718 left a comment

wojtuss commented Feb 28, 2019

wojtuss commented Mar 14, 2019

[RFC] add INT8 placement pass #15472

[RFC] add INT8 placement pass #15472

Conversation

wojtuss commented Jan 22, 2019 • edited

sfraczek left a comment • edited

Choose a reason for hiding this comment

hshen14 left a comment • edited

Choose a reason for hiding this comment

wojtuss commented Jan 22, 2019

jianhang-liu commented Jan 23, 2019

hshen14 commented Jan 23, 2019

hshen14 commented Jan 23, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wojtuss Jan 23, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wojtuss commented Jan 23, 2019

hshen14 commented Jan 23, 2019

wojtuss commented Jan 23, 2019

panyx0718 left a comment

Choose a reason for hiding this comment

wojtuss commented Feb 28, 2019

wojtuss commented Mar 14, 2019

wojtuss commented Jan 22, 2019 •

edited

$@sfraczek$ sfraczek left a comment •

edited

hshen14 left a comment •

edited

hshen14 commented Jan 23, 2019 •

edited

wojtuss Jan 23, 2019 •

edited