Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] add INT8 placement pass #15472

Closed
wants to merge 1 commit into from

Conversation

wojtuss
Copy link

@wojtuss wojtuss commented Jan 22, 2019

This is a request for comments.

There is a need for applying INT8 quantization strategy in PaddlePaddle models. This patch adds a mechanism for applying INT8 quantization strategy similar to MKL-DNN approach.

A use_int8 op attribute and int8_placement_pass in the simplest form (similar to the use_mkldnn attribute and mkldnn_placement_pass) are added. The pass will allow a user to choose which operators should be quantized and use INT8 kernels.

We envisage more passes for INT8 optimization that would utilize the use_int8 attribute.

test=develop

This patch adds a mechanism for applying INT8 quantization strategy
similar to MKL-DNN approach.

test=develop
Copy link
Contributor

@sfraczek sfraczek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but push the pass code also. :D

Copy link
Contributor

@hshen14 hshen14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed earlier, I would suggest to maintain such INT8 kernel op list into calibration tool since it is better not to expose such information in non-INT8-MKLDNN-device model (e.g., FP32, GPU). Calibration tool should be the better place to handle/maintain such list.

@wojtuss
Copy link
Author

wojtuss commented Jan 22, 2019

As discussed earlier, I would suggest to maintain such INT8 kernel op list into calibration tool since it is better not to expose such information in non-INT8-MKLDNN-device model (e.g., FP32, GPU). Calibration tool should be the better place to handle/maintain such list.

@hshen14 , the method SetINT8Op() (https://github.com/PaddlePaddle/Paddle/blob/e0fb2578a53de189fd9c675850c00cc80b657cd4/paddle/fluid/inference/api/paddle_analysis_config.h#L193) can be used in the Calibration tool to choose operators for quantization (cf. the SetMKLDNNOp() usage in

)
Furthermore, the int8_placement_pass can be more sophisticated. Here it applies only the simplest strategy.

@wojtuss wojtuss force-pushed the wojtuss/int8-placement branch 2 times, most recently from a12b17c to 1872568 Compare January 22, 2019 14:26
@jianhang-liu
Copy link
Contributor

@wojtuss I understood this is the 1st version so now it only implement the simplest "placement" strategy, i.e. set "use_int8 = true" for a OP if it's in the passed in "int8 op list". Similar as what we had done for mkldnn placement. However, it's not enough even as the 1st version. We have to be able to specify which instances of OP (e.g. 1st conv) should be included (or excluded) for quantization. Remember that the 1st conv Op can't be quanzied due to we (Xeon) only support s8u8 now and there is no RELU before the 1st conv OP.

@hshen14
Copy link
Contributor

hshen14 commented Jan 23, 2019

@wojtuss I understood this is the 1st version so now it only implement the simplest "placement" strategy, i.e. set "use_int8 = true" for a OP if it's in the passed in "int8 op list". Similar as what we had done for mkldnn placement. However, it's not enough even as the 1st version. We have to be able to specify which instances of OP (e.g. 1st conv) should be included (or excluded) for quantization. Remember that the 1st conv Op can't be quanzied due to we (Xeon) only support s8u8 now and there is no RELU before the 1st conv OP.

No, MKL-DNN provides all the capability of conv Op INT8, including first conv Op.

@hshen14
Copy link
Contributor

hshen14 commented Jan 23, 2019

This is a request for comments.

There is a need for applying INT8 quantization strategy in PaddlePaddle models. This patch adds a mechanism for applying INT8 quantization strategy similar to MKL-DNN approach.

A use_int8 op attribute and int8_placement_pass in the simplest form (similar to the use_mkldnn attribute and mkldnn_placement_pass) are added. The pass will allow a user to choose which operators should be quantized and use INT8 kernels.

We envisage more passes for INT8 optimization that would utilize the use_int8 attribute.

test=develop

Please also consider to make the design more general, e.g., to support FP16, BF16, or other mixed precision. Is it scalable to add other attributes and maintain different separate pass? Probably we can refer to FP16 training in GPU already done in Paddle. Any example to specify such information? @luotao1 @panyx0718 Thanks.

@@ -32,6 +32,7 @@ cc_library(fuse_pass_base SRCS fuse_pass_base.cc DEPS pass)
pass_library(graph_to_program_pass base)
pass_library(graph_viz_pass base)
pass_library(lock_free_optimize_pass base)
pass_library(int8_placement_pass base)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to mkldnn_int8_placement_pass ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't suggest to name it as "mkldnn_int8" considering we may use various mechanism (int8 mkl, int8 mkldnn, etc.) to implement int8 kernel.

Copy link
Author

@wojtuss wojtuss Jan 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Superjomn , INT8 optimization is independent of MKL-DNN, we would like to keep them separately. There are operators which we are going to quantize without MKL-DNN.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wojtuss @Superjomn @jianhang-liu How about cpu_int8_placement_pass?

@@ -123,6 +123,10 @@ struct Argument {
DECL_ARGUMENT_FIELD(mkldnn_enabled_op_types, MKLDNNEnabledOpTypes,
std::unordered_set<std::string>);

// Pass a set of op types to enable its int8 kernel
DECL_ARGUMENT_FIELD(int8_enabled_op_types, INT8EnabledOpTypes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mkldnn_int8 ?

There might be GPU int8 latter.

@@ -151,6 +151,13 @@ struct AnalysisConfig {
*/
bool mkldnn_enabled() const { return use_mkldnn_; }

/** Turn on INT8.
*/
void EnableINT8();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnableMkldnnInt8 ?

/** Specify the operator type list to use INT8 kernel.
* @param op_list the operator type list.
*/
void SetINT8Op(std::unordered_set<std::string> op_list) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MkldnnInt8 ?

Get<std::unordered_set<std::string>>("int8_enabled_op_types");
for (const Node* n : graph->Nodes()) {
if (n->IsOp()) {
auto* op = n->Op();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an enforce here.

PADDLE_ENFORCE_NOT_NULL(n->Op())

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When can op be null?

@wojtuss
Copy link
Author

wojtuss commented Jan 23, 2019

@wojtuss I understood this is the 1st version so now it only implement the simplest "placement" strategy, i.e. set "use_int8 = true" for a OP if it's in the passed in "int8 op list". Similar as what we had done for mkldnn placement. However, it's not enough even as the 1st version. We have to be able to specify which instances of OP (e.g. 1st conv) should be included (or excluded) for quantization. Remember that the 1st conv Op can't be quanzied due to we (Xeon) only support s8u8 now and there is no RELU before the 1st conv OP.

@jianhang-liu
As @hshen14 wrote, quantization of all the convolutions (including the first one) can be supported.
With a pass like the int8_placement_pass and using GraphPatternDetector you can very precisely pick operators for quantization according to a given pattern (eg. look for a sequence of more than one conv ops and start quantization from the second one). This is how it works for any other fuse pass.

@hshen14
Copy link
Contributor

hshen14 commented Jan 23, 2019

@wojtuss I understood this is the 1st version so now it only implement the simplest "placement" strategy, i.e. set "use_int8 = true" for a OP if it's in the passed in "int8 op list". Similar as what we had done for mkldnn placement. However, it's not enough even as the 1st version. We have to be able to specify which instances of OP (e.g. 1st conv) should be included (or excluded) for quantization. Remember that the 1st conv Op can't be quanzied due to we (Xeon) only support s8u8 now and there is no RELU before the 1st conv OP.

@jianhang-liu
As @hshen14 wrote, quantization of all the convolutions (including the first one) can be supported.
With a pass like the int8_placement_pass and using GraphPatternDetector you can very precisely pick operators for quantization according to a given pattern (eg. look for a sequence of more than one conv ops and start quantization from the second one). This is how it works for any other fuse pass.

Brian mentioned the op instance in the meeting and Baidu suggested to use sorted topology in a vector, which can probably meet the requirement.

@wojtuss
Copy link
Author

wojtuss commented Jan 23, 2019

Brian mentioned the op instance in the meeting and Baidu suggested to use sorted topology in a vector, which can probably meet the requirement.

@hshen14 what do you mean by a sorted topology?

@luotao1 luotao1 requested a review from wzzju January 24, 2019 03:44
Copy link
Contributor

@panyx0718 panyx0718 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

@wojtuss
Copy link
Author

wojtuss commented Feb 28, 2019

This PR is updated and included in the quantization core: #15987

@wojtuss wojtuss closed this Feb 28, 2019
@wojtuss
Copy link
Author

wojtuss commented Mar 14, 2019

Closed, as became a part of the INT8v2 core: #15987

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants