-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] add INT8 placement pass #15472
Conversation
This patch adds a mechanism for applying INT8 quantization strategy similar to MKL-DNN approach. test=develop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but push the pass code also. :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed earlier, I would suggest to maintain such INT8 kernel op list into calibration tool since it is better not to expose such information in non-INT8-MKLDNN-device model (e.g., FP32, GPU). Calibration tool should be the better place to handle/maintain such list.
@hshen14 , the method
Furthermore, the int8_placement_pass can be more sophisticated. Here it applies only the simplest strategy.
|
a12b17c
to
1872568
Compare
1872568
to
48d7390
Compare
@wojtuss I understood this is the 1st version so now it only implement the simplest "placement" strategy, i.e. set "use_int8 = true" for a OP if it's in the passed in "int8 op list". Similar as what we had done for mkldnn placement. However, it's not enough even as the 1st version. We have to be able to specify which instances of OP (e.g. 1st conv) should be included (or excluded) for quantization. Remember that the 1st conv Op can't be quanzied due to we (Xeon) only support s8u8 now and there is no RELU before the 1st conv OP. |
No, MKL-DNN provides all the capability of conv Op INT8, including first conv Op. |
Please also consider to make the design more general, e.g., to support FP16, BF16, or other mixed precision. Is it scalable to add other attributes and maintain different separate pass? Probably we can refer to FP16 training in GPU already done in Paddle. Any example to specify such information? @luotao1 @panyx0718 Thanks. |
@@ -32,6 +32,7 @@ cc_library(fuse_pass_base SRCS fuse_pass_base.cc DEPS pass) | |||
pass_library(graph_to_program_pass base) | |||
pass_library(graph_viz_pass base) | |||
pass_library(lock_free_optimize_pass base) | |||
pass_library(int8_placement_pass base) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to mkldnn_int8_placement_pass ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally don't suggest to name it as "mkldnn_int8" considering we may use various mechanism (int8 mkl, int8 mkldnn, etc.) to implement int8 kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Superjomn , INT8 optimization is independent of MKL-DNN, we would like to keep them separately. There are operators which we are going to quantize without MKL-DNN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wojtuss @Superjomn @jianhang-liu How about cpu_int8_placement_pass
?
@@ -123,6 +123,10 @@ struct Argument { | |||
DECL_ARGUMENT_FIELD(mkldnn_enabled_op_types, MKLDNNEnabledOpTypes, | |||
std::unordered_set<std::string>); | |||
|
|||
// Pass a set of op types to enable its int8 kernel | |||
DECL_ARGUMENT_FIELD(int8_enabled_op_types, INT8EnabledOpTypes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mkldnn_int8 ?
There might be GPU int8 latter.
@@ -151,6 +151,13 @@ struct AnalysisConfig { | |||
*/ | |||
bool mkldnn_enabled() const { return use_mkldnn_; } | |||
|
|||
/** Turn on INT8. | |||
*/ | |||
void EnableINT8(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EnableMkldnnInt8 ?
/** Specify the operator type list to use INT8 kernel. | ||
* @param op_list the operator type list. | ||
*/ | ||
void SetINT8Op(std::unordered_set<std::string> op_list) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MkldnnInt8 ?
Get<std::unordered_set<std::string>>("int8_enabled_op_types"); | ||
for (const Node* n : graph->Nodes()) { | ||
if (n->IsOp()) { | ||
auto* op = n->Op(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an enforce here.
PADDLE_ENFORCE_NOT_NULL(n->Op())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When can op be null?
@jianhang-liu |
Brian mentioned the op instance in the meeting and Baidu suggested to use sorted topology in a vector, which can probably meet the requirement. |
@hshen14 what do you mean by a sorted topology? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG
This PR is updated and included in the quantization core: #15987 |
Closed, as became a part of the INT8v2 core: #15987 |
This is a request for comments.
There is a need for applying INT8 quantization strategy in PaddlePaddle models. This patch adds a mechanism for applying INT8 quantization strategy similar to MKL-DNN approach.
A
use_int8
op attribute andint8_placement_pass
in the simplest form (similar to theuse_mkldnn
attribute andmkldnn_placement_pass
) are added. The pass will allow a user to choose which operators should be quantized and use INT8 kernels.We envisage more passes for INT8 optimization that would utilize the
use_int8
attribute.test=develop