Support Quantized Fully Connected by INT8 GEMM #12922

lihaofd · 2018-10-23T07:42:59Z

Description

In this PR, it created quantized fully connected op by using int8 gemm
@pengzhao-intel, @TaoLv , @ciyongch

Feature changes

New features

Support quantized fully connected op by using int8 gemm
Support int8 bias by using beta offset

Unit-test changes

Update testcase test_quantized_fc in tests/python/quantization/test_quantization.py
Check consistency with original mx.sym.FullyConnected implementation.

Checklist

Passed code style checking (make lint).
All changes have test coverage.
Code is well-documented.

pengzhao-intel · 2018-10-23T13:12:19Z

@reminisce @zheng-da could you help take a review?

roywei · 2018-10-23T20:38:15Z

@lihaofd Thanks for the contribution!
@mxnet-label-bot [pr-awaiting-review]

reminisce · 2018-10-26T16:37:49Z

src/operator/quantization/quantized_fully_connected-inl.h

+enum QuantilizedfcOpResource {kTempSpace};
+}
+
+struct QuantizedSumInitKernelWithBias {


Suggest move all the implementation to .cc file since it's only for CPU.

reminisce · 2018-10-26T16:38:35Z

src/operator/quantization/quantized_fully_connected.cc

+    *dispatch_mode = DispatchMode::kFComputeEx;
+  }
+  for (size_t i = 0; i < out_attrs->size(); i++)
+    (*out_attrs)[i] = kDefaultStorage;


Use STORAGE_TYPE_ASSIGN_CHECK.

Just delete this line.

reminisce · 2018-10-26T16:40:38Z

src/operator/quantization/quantized_fully_connected.cc

+  }
+  for (size_t i = 0; i < out_attrs->size(); i++)
+    (*out_attrs)[i] = kDefaultStorage;
+  return true;


What if in_attrs has unknown storage types? You need to

Check and assign stype to in_attrs as well.

return false if any stype is unknown in in_attrs and out_attrs.

Please consider using range for loops for readability.

I think @larroy meant to use:

for (auto &v : in_attrs) { // ... }

reminisce · 2018-10-29T18:45:51Z

tests/python/quantization/test_quantization.py

@@ -283,16 +283,16 @@ def check_quantized_fc(data_shape, num_hidden, no_bias, qdtype, flatten=True):
        fc_fp32_exe = fc_fp32.simple_bind(ctx=mx.current_context(), grad_req='null')
        if qdtype == 'uint8':
            data_low = 0.0
-            data_high = 127.0
+            data_high = 63.0


Any reason of changing this?

Change data range from (-127,127) to (-63, 63) to avoid potential overflow when using igemm in some hardware platform

reminisce · 2018-10-29T18:46:20Z

src/operator/quantization/quantized_fully_connected.cc

+  }
+
+  for (size_t i = 0; i < in_attrs->size(); i++) {
+    (*in_attrs)[i] = kDefaultStorage;


Same here, delete this line.

No blocking issues.

ankkhedia · 2018-10-29T21:54:41Z

@zheng-da Could you please take a look?

TaoLv · 2018-11-02T16:35:06Z

src/operator/quantization/quantized_fully_connected.cc

 #include "../nn/fully_connected-inl.h"

 namespace mxnet {
 namespace op {

+namespace quantized_fc {
+enum QuantilizedfcOpResource {kTempSpace};


'Quantized'

TaoLv · 2018-11-02T16:38:29Z

src/operator/quantization/quantized_fully_connected.cc

+    using mshadow::red::limits::MinValue;
+    using mshadow::red::limits::MaxValue;
+    float float_for_one_out_quant  =
+      MaxAbs(*min_out, *max_out) / static_cast<double>(MaxValue<T1>());


4 spaces as indent

TaoLv · 2018-11-02T16:39:41Z

src/operator/quantization/quantized_fully_connected.cc

+    }
+  }
+};
+template<typename SrcType>


need a blank line before this line

TaoLv · 2018-11-02T16:43:59Z

src/operator/quantization/quantized_fully_connected.cc

+  }
+};
+template<typename SrcType>
+void MKLDNNQuantizedFullyConnectedForward(const nnvm::NodeAttrs& attrs,


Chang to another function name? Since it doesn't call any MKL-DNN APIs.

kalyc · 2018-11-13T21:14:35Z

Thanks for your contribution @lihaofd
@pengzhao-intel, @TaoLv , @ciyongch requesting review

TaoLv · 2018-11-14T01:55:52Z

src/operator/quantization/quantized_fully_connected.cc

+
+template<typename SrcType>
+void QuantizedFullyConnectedForward(const nnvm::NodeAttrs& attrs,
+                                          const OpContext &ctx,


Fix indent.

TaoLv

LGTM. All of my comments are addressed.

stu1130 · 2018-11-21T20:31:08Z

@mxnet-label-bot update [pr-awaiting-merge]
@anirudh2290 could you take a look at this?

src/operator/quantization/quantized_fully_connected.cc

KellenSunderland · 2018-11-26T04:44:32Z

@reminisce @zheng-da Look ok to one of you?

sandeep-krishnamurthy · 2018-12-01T07:21:04Z

src/operator/quantization/quantized_fully_connected.cc

+                     n,
+                     &oc);
+#else
+  LOG(FATAL) << "s8u8s32 is only supported by MKL BLAS library";


Can this error message be made little bit more verbose for users? Like mentioning Quantized INT8.

sandeep-krishnamurthy · 2018-12-01T07:25:56Z

src/operator/quantization/quantized_fully_connected.cc

+      out[i] = bias[i] * float_for_one_bias_quant /
+          float_for_one_out_quant;
+    } else {
+      LOG(INFO) << "WARNING: QuantizedBiasAddKernel float_for_one_out_quant is 0 !";


Can we make this info more verbose and add more details?

TaoLv · 2018-12-08T15:04:40Z

@reminisce @sandeep-krishnamurthy @KellenSunderland Please check if your comments are addressed and then we can move forward. Thank you.

pengzhao-intel

LGTM.
@lihaofd could you rebase the code to the latest?
@TaoLv seems no other comments from the community.
Could you help merge this PR?

KellenSunderland · 2018-12-10T07:25:44Z

src/operator/quantization/quantized_fully_connected.cc

+      out[i] = bias[i] * float_for_one_bias_quant /
+          float_for_one_out_quant;
+    } else {
+      LOG(INFO) << "WARNING: float_for_one_out_quant is 0, need to check min/max data !";


NIT: Seems like a mix of INFO / WARNING usage here.

KellenSunderland · 2018-12-10T07:41:37Z

tests/python/quantization/test_quantization.py

@@ -270,7 +270,7 @@ def check_quantized_pooling(data_shape, kernel, pool_type, pad, stride, global_p
 def test_quantized_fc():
    def check_quantized_fc(data_shape, num_hidden, no_bias, qdtype, flatten=True):
        if mx.current_context().device_type != 'gpu':


We should be able to run this test on CPU in CI. Could we test to see if 'MKL' is in the env var 'BUILD_TAG' and run the test if it is.

@KellenSunderland good suggestion! Currently, the CI doesn't include Intel MKL library as BLAS library and @azai91 is working on adding it so that we can have a better coverage, such as batch_gemm, quantization FC, etc.

@pengzhao-intel Oh sorry, didn't realize that was the case. If the tests won't pass without full mkl installed and it's not there let's add this in a later PR.

@pengzhao-intel do you mean the full MKL? We already use MKLML on CI.

@lebeg yes, I mean full MKL. The MKLML doesn't have the INT8 GEMM now :)

KellenSunderland · 2018-12-10T09:05:36Z

If you want to reset to 1f98f63 and then put cf527e0 in a new PR I think this is ready to merge.

lihaofd · 2018-12-10T13:36:01Z

@KellenSunderland, @pengzhao-intel @TaoLv
Reset to 1f98f63 and will PR cf527e0 after 1f98f63 merge

sandeep-krishnamurthy · 2018-12-10T20:12:24Z

@reminisce @sandeep-krishnamurthy @KellenSunderland Please check if your comments are addressed and then we can move forward. Thank you.

Thanks for asking. Great changes. No blocking issues. However, reading through error messages, I don't think it was very easy to understand for users on what failed and actions like what should they do no.

KellenSunderland · 2018-12-10T22:19:10Z

@lihaofd Thanks for resetting. I'll try to monitor this PR closely and merge when you're ready. Ping me on the other PR as well and I'll try and help out there.

By the way: quite a few people on my team are looking forward to this PR. Thanks for the contribution and patience in the review.

xinyu-intel · 2018-12-12T01:21:32Z

@lihaofd please rebase code and trigger MKL ci.

TaoLv · 2018-12-12T01:32:24Z

Test case need be refined to make it can run into MKL BLAS.

sync to latest master code base

lihaofd · 2018-12-12T08:07:43Z

@KellenSunderland @TaoLv ,could you help to review and check if it can be merged? Thanks!

TaoLv · 2018-12-12T08:23:23Z

src/operator/quantization/quantized_fully_connected.cc

@@ -48,8 +54,9 @@ bool QuantizedFullyConnectedShape(const nnvm::NodeAttrs& attrs,
    SHAPE_ASSIGN_CHECK(*in_shape, 2, bshape);
  }

-  for (size_t i = num_inputs; i < 3 * num_inputs; ++i) {
-    SHAPE_ASSIGN_CHECK(*in_shape, i, TShape{1});


why not use SHAPE_ASSIGN_CHECK?

TaoLv · 2018-12-12T08:23:34Z

src/operator/quantization/quantized_fully_connected.cc

@@ -66,11 +73,12 @@ bool QuantizedFullyConnectedType(const nnvm::NodeAttrs& attrs,
  CHECK_EQ(in_type->size(), num_inputs * 3);
  CHECK_EQ(out_type->size(), 3U);

-  for (size_t i = 0; i < num_inputs; ++i) {
-    TYPE_ASSIGN_CHECK(*in_type, i, mshadow::kInt8);


why not use TYPE_ASSIGN_CHECK?

TaoLv · 2018-12-12T08:23:45Z

src/operator/quantization/quantized_fully_connected.cc

  }
-  for (size_t i = num_inputs; i < 3 * num_inputs; ++i) {
-    TYPE_ASSIGN_CHECK(*in_type, i, mshadow::kFloat32);


why not use TYPE_ASSIGN_CHECK?

TaoLv · 2018-12-15T05:34:57Z

@lihaofd Thank you for your contribution and patience. Now merging.

* add quantized fully connect support * disable qfc cpu case since s8u8s32 is only supported by MKL BLAS library * retrigger to ci testing * move implementation to cc file and add STORAGE_TYPE_ASSIGN_CHECK * fix typo bug * retrigger the ci test * fix typo bug * retrigger ci * retrigger the ci test * retrigger the ci * retrigger the ci test * retrigger ci test * fix indent issue * retrigger the ci * retrigger the ci test * add verbose message * update log message * using range for loop * using for auto range * enable MKL BLAS ci test * fix typo issue * use TYPE_ASSIGN_CHECK * retrigger the ci

add quantized fully connect support

49b189f

lihaofd requested a review from anirudh2290 as a code owner October 23, 2018 07:42

disable qfc cpu case since s8u8s32 is only supported by MKL BLAS library

a2bfef4

marcoabreu added the pr-awaiting-review PR is waiting for code review label Oct 23, 2018

retrigger to ci testing

91f1a9b

lihaofd changed the title ~~Support Quantized Fully Connected~~ Support Quantized Fully Connected by INT8 GEMM Oct 24, 2018

reminisce previously requested changes Oct 26, 2018

View reviewed changes

move implementation to cc file and add STORAGE_TYPE_ASSIGN_CHECK

b8e8257

reminisce reviewed Oct 29, 2018

View reviewed changes

Li, Hao H added 2 commits October 30, 2018 09:35

fix typo bug

471a2dc

retrigger the ci test

7b64226

TaoLv reviewed Nov 2, 2018

View reviewed changes

Li, Hao H added 6 commits November 3, 2018 13:17

fix typo bug

1dbc106

retrigger ci

babc764

retrigger the ci test

d365b64

retrigger the ci

1010deb

retrigger the ci test

818021d

retrigger ci test

b3df5a6

TaoLv reviewed Nov 14, 2018

View reviewed changes

fix indent issue

b3bf9f7

TaoLv approved these changes Nov 14, 2018

View reviewed changes

Li, Hao H added 2 commits November 14, 2018 12:09

retrigger the ci

e537fc1

retrigger the ci test

72b81d9

KellenSunderland reviewed Nov 23, 2018

View reviewed changes

src/operator/quantization/quantized_fully_connected.cc Show resolved Hide resolved

sandeep-krishnamurthy reviewed Dec 1, 2018

View reviewed changes

add verbose message

1f98f63

pengzhao-intel approved these changes Dec 10, 2018

View reviewed changes

KellenSunderland approved these changes Dec 10, 2018

View reviewed changes

KellenSunderland reviewed Dec 10, 2018

View reviewed changes

lihaofd force-pushed the quantized_fc branch from 0de4156 to 1f98f63 Compare December 10, 2018 13:34

Li, Hao H added 3 commits December 11, 2018 10:07

update log message

9171b1a

using range for loop

daf75e6

using for auto range

1ea0675

Li, Hao H added 3 commits December 12, 2018 10:09

enable MKL BLAS ci test

c87402e

Merge pull request #8 from apache/master

28bf1c3

sync to latest master code base

fix typo issue

88562b9

TaoLv reviewed Dec 12, 2018

View reviewed changes

Li, Hao H added 2 commits December 13, 2018 09:15

use TYPE_ASSIGN_CHECK

54ee001

retrigger the ci

d2dde15

TaoLv merged commit 1eb3344 into apache:master Dec 15, 2018

pengzhao-intel mentioned this pull request Dec 18, 2018

Error when set export MXNET_SUBGRAPH_BACKEND=MKLDNN #13671

Closed

Support Quantized Fully Connected by INT8 GEMM #12922

Support Quantized Fully Connected by INT8 GEMM #12922

Conversation

lihaofd commented Oct 23, 2018

Description

Feature changes

New features

Unit-test changes

Checklist

pengzhao-intel commented Oct 23, 2018

roywei commented Oct 23, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lihaofd Oct 30, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ankkhedia commented Oct 29, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kalyc commented Nov 13, 2018

Choose a reason for hiding this comment

TaoLv left a comment

Choose a reason for hiding this comment

stu1130 commented Nov 21, 2018

KellenSunderland commented Nov 26, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TaoLv commented Dec 8, 2018

pengzhao-intel left a comment • edited

Choose a reason for hiding this comment

KellenSunderland Dec 10, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KellenSunderland commented Dec 10, 2018

lihaofd commented Dec 10, 2018

sandeep-krishnamurthy commented Dec 10, 2018

KellenSunderland commented Dec 10, 2018 • edited

xinyu-intel commented Dec 12, 2018

TaoLv commented Dec 12, 2018

lihaofd commented Dec 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TaoLv commented Dec 15, 2018

lihaofd Oct 30, 2018 •

edited

pengzhao-intel left a comment •

edited

KellenSunderland Dec 10, 2018 •

edited

KellenSunderland commented Dec 10, 2018 •

edited