bnorm+relu fuse for mkldnn (inference) #11434

pzelazko-intel · 2018-06-13T08:45:37Z

I've added batch norm + relu fuse case to inference_transpiler.
In next step, I'm going to create training transpiler doing same operation.

tensor-tang · 2018-06-19T04:02:02Z

python/paddle/fluid/transpiler/inference_transpiler.py

@@ -21,13 +22,13 @@
 class InferenceTranspiler:
    def transpile(self, program, place, scope=None):
        '''
-        Transpile the program. Support only fuse batch normalization now.
+        Transpile the program. Support only batch normalization and relu fuse now.


You mean batch norm and relu fuse in MKLDNN,

But the plain batch norm is fused here.

tensor-tang · 2018-06-19T04:04:03Z

python/paddle/fluid/transpiler/inference_transpiler.py

+                    self.block.remove_op(i + 1)
+            i = i + 1
+
+        # TODO(luotao): use clone() method to flush the program.desc in force,


@luotao1 please help review below.

luotao1 · 2018-06-19T09:30:46Z

python/paddle/fluid/layers/nn.py

-    **Gather Layer**
-
-    Output is obtained by gathering entries of the outer-most dimension 
+    Output is obtained by gathering entries of the outer-most dimension


Seem that there is some diff in the annotation of nn.py, do you modify this file? If not, you could remain it.

Seems like I have unneceessary deleted "Gather Layer" - I will restore it.

luotao1 · 2018-06-19T09:38:51Z

benchmark/fluid/fluid_benchmark.py

@@ -131,6 +131,10 @@ def train(avg_loss, infer_prog, optimizer, train_reader, test_reader, batch_acc,
    exe = fluid.Executor(place)
    exe.run(startup_prog)

+    # Use inference_transpiler to speedup
+    t = fluid.InferenceTranspiler()
+    t.transpile(infer_prog, place)


It's not suitable to add transpiler function here, since its a train benchmark. @typhoonzero Do you have some better suggestion?

I could move it higher to main. However, InferenceTranspiler requires place as a parameter and extracting it happens in train. So the drawback would be that I would have to extract place again. Or I could add 'place' parameter to train and extract it in `main.

The benchmark of inference is on the original model now. Thus, how about remove line 134-136, or add an option to control it?

luotao1 · 2018-06-19T09:41:47Z

python/paddle/fluid/transpiler/inference_transpiler.py

+        if not use_mkldnn:
+            self.fuse_batch_norm(program, place, scope)
+        else:
+            self.fuse_relu(program)


fuse_batch_norm is also suitable for MKLDNN.

After fuse_batch_norm, fuse_relu is conv+relu, do you mean that?

If fuse_batch_norm is suitable for MKLDNN, then I'll leave it without checking use_mkldnn flag.

fuse_relu deletes relu from "batch norm + relu" pair.

Since we have fuse_batch_norm already, there is no batch_norm op in inference program. Thus, what's the usage of fuse_relu?

luotao1 · 2018-06-19T11:51:51Z

python/paddle/fluid/transpiler/inference_transpiler.py

+        if not use_mkldnn:
+            self.fuse_batch_norm(program, place, scope)
+        else:
+            self.fuse_relu(program)


Since we have fuse_batch_norm already, there is no batch_norm op in inference program. Thus, what's the usage of fuse_relu?

luotao1 · 2018-06-19T11:54:47Z

paddle/fluid/operators/batch_norm_mkldnn_op.cc

@@ -80,6 +80,7 @@ class BatchNormMKLDNNOpKernel : public paddle::framework::OpKernel<T> {
    const float epsilon = ctx.Attr<float>("epsilon");
    const float momentum = ctx.Attr<float>("momentum");
    const bool is_test = ctx.Attr<bool>("is_test");
+    const bool fuse_with_relu = ctx.Attr<bool>("fuse_with_relu");


why should we add fuse_with_relu attribute here?

t = fluid.InferenceTranspiler() t.transpile(infer_prog, place)

is enough. Do you mean there is mkldnn::fuse_bn_relu function in MKLDNN?

mkldnn::fuse_bn_relu is a flag for MKLDNN batch norm telling it to execute relu along with batch norm.

If we execute fuse_batch_norm always (no not ise_mkldnn if), then fuse_relu makes sense only in case where there is no conv before batch norm. I don't know if such case in fact ever exists.

If no, then I can skip this PR and create similiar one for the training, which I have already completed.

where there is no conv before batch norm

For DenseNet https://github.com/liuzhuang13/DenseNet, there is BN+Relu+Conv, thus, fuse_relu is useful in this case.

I can skip this PR and create similiar one for the training

fuse_relu for training is needed as well.

So I assume this PR is OK. After it's merged, I'll create PR for training transpiler.

luotao1 · 2018-06-22T11:12:07Z

benchmark/fluid/fluid_benchmark.py

@@ -131,6 +131,10 @@ def train(avg_loss, infer_prog, optimizer, train_reader, test_reader, batch_acc,
    exe = fluid.Executor(place)
    exe.run(startup_prog)

+    # Use inference_transpiler to speedup
+    t = fluid.InferenceTranspiler()
+    t.transpile(infer_prog, place)


The benchmark of inference is on the original model now. Thus, how about remove line 134-136, or add an option to control it?

luotao1 · 2018-06-22T11:20:46Z

python/paddle/fluid/transpiler/inference_transpiler.py

+                    current_op.set_attr("fuse_with_relu", True)
+                    # remove relu OP
+                    self.block.remove_op(i + 1)
+            i = i + 1


could you give some unit test to validate the accuracy of fuse_with_relu?

could you call self._remove_unused_var , since remove_op will not remove variables

luotao1 · 2018-06-22T11:34:15Z

python/paddle/fluid/transpiler/inference_transpiler.py

-    There are several optimizations, only fuse batch normalization is supported now.
+    Convert the fluid program to optimized inference program.
+
+    There are several optimizations.


There are several optimizations:

fuse convolution and batch normalization

fuse batch normalization and relu (MKLDNN only)

luotao1 · 2018-06-22T11:35:04Z

python/paddle/fluid/transpiler/inference_transpiler.py

+        Transpile the program by fused relu activation for MKLDNN program.
+
+        Relu activation following batch norm OP can be fused by adding
+        'fuse_with_relu' attribute to batch norm OP.


'fuse_with_relu' -> :math:fuse_with_relu

luotao1 · 2018-06-22T11:37:07Z

python/paddle/fluid/transpiler/inference_transpiler.py

+            - before:
+                - batch_norm->relu->any_other_op
+            - after:
+                - batch_norm->any_other_op


The format of line 69-73 is not correct. You can use https://github.com/PaddlePaddle/FluidDoc to see the generated html. And paste the generated picture here like #11521.

If you have any question about how to generate the API reference, please feel open to ask me.

luotao1 · 2018-06-22T11:39:31Z

python/paddle/fluid/transpiler/inference_transpiler.py

+        if use_mkldnn:
+            self.fuse_relu(program)
+
+    def fuse_relu(self, program):


how about the function name of fuse_relu_mkldnn?

you may add the check of FLAGS_use_mkldnn=True in this function, thus, other people will not use it in plain CPU.

luotao1 · 2018-06-27T09:31:22Z

python/paddle/fluid/transpiler/inference_transpiler.py


        i = 0
-        while i < len(self.block.ops):
+        while i < len(self.block.ops) - 2:


why do you change line 159?

Because in lines 174 and 183 we access i+2 element.

luotao1

LGTM

pzelazko-intel requested review from luotao1 and tensor-tang June 13, 2018 08:45

pzelazko-intel force-pushed the pzelazko/bnorm-relu-fuse branch 2 times, most recently from 7080d1a to bfc74df Compare June 13, 2018 11:11

pzelazko-intel added the Intel label Jun 13, 2018

pzelazko-intel force-pushed the pzelazko/bnorm-relu-fuse branch 3 times, most recently from 894c34f to 3063cdc Compare June 18, 2018 11:33

luotao1 added this to Doing in Intel Optimization on Fluid Jun 19, 2018

tensor-tang reviewed Jun 19, 2018

View reviewed changes

luotao1 reviewed Jun 19, 2018

View reviewed changes

pzelazko-intel force-pushed the pzelazko/bnorm-relu-fuse branch from 2ef33af to c2a8d2c Compare June 20, 2018 10:59

luotao1 reviewed Jun 22, 2018

View reviewed changes

pzelazko-intel added 7 commits June 25, 2018 12:34

bnorm+relu fuse for mkldnn

ea72e10

separate fuse_relu function

4e10b2f

bug fix

90e03c4

proper while range in inference_transpiler

c3058c6

description fix

a178eff

review fix

7e4fa30

review fix

114a23d

pzelazko-intel force-pushed the pzelazko/bnorm-relu-fuse branch 3 times, most recently from 07398ef to 67d1640 Compare June 26, 2018 08:42

unit test for fwd batch norm+relu MKLDNN fuse

610dc19

pzelazko-intel force-pushed the pzelazko/bnorm-relu-fuse branch from 67d1640 to 610dc19 Compare June 26, 2018 09:39

luotao1 reviewed Jun 27, 2018

View reviewed changes

luotao1 approved these changes Jun 27, 2018

View reviewed changes

luotao1 merged commit 9a15c92 into PaddlePaddle:develop Jun 27, 2018

pzelazko-intel mentioned this pull request Jun 29, 2018

bnorm+relu MKLDNN fuse for training #11849

Closed

luotao1 moved this from Doing to Done in Intel Optimization on Fluid Jun 29, 2018

luotao1 mentioned this pull request Nov 4, 2019

batch_norm, fuse_with_relu, 与act参数互斥，不能用户同时使用 #21003

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bnorm+relu fuse for mkldnn (inference) #11434

bnorm+relu fuse for mkldnn (inference) #11434

pzelazko-intel commented Jun 13, 2018

tensor-tang Jun 19, 2018 •

edited

Loading

tensor-tang Jun 19, 2018

luotao1 Jun 19, 2018

pzelazko-intel Jun 19, 2018

luotao1 Jun 19, 2018

pzelazko-intel Jun 19, 2018

luotao1 Jun 22, 2018

luotao1 Jun 19, 2018

pzelazko-intel Jun 19, 2018

luotao1 Jun 19, 2018

luotao1 Jun 19, 2018

luotao1 Jun 19, 2018

pzelazko-intel Jun 19, 2018

pzelazko-intel Jun 19, 2018

luotao1 Jun 20, 2018

pzelazko-intel Jun 20, 2018

luotao1 Jun 22, 2018

luotao1 Jun 22, 2018 •

edited

Loading

pzelazko-intel Jun 25, 2018

luotao1 Jun 22, 2018

luotao1 Jun 22, 2018

luotao1 Jun 22, 2018

luotao1 Jun 22, 2018

luotao1 Jun 27, 2018

pzelazko-intel Jun 27, 2018

luotao1 left a comment

bnorm+relu fuse for mkldnn (inference) #11434

bnorm+relu fuse for mkldnn (inference) #11434

Conversation

pzelazko-intel commented Jun 13, 2018

tensor-tang Jun 19, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luotao1 Jun 22, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luotao1 left a comment

Choose a reason for hiding this comment

tensor-tang Jun 19, 2018 •

edited

Loading

luotao1 Jun 22, 2018 •

edited

Loading