Implementation of MKLDNN FC #9385

mozga-intel · 2018-03-26T13:58:43Z

No description provided.

CLAassistant · 2018-03-26T13:58:49Z

All committers have signed the CLA.

luotao1 · 2018-03-29T02:08:17Z

paddle/fluid/operators/CMakeLists.txt

@@ -222,7 +244,6 @@ op_library(recurrent_op DEPS executor)
 op_library(warpctc_op DEPS dynload_warpctc sequence_padding sequence_scale)
 op_library(cos_sim_op DEPS cos_sim_functor)
 op_library(parallel_do_op DEPS executor)


Since fc_mkldnn_op.cc is different from others, could compile it directly, which will make the cmakelists.txt more clearly?

if(WITH_MKLDNN) cc_library(fc_op SRCS fc_mkldnn_op.cc fc_op.cc) file(APPEND ${pybind_file} "USE_OP_DEVICE_KERNEL(fc, MKLDNN);\n") endif()

luotao1 · 2018-03-29T02:11:17Z

python/paddle/fluid/layers/nn.py

@@ -84,6 +84,7 @@ def fc(input,
       param_attr=None,
       bias_attr=None,
       use_mkldnn=False,
+       with_bias=False,


You can use bias_attr instead of with_bias. i.e. bias_attr=False

luotao1 · 2018-03-29T02:14:45Z

python/paddle/fluid/layers/nn.py

+                attrs={
+                    "x_num_col_dims": num_flatten_dims,
+                    "y_num_col_dims": 1,
+                    'use_mkldnn': use_mkldnn


when use_mkldnn == False, remove line 175, since use_mkldnn is not an attribute of mul_op.

luotao1 · 2018-03-29T02:21:45Z

python/paddle/fluid/layers/nn.py

+                        "W": w},
+                outputs={"Out": tmp},
+                attrs={"use_mkldnn": use_mkldnn,
+                       "with_bias": with_bias})
        mul_results.append(tmp)


#9197 said "MKLDNN version of algorithm gives us the opportunity to combine these operations into one", that is to say, mkldnn_fc_op is a combination of mul_op and sum_op. but in this method, it just replaces the mul_op?

luotao1 · 2018-03-29T02:30:08Z

paddle/fluid/operators/fc_mkldnn_op.cc

+        "(Tensor) The input tensor of fully connected operator. "
+        "The format of input tensor is NCHW, where N is batch size, C is the "
+        "number of channels, H is the height of the feature, "
+        "and W is the width of the feature.");


fc_op should not consider the data format.

luotao1 · 2018-03-29T02:34:42Z

paddle/fluid/operators/fc_mkldnn_op.cc

+    AddAttr<bool>("with_bias",
+                  "(bool, default false) Only used in mkldnn kernel")
+        .SetDefault(false);
+    AddAttr<std::string>(


data_format is not used in fc_op

luotao1 · 2018-03-29T02:34:55Z

paddle/fluid/operators/fc_mkldnn_op.cc

+        "Defaults to \"NHWC\". Specify the data format of the output data, "
+        "the input will be transformed automatically. ")
+        .SetDefault("AnyLayout");
+    AddComment(R"DOC(


please add comments of fc_op

luotao1 · 2018-03-29T02:51:08Z

paddle/fluid/operators/fc_mkldnn_op.h

+
+#pragma once
+
+#include "paddle/fluid/framework/op_registry.h"


fc_mkldnn_op.h file could be removed since all contents could be implemented in fc_op.cc file.

luotao1 · 2018-03-29T02:54:26Z

paddle/fluid/operators/fc_mkldnn_op.cc

+
+using paddle::framework::Tensor;
+using paddle::platform::MKLDNNDeviceContext;
+


line 26- line 118 should be moved to fc_op.cc file, and this mkldnn_fc_op.cc file only contains mkldnn related functions.

luotao1 · 2018-03-29T03:06:42Z

paddle/fluid/operators/fc_mkldnn_op.cc

+                 "Fully Connected input should be 4-D tensor.");
+
+  PADDLE_ENFORCE(w_dims.size() == 2,
+                 "Fully Connected input should be 2-D tensor.");


I wonder why input of FC layer should be 4-D tensor?

luotao1

If don't modify the CMakeLists.txt, it would write USE_OP(fc) in pybind.h. USE_OP contains USE_OP_DEVICE_KERNEL(op_type, CPU) and USE_OP_DEVICE_KERNEL(op_type, CUDA), but fc_op doesn't have CPU and GPU kernel, does it OK?

[mozga-intel]

[Previous code]
Question from: #9197

Can I use the new Fc's kernel when I don't have a full implementation of FC's kernels on a CPU and GPU place, but I have only two fake kernels on CPU and GPU place?
By fake kernel I mean that this kernel is registered in the system but when it is called then the system gets the message that the kernel is not available at this time. I worked out that there are fake objects because the PaddlePaddle platform needs to have all kernels on all platforms.

[Current]
Fully Connected layer doesn't have the Cuda and Cpu kernel. This code were tested by the three cases.

First of all, when the platform doesn't contains: MKLDNN and GPU version - this version works, i.e when a flag of use_mkldnn is True and the paddle paddle doesn't have the mkldnn version, the platform gets message that the CPU, GPU and MKLDNN kernel doesn't exist. But the old version of code (mul, add) works.
Second, when the platform of paddle paddle doesn't have the mkldnn version - the old version of code works: the paddle paddle gets information that the mkldnn version is not attached to this system.
Third, simultaneously the last case - When the MKLDNN and GPU are attached to the platform - the code also works.

[Summary]:
This code works, and we shouldn't register the fake kernels, because the current system of paddle paddle doesn't need to have all kernels on all platform.

luotao1 · 2018-03-30T03:34:10Z

paddle/fluid/operators/fc_op.cc

+      "(Tensor) The input tensor of fully connected operator. "
+      "The format of input tensor is NCHW, where N is batch size, C is the "
+      "number of channels, H is the height of the feature, "
+      "and W is the width of the feature.");


FC Op should not have the data_format

In the comment of "Input" and "Output", the format of input/output tensor is still NCHW, should refine them.

luotao1 · 2018-03-30T03:35:16Z

paddle/fluid/operators/fc_op.cc

+  The size of each dimension of the parameters checked in the infer-shape.
+  Input(Input) is NCHW or NC format. Where N is batch size, C is the number of channels, 
+               H is the height of the feature, and W is the width of the feature.
+  Weights(W) is OIHW or OI format. Where H is the height of the feature, W is the width of the feature, 


OIHW and OI is the format of mkldnn_fc_op, but this comment is for general fc_op. Maybe the comments related with mkldnn could be moved to fc_mkldnn_op.cc.

luotao1 · 2018-03-30T03:37:34Z

python/paddle/fluid/layers/nn.py

+            is_bias=False)
+        bias_attr = False
+        if bias_attr is not None:
+            bias_attr = True


for line 168-170, if bias_attr=False from parameters, it will be set to True in line 170?

Oh, I agree with you. Thank you.

luotao1 · 2018-04-02T02:40:47Z

paddle/fluid/operators/fc_op.cc

+      "(Tensor) The input tensor of fully connected operator. "
+      "The format of input tensor is NCHW, where N is batch size, C is the "
+      "number of channels, H is the height of the feature, "
+      "and W is the width of the feature.");


In the comment of "Input" and "Output", the format of input/output tensor is still NCHW, should refine them.

luotao1 · 2018-04-02T02:42:00Z

paddle/fluid/operators/fc_mkldnn_op.cc

+using paddle::framework::Tensor;
+using paddle::platform::MKLDNNDeviceContext;
+
+template <typename T>


@tensor-tang Could you help review the fc_mkldnn_op.cc?

tensor-tang · 2018-04-02T12:39:13Z

paddle/fluid/operators/fc_mkldnn_op.cc

+using paddle::platform::MKLDNNDeviceContext;
+
+template <typename T>
+class MKLDNNMD {


I do not see much benefit defining this MKLDNNMD.

Basically, the devil is in the details, I reckon that this code refactoring is fairly simple. I carried out this code refactoring because the forward and backward function has been using the same line of code. Looking forward, this code gives us opportunity to make the more generic function for the all operators.

OK, looking forward to making this code more common for all MKLDNN operators.

tensor-tang · 2018-04-02T12:41:06Z

paddle/fluid/operators/fc_mkldnn_op.cc

+  bool is_spatial_;
+};
+
+class MKLDNNMemory {


Same as MKLDNNMD.

And why this MKLDNNMemory defined here, why not some memory headers?

tensor-tang · 2018-04-02T12:44:02Z

paddle/fluid/operators/fc_mkldnn_op.cc

+    PADDLE_ENFORCE(input->dims().size() == 4 || input->dims().size() == 2,
+                   "Input must be with 2 or 4 dimensions, i.e. NCHW");
+    PADDLE_ENFORCE(w->dims().size() == 2,
+                   "Weights must be with 2 dimensions, i.e. NC");


The 4 dims weight is supported here https://github.com/PaddlePaddle/Paddle/pull/9385/files#diff-748e86d3a092af5baef9d03f94b1cf1bR53, why enforce this?
By the way, it should be OI or OIHW of weight format.

tensor-tang · 2018-04-02T12:46:49Z

paddle/fluid/operators/fc_mkldnn_op.cc

+    return is_spatial()
+               ? platform::MKLDNNMemDesc({in[0], in[1], in[2], in[3]},
+                                         mkldnn::memory::data_type::f32,
+                                         mkldnn::memory::format::nchw)


If your input format is nChw8c, then this would be wrong.
Or at least, you can add some enforcement.

I agree with you, but I gather that I can't overcome the some problems - I'm waiting for layers. The input's format for this layer will be changed in the feature when the support (layers) for the all layers will be ready.

OK, again, this would be a note for you in case of later optimization. Thanks.

luotao1

LGTM! Thanks very much!

mozga-intel added the Intel label Mar 26, 2018

mozga-intel force-pushed the mozga/mkldnn-fc branch from 5ebfad2 to ef23359 Compare March 26, 2018 18:11

mozga-intel assigned mozga-intel and unassigned mozga-intel Mar 26, 2018

mozga-intel requested a review from luotao1 March 26, 2018 20:53

mozga-intel changed the title ~~Implementation of MKLDNN FullyConnected~~ Implementation of MKLDNN FC Mar 26, 2018

mozga-intel force-pushed the mozga/mkldnn-fc branch 2 times, most recently from dd21d88 to dcbf25c Compare March 28, 2018 08:53

luotao1 reviewed Mar 29, 2018

View reviewed changes

mozga-intel force-pushed the mozga/mkldnn-fc branch 5 times, most recently from 01696e6 to b571334 Compare March 29, 2018 18:39

luotao1 reviewed Mar 30, 2018

View reviewed changes

mozga-intel force-pushed the mozga/mkldnn-fc branch 3 times, most recently from 8fdcf88 to e5c4a89 Compare March 30, 2018 18:16

luotao1 reviewed Apr 2, 2018

View reviewed changes

tensor-tang reviewed Apr 2, 2018

View reviewed changes

mozga-intel force-pushed the mozga/mkldnn-fc branch 2 times, most recently from 912518a to e5c4a89 Compare April 3, 2018 12:53

mozga-intel added 4 commits April 3, 2018 14:53

Implementation of MKLDNN FC

2811ea4

Added new fc files, register fc kernel

34a8084

Remove additional message

32f8ac7

Enforce: 2 and 4 dims, remove information about out in format

46e14bb

mozga-intel force-pushed the mozga/mkldnn-fc branch 2 times, most recently from 0f63721 to 46e14bb Compare April 3, 2018 13:05

luotao1 approved these changes Apr 4, 2018

View reviewed changes

luotao1 merged commit a98a3fd into PaddlePaddle:develop Apr 4, 2018

luotao1 added this to Done in Intel Optimization on Fluid May 7, 2018


		using paddle::framework::Tensor;
		using paddle::platform::MKLDNNDeviceContext;


		#pragma once

		#include "paddle/fluid/framework/op_registry.h"

Implementation of MKLDNN FC #9385

Implementation of MKLDNN FC #9385

Conversation

mozga-intel commented Mar 26, 2018

CLAassistant commented Mar 26, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luotao1 left a comment • edited by mozga-intel Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mozga-intel Apr 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mozga-intel Apr 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luotao1 left a comment

Choose a reason for hiding this comment

CLAassistant commented Mar 26, 2018 •

edited

Loading

luotao1 left a comment •

edited by mozga-intel

Loading

mozga-intel Apr 3, 2018 •

edited

Loading

mozga-intel Apr 3, 2018 •

edited

Loading