Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of MKLDNN FC #9385

Merged
merged 4 commits into from
Apr 4, 2018

Conversation

mozga-intel
Copy link
Contributor

No description provided.

@CLAassistant
Copy link

CLAassistant commented Mar 26, 2018

CLA assistant check
All committers have signed the CLA.

@mozga-intel mozga-intel changed the title Implementation of MKLDNN FullyConnected Implementation of MKLDNN FC Mar 26, 2018
@mozga-intel mozga-intel force-pushed the mozga/mkldnn-fc branch 2 times, most recently from dd21d88 to dcbf25c Compare March 28, 2018 08:53
@@ -222,7 +244,6 @@ op_library(recurrent_op DEPS executor)
op_library(warpctc_op DEPS dynload_warpctc sequence_padding sequence_scale)
op_library(cos_sim_op DEPS cos_sim_functor)
op_library(parallel_do_op DEPS executor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since fc_mkldnn_op.cc is different from others, could compile it directly, which will make the cmakelists.txt more clearly?

if(WITH_MKLDNN)
    cc_library(fc_op SRCS fc_mkldnn_op.cc fc_op.cc)
    file(APPEND ${pybind_file} "USE_OP_DEVICE_KERNEL(fc, MKLDNN);\n")
endif()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -84,6 +84,7 @@ def fc(input,
param_attr=None,
bias_attr=None,
use_mkldnn=False,
with_bias=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use bias_attr instead of with_bias. i.e. bias_attr=False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

attrs={
"x_num_col_dims": num_flatten_dims,
"y_num_col_dims": 1,
'use_mkldnn': use_mkldnn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when use_mkldnn == False, remove line 175, since use_mkldnn is not an attribute of mul_op.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"W": w},
outputs={"Out": tmp},
attrs={"use_mkldnn": use_mkldnn,
"with_bias": with_bias})
mul_results.append(tmp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#9197 said "MKLDNN version of algorithm gives us the opportunity to combine these operations into one", that is to say, mkldnn_fc_op is a combination of mul_op and sum_op. but in this method, it just replaces the mul_op?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"(Tensor) The input tensor of fully connected operator. "
"The format of input tensor is NCHW, where N is batch size, C is the "
"number of channels, H is the height of the feature, "
"and W is the width of the feature.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fc_op should not consider the data format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

AddAttr<bool>("with_bias",
"(bool, default false) Only used in mkldnn kernel")
.SetDefault(false);
AddAttr<std::string>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data_format is not used in fc_op

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"Defaults to \"NHWC\". Specify the data format of the output data, "
"the input will be transformed automatically. ")
.SetDefault("AnyLayout");
AddComment(R"DOC(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add comments of fc_op

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


#pragma once

#include "paddle/fluid/framework/op_registry.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fc_mkldnn_op.h file could be removed since all contents could be implemented in fc_op.cc file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


using paddle::framework::Tensor;
using paddle::platform::MKLDNNDeviceContext;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 26- line 118 should be moved to fc_op.cc file, and this mkldnn_fc_op.cc file only contains mkldnn related functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"Fully Connected input should be 4-D tensor.");

PADDLE_ENFORCE(w_dims.size() == 2,
"Fully Connected input should be 2-D tensor.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why input of FC layer should be 4-D tensor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@mozga-intel mozga-intel force-pushed the mozga/mkldnn-fc branch 5 times, most recently from 01696e6 to b571334 Compare March 29, 2018 18:39
Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If don't modify the CMakeLists.txt, it would write USE_OP(fc) in pybind.h. USE_OP contains USE_OP_DEVICE_KERNEL(op_type, CPU) and USE_OP_DEVICE_KERNEL(op_type, CUDA), but fc_op doesn't have CPU and GPU kernel, does it OK?

[mozga-intel]

[Previous code]
Question from: #9197

Can I use the new Fc's kernel when I don't have a full implementation of FC's kernels on a CPU and GPU place, but I have only two fake kernels on CPU and GPU place?
By fake kernel I mean that this kernel is registered in the system but when it is called then the system gets the message that the kernel is not available at this time. I worked out that there are fake objects because the PaddlePaddle platform needs to have all kernels on all platforms.

[Current]
Fully Connected layer doesn't have the Cuda and Cpu kernel. This code were tested by the three cases.

  • First of all, when the platform doesn't contains: MKLDNN and GPU version - this version works, i.e when a flag of use_mkldnn is True and the paddle paddle doesn't have the mkldnn version, the platform gets message that the CPU, GPU and MKLDNN kernel doesn't exist. But the old version of code (mul, add) works.
  • Second, when the platform of paddle paddle doesn't have the mkldnn version - the old version of code works: the paddle paddle gets information that the mkldnn version is not attached to this system.
  • Third, simultaneously the last case - When the MKLDNN and GPU are attached to the platform - the code also works.

[Summary]:
This code works, and we shouldn't register the fake kernels, because the current system of paddle paddle doesn't need to have all kernels on all platform.

"(Tensor) The input tensor of fully connected operator. "
"The format of input tensor is NCHW, where N is batch size, C is the "
"number of channels, H is the height of the feature, "
"and W is the width of the feature.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FC Op should not have the data_format

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the comment of "Input" and "Output", the format of input/output tensor is still NCHW, should refine them.

The size of each dimension of the parameters checked in the infer-shape.
Input(Input) is NCHW or NC format. Where N is batch size, C is the number of channels,
H is the height of the feature, and W is the width of the feature.
Weights(W) is OIHW or OI format. Where H is the height of the feature, W is the width of the feature,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OIHW and OI is the format of mkldnn_fc_op, but this comment is for general fc_op. Maybe the comments related with mkldnn could be moved to fc_mkldnn_op.cc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

is_bias=False)
bias_attr = False
if bias_attr is not None:
bias_attr = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for line 168-170, if bias_attr=False from parameters, it will be set to True in line 170?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I agree with you. Thank you.

@mozga-intel mozga-intel force-pushed the mozga/mkldnn-fc branch 3 times, most recently from 8fdcf88 to e5c4a89 Compare March 30, 2018 18:16
"(Tensor) The input tensor of fully connected operator. "
"The format of input tensor is NCHW, where N is batch size, C is the "
"number of channels, H is the height of the feature, "
"and W is the width of the feature.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the comment of "Input" and "Output", the format of input/output tensor is still NCHW, should refine them.

using paddle::framework::Tensor;
using paddle::platform::MKLDNNDeviceContext;

template <typename T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tensor-tang Could you help review the fc_mkldnn_op.cc?

using paddle::platform::MKLDNNDeviceContext;

template <typename T>
class MKLDNNMD {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see much benefit defining this MKLDNNMD.

Copy link
Contributor Author

@mozga-intel mozga-intel Apr 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, the devil is in the details, I reckon that this code refactoring is fairly simple. I carried out this code refactoring because the forward and backward function has been using the same line of code. Looking forward, this code gives us opportunity to make the more generic function for the all operators.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, looking forward to making this code more common for all MKLDNN operators.

bool is_spatial_;
};

class MKLDNNMemory {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as MKLDNNMD.

And why this MKLDNNMemory defined here, why not some memory headers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

PADDLE_ENFORCE(input->dims().size() == 4 || input->dims().size() == 2,
"Input must be with 2 or 4 dimensions, i.e. NCHW");
PADDLE_ENFORCE(w->dims().size() == 2,
"Weights must be with 2 dimensions, i.e. NC");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 4 dims weight is supported here https://github.com/PaddlePaddle/Paddle/pull/9385/files#diff-748e86d3a092af5baef9d03f94b1cf1bR53, why enforce this?
By the way, it should be OI or OIHW of weight format.

return is_spatial()
? platform::MKLDNNMemDesc({in[0], in[1], in[2], in[3]},
mkldnn::memory::data_type::f32,
mkldnn::memory::format::nchw)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your input format is nChw8c, then this would be wrong.
Or at least, you can add some enforcement.

Copy link
Contributor Author

@mozga-intel mozga-intel Apr 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you, but I gather that I can't overcome the some problems - I'm waiting for layers. The input's format for this layer will be changed in the feature when the support (layers) for the all layers will be ready.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, again, this would be a note for you in case of later optimization. Thanks.

@mozga-intel mozga-intel force-pushed the mozga/mkldnn-fc branch 2 times, most recently from 912518a to e5c4a89 Compare April 3, 2018 12:53
@mozga-intel mozga-intel force-pushed the mozga/mkldnn-fc branch 2 times, most recently from 0f63721 to 46e14bb Compare April 3, 2018 13:05
Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks very much!

@luotao1 luotao1 merged commit a98a3fd into PaddlePaddle:develop Apr 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

None yet

4 participants