-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of MKLDNN LRN #9123
Conversation
@@ -87,5 +87,15 @@ def test_check_grad_normal(self): | |||
self.check_grad(['X'], 'Out', max_relative_error=0.01) | |||
|
|||
|
|||
class TestLRNMKLDNNOp(TestLRNOp): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two algorithms ACROSS_CHANNELS and WITHIN_CHANNEL, but TestLRNMKLDNNOp here only test the default one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this remark, @luotao1. PaddlePaddle seems to implement LRN that works across channels. For MKLDNN, WITHIN_CHANNEL option does not work in backward pass.
I decided to remove "algorithm" attribute and fix LRN algorithm to ACROSS_CHANNELS.
paddle/fluid/operators/lrn_op.cc
Outdated
framework::DataLayout layout_ = framework::StringToDataLayout(data_format); | ||
return framework::OpKernelType( | ||
framework::ToDataType(ctx.Input<Tensor>("X")->type()), ctx.GetPlace(), | ||
layout_, library_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GetExpectedKernelType of LRNOp and LRNOp are the same, could these functions be fused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved content of this method to standalone function in an anonymous namespace, and reuse it in both LRNOp and LRNGradOp.
LGTM. @tensor-tang Could you help review lrn_mkldnn_op.cc? |
|
||
auto input_data = x->data<T>(); | ||
auto output_data = out->mutable_data<T>(ctx.GetPlace()); | ||
mid->mutable_data<T>(ctx.GetPlace()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this MidOut
for in MKL-DNN implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to documentation of LRN operator, it's a middle result of lrn operator that is computed in forward pass and used in backward pass. In practice, CPU naive implementation sets it to a value of attribute k
, and MidOut
tensor is validated by LRN unit tests.
It's not really needed by MKLDNN primitives. However, it's a part of LRN interface, as an input tensor, so I decided fill it in the same way as it's done in CPU naive implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not needed by MKLDNN primitives, you can use the following ways: "MaxIndex" is only used when "pooltype" == "MAX" in sequence_pool_op
Paddle/paddle/fluid/operators/sequence_pool_op.cc
Lines 30 to 34 in 0821ee7
if (ctx->Attrs().Get<std::string>("pooltype") == "MAX") { | |
PADDLE_ENFORCE(ctx->HasOutput("MaxIndex"), | |
"Output(MaxIndex) of SequencePoolOp should not be null."); | |
ctx->SetOutputDim("MaxIndex", ctx->GetInputDim("X")); | |
} |
const float k = ctx.Attr<float>("k"); | ||
|
||
auto e_mid = framework::EigenTensor<T, 4>::From(*mid); | ||
e_mid = e_mid.constant(k); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is e_mid
? and why we need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an Eigen TensorMap that it's mapped onto data of the MidOut
tensor. It's used to fill MidOut
with value of attribute k
.
auto workspace_md = forward_pd->workspace_primitive_desc(); | ||
auto workspace_memory = std::make_shared<mkldnn::memory>(workspace_md); | ||
|
||
dev_ctx.SetBlob(key_workspace_memory, workspace_memory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This workspace_memory
would always be allocated in forward, which is not performance best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tensor-tang , you are right, that is allocated in forward pass and used in backward pass. What is your suggestion? Would you like to have it allocated only once when workspace memory is not present in the device context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could try to get it first, then create only if it's empty.
dev_ctx.SetBlob(key_workspace_memory, workspace_memory); | ||
|
||
auto forward_op = mkldnn::lrn_forward{*forward_pd, *src_memory, | ||
*workspace_memory, dst_memory}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, this memory and function is used only in training phase, maybe you should add one TODO to optimize it when you can get phase, as a reminder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tensor-tang, good point! Do you know how to discover how to check whether we are in training phase or scoring phase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can following the same way of batch_norm_op, which uses is_test
:
Paddle/paddle/fluid/operators/batch_norm_op.cc
Lines 175 to 182 in 0821ee7
if (!is_test) { | |
// saved_xx is use just in this batch of data | |
EigenVectorArrayMap<T> saved_mean_e( | |
saved_mean->mutable_data<T>(ctx.GetPlace()), C); | |
EigenVectorArrayMap<T> saved_variance_e( | |
saved_variance->mutable_data<T>(ctx.GetPlace()), C); | |
saved_mean_e.setZero(); | |
saved_variance_e.setZero(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luotao1 @tensor-tang, I do see your point but in case of LRN is_test
would become a part of user interface not only for MKLDNN LRN, but also for other implementations of LRN. Do you think adding such attribute would be justified for other implementations of LRN operator?
You could make such case for other operators as well. We could end up in the situation when each operator sets its own local is_test
attribute to maintain information about phase, when this information is in fact global (global as being maintain on the network level).
Would it make more sense to maintain phase information in network executor?
I could add TODO
comment saying that the blob should not be allocated when PaddlePaddle is in testing phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think adding such attribute would be justified for other implementations of LRN operator
Yes, "is_test" is also suited for other implementation of LRN operators.
You could make such case for other operators as well. We could end up in the situation when each operator sets its own local is_test attribute to maintain information about phase, when this information is in fact global (global as being maintain on the network level). Would it make more sense to maintain phase information in network executor?
Now, framework/prune.cc
use this attribute in inference phase.
Paddle/paddle/fluid/framework/prune.cc
Lines 187 to 197 in 0821ee7
void inference_optimize_impl(proto::ProgramDesc* input, int block_id) { | |
auto* op_field = input->mutable_blocks(block_id)->mutable_ops(); | |
for (auto& op_desc : *op_field) { | |
for (auto& attr : *op_desc.mutable_attrs()) { | |
if (attr.name() == "is_test") { | |
attr.set_b(true); | |
break; | |
} | |
} | |
} | |
} |
I could add TODO comment saying that the blob should not be allocated when PaddlePaddle is in testing phase.
Yes, you can either add TODO comment or fix in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luotao1 thanks for your reply. I will add the attribute to the operator.
But would it be more useful to maintain testing/training information on the network level, instead of keeping on the operator level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now We maintain testing/training information on network level as well.
Paddle/python/paddle/fluid/framework.py
Lines 959 to 982 in 0821ee7
def clone(self, for_test=False): | |
"""Clone the Program object | |
Set for_test to False when we want to clone the program for training. | |
Set for_test to True when we want to clone the program for testing. | |
Args: | |
for_test(bool): Some operators, such as batch_norm and drop_out ops, | |
behave differently in training and testing. If for_test is True, | |
the is_test attributes in these operators will be set to True for | |
testing purposes, otherwise, they remain unchanged. | |
Returns(Program): | |
The cloned Program object. | |
""" | |
p = Program() | |
if for_test: | |
p.desc = core.inference_optimize(self.desc) | |
else: | |
p.desc = core.ProgramDesc(self.desc) | |
p.blocks = [Block(p, i) for i in xrange(self.desc.num_blocks())] | |
p.sync_with_cpp() | |
p.copy_param_info_from(self) | |
return p |
But if operator level doesn't have is_test information, how could this op behave differently in test or train phase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I just added some changes regarding is_test attribute and unittest for is_test attribute. Could it be also merged? |
This PR contains implementation of LRN optimized with MKLDNN. It also contains unittests for the operator.