Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of MKLDNN LRN #9123

Merged
merged 4 commits into from
Mar 22, 2018
Merged

Implementation of MKLDNN LRN #9123

merged 4 commits into from
Mar 22, 2018

Conversation

tpatejko
Copy link

This PR contains implementation of LRN optimized with MKLDNN. It also contains unittests for the operator.

@CLAassistant
Copy link

CLAassistant commented Mar 15, 2018

CLA assistant check
All committers have signed the CLA.

@luotao1 luotao1 added the Intel label Mar 15, 2018
@@ -87,5 +87,15 @@ def test_check_grad_normal(self):
self.check_grad(['X'], 'Out', max_relative_error=0.01)


class TestLRNMKLDNNOp(TestLRNOp):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two algorithms ACROSS_CHANNELS and WITHIN_CHANNEL, but TestLRNMKLDNNOp here only test the default one.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this remark, @luotao1. PaddlePaddle seems to implement LRN that works across channels. For MKLDNN, WITHIN_CHANNEL option does not work in backward pass.

I decided to remove "algorithm" attribute and fix LRN algorithm to ACROSS_CHANNELS.

framework::DataLayout layout_ = framework::StringToDataLayout(data_format);
return framework::OpKernelType(
framework::ToDataType(ctx.Input<Tensor>("X")->type()), ctx.GetPlace(),
layout_, library_);
Copy link
Contributor

@luotao1 luotao1 Mar 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GetExpectedKernelType of LRNOp and LRNOp are the same, could these functions be fused?

Copy link
Author

@tpatejko tpatejko Mar 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved content of this method to standalone function in an anonymous namespace, and reuse it in both LRNOp and LRNGradOp.

@luotao1
Copy link
Contributor

luotao1 commented Mar 19, 2018

LGTM. @tensor-tang Could you help review lrn_mkldnn_op.cc?


auto input_data = x->data<T>();
auto output_data = out->mutable_data<T>(ctx.GetPlace());
mid->mutable_data<T>(ctx.GetPlace());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this MidOut for in MKL-DNN implementations?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to documentation of LRN operator, it's a middle result of lrn operator that is computed in forward pass and used in backward pass. In practice, CPU naive implementation sets it to a value of attribute k, and MidOut tensor is validated by LRN unit tests.

It's not really needed by MKLDNN primitives. However, it's a part of LRN interface, as an input tensor, so I decided fill it in the same way as it's done in CPU naive implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not needed by MKLDNN primitives, you can use the following ways: "MaxIndex" is only used when "pooltype" == "MAX" in sequence_pool_op

if (ctx->Attrs().Get<std::string>("pooltype") == "MAX") {
PADDLE_ENFORCE(ctx->HasOutput("MaxIndex"),
"Output(MaxIndex) of SequencePoolOp should not be null.");
ctx->SetOutputDim("MaxIndex", ctx->GetInputDim("X"));
}

const float k = ctx.Attr<float>("k");

auto e_mid = framework::EigenTensor<T, 4>::From(*mid);
e_mid = e_mid.constant(k);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is e_mid? and why we need it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an Eigen TensorMap that it's mapped onto data of the MidOut tensor. It's used to fill MidOut with value of attribute k.

auto workspace_md = forward_pd->workspace_primitive_desc();
auto workspace_memory = std::make_shared<mkldnn::memory>(workspace_md);

dev_ctx.SetBlob(key_workspace_memory, workspace_memory);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workspace_memory would always be allocated in forward, which is not performance best.

Copy link
Author

@tpatejko tpatejko Mar 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tensor-tang , you are right, that is allocated in forward pass and used in backward pass. What is your suggestion? Would you like to have it allocated only once when workspace memory is not present in the device context?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could try to get it first, then create only if it's empty.

dev_ctx.SetBlob(key_workspace_memory, workspace_memory);

auto forward_op = mkldnn::lrn_forward{*forward_pd, *src_memory,
*workspace_memory, dst_memory};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, this memory and function is used only in training phase, maybe you should add one TODO to optimize it when you can get phase, as a reminder.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tensor-tang, good point! Do you know how to discover how to check whether we are in training phase or scoring phase?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can following the same way of batch_norm_op, which uses is_test:

if (!is_test) {
// saved_xx is use just in this batch of data
EigenVectorArrayMap<T> saved_mean_e(
saved_mean->mutable_data<T>(ctx.GetPlace()), C);
EigenVectorArrayMap<T> saved_variance_e(
saved_variance->mutable_data<T>(ctx.GetPlace()), C);
saved_mean_e.setZero();
saved_variance_e.setZero();

Copy link
Author

@tpatejko tpatejko Mar 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luotao1 @tensor-tang, I do see your point but in case of LRN is_test would become a part of user interface not only for MKLDNN LRN, but also for other implementations of LRN. Do you think adding such attribute would be justified for other implementations of LRN operator?

You could make such case for other operators as well. We could end up in the situation when each operator sets its own local is_test attribute to maintain information about phase, when this information is in fact global (global as being maintain on the network level).

Would it make more sense to maintain phase information in network executor?

I could add TODO comment saying that the blob should not be allocated when PaddlePaddle is in testing phase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think adding such attribute would be justified for other implementations of LRN operator

Yes, "is_test" is also suited for other implementation of LRN operators.

You could make such case for other operators as well. We could end up in the situation when each operator sets its own local is_test attribute to maintain information about phase, when this information is in fact global (global as being maintain on the network level). Would it make more sense to maintain phase information in network executor?

Now, framework/prune.cc use this attribute in inference phase.

void inference_optimize_impl(proto::ProgramDesc* input, int block_id) {
auto* op_field = input->mutable_blocks(block_id)->mutable_ops();
for (auto& op_desc : *op_field) {
for (auto& attr : *op_desc.mutable_attrs()) {
if (attr.name() == "is_test") {
attr.set_b(true);
break;
}
}
}
}

I could add TODO comment saying that the blob should not be allocated when PaddlePaddle is in testing phase.

Yes, you can either add TODO comment or fix in this PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luotao1 thanks for your reply. I will add the attribute to the operator.

But would it be more useful to maintain testing/training information on the network level, instead of keeping on the operator level?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now We maintain testing/training information on network level as well.

def clone(self, for_test=False):
"""Clone the Program object
Set for_test to False when we want to clone the program for training.
Set for_test to True when we want to clone the program for testing.
Args:
for_test(bool): Some operators, such as batch_norm and drop_out ops,
behave differently in training and testing. If for_test is True,
the is_test attributes in these operators will be set to True for
testing purposes, otherwise, they remain unchanged.
Returns(Program):
The cloned Program object.
"""
p = Program()
if for_test:
p.desc = core.inference_optimize(self.desc)
else:
p.desc = core.ProgramDesc(self.desc)
p.blocks = [Block(p, i) for i in xrange(self.desc.num_blocks())]
p.sync_with_cpp()
p.copy_param_info_from(self)
return p

But if operator level doesn't have is_test information, how could this op behave differently in test or train phase?

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1 luotao1 merged commit e027eb4 into PaddlePaddle:develop Mar 22, 2018
@tpatejko
Copy link
Author

tpatejko commented Mar 22, 2018

@luotao1, @tensor-tang

I just added some changes regarding is_test attribute and unittest for is_test attribute. Could it be also merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

None yet

4 participants