Implementation of MKLDNN LRN #9123

tpatejko · 2018-03-15T11:10:52Z

This PR contains implementation of LRN optimized with MKLDNN. It also contains unittests for the operator.

CLAassistant · 2018-03-15T11:10:59Z

All committers have signed the CLA.

luotao1 · 2018-03-16T03:31:26Z

python/paddle/fluid/tests/unittests/test_lrn_op.py

@@ -87,5 +87,15 @@ def test_check_grad_normal(self):
        self.check_grad(['X'], 'Out', max_relative_error=0.01)


+class TestLRNMKLDNNOp(TestLRNOp):


There are two algorithms ACROSS_CHANNELS and WITHIN_CHANNEL, but TestLRNMKLDNNOp here only test the default one.

Thanks for this remark, @luotao1. PaddlePaddle seems to implement LRN that works across channels. For MKLDNN, WITHIN_CHANNEL option does not work in backward pass.

I decided to remove "algorithm" attribute and fix LRN algorithm to ACROSS_CHANNELS.

luotao1 · 2018-03-16T03:44:05Z

paddle/fluid/operators/lrn_op.cc

+    framework::DataLayout layout_ = framework::StringToDataLayout(data_format);
+    return framework::OpKernelType(
+        framework::ToDataType(ctx.Input<Tensor>("X")->type()), ctx.GetPlace(),
+        layout_, library_);


The GetExpectedKernelType of LRNOp and LRNOp are the same, could these functions be fused?

I moved content of this method to standalone function in an anonymous namespace, and reuse it in both LRNOp and LRNGradOp.

…y with ACROSS_CHANNELS

luotao1 · 2018-03-19T10:41:25Z

LGTM. @tensor-tang Could you help review lrn_mkldnn_op.cc?

tensor-tang · 2018-03-19T11:20:22Z

paddle/fluid/operators/lrn_mkldnn_op.cc

+
+    auto input_data = x->data<T>();
+    auto output_data = out->mutable_data<T>(ctx.GetPlace());
+    mid->mutable_data<T>(ctx.GetPlace());


What is this MidOut for in MKL-DNN implementations?

According to documentation of LRN operator, it's a middle result of lrn operator that is computed in forward pass and used in backward pass. In practice, CPU naive implementation sets it to a value of attribute k, and MidOut tensor is validated by LRN unit tests.

It's not really needed by MKLDNN primitives. However, it's a part of LRN interface, as an input tensor, so I decided fill it in the same way as it's done in CPU naive implementation.

If it's not needed by MKLDNN primitives, you can use the following ways: "MaxIndex" is only used when "pooltype" == "MAX" in sequence_pool_op

Paddle/paddle/fluid/operators/sequence_pool_op.cc

Lines 30 to 34 in 0821ee7

if (ctx->Attrs().Get<std::string>("pooltype") == "MAX") {

PADDLE_ENFORCE(ctx->HasOutput("MaxIndex"),

"Output(MaxIndex) of SequencePoolOp should not be null.");

ctx->SetOutputDim("MaxIndex", ctx->GetInputDim("X"));

}

tensor-tang · 2018-03-19T11:21:05Z

paddle/fluid/operators/lrn_mkldnn_op.cc

+    const float k = ctx.Attr<float>("k");
+
+    auto e_mid = framework::EigenTensor<T, 4>::From(*mid);
+    e_mid = e_mid.constant(k);


What is e_mid? and why we need it?

It's an Eigen TensorMap that it's mapped onto data of the MidOut tensor. It's used to fill MidOut with value of attribute k.

tensor-tang · 2018-03-19T11:21:37Z

paddle/fluid/operators/lrn_mkldnn_op.cc

+    auto workspace_md = forward_pd->workspace_primitive_desc();
+    auto workspace_memory = std::make_shared<mkldnn::memory>(workspace_md);
+
+    dev_ctx.SetBlob(key_workspace_memory, workspace_memory);


This workspace_memory would always be allocated in forward, which is not performance best.

@tensor-tang , you are right, that is allocated in forward pass and used in backward pass. What is your suggestion? Would you like to have it allocated only once when workspace memory is not present in the device context?

I think you could try to get it first, then create only if it's empty.

tensor-tang · 2018-03-19T11:29:53Z

paddle/fluid/operators/lrn_mkldnn_op.cc

+    dev_ctx.SetBlob(key_workspace_memory, workspace_memory);
+
+    auto forward_op = mkldnn::lrn_forward{*forward_pd, *src_memory,
+                                          *workspace_memory, dst_memory};


BTW, this memory and function is used only in training phase, maybe you should add one TODO to optimize it when you can get phase, as a reminder.

@tensor-tang, good point! Do you know how to discover how to check whether we are in training phase or scoring phase?

You can following the same way of batch_norm_op, which uses is_test:

Paddle/paddle/fluid/operators/batch_norm_op.cc

Lines 175 to 182 in 0821ee7

if (!is_test) {

// saved_xx is use just in this batch of data

EigenVectorArrayMap<T> saved_mean_e(

saved_mean->mutable_data<T>(ctx.GetPlace()), C);

EigenVectorArrayMap<T> saved_variance_e(

saved_variance->mutable_data<T>(ctx.GetPlace()), C);

saved_mean_e.setZero();

saved_variance_e.setZero();

@luotao1 @tensor-tang, I do see your point but in case of LRN is_test would become a part of user interface not only for MKLDNN LRN, but also for other implementations of LRN. Do you think adding such attribute would be justified for other implementations of LRN operator?

You could make such case for other operators as well. We could end up in the situation when each operator sets its own local is_test attribute to maintain information about phase, when this information is in fact global (global as being maintain on the network level).

Would it make more sense to maintain phase information in network executor?

I could add TODO comment saying that the blob should not be allocated when PaddlePaddle is in testing phase.

Do you think adding such attribute would be justified for other implementations of LRN operator

Yes, "is_test" is also suited for other implementation of LRN operators.

You could make such case for other operators as well. We could end up in the situation when each operator sets its own local is_test attribute to maintain information about phase, when this information is in fact global (global as being maintain on the network level). Would it make more sense to maintain phase information in network executor?

Now, framework/prune.cc use this attribute in inference phase.

Paddle/paddle/fluid/framework/prune.cc

Lines 187 to 197 in 0821ee7

void inference_optimize_impl(proto::ProgramDesc* input, int block_id) {

auto* op_field = input->mutable_blocks(block_id)->mutable_ops();

for (auto& op_desc : *op_field) {

for (auto& attr : *op_desc.mutable_attrs()) {

if (attr.name() == "is_test") {

attr.set_b(true);

break;

}

}

}

}

I could add TODO comment saying that the blob should not be allocated when PaddlePaddle is in testing phase.

Yes, you can either add TODO comment or fix in this PR.

@luotao1 thanks for your reply. I will add the attribute to the operator.

But would it be more useful to maintain testing/training information on the network level, instead of keeping on the operator level?

Now We maintain testing/training information on network level as well.

Paddle/python/paddle/fluid/framework.py

Lines 959 to 982 in 0821ee7

def clone(self, for_test=False):

"""Clone the Program object

Set for_test to False when we want to clone the program for training.

Set for_test to True when we want to clone the program for testing.

Args:

for_test(bool): Some operators, such as batch_norm and drop_out ops,

behave differently in training and testing. If for_test is True,

the is_test attributes in these operators will be set to True for

testing purposes, otherwise, they remain unchanged.

Returns(Program):

The cloned Program object.

"""

p = Program()

if for_test:

p.desc = core.inference_optimize(self.desc)

else:

p.desc = core.ProgramDesc(self.desc)

p.blocks = [Block(p, i) for i in xrange(self.desc.num_blocks())]

p.sync_with_cpp()

p.copy_param_info_from(self)

return p

But if operator level doesn't have is_test information, how could this op behave differently in test or train phase?

luotao1

LGTM

tpatejko · 2018-03-22T12:48:01Z

@luotao1, @tensor-tang

I just added some changes regarding is_test attribute and unittest for is_test attribute. Could it be also merged?

tpatejko requested a review from tensor-tang March 15, 2018 11:10

luotao1 added the Intel label Mar 15, 2018

luotao1 reviewed Mar 16, 2018

View reviewed changes

Tomasz Patejko added 3 commits March 19, 2018 05:57

Implementation of MKLDNN LRN

192cc5d

Content of GetExpectedKernelType moved to standalone function

c51c446

Removing WITHIN_CHANNEL algorithm for lrn. CPU lrn operator works onl…

2d95527

…y with ACROSS_CHANNELS

tensor-tang reviewed Mar 19, 2018

View reviewed changes

Device blobs are created only in training. Added testing attribute

72cc64e

luotao1 approved these changes Mar 22, 2018

View reviewed changes

luotao1 merged commit e027eb4 into PaddlePaddle:develop Mar 22, 2018

tpatejko mentioned this pull request Mar 22, 2018

Improvements for MKLDNN LRN #9329

Merged

luotao1 added this to Done in Intel Optimization on Fluid May 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of MKLDNN LRN #9123

Implementation of MKLDNN LRN #9123

tpatejko commented Mar 15, 2018

CLAassistant commented Mar 15, 2018 •

edited

Loading

luotao1 Mar 16, 2018

tpatejko Mar 19, 2018

luotao1 Mar 16, 2018 •

edited

Loading

tpatejko Mar 19, 2018 •

edited

Loading

luotao1 commented Mar 19, 2018

tensor-tang Mar 19, 2018 •

edited

Loading

tpatejko Mar 19, 2018

luotao1 Mar 19, 2018

tensor-tang Mar 19, 2018 •

edited

Loading

tpatejko Mar 19, 2018

tensor-tang Mar 19, 2018

tpatejko Mar 19, 2018 •

edited

Loading

tensor-tang Mar 20, 2018 •

edited

Loading

tensor-tang Mar 19, 2018

tpatejko Mar 19, 2018

luotao1 Mar 19, 2018

tpatejko Mar 19, 2018 •

edited

Loading

luotao1 Mar 20, 2018

tpatejko Mar 21, 2018

luotao1 Mar 21, 2018

luotao1 left a comment

tpatejko commented Mar 22, 2018 •

edited

Loading

		@@ -87,5 +87,15 @@ def test_check_grad_normal(self):
		self.check_grad(['X'], 'Out', max_relative_error=0.01)


		class TestLRNMKLDNNOp(TestLRNOp):

	if (ctx->Attrs().Get<std::string>("pooltype") == "MAX") {
	PADDLE_ENFORCE(ctx->HasOutput("MaxIndex"),
	"Output(MaxIndex) of SequencePoolOp should not be null.");
	ctx->SetOutputDim("MaxIndex", ctx->GetInputDim("X"));
	}

	if (!is_test) {
	// saved_xx is use just in this batch of data
	EigenVectorArrayMap<T> saved_mean_e(
	saved_mean->mutable_data<T>(ctx.GetPlace()), C);
	EigenVectorArrayMap<T> saved_variance_e(
	saved_variance->mutable_data<T>(ctx.GetPlace()), C);
	saved_mean_e.setZero();
	saved_variance_e.setZero();

	void inference_optimize_impl(proto::ProgramDesc* input, int block_id) {
	auto* op_field = input->mutable_blocks(block_id)->mutable_ops();
	for (auto& op_desc : *op_field) {
	for (auto& attr : *op_desc.mutable_attrs()) {
	if (attr.name() == "is_test") {
	attr.set_b(true);
	break;
	}
	}
	}
	}

	def clone(self, for_test=False):
	"""Clone the Program object

	Set for_test to False when we want to clone the program for training.
	Set for_test to True when we want to clone the program for testing.

	Args:
	for_test(bool): Some operators, such as batch_norm and drop_out ops,
	behave differently in training and testing. If for_test is True,
	the is_test attributes in these operators will be set to True for
	testing purposes, otherwise, they remain unchanged.

	Returns(Program):
	The cloned Program object.
	"""
	p = Program()
	if for_test:
	p.desc = core.inference_optimize(self.desc)
	else:
	p.desc = core.ProgramDesc(self.desc)
	p.blocks = [Block(p, i) for i in xrange(self.desc.num_blocks())]
	p.sync_with_cpp()
	p.copy_param_info_from(self)
	return p

Implementation of MKLDNN LRN #9123

Implementation of MKLDNN LRN #9123

Conversation

tpatejko commented Mar 15, 2018

CLAassistant commented Mar 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luotao1 Mar 16, 2018 • edited Loading

Choose a reason for hiding this comment

tpatejko Mar 19, 2018 • edited Loading

Choose a reason for hiding this comment

luotao1 commented Mar 19, 2018

tensor-tang Mar 19, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tensor-tang Mar 19, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tpatejko Mar 19, 2018 • edited Loading

Choose a reason for hiding this comment

tensor-tang Mar 20, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tpatejko Mar 19, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luotao1 left a comment

Choose a reason for hiding this comment

tpatejko commented Mar 22, 2018 • edited Loading

CLAassistant commented Mar 15, 2018 •

edited

Loading

luotao1 Mar 16, 2018 •

edited

Loading

tpatejko Mar 19, 2018 •

edited

Loading

tensor-tang Mar 19, 2018 •

edited

Loading

tensor-tang Mar 19, 2018 •

edited

Loading

tpatejko Mar 19, 2018 •

edited

Loading

tensor-tang Mar 20, 2018 •

edited

Loading

tpatejko Mar 19, 2018 •

edited

Loading

tpatejko commented Mar 22, 2018 •

edited

Loading