Merge mkldnn output grad #4759

tensor-tang · 2017-10-12T16:23:30Z

add merge output grad for branches
add gtest comparing with cpu net, only support forward yet

and add branch net comparing with cpu result

luotao1 · 2017-10-13T08:53:18Z

paddle/gserver/layers/MKLDNNConvLayer.cpp

      cpuOutVal_ = out;
    }
+    // when output is cpu device, change the mkldnn output value and make they


luotao1 · 2017-10-13T09:06:59Z

paddle/gserver/layers/MKLDNNLayer.h

+   * and reset the merge grad primitive if needed.
+   * note: when this layer have serval output,
+   *       do not support mixing with cpu device,
+   *       because can not get memory desc from cpu device.


when this layer has serval outputs, it could not be mixed with cpu device, since it can not get memory desc from cpu device.

luotao1 · 2017-10-13T09:08:55Z

paddle/gserver/layers/MKLDNNLayer.h

+
+    auto sumPD = mkldnn::sum::primitive_desc(
+        tmpOutGrad_->getMemoryDesc(), scales, srcPDs);
+    mergeGrad_.reset(new mkldnn::sum(sumPD, srcs, *tmpOutGrad_));


这里是调用sum这个接口来merge grad，对么？

是的，调用mkldnn::sum来merge

luotao1 · 2017-10-13T09:10:42Z

paddle/gserver/layers/MKLDNNLayer.h

+    if (outputMap_.size() <= 1) {
+      return;
+    }
+    std::vector<double> scales;


这里scale可以初始化为1.0，不需要后面一个个push了吧。我看都是1.0

luotao1 · 2017-10-13T09:18:55Z

paddle/trainer/tests/CMakeLists.txt

+          ${CMAKE_CURRENT_BINARY_DIR}/test_CompareMKLDNNandCPU
+              --config_file_a=trainer/tests/sample_trainer_config_branch_net.conf --use_mkldnn_a=True
+              --config_file_b=trainer/tests/sample_trainer_config_branch_net.conf --use_mkldnn_b=False
+              --use_gpu=False


后面是否还会新加对比网络的测试呢？
这里的COMMAND能在CMake里封装下，从而简化下45-56行么？

test_CompareMKLDNNandCPU --config_file=trainer/tests/sample_trainer_config_branch_net.conf

现在是想着按照种类分，有分支和没有分支的，再加layer就直接改conf文件内容就可以了。

确实可以简化下，done。

luotao1

LGTM

tensor-tang added 3 commits October 12, 2017 10:35

enable merge output grad of mkldnn

6715bea

share mkldnn output value data if next layer is cpu device

698071c

fix bug: merge grad must before backward act.

e195485

and add branch net comparing with cpu result

tensor-tang requested a review from luotao1 October 13, 2017 08:12

tensor-tang added this to Doing in Optimization on Intel Platform Oct 13, 2017

tensor-tang changed the title ~~[WIP] Merge mkldnn output grad~~ Merge mkldnn output grad Oct 13, 2017

Merge remote-tracking branch 'upstream/develop' into merge_grad

59ccb01

luotao1 reviewed Oct 13, 2017

View reviewed changes

simplify some comments and code

7a7c8fd

luotao1 approved these changes Oct 15, 2017

View reviewed changes

luotao1 merged commit 17b4cea into PaddlePaddle:develop Oct 15, 2017

tensor-tang moved this from Doing to Done in Optimization on Intel Platform Oct 16, 2017

tensor-tang deleted the merge_grad branch October 16, 2017 01:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge mkldnn output grad #4759

Merge mkldnn output grad #4759

tensor-tang commented Oct 12, 2017 •

edited

Loading

luotao1 Oct 13, 2017

tensor-tang Oct 13, 2017

luotao1 Oct 13, 2017

tensor-tang Oct 13, 2017

luotao1 Oct 13, 2017

tensor-tang Oct 13, 2017

luotao1 Oct 13, 2017

tensor-tang Oct 13, 2017

luotao1 Oct 13, 2017

tensor-tang Oct 13, 2017

luotao1 left a comment

Merge mkldnn output grad #4759

Merge mkldnn output grad #4759

Conversation

tensor-tang commented Oct 12, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luotao1 left a comment

Choose a reason for hiding this comment

tensor-tang commented Oct 12, 2017 •

edited

Loading