fea/infer executor #13451

Superjomn · 2018-09-18T03:06:02Z

add NaiveExecutor
Add ZeroCopyTensor to AnalysisPredictor

… fea/infer-executor

into fea/infer-executor

CLAassistant · 2018-09-22T10:26:35Z

All committers have signed the CLA.

… fea/infer-executor

tensor-tang · 2018-09-26T06:02:02Z

paddle/fluid/memory/malloc.cc

@@ -36,8 +36,8 @@ namespace memory {
 using BuddyAllocator = detail::BuddyAllocator;

 BuddyAllocator* GetCPUBuddyAllocator() {
-  static std::once_flag init_flag;
-  static detail::BuddyAllocator* a = nullptr;
+  static thread_local std::once_flag init_flag;


这一部分的可能需要恢复下。

tensor-tang · 2018-09-26T06:03:00Z

paddle/fluid/framework/scope.cc

@@ -49,18 +49,13 @@ int64_t GetEagerDeletionThreshold() {
 Scope::~Scope() { DropKids(); }

 Scope& Scope::NewScope() const {
-  std::unique_lock<std::mutex> lock(mutex_);


scope 锁这一块，目前看起来好像跟这块关系不太大吧。

tensor-tang · 2018-09-26T06:10:15Z

paddle/fluid/inference/api/analysis_predictor.cc

+  }
+
+  for (size_t i = 0; i < inputs.size(); ++i) {
+    framework::LoDTensor input;


这一部分还是以前的方式？需要更换掉吧。

test=develop

… fea/infer-executor

test=develop

into fea/infer-executor

test=develop

luotao1 · 2018-09-27T12:50:02Z

paddle/fluid/framework/CMakeLists.txt

@@ -141,12 +141,15 @@ cc_library(lod_rank_table SRCS lod_rank_table.cc DEPS lod_tensor)

 cc_library(feed_fetch_method SRCS feed_fetch_method.cc DEPS lod_tensor scope glog)

+cc_library(naive_executor SRCS naive_executor.cc DEPS op_registry device_context scope framework_proto glog lod_rank_table feed_fetch_method graph_to_program_pass)


144行应该放在if (NOT WITH_DISTRIBUTE)，即151行后面吧。对分布式不影响。

即使是 WITH_DISTRIBUTE的情况，也是需要跑inference的单测的，那个需要naive_executor。

luotao1 · 2018-09-27T12:50:34Z

paddle/fluid/framework/CMakeLists.txt

 if(WITH_DISTRIBUTE)
  cc_library(executor SRCS executor.cc DEPS op_registry device_context scope framework_proto glog lod_rank_table feed_fetch_method sendrecvop_grpc cares grpc++_unsecure grpc_unsecure gpr graph_to_program_pass)
  set(DISTRIBUTE_COMPILE_FLAGS "-Wno-non-virtual-dtor -Wno-error=non-virtual-dtor -Wno-error=delete-non-virtual-dtor")
  set_source_files_properties(executor.cc PROPERTIES COMPILE_FLAGS ${DISTRIBUTE_COMPILE_FLAGS})
 else()
  cc_library(executor SRCS executor.cc DEPS op_registry device_context scope framework_proto glog lod_rank_table feed_fetch_method graph_to_program_pass)
+  cc_test(test_naive_executor SRCS naive_executor_test.cc DEPS naive_executor op_registry device_context scope framework_proto glog lod_rank_table feed_fetch_method graph_to_program_pass elementwise_add_op)


可简化为
cc_test(test_naive_executor SRCS naive_executor_test.cc DEPS naive_executor elementwise_add_op

luotao1 · 2018-09-27T12:53:40Z

paddle/fluid/framework/naive_executor.h

+ * Simple, intuitive and effective. Only single thread is supported, and
+ * currently designed for inference.
+ */
+class NaiveExecutor {


NaiveExecutor是否取名为InferenceExecutor更合理

不一定是 inference 的。

luotao1 · 2018-09-27T13:12:35Z

paddle/fluid/inference/tests/api/analyzer_rnn1_tester.cc

@@ -214,7 +271,229 @@ TEST(Analyzer_rnn1, multi_thread) {

  std::vector<std::vector<PaddleTensor>> input_slots_all;
  SetInput(&input_slots_all);
-  TestPrediction(cfg, input_slots_all, &outputs, 4 /* num_threads */);
+  TestPrediction(cfg, input_slots_all, &outputs, FLAGS_num_threads);


这里要改成4，不然ci默认FLAGS_num_threads=1，测不出多线程的效果。

至少需要能手动控制下，ci上多线程其实没有啥作用，目前没有clone

手动控制可以用TEST(Analyzer_rnn1, profile)设置，ci上多线程还是有必要的，之前多线程会跑不起来。

luotao1 · 2018-09-27T13:14:49Z

paddle/fluid/inference/tests/api/analyzer_rnn1_tester.cc

+              cell_init_tensor->mutable_data<float>(PaddlePlace::kCPU));
+  std::copy_n(zeros.begin(), zeros.size(),
+              hidden_init_tensor->mutable_data<float>(PaddlePlace::kCPU));
+  ZeroCopyTensorAssignData(data_tensor, one_batch.rnn_link_data);


之后所有的单侧都要用ZeroCopyTensorAssignData来fill么

这个是 ZeroCopy 的fill，如果不用 ZeroCopyTensor 的话应该不需要，用原来的就可以了。
目前接口还没有稳定，后面再看看

tensor-tang · 2018-09-27T13:18:04Z

paddle/fluid/inference/analysis/CMakeLists.txt

@@ -1,6 +1,6 @@
 cc_library(ir_pass_manager SRCS ir_pass_manager.cc DEPS graph pass)
 set(analysis_deps
-    framework_proto proto_desc ir_pass_manager graph pass paddle_fluid_api executor pretty_log)
+        framework_proto proto_desc ir_pass_manager graph pass paddle_fluid_api executor pretty_log)


这个缩进没必要。

tensor-tang · 2018-09-27T13:19:59Z

paddle/fluid/inference/CMakeLists.txt

@@ -53,7 +53,7 @@ if(NOT APPLE)
 endif()

 if(WITH_TESTING)
-  # tests/book depends the models that generated by python/paddle/fluid/tests/book
+    # tests/book depends the models that generated by python/paddle/fluid/tests/book


没必要缩进

后续pr再改吧

tensor-tang · 2018-09-27T13:21:18Z

paddle/fluid/framework/ir/CMakeLists.txt

-if(WITH_MKLDNN)
-  pass_library(conv_relu_mkldnn_fuse_pass inference)
-endif()
+if (WITH_MKLDNN)


这个空格没必要，下同。

tensor-tang · 2018-09-27T13:24:02Z

paddle/fluid/inference/api/analysis_predictor.cc

-    inference_program_ = paddle::inference::Load(
-        executor_.get(), scope_.get(), config_.prog_file, config_.param_file);
+  if (!program) {
+    if (!LoadProgramDesc()) return false;


这里if 格式需要改下。

tensor-tang · 2018-09-27T13:26:38Z

paddle/fluid/inference/api/analysis_predictor.cc

+    }
+
+    // TODO(panyx0718): Init LoDTensor from existing memcpy to save a copy.
+    std::memcpy(static_cast<void *>(input_ptr), inputs[i].data.data(),


这里没有用zerocopy？

这是老接口，需要传入 PaddleTensor向前兼容，zerocopy是通过 ZeroCopyTensor 暴露给用户的，是另外一套接口

tensor-tang · 2018-09-27T13:27:29Z

paddle/fluid/inference/api/analysis_predictor.cc

+  output->data.Resize(num_elems * sizeof(T));
+  // The fetched tensor output by fetch op, should always in CPU memory, so just
+  // copy.
+  memcpy(output->data.data(), data, num_elems * sizeof(T));


luotao1 · 2018-09-27T13:32:20Z

paddle/fluid/inference/api/analysis_predictor.cc

-  if (config_._use_mkldnn) {
-    executor_->EnableMKLDNN(*inference_program_);
+  // get fetch variable
+  if (!GetFetch(output_data, scope)) {


这里去掉config_._use_mkldnn的话，目前的mkldnn都跑不起来了，能否等EnableMKLDNNPass完善后，再把这个给去掉。

这个pr之后按需加吧，优先保证这些feature合入到1.0

luotao1

LGTM。一些功能后续pr完善

…addle#13451) - add naive executor - fix concurrency performance issue

Superjomn added 15 commits September 11, 2018 07:46

update

ec0cddf

update

1f8f3e4

update

7be81ed

update

1f10465

finish coding

9b24dc7

update

409239c

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

8e83d13

… fea/infer-executor

clean code

96cb893

clean code

3ec4687

update

f976a33

Merge branch 'develop' into fea/infer-executor

d9b065d

code clean

76cf652

Merge branch 'fea/infer-executor' of https://github.com/Superjomn/Paddle

9f24533

into fea/infer-executor

update

fd421d6

update

3f275e8

fixed precision error

45dfdf9

Superjomn force-pushed the fea/infer-executor branch from bae8e7b to 45dfdf9 Compare September 25, 2018 03:35

Superjomn added 7 commits September 25, 2018 03:47

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

b5a376f

… fea/infer-executor

fix conflicts

76a5f7f

update

439afda

update

9f54699

update

5422d55

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

39c42cb

… fea/infer-executor

exp

3647712

tensor-tang reviewed Sep 26, 2018

View reviewed changes

Superjomn added 2 commits September 26, 2018 07:33

update

6b5bd44

update

435fa1f

test=develop

Superjomn force-pushed the fea/infer-executor branch from ff17bbe to 435fa1f Compare September 26, 2018 12:10

Superjomn added 6 commits September 26, 2018 12:15

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

fd69b67

… fea/infer-executor

code clean

828018d

test=develop

recover use_mkldnn flag

ee75ad3

test=develop

Merge branch 'fea/infer-executor' of https://github.com/Superjomn/Paddle

aee43f7

into fea/infer-executor

fix code style

4bdffdb

test=develop

code clean

f1b0fea

Superjomn force-pushed the fea/infer-executor branch from 8558dd3 to a0aa6d7 Compare September 27, 2018 02:21

recover code

af5c86c

test=develop

Superjomn force-pushed the fea/infer-executor branch from a0aa6d7 to af5c86c Compare September 27, 2018 08:37

add an if to make DeviceContextPool fine for concurrency

2001f9b

test=develop

luotao1 reviewed Sep 27, 2018

View reviewed changes

tensor-tang reviewed Sep 27, 2018

View reviewed changes

luotao1 reviewed Sep 27, 2018

View reviewed changes

luotao1 approved these changes Sep 27, 2018

View reviewed changes

Superjomn merged commit c8744d1 into PaddlePaddle:develop Sep 28, 2018

Superjomn deleted the fea/infer-executor branch September 28, 2018 04:51

luotao1 pushed a commit to Superjomn/Paddle that referenced this pull request Sep 28, 2018

fea/infer executor and concurrency performance issue bug fix (PaddleP…

952b11b

…addle#13451) - add naive executor - fix concurrency performance issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fea/infer executor #13451

fea/infer executor #13451

Superjomn commented Sep 18, 2018

CLAassistant commented Sep 22, 2018 •

edited

tensor-tang Sep 26, 2018

tensor-tang Sep 26, 2018

tensor-tang Sep 26, 2018

luotao1 Sep 27, 2018

Superjomn Sep 27, 2018

luotao1 Sep 27, 2018

luotao1 Sep 27, 2018

Superjomn Sep 27, 2018

luotao1 Sep 27, 2018

Superjomn Sep 27, 2018

luotao1 Sep 27, 2018

luotao1 Sep 27, 2018

Superjomn Sep 27, 2018

tensor-tang Sep 27, 2018

tensor-tang Sep 27, 2018

Superjomn Sep 27, 2018

tensor-tang Sep 27, 2018

tensor-tang Sep 27, 2018

tensor-tang Sep 27, 2018

Superjomn Sep 27, 2018

tensor-tang Sep 27, 2018

luotao1 Sep 27, 2018

Superjomn Sep 27, 2018

luotao1 left a comment

		@@ -141,12 +141,15 @@ cc_library(lod_rank_table SRCS lod_rank_table.cc DEPS lod_tensor)

		cc_library(feed_fetch_method SRCS feed_fetch_method.cc DEPS lod_tensor scope glog)

		cc_library(naive_executor SRCS naive_executor.cc DEPS op_registry device_context scope framework_proto glog lod_rank_table feed_fetch_method graph_to_program_pass)

fea/infer executor #13451

fea/infer executor #13451

Conversation

Superjomn commented Sep 18, 2018

CLAassistant commented Sep 22, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luotao1 left a comment

Choose a reason for hiding this comment

CLAassistant commented Sep 22, 2018 •

edited