Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fea/infer executor #13451

Merged
merged 33 commits into from Sep 28, 2018
Merged

Conversation

Superjomn
Copy link
Contributor

  • add NaiveExecutor
  • Add ZeroCopyTensor to AnalysisPredictor

@CLAassistant
Copy link

CLAassistant commented Sep 22, 2018

CLA assistant check
All committers have signed the CLA.

@@ -36,8 +36,8 @@ namespace memory {
using BuddyAllocator = detail::BuddyAllocator;

BuddyAllocator* GetCPUBuddyAllocator() {
static std::once_flag init_flag;
static detail::BuddyAllocator* a = nullptr;
static thread_local std::once_flag init_flag;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一部分的可能需要恢复下。

@@ -49,18 +49,13 @@ int64_t GetEagerDeletionThreshold() {
Scope::~Scope() { DropKids(); }

Scope& Scope::NewScope() const {
std::unique_lock<std::mutex> lock(mutex_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scope 锁这一块,目前看起来好像跟这块关系不太大吧。

}

for (size_t i = 0; i < inputs.size(); ++i) {
framework::LoDTensor input;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一部分还是以前的方式?需要更换掉吧。

test=develop
@@ -141,12 +141,15 @@ cc_library(lod_rank_table SRCS lod_rank_table.cc DEPS lod_tensor)

cc_library(feed_fetch_method SRCS feed_fetch_method.cc DEPS lod_tensor scope glog)

cc_library(naive_executor SRCS naive_executor.cc DEPS op_registry device_context scope framework_proto glog lod_rank_table feed_fetch_method graph_to_program_pass)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

144行应该放在if (NOT WITH_DISTRIBUTE),即151行后面吧。对分布式不影响。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

即使是 WITH_DISTRIBUTE的情况,也是需要跑inference的单测的,那个需要naive_executor。

if(WITH_DISTRIBUTE)
cc_library(executor SRCS executor.cc DEPS op_registry device_context scope framework_proto glog lod_rank_table feed_fetch_method sendrecvop_grpc cares grpc++_unsecure grpc_unsecure gpr graph_to_program_pass)
set(DISTRIBUTE_COMPILE_FLAGS "-Wno-non-virtual-dtor -Wno-error=non-virtual-dtor -Wno-error=delete-non-virtual-dtor")
set_source_files_properties(executor.cc PROPERTIES COMPILE_FLAGS ${DISTRIBUTE_COMPILE_FLAGS})
else()
cc_library(executor SRCS executor.cc DEPS op_registry device_context scope framework_proto glog lod_rank_table feed_fetch_method graph_to_program_pass)
cc_test(test_naive_executor SRCS naive_executor_test.cc DEPS naive_executor op_registry device_context scope framework_proto glog lod_rank_table feed_fetch_method graph_to_program_pass elementwise_add_op)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可简化为
cc_test(test_naive_executor SRCS naive_executor_test.cc DEPS naive_executor elementwise_add_op

* Simple, intuitive and effective. Only single thread is supported, and
* currently designed for inference.
*/
class NaiveExecutor {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NaiveExecutor是否取名为InferenceExecutor更合理

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不一定是 inference 的。

@@ -214,7 +271,229 @@ TEST(Analyzer_rnn1, multi_thread) {

std::vector<std::vector<PaddleTensor>> input_slots_all;
SetInput(&input_slots_all);
TestPrediction(cfg, input_slots_all, &outputs, 4 /* num_threads */);
TestPrediction(cfg, input_slots_all, &outputs, FLAGS_num_threads);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里要改成4,不然ci默认FLAGS_num_threads=1,测不出多线程的效果。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

至少需要能手动控制下,ci上多线程其实没有啥作用,目前没有clone

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

手动控制可以用TEST(Analyzer_rnn1, profile)设置,ci上多线程还是有必要的,之前多线程会跑不起来。

cell_init_tensor->mutable_data<float>(PaddlePlace::kCPU));
std::copy_n(zeros.begin(), zeros.size(),
hidden_init_tensor->mutable_data<float>(PaddlePlace::kCPU));
ZeroCopyTensorAssignData(data_tensor, one_batch.rnn_link_data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

之后所有的单侧都要用ZeroCopyTensorAssignData来fill么

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是 ZeroCopy 的fill,如果不用 ZeroCopyTensor 的话应该不需要,用原来的就可以了。
目前接口还没有稳定,后面再看看

@@ -1,6 +1,6 @@
cc_library(ir_pass_manager SRCS ir_pass_manager.cc DEPS graph pass)
set(analysis_deps
framework_proto proto_desc ir_pass_manager graph pass paddle_fluid_api executor pretty_log)
framework_proto proto_desc ir_pass_manager graph pass paddle_fluid_api executor pretty_log)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个缩进没必要。

@@ -53,7 +53,7 @@ if(NOT APPLE)
endif()

if(WITH_TESTING)
# tests/book depends the models that generated by python/paddle/fluid/tests/book
# tests/book depends the models that generated by python/paddle/fluid/tests/book
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没必要缩进

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后续pr再改吧

if(WITH_MKLDNN)
pass_library(conv_relu_mkldnn_fuse_pass inference)
endif()
if (WITH_MKLDNN)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个空格没必要,下同。

inference_program_ = paddle::inference::Load(
executor_.get(), scope_.get(), config_.prog_file, config_.param_file);
if (!program) {
if (!LoadProgramDesc()) return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里if 格式需要改下。

}

// TODO(panyx0718): Init LoDTensor from existing memcpy to save a copy.
std::memcpy(static_cast<void *>(input_ptr), inputs[i].data.data(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里没有用zerocopy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这是老接口,需要传入 PaddleTensor向前兼容,zerocopy是通过 ZeroCopyTensor 暴露给用户的,是另外一套接口

output->data.Resize(num_elems * sizeof(T));
// The fetched tensor output by fetch op, should always in CPU memory, so just
// copy.
memcpy(output->data.data(), data, num_elems * sizeof(T));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

if (config_._use_mkldnn) {
executor_->EnableMKLDNN(*inference_program_);
// get fetch variable
if (!GetFetch(output_data, scope)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里去掉config_._use_mkldnn的话,目前的mkldnn都跑不起来了,能否等EnableMKLDNNPass完善后,再把这个给去掉。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个pr之后按需加吧,优先保证这些feature合入到1.0

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM。一些功能后续pr完善

@Superjomn Superjomn merged commit c8744d1 into PaddlePaddle:develop Sep 28, 2018
@Superjomn Superjomn deleted the fea/infer-executor branch September 28, 2018 04:51
luotao1 pushed a commit to Superjomn/Paddle that referenced this pull request Sep 28, 2018
…addle#13451)

- add naive executor
- fix concurrency performance issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants