New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fea/infer executor #13451
fea/infer executor #13451
Conversation
… fea/infer-executor
bae8e7b
to
45dfdf9
Compare
… fea/infer-executor
… fea/infer-executor
paddle/fluid/memory/malloc.cc
Outdated
@@ -36,8 +36,8 @@ namespace memory { | |||
using BuddyAllocator = detail::BuddyAllocator; | |||
|
|||
BuddyAllocator* GetCPUBuddyAllocator() { | |||
static std::once_flag init_flag; | |||
static detail::BuddyAllocator* a = nullptr; | |||
static thread_local std::once_flag init_flag; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这一部分的可能需要恢复下。
paddle/fluid/framework/scope.cc
Outdated
@@ -49,18 +49,13 @@ int64_t GetEagerDeletionThreshold() { | |||
Scope::~Scope() { DropKids(); } | |||
|
|||
Scope& Scope::NewScope() const { | |||
std::unique_lock<std::mutex> lock(mutex_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scope 锁这一块,目前看起来好像跟这块关系不太大吧。
} | ||
|
||
for (size_t i = 0; i < inputs.size(); ++i) { | ||
framework::LoDTensor input; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这一部分还是以前的方式?需要更换掉吧。
ff17bbe
to
435fa1f
Compare
… fea/infer-executor
test=develop
test=develop
test=develop
8558dd3
to
a0aa6d7
Compare
test=develop
a0aa6d7
to
af5c86c
Compare
@@ -141,12 +141,15 @@ cc_library(lod_rank_table SRCS lod_rank_table.cc DEPS lod_tensor) | |||
|
|||
cc_library(feed_fetch_method SRCS feed_fetch_method.cc DEPS lod_tensor scope glog) | |||
|
|||
cc_library(naive_executor SRCS naive_executor.cc DEPS op_registry device_context scope framework_proto glog lod_rank_table feed_fetch_method graph_to_program_pass) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
144行应该放在if (NOT WITH_DISTRIBUTE),即151行后面吧。对分布式不影响。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
即使是 WITH_DISTRIBUTE的情况,也是需要跑inference的单测的,那个需要naive_executor。
if(WITH_DISTRIBUTE) | ||
cc_library(executor SRCS executor.cc DEPS op_registry device_context scope framework_proto glog lod_rank_table feed_fetch_method sendrecvop_grpc cares grpc++_unsecure grpc_unsecure gpr graph_to_program_pass) | ||
set(DISTRIBUTE_COMPILE_FLAGS "-Wno-non-virtual-dtor -Wno-error=non-virtual-dtor -Wno-error=delete-non-virtual-dtor") | ||
set_source_files_properties(executor.cc PROPERTIES COMPILE_FLAGS ${DISTRIBUTE_COMPILE_FLAGS}) | ||
else() | ||
cc_library(executor SRCS executor.cc DEPS op_registry device_context scope framework_proto glog lod_rank_table feed_fetch_method graph_to_program_pass) | ||
cc_test(test_naive_executor SRCS naive_executor_test.cc DEPS naive_executor op_registry device_context scope framework_proto glog lod_rank_table feed_fetch_method graph_to_program_pass elementwise_add_op) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可简化为
cc_test(test_naive_executor SRCS naive_executor_test.cc DEPS naive_executor elementwise_add_op
* Simple, intuitive and effective. Only single thread is supported, and | ||
* currently designed for inference. | ||
*/ | ||
class NaiveExecutor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NaiveExecutor是否取名为InferenceExecutor更合理
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不一定是 inference 的。
@@ -214,7 +271,229 @@ TEST(Analyzer_rnn1, multi_thread) { | |||
|
|||
std::vector<std::vector<PaddleTensor>> input_slots_all; | |||
SetInput(&input_slots_all); | |||
TestPrediction(cfg, input_slots_all, &outputs, 4 /* num_threads */); | |||
TestPrediction(cfg, input_slots_all, &outputs, FLAGS_num_threads); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里要改成4,不然ci默认FLAGS_num_threads=1,测不出多线程的效果。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
至少需要能手动控制下,ci上多线程其实没有啥作用,目前没有clone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
手动控制可以用TEST(Analyzer_rnn1, profile)
设置,ci上多线程还是有必要的,之前多线程会跑不起来。
cell_init_tensor->mutable_data<float>(PaddlePlace::kCPU)); | ||
std::copy_n(zeros.begin(), zeros.size(), | ||
hidden_init_tensor->mutable_data<float>(PaddlePlace::kCPU)); | ||
ZeroCopyTensorAssignData(data_tensor, one_batch.rnn_link_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
之后所有的单侧都要用ZeroCopyTensorAssignData
来fill么
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个是 ZeroCopy 的fill,如果不用 ZeroCopyTensor 的话应该不需要,用原来的就可以了。
目前接口还没有稳定,后面再看看
@@ -1,6 +1,6 @@ | |||
cc_library(ir_pass_manager SRCS ir_pass_manager.cc DEPS graph pass) | |||
set(analysis_deps | |||
framework_proto proto_desc ir_pass_manager graph pass paddle_fluid_api executor pretty_log) | |||
framework_proto proto_desc ir_pass_manager graph pass paddle_fluid_api executor pretty_log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个缩进没必要。
@@ -53,7 +53,7 @@ if(NOT APPLE) | |||
endif() | |||
|
|||
if(WITH_TESTING) | |||
# tests/book depends the models that generated by python/paddle/fluid/tests/book | |||
# tests/book depends the models that generated by python/paddle/fluid/tests/book |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
没必要缩进
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
后续pr再改吧
if(WITH_MKLDNN) | ||
pass_library(conv_relu_mkldnn_fuse_pass inference) | ||
endif() | ||
if (WITH_MKLDNN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个空格没必要,下同。
inference_program_ = paddle::inference::Load( | ||
executor_.get(), scope_.get(), config_.prog_file, config_.param_file); | ||
if (!program) { | ||
if (!LoadProgramDesc()) return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里if 格式需要改下。
} | ||
|
||
// TODO(panyx0718): Init LoDTensor from existing memcpy to save a copy. | ||
std::memcpy(static_cast<void *>(input_ptr), inputs[i].data.data(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里没有用zerocopy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这是老接口,需要传入 PaddleTensor向前兼容,zerocopy是通过 ZeroCopyTensor 暴露给用户的,是另外一套接口
output->data.Resize(num_elems * sizeof(T)); | ||
// The fetched tensor output by fetch op, should always in CPU memory, so just | ||
// copy. | ||
memcpy(output->data.data(), data, num_elems * sizeof(T)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
if (config_._use_mkldnn) { | ||
executor_->EnableMKLDNN(*inference_program_); | ||
// get fetch variable | ||
if (!GetFetch(output_data, scope)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里去掉config_._use_mkldnn
的话,目前的mkldnn都跑不起来了,能否等EnableMKLDNNPass完善后,再把这个给去掉。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个pr之后按需加吧,优先保证这些feature合入到1.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM。一些功能后续pr完善
…addle#13451) - add naive executor - fix concurrency performance issue
NaiveExecutor