New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize while_op for test #14764
Optimize while_op for test #14764
Conversation
26a1cc8
to
f377755
Compare
test=develop
f377755
to
c3963f6
Compare
846b535
to
9a6ee11
Compare
test=develop
9a6ee11
to
6c76cdd
Compare
test=develop
f443de4
to
d18af8b
Compare
2ae5cc3
to
ac191ba
Compare
… core_opt_while_op test=develop
test=develop
test=develop
Met following linking error in PR_CI_python35, see building log:
|
9669bbf
to
b52b770
Compare
The following unittests failed in CI:
It is fixed in #15222 . |
7880101
to
d84fb06
Compare
test=develop
d84fb06
to
ad927fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for cmake part.
auto ¤t_scope = scope.NewScope(); | ||
step_scopes->push_back(¤t_scope); | ||
executor.RunPreparedContext(ctx_ptr, ¤t_scope, false, true, true); | ||
if (is_test) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
executor.RunPreparedContext(ctx.get(), ¤t_scope, false, true, true); | ||
if (is_test) { | ||
scope.DeleteScope(¤t_scope); | ||
executor.CreateVariables(*program, ¤t_scope, block->ID()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why inference doesn't need to create variables but train need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Paddle/paddle/fluid/framework/executor.cc
Lines 413 to 423 in 5d9edb4
void Executor::RunPreparedContext(ExecutorPrepareContext* ctx, Scope* scope, | |
bool create_local_scope, bool create_vars, | |
bool keep_kids) { | |
PADDLE_ENFORCE_NOT_NULL(scope); | |
Scope* local_scope = scope; | |
if (create_vars) { | |
if (create_local_scope) { | |
local_scope = &scope->NewScope(); | |
} | |
CreateVariables(ctx->prog_, local_scope, ctx->block_id_); | |
} |
When create_vars
is set to true
, variables will be created in RunPreparedContext
. Because the step scopes will be used in while_grad_op
, so all the step scopes should be saved. So that if there is 100 iterations, CreatedVariables
will be called 100 times.
Paddle/paddle/fluid/operators/controlflow/while_op.cc
Lines 84 to 86 in 5d9edb4
if (is_test) { | |
scope.DeleteScope(¤t_scope); | |
} |
For test, the step scopes is deleted at the end of the iteration. In fact, there is no need to new 100 step scopes and call CreateVariables
100 times. Instead, 1 time is enough. This can reduce the overhead a lot, from 17.8619ms
to 16.0768ms
.
namespace api { | ||
namespace details { | ||
|
||
static void CheckWhileOpInput(framework::ProgramDesc *program) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not "check", this is "change"
framework::Variable *ptr = scope->Var(context_var->Name()); | ||
framework::InitializeVariable(ptr, context_var->GetType()); | ||
|
||
auto *tensor = ptr->GetMutable<framework::LoDTensor>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this LoDTensor, not ExecutorPrepareContext
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to use ExecutorPrepareContext
directly, we need to add a new type to the var's type list. We need to do so much changes, first here:
Paddle/paddle/fluid/framework/framework.proto
Lines 105 to 135 in 5d9edb4
message VarType { | |
enum Type { | |
// Pod Types | |
BOOL = 0; | |
INT16 = 1; | |
INT32 = 2; | |
INT64 = 3; | |
FP16 = 4; | |
FP32 = 5; | |
FP64 = 6; | |
// Tensor<size_t> is used in C++. | |
SIZE_T = 19; | |
UINT8 = 20; | |
INT8 = 21; | |
// Other types that may need additional descriptions | |
LOD_TENSOR = 7; | |
SELECTED_ROWS = 8; | |
FEED_MINIBATCH = 9; | |
FETCH_LIST = 10; | |
STEP_SCOPES = 11; | |
LOD_RANK_TABLE = 12; | |
LOD_TENSOR_ARRAY = 13; | |
PLACE_LIST = 14; | |
READER = 15; | |
// Any runtime decided variable type is raw | |
// raw variables should manage their own allocations | |
// in operators like nccl_op | |
RAW = 17; | |
TUPLE = 18; | |
} |
Do you think that is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does RAW work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I try it and was trapped in complicated cmake dependent errors. The change leads to circle dependent in cmake. I'd like to delete these codes and try to fix this in another PR, is that OK?
executor_->CreateVariables(*inference_program_, | ||
sub_scope_ ? sub_scope_ : scope_.get(), 0); | ||
framework::Scope *scope = sub_scope_ != nullptr ? sub_scope_ : scope_.get(); | ||
inference::api::details::PrepareExecutor(inference_program_.get(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this live in a pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, it is not suitable to implement this in a pass, because we need to create ExecutorPrepareContext
for the final programs. These ExecutorPrepareContext
will be used in Predictor.Run()
directly.
In fact, I'm considering implementing this in
Paddle/paddle/fluid/framework/executor.cc
Lines 372 to 374 in 5d9edb4
std::unique_ptr<ExecutorPrepareContext> Executor::Prepare( | |
const ProgramDesc& program, int block_id, | |
const std::vector<std::string>& skip_ref_cnt_vars) { |
But I'm not sure it is stable enough, so implement in NativePredictor
first to verify that.
@@ -0,0 +1,33 @@ | |||
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for this file?
framework::Variable *ptr = scope->Var(context_var->Name()); | ||
framework::InitializeVariable(ptr, context_var->GetType()); | ||
|
||
auto *tensor = ptr->GetMutable<framework::LoDTensor>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does RAW work?
framework::TensorCopySync(*in, ctx.GetPlace(), out); | ||
framework::TensorCopy( | ||
*in, ctx.GetPlace(), | ||
ctx.template device_context<platform::DeviceContext>(), out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@velconia why was this copy sync?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is a copied from original implementation, but I guess Sync is NO need here.
…ile_op. test=develop
is_test
is set.Add in dispensable inputmove to another PR.ExecutorPrepareContext
inwhile_op
, so that the operators of this block can be created in advance.TensorCopy
instead ofTensorCopySync
inreshape_op
.compare_op
whennumel()
of input tensor is 1.