Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run prefetch prog on server #9593

Merged
merged 7 commits into from
Apr 8, 2018

Conversation

Yancey1989
Copy link
Contributor

@Yancey1989 Yancey1989 commented Apr 3, 2018

Fixed #9577
tasks: #9211


VLOG(3) << "RequestPrefetch Process in";
executor_->Run(*program_, scope_, blkid_, false, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Process runs in a separated thread, executor_ may be accessed in different threads at the same time, don't know whether this is safe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the prefetch and optimize may happen at the same time, they will both access the lookup_table parameter.

I think the final solution may be that table optimization also be a separate thread, and the prefetch thread and update thread try to get the same lock.

Currently, we will run update operators withing optimize block, should find a way to a avoid the conflict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the prefetch and optimize may happen at the same time

May not, for the current process, prefetch request would happen before sending gradients. And there is a SEND BARRIER to make sure that optimize process would happen after prefetch request

Copy link
Contributor Author

@Yancey1989 Yancey1989 Apr 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think it's not threaded safe for the current implementation because we use one scope to create the output variable, if there are more then two prefetch request, the output variable would be replaced, and the serialize function would be failed.
Maybe a way to solve this is to use the different scope to create output var.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can create a sub_scope here to store the output variable.

Copy link
Contributor Author

@Yancey1989 Yancey1989 Apr 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But NewScope may also not thread safe...
Maybe another way is to create multiple output vars with the different suffix such as out_trainer0, out_trainer1 in Distributed transpiler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discuss with @Yancey1989, we decided to use NewScope to run each Process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jacquesqiao jacquesqiao mentioned this pull request Apr 3, 2018
15 tasks
@jacquesqiao jacquesqiao added this to In progress in distributed lookup table Apr 3, 2018

void InitTensorsInScope(framework::Scope &scope, platform::CPUPlace &place) {
auto w_var = scope.Var("w");
auto w = w_var->GetMutable<framework::LoDTensor>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W should be SelectedRows

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return block;
}

void InitTensorsInScope(framework::Scope &scope, platform::CPUPlace &place) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should split InitTensorsInScope into InitTensorsInClientScope and InitTensorsInServerScope

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


VLOG(3) << "RequestPrefetch Process in";
executor_->Run(*program_, scope_, blkid_, false, false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need DeSerialize the Request into the current scope before running prefetch block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

auto* var = local_scope->FindVar(var_name);
InitializeVariable(var, var_desc->GetType());

executor_->Run(*program_, local_scope, blkid_, false, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If executor_ is the member of RequestPrefetch, it will be created every time the request is send to the server, it's expensive, can make it the member of the server instance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can prepare it before run.

Copy link
Contributor Author

@Yancey1989 Yancey1989 Apr 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type of executor_ is a pointer, so maybe we would create it for every request.

Also can prepare it before run

A good idea and I will do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -67,6 +67,8 @@ message VariableMessage {
bytes serialized = 8;
// selected_rows data
bytes rows = 9;
// prefetch var name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look up table block execution output variable name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems not updated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry...updated by the comments.

detail::RPCClient client;
client.AsyncPrefetchVariable("127.0.0.1:8889", ctx, scope, in_var_name,
out_var_name);
client.Wait();

// auto out_var = scope.Var(out_var_name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@jacquesqiao jacquesqiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Yancey1989 Yancey1989 merged commit be85385 into PaddlePaddle:develop Apr 8, 2018
distributed lookup table automation moved this from In progress to Done Apr 8, 2018
@Yancey1989 Yancey1989 deleted the prefech_prog_on_server branch April 8, 2018 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants