Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design for remote parameter updater using cclient #2330

Closed
wants to merge 16 commits into from

Conversation

jacquesqiao
Copy link
Member

@jacquesqiao jacquesqiao commented Jun 1, 2017

refs: #2274

测试方法:
1,编译。
2,编译完成之后运行 go run ./go/cmd/pserver/pserver.go -port 3000 会在本地启动一个parameter server,监听3000端口。
3,python paddle/go/pserver/cclient/test/test_train.py

@jacquesqiao jacquesqiao changed the title init design for remote parameter updater using cclient Design for remote parameter updater using cclient Jun 1, 2017

Remote Parameter Updater manage parameters through remote parameter server with the client that communicate with pserver([The Client Library of Parameter Server Design Doc](pserver_client.md))

In PaddlePaddle Python V2 API, trainer is implemented in python, and the trainer will hold a member of parameter updater and call it's functions directly. In this design, we should also expose the api of RemoteParameterUpdater to python with swig.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hold a member -> hold the instance

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


Remote Parameter Updater manage parameters through remote parameter server with the client that communicate with pserver([The Client Library of Parameter Server Design Doc](pserver_client.md))

In PaddlePaddle Python V2 API, trainer is implemented in python, and the trainer will hold a member of parameter updater and call it's functions directly. In this design, we should also expose the api of RemoteParameterUpdater to python with swig.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should -> we will
感觉“we will"的语气更加确定,更合适design doc。"we should"在英语里我理解是我们应当做(但不一定立刻会做)。

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@helinwang
Copy link
Contributor

helinwang commented Jun 1, 2017

CI failed, probably need to use pre commit hook.

Copy link
Contributor

@dzhwinter dzhwinter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 such a fast implement!

@@ -9,5 +9,14 @@ project(cxx_go C Go)
include(golang)
include(flags)

go_library(client STATIC)
go_library(paddle_cclient STATIC)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe named with pserver_cclient will be better? since it is only used by the new trainer module and communicate with parameter server. paddle_cclient would lead to an inference client at first glance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spend a lot of time working on CMake to make it work properly.@@

This is a good suggestion, I will rename it~


// create parameter server client.
parameterClient_ =
paddle_new_pserver_client((char *)pserverSpec_.c_str(), FLAGS_trainer_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a little different with current RemoteUpdater here. we don't have a trainer id in paddle cloud.

LOG(INFO) << "finishBatch in, cost: " << cost;

// send gradient to parameter server.
paddle_send_grads(parameterClient_, *newGradients_, parameterSize());
Copy link
Contributor

@dzhwinter dzhwinter Jun 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variable cost is useless here. cannot find anywhere update the newGradients_

Copy link
Contributor

@helinwang helinwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall great!

@@ -32,17 +32,21 @@ function(GO_LIBRARY NAME BUILD_TYPE)
# will use the local changes in Paddle rather than checkout Paddle
# in github.
add_custom_target(copyPaddle
COMMAND ln -sf ${PADDLE_DIR} ${PADDLE_IN_GOPATH})
COMMAND rm -rf ${PADDLE_IN_GOPATH}/Paddle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange, -f in ln command force the link, so we probably don't need to remove the old link.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个地方我换了一种link方式,在最后面指定了 目录名为Paddle,因为目前的做法,如果paddle所在目录改名字了,link的名字也会变,这样 go get就找不到目录了。
然后修改之后,在mac上两次编译会报Operation not permitted,因为他试图在Paddle这个链接下再建一个链接,循环了。

${CMAKE_GO_FLAGS} ${GO_SOURCE}
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR})

add_custom_target(${NAME} ALL DEPENDS ${OUTPUT_DIR}/.timestamp ${ARGN})
add_dependencies(${NAME} goGet)
add_dependencies(${NAME} copyPaddle)
# add_dependencies(${NAME} goGet)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It need to depend on go get to get all dependencies, are there anything wrong with this line?


// create parameter server client.
parameterClient_ =
paddle_new_pserver_client((char *)pserverSpec_.c_str(), FLAGS_trainer_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paddle_new_pserver_client in cclient does not take any argument. I am curious how does this compile?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/PaddlePaddle/Paddle/blob/develop/go/pserver/cclient/cclient.go#L103

func paddle_new_pserver_client(addrs *C.char, selected int) C.client {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's new, sorry I got confused.

paddle_get_params(
parameterClient_, names_, newParameters_, (int)parameters_.size());
}
paddle_finish_init_params(parameterClient_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only the trainer that inits params should call paddle_finish_init_params. Please put this line to line 56.
Please see the graph in the linked section for reference: https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/cluster_train/pserver_client.md#trainer-selection-process

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok~

num_passes, use_sparse_updater)
else:
parameter_updater = self.__create_new_remote_updater__(
pserver_spec)
Copy link
Contributor

@helinwang helinwang Jun 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cclient don't need pserver spec (the only required spec is parameter config), please see: https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/cluster_train/pserver_client.md#trainer-selection-process
Maybe you have an outdated version of cclient?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the current code may be old here?
https://github.com/PaddlePaddle/Paddle/blob/develop/go/pserver/cclient/cclient.go#L103

func paddle_new_pserver_client(addrs *C.char, selected int) C.client {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's new, sorry I got confused.

@jacquesqiao
Copy link
Member Author

closed and open a new pr #2413

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants