Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switching RPC library in new pserver implement #2602

Closed
typhoonzero opened this issue Jun 26, 2017 · 8 comments
Closed

Switching RPC library in new pserver implement #2602

typhoonzero opened this issue Jun 26, 2017 · 8 comments

Comments

@typhoonzero
Copy link
Contributor

typhoonzero commented Jun 26, 2017

Related issue: #1721

We are using net/rpc in our implementation currently. In fact, I found that when we writing a client for trainers to call from, we have to implement the c binding first, and then compile it to a shared object(so), and then use swig to export that library, and then using ctypes to import so lib from python. The procedure can be displayed below:

go pserver

As we can see, we need to implement so many things. The root cause is that net/rpc in go don't have c or python bindings, we have to convert go-written clients to c and then to python.

I would like to introduce a more simple and efficient way using grpc instead of net/rpc as below:

go pserver2

If we all switch to grpc and using python binding to implement the pserver client and master client, it will be much more simpler.


Some etcd3 python client bindings:

@jacquesqiao
Copy link
Member

The current master client does not use SWIG but directly import .so build with cgo.

@jacquesqiao
Copy link
Member

jacquesqiao commented Jun 26, 2017

有两条调用路径,一个是swig,这个目前其实绕不开,因为是trainer(cpp) <---> pserver(go) ,第二条路目前是cgo实现的.so,目前master -- client实际上就是做了一些rpc调用,所以如果直接有好用的python binding,client端就不需要写go了,想起来确实不错

@typhoonzero
Copy link
Contributor Author

trainer(cpp) <---> pserver(go) 这条路径也可以简化,由于grpc是有c++ binding的,所以可以直接用c++来调go的pserver接口,而不是先用cgo实现一个c的client,然后再调用。

@jacquesqiao
Copy link
Member

相当于换一个有多重语言binding的rpc框架,这样各个组件用自己的语言实现就行了,还可以随意切换实现的语言,赞

@dzhwinter
Copy link
Contributor

dzhwinter commented Jun 26, 2017

Just follow the related issue link, I found that here is a very close idea rpc issue
which can be summarized as three points.

  1. go language set up service
  2. use gRPC as rpc framework
  3. use flatbuffers as wire-format.

@helinwang
Copy link
Contributor

helinwang commented Jun 26, 2017

Thanks for the great suggestion! It's a very good point that we have a complicated code path due mixing C, C++, Python, Go together, we need to write glue code for language bindings. I would love to have our code reduced (fewer human written code fewer error).

This is a great idea. Using grpc which automatically generates language bindings, we can remove the glue code for RPC written by humans, replacing it with grpc's native Python code (as illustrated in the second graph from your comment).

In essence, Go in the client lib is removed, Python calls the remote server directly with grpc.

However, we need to consider this problem: The Go code in the client is not only net/rpc code, it also includes important logic. Which can not be replaced by grpc.

If we take a look at the master client code here it calls net/rpc, gets the RecordIO Index of the training data, uses Go RecordIO to read the training data according to the Index.

If we take a look at the pserver client code here and here, it has the logic to partition parameters into different pservers, and the logic to ask etcd if the current trainer will be the one trainer to init all parameters on pservers.

From the two paragraphs above, we can see that there are chunks of application logic in Go in the client, which grpc can not replace. So if we want to remove Go in the client, we will need to write these logics again in Python.

I am not fully convinced to do these logics in Python for the following reasons:

  • Python is not statically typed. If anything goes wrong, it's very hard to debug. This hurts development speed and maintainability.
  • If we want distributed training to support other language (e.g, C++) other than Python, and these logics were written in Python, we need to port them to C++ or Go.

I totally agree that we need the system to be as simple as possible, and it would awesome if we have a feasible solution reduce this complication. Would love to know your thoughts.

@jacquesqiao
Copy link
Member

jacquesqiao commented Jun 26, 2017

If we take a look at the master client code here it calls net/rpc, gets the RecordIO Index of the training data, uses Go RecordIO to read the training data according to the Index.

If we take a look at the pserver client code here and here, it has the logic to partition parameters into different pservers, and the logic to ask etcd if the current trainer will be the one trainer to init all parameters on pservers.

In my former think, if we use grpc, we can use it's client in native language such as cpp in pserver client and python in master client, but just like @helinwang 's comments, it's certainly a problem if we have written some library in go and want to use it in both server and client such as RecordIO. For lib like RecordIO, It seems we always need a glue and can not totally replace it with rpc language binding.

@typhoonzero
Copy link
Contributor Author

@jacquesqiao if implement pserver and master client in c++, there's no good choice for etcd client, so we have to call the etcd gRPC APIs directly(https://github.com/coreos/etcd/blob/master/Documentation/dev-guide/api_reference_v3.md).
This will make client library more complex and requires more code.

For pserver client, if using gRPC, it should be implemented using c++ so that the NewRemoteParameterUpdater will call it directly, because lack of etcd c++ binding, this work seem will not bring much benifit.

For master client, we can implement using python(recordio have python binding), so it will be simpler, but the client is so small that it will bring not buch benifit at all.

So after some thinking, I would prefer to continue using the current way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants