Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send recv op #5520

Merged
merged 32 commits into from
Nov 28, 2017
Merged

Send recv op #5520

merged 32 commits into from
Nov 28, 2017

Conversation

typhoonzero
Copy link
Contributor

@typhoonzero typhoonzero commented Nov 9, 2017

Implement by this design

Still work in progress, enable distributed training using gRPC.

  • RPC communication implement separated in detail, so we can switch to other libs quickly.
  • send_op and recv_op
  • unit test with c++

TODO in later PRs:

  • add remoteoptimizer and unit test
  • add benchmark
  • add queue op to buffer tensor send and recv

@typhoonzero typhoonzero changed the title [WIP] Send recv op Send recv op Nov 16, 2017
@jacquesqiao
Copy link
Member

why choose grpc but not baidu-rpc

for (size_t len : lod_length[i]) {
level.push_back(level.back() + len);
}
}
}

void SerializeToStream(std::ostream &os, const LoDTensor &tensor,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@helinwang I think the serialize function can be put into the tensor_util.h. Which is a part of this PR. #5455
Did you change the viewpoint after reading the boost serialized reference manul?

Copy link
Contributor

@helinwang helinwang Nov 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dzhwinter Thanks for asking! It's ok if all the tensor types that we currently and in the future support will only be the LoDTensor (so we don't need polymorphism). Otherwise I think having serialization as an interface that every tensor type implements would be better.

Copy link
Contributor Author

@typhoonzero typhoonzero Nov 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting non-member functions in separated source files seems good. Also, we may need to serialize SelectedRows in the future, so putting serialize functions in util is file, at this point I agree with @helinwang to have "serialization as an interface that every tensor type implements". Would you mind please update PR #5455 to add serialization functions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@helinwang
We need to support multiple types, LoDTensor, SelectedRows, LoDTensorArray, we can put it as a template free-function(global function).
And I think:

  1. polymorphism is not a real demand in application. When you do a serialization, you will have an object with a proper type. You can use a reference instead of a pointer. The class always provide sufficient information to serialize itself.
    http://www.boost.org/doc/libs/1_45_0/libs/serialization/doc/serialization.html

  2. Other popular libraries always serialized object through free function.
    Such as protobuf, msgpack.

MsgPack::Serializer serializer(socket);  
std::vector<std::unique_ptr<MsgPack::Element>> arrayWithoutElements, arrayWith3Elements;
arrayWith3Elements.push_back(MsgPack::Factory(true));
arrayWith3Elements.push_back(MsgPack__Factory(Array(std::move(arrayWithoutElements))));
arrayWith3Elements.push_back(MsgPack::Factory("Hello World!"));  
serializer << MsgPack__Factory(Array(std::move(arrayWith3Elements)));

MsgPack::Deserializer deserializer(socket);  
deserializer.deserialize([](std::unique_ptr<MsgPack::Element> parsed) {
    std::cout << "Parsed: " << *parsed << "\n";
    return false;
}, true);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we treat Tensor and the operators as a computing library, then we should use the free function. --- Just treat Tensor as a third-party library.
@typhoonzero @helinwang

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. treat Tensor and the operators as a computing library, then we'll have to describe the data inside each type to be serialized. We are using xxxDesc for now, eg.
    message TensorDesc {
      required DataType data_type = 1;
      repeated int64 dims = 2; // [UNK, 640, 480] is saved as [-1, 640, 480]
    }
    
  2. use the desc and the data pointers is enouph for a "Free Function" to serialize it.

ServerBuilder builder;
builder.AddListeningPort(server_address, grpc::InsecureServerCredentials());
builder.RegisterService(service.get());
// rpc_server.reset(new Server(builder.BuildAndStart());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No comment lines please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@typhoonzero
Copy link
Contributor Author

@jacquesqiao I made the RPC implementation in separate place, so it'll easy to switch. I'll add a benchmark for training job metric like throughput (model size), since brpc's benchmark is mainly about small queries.

Copy link
Contributor

@helinwang helinwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Did you get a chance to profile the serialization speed of protobuf string? (used in tensor serialization).

* You can pass ofstream or ostringstream to serilize to file
* or to a in memory string. GPU tensor will be copied to CPU.
*/
void SerializeToStream(std::ostream& os, const LoDTensor& tensor,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For discussion: consider make SerializeToStream and DeserializeFromStream member functions for LoDTensor. Reason stated here: https://github.com/PaddlePaddle/Paddle/pull/5520/files#r151577859
CC: @dzhwinter

namespace operators {
namespace detail {

Status SendRecvServerImpl::InitVariables(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for initializing variables? I think send/recv should not have anything to do with initing the variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Will remove.

namespace operators {
namespace detail {

bool RPCClient::InitVariables(const framework::Scope& scope,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for initializing variables? I think send/recv should not have anything to do with initing the variable.

namespace operators {
namespace detail {

bool RPCClient::InitVariables(const framework::Scope& scope,
Copy link
Contributor

@helinwang helinwang Nov 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming of RPCClient and SendRecvServerImpl does not match as a pair, and maybe the name should state that they are related to send/recv, so maybe something like sendImpl / recvImpl?

// TODO(typhoonzero): desirealize in_tensor and run pserver network.
std::istringstream iss(in_var->serialized());
framework::DeserializeFromStream(iss, &t);
lodtensor_queue_.Push(std::move(t));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding each send/recv is for connecting one edge in the graph between nodes, why a queue (for multiple values) is necessary?

namespace paddle {
namespace operators {

void RunServer(Server **rpc_server,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does currently we start one server per recv OP? There could be easily tens or even hundreds of send/recv OP pairs, maybe we should only start only one server. E.g., the send/recv queue is an argument for the constructor of send/recv OP, and there will be only one single instance of send recv server.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be done in next PR. From my current consideration, we'll need a RemoteOptimizer interface which will do this work, it will create server-side program and put it in recv op as a member block to run.

Copy link
Collaborator

@wangkuiyi wangkuiyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid that this PR is too big and could be slow to merge. It looks to me that many pieces, e.g., the introduction of grpc, could be a separate PR.

.clang-format Outdated
@@ -24,5 +24,8 @@ Standard: Cpp11
AllowAllParametersOfDeclarationOnNextLine: true
BinPackParameters: false
BinPackArguments: false
---
Language: Proto
# Don't format .proto files.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not format .proto file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed these lines. I'm not sure why my previous style check won't pass.

@@ -133,6 +133,8 @@ include(external/any) # download libn::any
include(external/eigen) # download eigen3
include(external/pybind11) # download pybind11
include(external/nccl)
include(external/cares)
include(external/grpc)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vaguely remember that you said you want to try brpc? @typhoonzero

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Will add benchmark in next PR to decide which to use. That won't affect current code structure.

@@ -467,3 +467,43 @@ function(py_test TARGET_NAME)
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR})
endif()
endfunction()


function(grpc_library TARGET_NAME)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a comment for this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# See the License for the specific language governing permissions and
# limitations under the License.
#

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are cares and grpc needed in all kinds of builds of Paddle? I just suggest to add an if statement here to return early if they are not needed, or at least add the following statements to avoid the failing of building for mobile:

IF(MOBILE_INFERENCE)
    return()
ENDIF()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Will add. Thanks for reminding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@dzhwinter dzhwinter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prototype has few difference with our design doc. @helinwang https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/refactor/parameter_server.md

  1. This PR split the role of trainer and pserver, which should be isomorphic.
  2. The SimpleBlockingQueue needs to support the asynchronized Send operatoration, just like the rendezvous in tensorflow.
  3. The RPC service only has one SendVariable interface, can it satisfy the unblocking function call such as Send?
  4. EmptyVariable in grpc seems ugly, do we have better method improve it?
    @helinwang

@dzhwinter
Copy link
Contributor

Since this prototype is separated with our main develop branch, I think we can merge this PR and enhanced the implementation later. BTW, the conflicts. @typhoonzero

@typhoonzero
Copy link
Contributor Author

This PR split the role of trainer and pserver, which should be isomorphic.

No. This is just basic implementation of send/recv op. We must have higher level wrappers for how to use it, like in the unit test's code. I'm afraid that the high-level API must split roles of trainer and pserver for simpler API.

@dzhwinter
Copy link
Contributor

ok. I see.

Copy link
Contributor

@dzhwinter dzhwinter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants