-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
c++20 coroutine based version is not as fast as the one using boost fiber? #3
Comments
Oh, I left out some information. I can get the comparable result with single thread. But the issue I mentioned is using multi-thread, maybe four with. Could u also provide a benchmark using multiple threads? |
Sure, although the maximum my machine can saturate are 2 threads :) |
Anyway, thanks for your work. I’m trying to adapt your work with libunifex recently since it’s a more light-weight solution. 😀 |
You're welcome. I might as well hack on libunifex a bit myself now :). Doesn't look much different from Boost.Asio, just lacking documentation |
just finished a part of adaptation here. |
wow that is so cool! |
I have updated the benchmarks in the README. They actually show better performance with C++20 coroutines compared to Boost.Coroutine on my Linux machine. Also note that if you are running benchmarks for 4 CPU servers then you really need to ensure that the server is fully exhausted. E.g. on my 12 core machine using 8 cores for the client and 4 for the server I am unable to do so:
Notice the |
For the upcoming version of asio-grpc I am planning to:
|
Thanks for the info. I’d like to reproduce with my machine. By the way, will repeatedly_request improve the performance? What if there’s not enough request calls to match incoming requests? |
Yes good question. I couldn't find any information on "how many outstanding request calls" there should be at a time. Actually I just tested and it seems that if there are multiple outstanding calls to struct ProcessRPC
{
using executor_type = agrpc::GrpcContext::executor_type;
agrpc::GrpcContext& grpc_context;
auto get_executor() const noexcept { return grpc_context.get_executor(); }
template <class RPCHandler>
void operator()(RPCHandler&& rpc_handler, bool ok)
{
if (!ok)
{
return;
}
auto args = rpc_handler.args();
auto response = std::allocate_shared<test::v1::Response>(grpc_context.get_allocator());
response->set_integer(21);
auto& response_ref = *response;
agrpc::finish(std::get<2>(args), response_ref, grpc::Status::OK,
asio::bind_executor(this->get_executor(), [rpc_handler = std::move(rpc_handler),
response = std::move(response)](bool) {}));
}
};
agrpc::repeatedly_request(&test::v1::Test::AsyncService::RequestUnary, service, ProcessRPC{grpc_context}); |
Actually, I don't know whether it's necessary to align with theses lines to handle async requests. } else if (status_ == PROCESS) {
// Spawn a new CallData instance to serve new clients while we process
// the one for this CallData. The instance will deallocate itself as
// part of its FINISH state.
new CallData(service_, cq_);
// The actual processing.
... That means, the coroutine version may be something like this: awaitable<void> handle_rpc(agrpc::GrpcContext& grpc_context,
helloworld::Greeter::AsyncService& service) {
auto executor = co_await this_coro::executor;
auto context = std::allocate_shared<UnaryRPCContext>(
grpc_context.get_allocator());
bool request_ok{true};
request_ok = co_await agrpc::request(
&helloworld::Greeter::AsyncService::RequestSayHello, service,
context->server_context, context->request, context->writer);
if (!request_ok) {
co_return;
}
// This line
co_spawn(executor, handle_rpc(grpc_context, service), detached);
helloworld::HelloReply response;
response.set_message(context->request.name());
auto &writer = context->writer;
co_await agrpc::finish(writer, response, grpc::Status::OK);
}
boost::asio::co_spawn(
grpc_context,
[&]() -> boost::asio::awaitable<void> {
while (true) {
co_await handle_rpc(grpc_context, service);
}
},
boost::asio::detached); Even though the actuall processing time is long, there's still enough calls to |
I reruned the benchmark with my machine and the results are the consistent with yours. However, it's strange that, after changing this line to |
In my experiment, after switch to use co_spawn immediately after receiving the request, the |
Seems legitimate to me. I mean that is exactly what The performance with unifex might be slightly better since they can avoid the extra dynamic memory allocation for the |
I am very glad for your input and your efforts on adapting asio-grpc to libunifex. I think we should combine our efforts. I am open to pull requests, issues and ideas in general. Currently thinking about how to design an API for the |
Actually I have been following the recent proposals of how executor/network are landed in C++ standard. I did some efforts on adapting to libunifex for a more lightweight solution, since asio is not only related to executors. (maybe because I'm not so familiar with asio😀) Now that you have already extended asio-grpc with libunifex support, maybe combine our efforts is a good way, and I am also willing to do some contribution for an easier-to-user async api. I will take a look up the recent updates and think about it. For the rest, since uou mentioned before that we need some codegen to make things easier for users, my first though is to add some plugin like this to provide some dummy codes. Then I'd like to refer to some C# grpc api for the designs. |
Close as the issue is addressed now. Hope we can have further discussion in the futute. |
Recently I ran the grpc_bench to compare the performance of different settings. I found the coroutine based one is slower than both the boost fiber version and the grpc multi-thread version. Do you have any insight about this?
The text was updated successfully, but these errors were encountered: