`grpc.StartGRPCServer` takes 5 seconds to start #14429

Wondertan · 2022-12-27T11:37:52Z

Summary of Bug

grpc.StartGRPCServer takes 5 seconds to start. It's not really clear why this approach was taken, and git blame didn't give me any definitive answer. Looking at the Serve implementation, I don't see why one would wait 5 seconds to conclude the server started correctly.

Why is this a problem? Besides just solving a problem that does not exist, this creates additional problems to end users. E.g., we in celestia-node run dozens of integration tests locally and on CI. Each of the tests running an App has to wait additional 5 seconds, just because. Also, some dev from Secret shared that they have to similarly engineer a solution around the problem.

Running this in a goroutine does not help and is racy, as the func sets GRPCClient to the context. Additionally, the func does not respect native Go's context... Context police 🚓 is discontented.

I don't see any other workaround besides copy code and waiting until it's fixed downstream into our fork,

Possible solutions

At least respect the Go's context in the channel waiting for timeout; then there would be a workaround just to set a context deadline to exit earlier.
But really, there should not be any timeouts, and it's totally fine to run Serve in the goroutine without handling an error for 5 seconds. Additionally, you could pass a logger(through Context?) to log errors, if any. Also, the native Go's context should still be respected, and the server routine should be stopped once it is canceled. Otherwise, the server is started forever and goroutine is similarly leaked/

Version

Latest master

Steps to Reproduce

Just run grpc.StartGRPCServer

The text was updated successfully, but these errors were encountered:

Wondertan · 2022-12-27T11:38:09Z

Happy to submit a patch once we settle on the solution.

alexanderbez · 2022-12-27T21:51:13Z

IIRC, Serve is a blocking process. So how can you call StartGRPCServer return the server object without blocking the caller's process?

tac0turtle · 2022-12-28T13:20:56Z

does it have to be blocking @alexanderbez ?

Wondertan · 2022-12-28T14:38:49Z

So how can you call StartGRPCServer return the server object without blocking the caller's process?

@alexanderbez, same as it is right now. Start the goroutine internally. If there is a need to handle the error by the user(unlikely), she should call Serve herself. Handling errors only for 5 secs is pointless.

alexanderbez · 2022-12-28T15:17:25Z

does it have to be blocking @alexanderbez ?

I'm referring to Serve on the gRPC server (which is defined by the open-source gRPC project), here: https://github.com/cosmos/cosmos-sdk/blob/6dfa0c98062d5d8b38d85ca1d2807937f47da4a3/server/grpc/server.go#LL70C19-L70C19

So yes, it's starting a server, of course it has to block.

@Wondertan I'm not really following your proposal, but if you can open a draft PR I would be happy to take a look at it 😄

…#1551) This PR is one of the cleaning PRs that I am doing. It removes the usage of KVStore. Besides simplifying and unifying tests, it also fixes all the existing and potential "bind already in used" errors from the app and tendermint tandem. I tested this by running all the tests with `-parallel 12`, which runs in parallel per pkg. With this flag, we can speed up the time it takes to run all the tests on Ci and locally. Going further, we can also consider running all the tests in parallel with only one instance of the app running rather than an instance per test. Would love this diff to be reddish, but unfortunately, I had to copy a bunch of code from Cosmos SDK. The rationale is in the comments. Also, see cosmos/cosmos-sdk#14429

… (#1551) This PR is one of the cleaning PRs that I am doing. It removes the usage of KVStore. Besides simplifying and unifying tests, it also fixes all the existing and potential "bind already in used" errors from the app and tendermint tandem. I tested this by running all the tests with `-parallel 12`, which runs in parallel per pkg. With this flag, we can speed up the time it takes to run all the tests on Ci and locally. Going further, we can also consider running all the tests in parallel with only one instance of the app running rather than an instance per test. Would love this diff to be reddish, but unfortunately, I had to copy a bunch of code from Cosmos SDK. The rationale is in the comments. Also, see cosmos/cosmos-sdk#14429

ValarDragon · 2023-02-15T09:41:31Z

cc @Wondertan did this get fixed in celestia / anything that can be upstreamed?

Would appreciate this 5 second delay being eliminated as well. Its not clear to me as well why this is a blocking 5 second wait. As I read the godoc, it seems as its explicitly not intended to be blocking: https://pkg.go.dev/google.golang.org/grpc#Server.Serve

. Serve returns when lis.Accept fails with fatal errors. lis will be closed when this method returns. Serve will return a non-nil error unless Stop or GracefulStop is called.

So the current implemented semantic seems incorrect - it assumes no errors / successful continual execution after 5 seconds. Whereas what I think you should want is this error to be handled with that same goroutine, with {some error recovery handling} for main application to deal with. (And have no blocking behavior)

Should define how we want grpc errors to be dealt with, to infer the error handling strategy. If we want GRPC to be isolated from the main application, this error recovery plan could just be a function thats ran after grpcServer.Serve(), in that goroutine. And otherwise, the function immediately returns with the server.

If we wanted grpc failures within 5 seconds to terminate the entire application, we can have the main app pass in a channel to get the error communicated on, and then the error handling logic is ran from within the app. (But entirely non-blocking)

yihuang · 2023-02-15T11:39:30Z

I think we can share the err channel between all services, and only wait once at the end.

alexanderbez · 2023-02-15T15:34:57Z

I think I can drastically clean this stuff up. Let me take a stab at it...

Wondertan · 2023-02-16T07:44:22Z

cc @Wondertan did this get fixed in celestia / anything that can be upstreamed?

@ValarDragon, we have a quick solution for the test runner that just avoids the timeout. We don't have any clean and "correct" solution.

Wondertan mentioned this issue Dec 27, 2022

test(core|state|swamp): unify and refactor core/app testing utilities celestiaorg/celestia-node#1551

Merged

tac0turtle added the C: gRPC Issues and PRs related to the gRPC service and HTTP gateway. label Dec 28, 2022

alexanderbez self-assigned this Feb 15, 2023

alexanderbez mentioned this issue Feb 15, 2023

refactor: cleanup server logic #15041

Merged

25 tasks

alexanderbez closed this as completed in #15041 Feb 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`grpc.StartGRPCServer` takes 5 seconds to start #14429

`grpc.StartGRPCServer` takes 5 seconds to start #14429

Wondertan commented Dec 27, 2022 •

edited

Wondertan commented Dec 27, 2022

alexanderbez commented Dec 27, 2022

tac0turtle commented Dec 28, 2022

Wondertan commented Dec 28, 2022

alexanderbez commented Dec 28, 2022 •

edited

ValarDragon commented Feb 15, 2023 •

edited

yihuang commented Feb 15, 2023

alexanderbez commented Feb 15, 2023

Wondertan commented Feb 16, 2023

grpc.StartGRPCServer takes 5 seconds to start #14429

grpc.StartGRPCServer takes 5 seconds to start #14429

Comments

Wondertan commented Dec 27, 2022 • edited

Summary of Bug

Possible solutions

Version

Steps to Reproduce

Wondertan commented Dec 27, 2022

alexanderbez commented Dec 27, 2022

tac0turtle commented Dec 28, 2022

Wondertan commented Dec 28, 2022

alexanderbez commented Dec 28, 2022 • edited

ValarDragon commented Feb 15, 2023 • edited

yihuang commented Feb 15, 2023

alexanderbez commented Feb 15, 2023

Wondertan commented Feb 16, 2023

`grpc.StartGRPCServer` takes 5 seconds to start #14429

`grpc.StartGRPCServer` takes 5 seconds to start #14429

Wondertan commented Dec 27, 2022 •

edited

alexanderbez commented Dec 28, 2022 •

edited

ValarDragon commented Feb 15, 2023 •

edited