New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cartservice: unhealthy signals from grpc #74
Comments
Hi, I got this error。Has it been solved? |
No it's still there, If you get around debugging it, let us know! |
I've isolated this problem to Redis. The readiness probe for Cartservice is the
Reusing the client keeps the number of clients at 3:
Steps to reproduce: I added a stopwatch to the Check function in cartservice. Then I increased the frequency of the Liveness probe to run every 3 seconds. Then I ran cartservice. The first Check takes almost a second to run (stopwatch in ms):
After the first check, the amount of time Check takes is highly erratic. Checks from the same 1-minute interval:
Then as soon as that time is over 1000 (1 second default timeout for grpc-health-probe, the container is restarted. I think the erratic pings are actually due to redis, when enabled -- local store's Ping always just returns true, takes < 1ms to run:
But with Redis, every time the liveness probe runs, the cartservice Ping also pings Redis. I think this Ping time is erratic due to the number of threads the C# Redis client has. If all the threads are being used on SET/GET operations, Ping sits in a queue for up to (and over) 1 second. |
Update: I let
Using the existing redis client makes most health checks now take < 1ms, as expected, yet every once in a while, the Ping takes a full second, not sure why:
|
I am thinking it's something with the redis client. A few things that you can use to debug:
|
Still investigating, other notes:
What's strange is that the ping either takes < 1 ms or about 1 second:
|
I am suspecting it's the redis client. 🤷♂ Many things in .NET ecosystem doesn't know much about the microservices architecture, so something like connection pooling, internal retries, or possibly stop-the-world GC could be causing this. Another thing to debug is to run another The solution is most likely going be one of these: cache the connection and just run client.Ping() on the health check –or the exact opposite: re-connect cleanly every time. |
Good ideas, sorry for delay.
Option 1 -- Cache the cart-service redis client connection.When re-using a redis client, determined that the
Option 2 -- Re-connect cleanly on every
|
cartservice (@ 3b6d386) restarts once about every 30 minutes and has a bunch of failed probe events.
most prominently the
timeout: health rpc did not complete within 1s
error from grpc_health_probe.This requires some investigation as for why Check() can't complete within 1s.
The text was updated successfully, but these errors were encountered: