-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gRPC v1.7.3 transport "panic: send on closed channel" on *serverHandlerTransport #8904
Comments
Can you share the full stack trace? |
@gyuho unfortunately that is all I see in the logs. The server is started via systemd. |
@zbindenren Are you sure it's running v3.2.10? Even systemd should be able to display panic stack trace. We can't help without stack trace. |
There seems to be a bug with journalctl: systemd/systemd#3277 I‘ll provide the stack trace asap. |
@gyuho here the stack:
|
@zbindenren We are still trying to figure out if this is a bug in gRPC-side or etcd. How often do you see this panics? And any easy way to reproduce this? |
@gyuho We have this panics daily on all three nodes. Here the events for the 27th: [root@p1-linux-mlsu006 ~]# journalctl -l --since='2017-11-27' --until='2017-11-28' |grep panic
Nov 27 09:32:42 p1-linux-mlsu006 etcd[80596]: panic: send on closed channel
Nov 27 09:32:42 p1-linux-mlsu006 etcd[80596]: panic: send on closed channel
Nov 27 09:32:57 p1-linux-mlsu006 etcd[39807]: panic: send on closed channel
Nov 27 16:20:57 p1-linux-mlsu006 etcd[40044]: panic: send on closed channel
Nov 27 16:21:12 p1-linux-mlsu006 etcd[43467]: panic: send on closed channel
Nov 27 16:21:28 p1-linux-mlsu006 etcd[43494]: panic: send on closed channel
Nov 27 16:42:57 p1-linux-mlsu006 etcd[43537]: panic: send on closed channel [root@p1-linux-mlsu007 ~]# journalctl -l --since='2017-11-27' --until='2017-11-28' |grep panic
Nov 27 09:32:41 p1-linux-mlsu007 etcd[81077]: panic: send on closed channel
Nov 27 11:52:14 p1-linux-mlsu007 etcd[109013]: panic: send on closed channel
Nov 27 13:18:49 p1-linux-mlsu007 etcd[6570]: panic: send on closed channel
Nov 27 15:16:38 p1-linux-mlsu007 etcd[24833]: panic: send on closed channel
Nov 27 16:20:58 p1-linux-mlsu007 etcd[50016]: panic: send on closed channel
Nov 27 16:21:12 p1-linux-mlsu007 etcd[68518]: panic: send on closed channel
Nov 27 16:21:31 p1-linux-mlsu007 etcd[68546]: panic: send on closed channel
Nov 27 16:42:50 p1-linux-mlsu007 etcd[68590]: panic: send on closed channel
Nov 27 16:43:13 p1-linux-mlsu007 etcd[74379]: panic: send on closed channel [root@p1-linux-mlsu008 ~]# journalctl -l --since='2017-11-27' --until='2017-11-28' |grep panic
Nov 27 09:32:48 p1-linux-mlsu008 etcd[9260]: panic: send on closed channel
Nov 27 15:16:36 p1-linux-mlsu008 etcd[42423]: panic: send on closed channel
Nov 27 16:20:59 p1-linux-mlsu008 etcd[119236]: panic: send on closed channel
Nov 27 16:36:41 p1-linux-mlsu008 etcd[10353]: panic: send on closed channel
Nov 27 16:42:54 p1-linux-mlsu008 etcd[16576]: panic: send on closed channel Unfortunately I have no easy way to reproduce this. We have this bug only in production. The production cluster is a three node cluster with around 2000 clients. There are around 2000 lease and watch streams. Usually there is not a lot of watch traffic. The panic usually occurs when there are increased watch events. Does this help? Do you need more information? |
@zbindenren Ok this definitely needs fix. Do you have full stack trace? Longer one than #8904 (comment) would be super helpful! |
@gyuho Here some stack traces from one node:
|
Ok thanks. We will look into it. |
@gyuho Let me know if I can help more. |
@zbindenren Sure, will do. One question, is that cluster using TLS? We have different gRPC handler when TLS is enabled. |
@gyuho You are right, the cluster is TLS with client certificate authentication. We had problems with user/password authentication, the server could not handle the load with username/password authentication. |
@gyuho I have not looked into this directly but I do recall seeing a similar issue with the grpc-gateway. What I found was that instead of caching the auth token where possible my Perl client was generating a new token during every request adding a lot of overhead which could result in panic. Could be unrelated but figure I would pass it along. |
@zbindenren Upstream fix has been merged (grpc/grpc-go#1687). Hopefully, we can release the fix with v3.2.11 next week. |
@gyuho thanks, that would be great. I will test it as soon it is released. |
Are you guys planning to release 3.3 any soon ? :) Is this issue blocking it? |
@sokoow We are waiting for gRPC team's release. Then we will release v3.2.11. |
v3.2.11 is released. Please try https://github.com/coreos/etcd/releases/tag/v3.2.11 and let us know if it fixes the issue. |
@gyuho thanks, I'll deploy it tomorrow on production and let you know. |
@zbindenren Just checking in. Did the new release resolve the issue? |
@gyuho no leader election since update. The last leader election you see is from the update. |
@zbindenren Thanks for confirming! |
Any sign of 3.3 release yet ? |
I have this same issue in 3.2.17+dfsg-1 (ubuntu 18.04).. my entire cluster dies regularly :( |
Hi
Similar to #8595 we still get a panic with v3.2.10. Here the log:
The text was updated successfully, but these errors were encountered: