-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
should handle channel disconnected better in NettyHttpClient or Druid #3393
Comments
any resolution for this? |
facing this same problem |
i am wondering what are the values of |
@kaijianding by tweak i mean increase the values. |
@kaijianding i guess that solved this issue right ? |
yes, but it just avoid to trigger this issue happening frequently, it doesn't resolve it at the root, I think it's better to fix it at code level. |
yes , |
@kaijianding for this specific problem, i think better solution might be to optionally (configured via |
we also saw it on some clusters, though it happens very very infrequently . @akashdw is working to implement #3393 (comment) . So, we'll set 1 min(configurable of course) pooling timeout (if no-one uses that connection in a min then its removed from the pool and discarded) so that near expiry connections are not given to DirectDruidClient. |
This solution sounds good @himanshug |
This should be fixed with the java-util upgrade in http://github.com/druid-io/druid/pull/5239 (and was preserved in the ensuing merge of java-util into Druid in #5289). |
… instead of throwing exception
I'm facing such a race condition:
Broker runs a query, finally calls the httpClient.go() in DirectDruidClient, the httpClient is instance of NettyHttpClient.
Inside the NettyHttpClient.go() method, after the channel is taken from the pool and verified in good shape and ready to fire the query, then the historical side reaches the idle time and disconnect this channel.
Then handler.channelDisconnected callback is called, and finally the query is failed in broker.
This exception occurs hundreds of times in our druid cluster (it is a busy cluster).
This channel never has a chance to actually connect to the historical to get a single byte back, thus ,I think, the NettyHttpClient.go() should retry internally one time to get a new channel to finish the "go" instead of set the retVal future to fail.
The same thing can happen when channel.write(httpRequest).addListener(...) in NettyHttpClient. the channel to write can be closed by historical server after it is taken from the pool channel, thus it also should be retried.
here is the trace:
The text was updated successfully, but these errors were encountered: