should handle channel disconnected better in NettyHttpClient or Druid #3393

kaijianding · 2016-08-24T07:41:24Z

I'm facing such a race condition:
Broker runs a query, finally calls the httpClient.go() in DirectDruidClient, the httpClient is instance of NettyHttpClient.
Inside the NettyHttpClient.go() method, after the channel is taken from the pool and verified in good shape and ready to fire the query, then the historical side reaches the idle time and disconnect this channel.

Then handler.channelDisconnected callback is called, and finally the query is failed in broker.
This exception occurs hundreds of times in our druid cluster (it is a busy cluster).

This channel never has a chance to actually connect to the historical to get a single byte back, thus ,I think, the NettyHttpClient.go() should retry internally one time to get a new channel to finish the "go" instead of set the retVal future to fail.

The same thing can happen when channel.write(httpRequest).addListener(...) in NettyHttpClient. the channel to write can be closed by historical server after it is taken from the pool channel, thus it also should be retried.

here is the trace:

2016-08-24T14:03:29,029 ERROR [qtp1369854401-202[timeseries_eternal_olap_click_da76287c-4eec-4c84-a8b8-2e12ea92aa05]] io.druid.server.QueryResource - Exception handling request: {class=io.druid.server.QueryResource, exceptionType=class com.metamx.common.RE, exceptionMessage=Failure getting results from[http://hdppic0101.et2.tbsite.net:8083/druid/v2/] because of [org.jboss.netty.channel.ChannelException: Channel disconnected], exception=com.metamx.common.RE: Failure getting results from[http://hdppic0101.et2.tbsite.net:8083/druid/v2/] because of [org.jboss.netty.channel.ChannelException: Channel disconnected], query=TimeseriesQuery{dataSource='eternal_olap_click', querySegmentSpec=LegacySegmentSpec{intervals=[2016-08-24T00:00:00.000+08:00/2016-08-24T13:54:00.000+08:00]}, descending=false, dimFilter=(scene_tag = sort_type^C_coefp && bu_src = taobao_topic && item_id = tm6048738 && exper_token_str = wl_topic_sort && action_type = click), granularity='PeriodGranularity{period=PT1M, timeZone=+08:00, origin=null}', aggregatorSpecs=[DoubleSumAggregatorFactory{fieldName='ipv', name='ipv'}], postAggregatorSpecs=[], context={queryId=da76287c-4eec-4c84-a8b8-2e12ea92aa05, timeout=30000}}, peer=10.197.16.234}
com.metamx.common.RE: Failure getting results from[http://hdppic0101.et2.tbsite.net:8083/druid/v2/] because of [org.jboss.netty.channel.ChannelException: Channel disconnected]
        at io.druid.client.DirectDruidClient$JsonParserIterator.init(DirectDruidClient.java:498) ~[druid-server-0.9.0.jar:0.9.0]
        at io.druid.client.DirectDruidClient$JsonParserIterator.hasNext(DirectDruidClient.java:442) ~[druid-server-0.9.0.jar:0.9.0]
        at com.metamx.common.guava.BaseSequence.makeYielder(BaseSequence.java:103) ~[java-util-0.27.7.jar:?]
        at com.metamx.common.guava.BaseSequence.toYielder(BaseSequence.java:81) ~[java-util-0.27.7.jar:?]
        at com.metamx.common.guava.MappedSequence.toYielder(MappedSequence.java:46) ~[java-util-0.27.7.jar:?]
        at com.metamx.common.guava.MergeSequence$2.accumulate(MergeSequence.java:66) ~[java-util-0.27.7.jar:?]
        at com.metamx.common.guava.MergeSequence$2.accumulate(MergeSequence.java:62) ~[java-util-0.27.7.jar:?]
        at com.metamx.common.guava.YieldingAccumulators$1.accumulate(YieldingAccumulators.java:32) ~[java-util-0.27.7.jar:?]
        at com.metamx.common.guava.BaseSequence.makeYielder(BaseSequence.java:104) ~[java-util-0.27.7.jar:?]
        at com.metamx.common.guava.BaseSequence.toYielder(BaseSequence.java:81) ~[java-util-0.27.7.jar:?]
        at com.metamx.common.guava.BaseSequence.accumulate(BaseSequence.java:67) ~[java-util-0.27.7.jar:?]
        at com.metamx.common.guava.MergeSequence.toYielder(MergeSequence.java:59) ~[java-util-0.27.7.jar:?]
        at com.metamx.common.guava.LazySequence.toYielder(LazySequence.java:43) ~[java-util-0.27.7.jar:?]
        at io.druid.query.RetryQueryRunner$1.toYielder(RetryQueryRunner.java:105) ~[druid-processing-0.9.0.jar:0.9.0]
        at io.druid.common.guava.CombiningSequence.toYielder(CombiningSequence.java:79) ~[druid-common-0.9.0.jar:0.9.0]
        at com.metamx.common.guava.MappedSequence.toYielder(MappedSequence.java:46) ~[java-util-0.27.7.jar:?]
        at io.druid.query.CPUTimeMetricQueryRunner$1.toYielder(CPUTimeMetricQueryRunner.java:93) ~[druid-processing-0.9.0.jar:0.9.0]
        at com.metamx.common.guava.Sequences$1.toYielder(Sequences.java:98) ~[java-util-0.27.7.jar:?]
        at io.druid.server.QueryResource.doPost(QueryResource.java:170) [druid-server-0.9.0.jar:0.9.0]
        at sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source) ~[?:?]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_66]
        at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_66]
        at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) [jersey-server-1.19.jar:1.19]
        at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205) [jersey-server-1.19.jar:1.19]
        at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) [jersey-server-1.19.jar:1.19]
        at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) [jersey-server-1.19.jar:1.19]
        at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) [jersey-server-1.19.jar:1.19]
        at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205) [jersey-server-1.19.jar:1.19]
        at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) [jersey-server-1.19.jar:1.19]
        at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) [jersey-server-1.19.jar:1.19]
        at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) [jersey-server-1.19.jar:1.19]
        at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) [jersey-server-1.19.jar:1.19]
        at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) [jersey-server-1.19.jar:1.19]
        at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542) [jersey-server-1.19.jar:1.19]
        at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473) [jersey-server-1.19.jar:1.19]
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419) [jersey-server-1.19.jar:1.19]
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409) [jersey-server-1.19.jar:1.19]
        at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409) [jersey-servlet-1.19.jar:1.19]
        at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558) [jersey-servlet-1.19.jar:1.19]
        at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733) [jersey-servlet-1.19.jar:1.19]
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) [javax.servlet-api-3.1.0.jar:3.1.0]
        at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:278) [guice-servlet-4.0-beta.jar:?]
        at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:268) [guice-servlet-4.0-beta.jar:?]
        at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:180) [guice-servlet-4.0-beta.jar:?]
        at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:93) [guice-servlet-4.0-beta.jar:?]
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85) [guice-servlet-4.0-beta.jar:?]
        at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:120) [guice-servlet-4.0-beta.jar:?]
        at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:132) [guice-servlet-4.0-beta.jar:?]
        at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:129) [guice-servlet-4.0-beta.jar:?]
        at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:206) [guice-servlet-4.0-beta.jar:?]
        at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:129) [guice-servlet-4.0-beta.jar:?]
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668) [jetty-servlet-9.3.6.v20151106.jar:9.3.6.v20151106]
        at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:45) [jetty-servlets-9.3.6.v20151106.jar:9.3.6.v20151106]
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668) [jetty-servlet-9.3.6.v20151106.jar:9.3.6.v20151106]
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581) [jetty-servlet-9.3.6.v20151106.jar:9.3.6.v20151106]
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:224) [jetty-server-9.3.6.v20151106.jar:9.3.6.v20151106]
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1158) [jetty-server-9.3.6.v20151106.jar:9.3.6.v20151106]
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511) [jetty-servlet-9.3.6.v20151106.jar:9.3.6.v20151106]
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) [jetty-server-9.3.6.v20151106.jar:9.3.6.v20151106]
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1090) [jetty-server-9.3.6.v20151106.jar:9.3.6.v20151106]
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) [jetty-server-9.3.6.v20151106.jar:9.3.6.v20151106]
        at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) [jetty-server-9.3.6.v20151106.jar:9.3.6.v20151106]
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:119) [jetty-server-9.3.6.v20151106.jar:9.3.6.v20151106]
        at org.eclipse.jetty.server.Server.handle(Server.java:517) [jetty-server-9.3.6.v20151106.jar:9.3.6.v20151106]
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308) [jetty-server-9.3.6.v20151106.jar:9.3.6.v20151106]
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:242) [jetty-server-9.3.6.v20151106.jar:9.3.6.v20151106]
        at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:213) [jetty-util-9.3.6.v20151106.jar:9.3.6.v20151106]
        at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:147) [jetty-util-9.3.6.v20151106.jar:9.3.6.v20151106]
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654) [jetty-util-9.3.6.v20151106.jar:9.3.6.v20151106]
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572) [jetty-util-9.3.6.v20151106.jar:9.3.6.v20151106]
        at java.lang.Thread.run(Thread.java:756) [?:1.8.0_66]
Caused by: java.util.concurrent.ExecutionException: org.jboss.netty.channel.ChannelException: Channel disconnected
        at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) ~[guava-16.0.1.jar:?]
        at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) ~[guava-16.0.1.jar:?]
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-16.0.1.jar:?]
        at io.druid.client.DirectDruidClient$JsonParserIterator.init(DirectDruidClient.java:479) ~[druid-server-0.9.0.jar:0.9.0]
        ... 69 more
Caused by: org.jboss.netty.channel.ChannelException: Channel disconnected
        at com.metamx.http.client.NettyHttpClient$1.channelDisconnected(NettyHttpClient.java:311) ~[http-client-1.0.4.jar:?]
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:102) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelDisconnected(SimpleChannelUpstreamHandler.java:208) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:102) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelDisconnected(SimpleChannelUpstreamHandler.java:208) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:102) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.handler.codec.replay.ReplayingDecoder.cleanup(ReplayingDecoder.java:570) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.handler.codec.frame.FrameDecoder.channelDisconnected(FrameDecoder.java:365) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:102) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.handler.codec.http.HttpClientCodec.handleUpstream(HttpClientCodec.java:92) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.Channels.fireChannelDisconnected(Channels.java:396) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:360) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) ~[netty-3.10.4.Final.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_66]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_66]

The text was updated successfully, but these errors were encountered:

chainani · 2016-10-06T16:21:09Z

any resolution for this?

villuminati · 2016-10-25T13:27:00Z

facing this same problem

b-slim · 2016-11-23T00:57:28Z

i am wondering what are the values of druid.broker.http.numConnections and druid.broker.http.readTimeout did you try to tweak it ?

b-slim · 2016-11-23T15:46:58Z

@kaijianding by tweak i mean increase the values.

kaijianding · 2016-11-23T15:48:41Z

druid.broker.http.numConnections=20
druid.broker.http.readTimeout=PT5M
druid.server.http.maxIdleTime=PT5m at historical side, and this value is druid's default value

I increase druid.server.http.maxIdleTime to P1D and this issue is rarely happen
@chainani @RishiMi @b-slim

b-slim · 2016-11-23T15:56:06Z

@kaijianding i guess that solved this issue right ?

kaijianding · 2016-11-23T16:03:19Z

yes, but it just avoid to trigger this issue happening frequently, it doesn't resolve it at the root, I think it's better to fix it at code level.

sharks222 · 2017-09-27T03:37:45Z

yes ,
I have set :druid.broker.http.readTimeout=PT20M and druid.server.http.maxIdleTime=PT20m,but it does not work , why ?

himanshug · 2017-09-27T20:29:38Z

@kaijianding for this specific problem, i think better solution might be to optionally (configured via ResourcePoolConfig ) add an expiry on unused connection in https://github.com/metamx/http-client/blob/master/src/main/java/com/metamx/http/client/pool/ResourcePool.java#L46

himanshug · 2017-11-16T17:58:00Z

we also saw it on some clusters, though it happens very very infrequently . @akashdw is working to implement #3393 (comment) . So, we'll set 1 min(configurable of course) pooling timeout (if no-one uses that connection in a min then its removed from the pool and discarded) so that near expiry connections are not given to DirectDruidClient.

kaijianding · 2017-11-22T11:40:03Z

This solution sounds good @himanshug

gianm · 2018-02-23T17:43:24Z

This should be fixed with the java-util upgrade in http://github.com/druid-io/druid/pull/5239 (and was preserved in the ensuing merge of java-util into Druid in #5289).

… instead of throwing exception

drcrallen added the Bug label Aug 24, 2016

leventov mentioned this issue Mar 28, 2017

Use async-http-client instead of metamx/http-client? #4124

Closed

akashdw mentioned this issue Dec 7, 2017

Timeout support in resource pool. metamx/java-util#73

Merged

gianm closed this as completed Feb 23, 2018

gianm mentioned this issue Feb 23, 2018

Exceptions in broker: Channel disconnected #5340

Closed

kamaci mentioned this issue Mar 9, 2019

Flaky test ITAppenderatorDriverRealtimeIndexTaskTest testRealtimeIndexTask() #7219

Closed

seoeun25 pushed a commit to seoeun25/incubator-druid that referenced this issue Feb 25, 2022

apache#3393 Purge on not-existing datasource should return NO_CONTENT…

ef282f0

… instead of throwing exception

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

should handle channel disconnected better in NettyHttpClient or Druid #3393

should handle channel disconnected better in NettyHttpClient or Druid #3393

kaijianding commented Aug 24, 2016 •

edited

Loading

chainani commented Oct 6, 2016

villuminati commented Oct 25, 2016

b-slim commented Nov 23, 2016

b-slim commented Nov 23, 2016

kaijianding commented Nov 23, 2016

b-slim commented Nov 23, 2016

kaijianding commented Nov 23, 2016

sharks222 commented Sep 27, 2017 •

edited

Loading

himanshug commented Sep 27, 2017

himanshug commented Nov 16, 2017

kaijianding commented Nov 22, 2017

gianm commented Feb 23, 2018

should handle channel disconnected better in NettyHttpClient or Druid #3393

should handle channel disconnected better in NettyHttpClient or Druid #3393

Comments

kaijianding commented Aug 24, 2016 • edited Loading

chainani commented Oct 6, 2016

villuminati commented Oct 25, 2016

b-slim commented Nov 23, 2016

b-slim commented Nov 23, 2016

kaijianding commented Nov 23, 2016

b-slim commented Nov 23, 2016

kaijianding commented Nov 23, 2016

sharks222 commented Sep 27, 2017 • edited Loading

himanshug commented Sep 27, 2017

himanshug commented Nov 16, 2017

kaijianding commented Nov 22, 2017

gianm commented Feb 23, 2018

kaijianding commented Aug 24, 2016 •

edited

Loading

sharks222 commented Sep 27, 2017 •

edited

Loading