Broker backpressure. #6313

gianm · 2018-09-07T08:06:37Z

Adds a new property "druid.broker.http.maxQueuedBytes" and a new context
parameter "maxQueuedBytes". Both represent a maximum number of bytes queued
per query before exerting backpressure on the channel to the data server.

I tested this by running on a modest cluster (6 historicals) and doing an unlimited,
unfiltered Scan query via SQL (select * from tbl). The query ran for about half an
hour and managed to fetch almost 20GB of results without OOMing the broker.
Without this patch, the same query quickly OOMed the broker.

See HttpResponseHandler for a description of the API.

Fixes #4933.

Adds a new property "druid.broker.http.maxQueuedBytes" and a new context parameter "maxQueuedBytes". Both represent a maximum number of bytes queued per query before exerting backpressure on the channel to the data server. Fixes apache#4933.

fjy · 2018-09-08T20:23:20Z

👍

himanshug · 2018-09-08T22:11:27Z

java-util/src/main/java/org/apache/druid/java/util/http/client/NettyHttpClient.java

+              synchronized (watermarkLock) {
+                suspendWatermark = Math.max(suspendWatermark, currentChunkNum);
+                if (suspendWatermark > resumeWatermark) {
+                  channel.setReadable(false);


so, this is the magic way of telling netty to stop reading data off of socket without blocking any of the worker threads?

Yes, it is. It makes netty stop reading from the socket and stop sending new data up the channel. It means that the data servers (historicals etc) will block while trying to write data, which I think is ok, since the blocking happens there in an http server thread dedicated to one query.

himanshug · 2018-09-08T22:13:20Z

java-util/src/main/java/org/apache/druid/java/util/http/client/NettyHttpClient.java

                if (response.isFinished()) {
                  retVal.set((Final) response.getObj());
                }

+                assert currentChunkNum == 0;


did you intentionally leave it ?

I did, it's just a thing that I think should always be true, so I put in an assert to 'document' that.

himanshug · 2018-09-08T23:31:54Z

...-util/src/main/java/org/apache/druid/java/util/http/client/response/HttpResponseHandler.java

+ *
+ * This handler can exert backpressure by returning a response with "continueReading" set to false from handleResponse()
+ * or handleChunk(). In this case, the HTTP client will stop reading soon thereafter. It may not happen immediately, so
+ * be prepared for more handleChunk() calls to happen. To resume reads, call resume() on the TrafficCop provided by


is this because netty channel.setReadable(false) doesn't take effect immediately and netty might deliver one or few more chunks after changing channel readability?

Yeah, it is. Any data that has already been read from the socket, but not yet delivered to our channel handler, will still be delivered.

himanshug

This is awesome.

It would be nice if we could add some metrics around number of times suspension happened or time period for which we suspended reading data. that way user can defensively set the queued bytes setting and tune its value based on the metrics.

gianm · 2018-09-09T01:22:14Z

@himanshug The metric sounds useful, are you ok adding it yourself in a later patch? 😄

himanshug · 2018-09-10T16:20:06Z

@gianm that isn't setting the right precedence :)
but yes, it could be left for a future PR.

gianm · 2018-09-10T16:31:43Z

@himanshug, thanks for the review. I raised an issue in #6321 describing how the metric could work.

hellobabygogo · 2019-05-13T06:06:45Z

@gianm
Hi, gianm. I have a question about broker backpressure. If there are a lot of backpressure queries, will it cause the query thread to be full?

himanshug · 2019-05-13T16:36:06Z

@hellobabygogo yes , this backpressure means broker will pause to read response data from historicals while it is processing data it already read from them. so, backpressure would propagate all the way to historicals.

Broker backpressure.

7012568

Adds a new property "druid.broker.http.maxQueuedBytes" and a new context parameter "maxQueuedBytes". Both represent a maximum number of bytes queued per query before exerting backpressure on the channel to the data server. Fixes apache#4933.

gianm added the Area - Querying label Sep 7, 2018

This was referenced Sep 7, 2018

Unable to query large data set with scan-query via broker #4865

Closed

Optionally refuse to consume new data until the prior chunk is being consumed #6014

Closed

Fix query context doc.

ecf9449

fjy added this to the 0.13.0 milestone Sep 7, 2018

himanshug reviewed Sep 8, 2018

View reviewed changes

himanshug approved these changes Sep 10, 2018

View reviewed changes

gianm mentioned this pull request Sep 10, 2018

Backpressure metrics #6321

Closed

gianm merged commit d6cbdf8 into apache:master Sep 10, 2018

gianm deleted the broker-backpressure branch September 10, 2018 16:33

drcrallen mentioned this pull request Sep 21, 2018

Move Caching Cluster Client to java streams and allow parallel intermediate merges #5913

Closed

1 task

dclim mentioned this pull request Oct 10, 2018

Druid 0.13.0-incubating release notes #6442

Closed

himanshug mentioned this pull request May 13, 2019

improve query timeout handling and limit max scatter-gather bytes #4229

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Broker backpressure. #6313

Broker backpressure. #6313

gianm commented Sep 7, 2018 •

edited

fjy commented Sep 8, 2018

himanshug Sep 8, 2018 •

edited

gianm Sep 9, 2018 •

edited

himanshug Sep 8, 2018

gianm Sep 9, 2018

himanshug Sep 8, 2018

gianm Sep 9, 2018

himanshug left a comment

gianm commented Sep 9, 2018

himanshug commented Sep 10, 2018

gianm commented Sep 10, 2018

hellobabygogo commented May 13, 2019

himanshug commented May 13, 2019

Broker backpressure. #6313

Broker backpressure. #6313

Conversation

gianm commented Sep 7, 2018 • edited

fjy commented Sep 8, 2018

himanshug Sep 8, 2018 • edited

Choose a reason for hiding this comment

gianm Sep 9, 2018 • edited

Choose a reason for hiding this comment

himanshug Sep 8, 2018

Choose a reason for hiding this comment

gianm Sep 9, 2018

Choose a reason for hiding this comment

himanshug Sep 8, 2018

Choose a reason for hiding this comment

gianm Sep 9, 2018

Choose a reason for hiding this comment

himanshug left a comment

Choose a reason for hiding this comment

gianm commented Sep 9, 2018

himanshug commented Sep 10, 2018

gianm commented Sep 10, 2018

hellobabygogo commented May 13, 2019

himanshug commented May 13, 2019

gianm commented Sep 7, 2018 •

edited

himanshug Sep 8, 2018 •

edited

gianm Sep 9, 2018 •

edited