Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broker backpressure. #6313

Merged
merged 2 commits into from
Sep 10, 2018
Merged

Broker backpressure. #6313

merged 2 commits into from
Sep 10, 2018

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Sep 7, 2018

Adds a new property "druid.broker.http.maxQueuedBytes" and a new context
parameter "maxQueuedBytes". Both represent a maximum number of bytes queued
per query before exerting backpressure on the channel to the data server.

I tested this by running on a modest cluster (6 historicals) and doing an unlimited,
unfiltered Scan query via SQL (select * from tbl). The query ran for about half an
hour and managed to fetch almost 20GB of results without OOMing the broker.
Without this patch, the same query quickly OOMed the broker.

See HttpResponseHandler for a description of the API.

Fixes #4933.

Adds a new property "druid.broker.http.maxQueuedBytes" and a new context
parameter "maxQueuedBytes". Both represent a maximum number of bytes queued
per query before exerting backpressure on the channel to the data server.

Fixes apache#4933.
@fjy fjy added this to the 0.13.0 milestone Sep 7, 2018
@fjy
Copy link
Contributor

fjy commented Sep 8, 2018

👍

synchronized (watermarkLock) {
suspendWatermark = Math.max(suspendWatermark, currentChunkNum);
if (suspendWatermark > resumeWatermark) {
channel.setReadable(false);
Copy link
Contributor

@himanshug himanshug Sep 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, this is the magic way of telling netty to stop reading data off of socket without blocking any of the worker threads?

Copy link
Contributor Author

@gianm gianm Sep 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is. It makes netty stop reading from the socket and stop sending new data up the channel. It means that the data servers (historicals etc) will block while trying to write data, which I think is ok, since the blocking happens there in an http server thread dedicated to one query.

if (response.isFinished()) {
retVal.set((Final) response.getObj());
}

assert currentChunkNum == 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you intentionally leave it ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did, it's just a thing that I think should always be true, so I put in an assert to 'document' that.

*
* This handler can exert backpressure by returning a response with "continueReading" set to false from handleResponse()
* or handleChunk(). In this case, the HTTP client will stop reading soon thereafter. It may not happen immediately, so
* be prepared for more handleChunk() calls to happen. To resume reads, call resume() on the TrafficCop provided by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this because netty channel.setReadable(false) doesn't take effect immediately and netty might deliver one or few more chunks after changing channel readability?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is. Any data that has already been read from the socket, but not yet delivered to our channel handler, will still be delivered.

Copy link
Contributor

@himanshug himanshug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome.

It would be nice if we could add some metrics around number of times suspension happened or time period for which we suspended reading data. that way user can defensively set the queued bytes setting and tune its value based on the metrics.

@gianm
Copy link
Contributor Author

gianm commented Sep 9, 2018

@himanshug The metric sounds useful, are you ok adding it yourself in a later patch? 😄

@himanshug
Copy link
Contributor

@gianm that isn't setting the right precedence :)
but yes, it could be left for a future PR.

@gianm gianm mentioned this pull request Sep 10, 2018
@gianm
Copy link
Contributor Author

gianm commented Sep 10, 2018

@himanshug, thanks for the review. I raised an issue in #6321 describing how the metric could work.

@hellobabygogo
Copy link

@gianm
Hi, gianm. I have a question about broker backpressure. If there are a lot of backpressure queries, will it cause the query thread to be full?

@himanshug
Copy link
Contributor

@hellobabygogo yes , this backpressure means broker will pause to read response data from historicals while it is processing data it already read from them. so, backpressure would propagate all the way to historicals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Backpressure for broker-to-historical communication
4 participants