-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
query laning and load shedding #9407
Changes from 34 commits
6220985
3f41001
feae8b1
554b8b5
0597c3f
405c94e
e22ece1
e98cad7
2069437
eaf1449
688ca43
f0b3f9f
87c6cbd
5e91bcb
912b7bc
1e384bf
60861a4
9aed16e
419ab98
f0d39e1
2afaaf1
ef029c4
059a2d4
0ec8a26
5711fce
aa73a14
50847af
abe3631
86501ef
8b7b70d
91ad9d9
373fd11
2741501
8575cf8
25a8bda
32965a4
361e06e
3ba7808
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1476,9 +1476,35 @@ These Broker configurations can be defined in the `broker/runtime.properties` fi | |
|`druid.broker.select.tier`|`highestPriority`, `lowestPriority`, `custom`|If segments are cross-replicated across tiers in a cluster, you can tell the broker to prefer to select segments in a tier with a certain priority.|`highestPriority`| | ||
|`druid.broker.select.tier.custom.priorities`|`An array of integer priorities.`|Select servers in tiers with a custom priority list.|None| | ||
|
||
##### Query laning | ||
|
||
The Broker provides facilities to aid in query capacity reservation for heterogeneous query workloads in the form of 'laning' strategies, which provide a variety of mechanisms to examine and classify a query, assigning it to a 'lane'. Lanes are defined with capacity limits which the broker will enforce, causing requests in excess of the capacity to be discarded with an HTTP 429 status code, reserving resources for other lanes or for interactive queries (with no lane). | ||
|
||
|Property|Description|Default| | ||
|--------|-----------|-------| | ||
|`druid.query.scheduler.numThreads`|Maximum number of HTTP threads to dedicate to query processing. To save HTTP thread capacity, this should be lower than `druid.server.http.numThreads`.|Unbounded| | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. in what use case would I ever want to set it something other than There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I actually think we might always want to set it lower than
See my above nervousness, but I think
I would agree the current documentation doesn't quite adequately describe how this stuff might be utilized, in a future PR i want to add a section to cluster tuning docs to more properly advise on when and how to set this stuff up. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks, I understand the reasoning now. Major behavior change with lane usage is really losing the queuing of requests to handle spikes and instead sending 429s immediately. In future, we could introduce mechanism to maintain statically/dynamically sized [per lane] waiting queue ourselves as well along with concurrency limits in lane strategy. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if it would make sense to instead move towards automatically computing
Yeah the current behavior is definitely a hard stop if you are over the line. I agree it would make sense to allow some sort of timed out queuing behavior, which is what jetty QoS filter can sort of provide, which is a large part of why I am still wondering if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is fine to let user provide There are few advantages in maintaining the queues ourselves and not letting jetty do it.
|
||
|`druid.query.scheduler.laning.strategy`|Query laning strategy to use to assign queries to a lane in order to control capacities for certain classes of queries.|`none`| | ||
|
||
##### Laning strategies | ||
|
||
###### No laning strategy | ||
|
||
In this mode, queries are never assigned lane, and concurrent query count will only limited by `druid.server.http.numThreads` or `druid.query.scheduler.numThreads`, if set. This is the default Druid query scheduler operating mode. This strategy can also be explicitly enabled by setting `druid.query.scheduler.laning.strategy` to `none`. | ||
clintropolis marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider adding: This strategy can be enabled by setting |
||
###### 'High/Low' laning strategy | ||
This laning strategy splits queries with a `priority` below zero into a `low` query lane, automatically. Queries with priority of zero (the default) or above are considered 'interactive'. The limit on `low` queries can be set to some desired percentage of the total capacity (or HTTP thread pool size), reserving capacity for interactive queries. Queries in the `low` lane are _not_ guaranteed their capacity, which may be consumed by interactive queries, but may use up to this limit if total capacity is available. | ||
|
||
If the `low` lane is specified in the [query context](../querying/query-context.md) `lane` parameter, this will override the computed lane. | ||
|
||
This strategy can be enabled by setting `druid.query.scheduler.laning.strategy=hilo`. | ||
|
||
|Property|Description|Default| | ||
|--------|-----------|-------| | ||
|`druid.query.scheduler.laning.maxLowPercent`|Maximum percent of the smaller number of`druid.server.http.numThreads` or `druid.query.scheduler.numThreads`, defining the number of HTTP threads that can be used by queries with a priority lower than 0. Value must be in the range 1 to 100, and will be rounded up|No default, must be set if using this mode| | ||
|
||
##### Server Configuration | ||
|
||
Druid uses Jetty to serve HTTP requests. | ||
Druid uses Jetty to serve HTTP requests. Each query being processed consumes a single thread from `druid.server.http.numThreads`, so consider defining `druid.query.scheduler.numThreads` to a lower value in order to reserve HTTP threads for responding to health checks, lookup loading, and other non-query, and in most cases comparatively very short lived, HTTP requests. | ||
|
||
|Property|Description|Default| | ||
|--------|-----------|-------| | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -35,6 +35,7 @@ | |
public class QueryContexts | ||
{ | ||
public static final String PRIORITY_KEY = "priority"; | ||
public static final String LANE_KEY = "lane"; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Was this added to the docs? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, I hadn't documented since I hadn't decided the behavior of whether or not a lane specified in the query context by the user should override the laning strategy, or if it should be laning strategy specific, see other comment about this |
||
public static final String TIMEOUT_KEY = "timeout"; | ||
public static final String MAX_SCATTER_GATHER_BYTES_KEY = "maxScatterGatherBytes"; | ||
public static final String MAX_QUEUED_BYTES_KEY = "maxQueuedBytes"; | ||
|
@@ -200,6 +201,11 @@ public static <T> int getPriority(Query<T> query, int defaultValue) | |
return parseInt(query, PRIORITY_KEY, defaultValue); | ||
} | ||
|
||
public static <T> String getLane(Query<T> query) | ||
{ | ||
return (String) query.getContextValue(LANE_KEY); | ||
} | ||
|
||
public static <T> boolean getEnableParallelMerges(Query<T> query) | ||
{ | ||
return parseBoolean(query, BROKER_PARALLEL_MERGE_KEY, DEFAULT_ENABLE_PARALLEL_MERGE); | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one | ||
* or more contributor license agreements. See the NOTICE file | ||
* distributed with this work for additional information | ||
* regarding copyright ownership. The ASF licenses this file | ||
* to you under the Apache License, Version 2.0 (the | ||
* "License"); you may not use this file except in compliance | ||
* with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
|
||
package org.apache.druid.query; | ||
|
||
import com.fasterxml.jackson.annotation.JsonCreator; | ||
import com.fasterxml.jackson.annotation.JsonProperty; | ||
|
||
import javax.annotation.Nullable; | ||
|
||
/** | ||
* Base serializable error response | ||
* | ||
* QueryResource and SqlResource are expected to emit the JSON form of this object when errors happen. | ||
*/ | ||
public class QueryException extends RuntimeException | ||
{ | ||
private final String errorCode; | ||
private final String errorClass; | ||
private final String host; | ||
|
||
public QueryException(Throwable cause, String errorCode, String errorClass, String host) | ||
{ | ||
super(cause == null ? null : cause.getMessage(), cause); | ||
this.errorCode = errorCode; | ||
this.errorClass = errorClass; | ||
this.host = host; | ||
} | ||
|
||
@JsonCreator | ||
public QueryException( | ||
@JsonProperty("error") @Nullable String errorCode, | ||
@JsonProperty("errorMessage") String errorMessage, | ||
@JsonProperty("errorClass") @Nullable String errorClass, | ||
@JsonProperty("host") @Nullable String host | ||
) | ||
{ | ||
super(errorMessage); | ||
this.errorCode = errorCode; | ||
this.errorClass = errorClass; | ||
this.host = host; | ||
} | ||
|
||
@Nullable | ||
@JsonProperty("error") | ||
public String getErrorCode() | ||
{ | ||
return errorCode; | ||
} | ||
|
||
@JsonProperty("errorMessage") | ||
@Override | ||
public String getMessage() | ||
{ | ||
return super.getMessage(); | ||
} | ||
|
||
@JsonProperty | ||
public String getErrorClass() | ||
{ | ||
return errorClass; | ||
} | ||
|
||
@JsonProperty | ||
public String getHost() | ||
{ | ||
return host; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not bad as is, but if i were to try to boil it down just a little, it might be something like:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, applied with some minor adjustment