-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
query laning and load shedding #9407
Changes from 6 commits
6220985
3f41001
feae8b1
554b8b5
0597c3f
405c94e
e22ece1
e98cad7
2069437
eaf1449
688ca43
f0b3f9f
87c6cbd
5e91bcb
912b7bc
1e384bf
60861a4
9aed16e
419ab98
f0d39e1
2afaaf1
ef029c4
059a2d4
0ec8a26
5711fce
aa73a14
50847af
abe3631
86501ef
8b7b70d
91ad9d9
373fd11
2741501
8575cf8
25a8bda
32965a4
361e06e
3ba7808
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1476,9 +1476,31 @@ These Broker configurations can be defined in the `broker/runtime.properties` fi | |
|`druid.broker.select.tier`|`highestPriority`, `lowestPriority`, `custom`|If segments are cross-replicated across tiers in a cluster, you can tell the broker to prefer to select segments in a tier with a certain priority.|`highestPriority`| | ||
|`druid.broker.select.tier.custom.priorities`|`An array of integer priorities.`|Select servers in tiers with a custom priority list.|None| | ||
|
||
##### Query laning | ||
|
||
Druid provides facilities to aid in query capacity reservation for heterogenous query workloads in the form of 'laning' strategies, which provide a variety of mechanisms examine and classify a query at the broker, assigning it to a 'lane'. Lanes are defined with capacity limits which the broker will enforce, causing requests in excess of the capacity to be discarded with an HTTP 429 status code, reserving resources for other lanes or for interactive queries (with no lane). | ||
|
||
|Property|Description|Default| | ||
|--------|-----------|-------| | ||
|`druid.query.scheduler.numThreads`|Maximum number of HTTP threads to dedicate to query processing. To save HTTP thread capacity, this should be lower than `druid.server.http.numThreads`.|Unbounded| | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. in what use case would I ever want to set it something other than There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I actually think we might always want to set it lower than
See my above nervousness, but I think
I would agree the current documentation doesn't quite adequately describe how this stuff might be utilized, in a future PR i want to add a section to cluster tuning docs to more properly advise on when and how to set this stuff up. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks, I understand the reasoning now. Major behavior change with lane usage is really losing the queuing of requests to handle spikes and instead sending 429s immediately. In future, we could introduce mechanism to maintain statically/dynamically sized [per lane] waiting queue ourselves as well along with concurrency limits in lane strategy. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if it would make sense to instead move towards automatically computing
Yeah the current behavior is definitely a hard stop if you are over the line. I agree it would make sense to allow some sort of timed out queuing behavior, which is what jetty QoS filter can sort of provide, which is a large part of why I am still wondering if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is fine to let user provide There are few advantages in maintaining the queues ourselves and not letting jetty do it.
|
||
|`druid.query.scheduler.laning.type`|Query laning strategy to use to assign queries to a lane in order to control capacities for certain classes of queries.|`none`| | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps the setting name ("type") and the name in the docs ("strategy") should be made consistent. In terms of documentation flow, it may be helpful to add a section below (e.g. "Laning strategy") and reference it here. "No laning strategy", etc. would be children on this new section. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With regards to documentation flow, I had made the suggested change in a branch I intend to follow-up with: clintropolis/druid@query-laning-and-load-shedding...clintropolis:query-auto-prioritization I wasn't sure if it made sense to break down yet in this branch because only the laning strategy exists, where as in that branch the scheduler also now has a prioritization strategy. I can go ahead and pull that part back of the doc change back into this PR though. |
||
|
||
###### No laning strategy | ||
|
||
In this mode, queries are never assigned a lane and only limited by `druid.server.http.numThreads` or `druid.query.scheduler.numThreads`, if set. This is the default Druid query scheduler operating mode. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider adding: This strategy can be enabled by setting |
||
###### 'High/Low' laning strategy | ||
This laning strategy splits queries with a `priority` below zero into a `low` query lane, automatically. The limit on `low` queries can be set to some desired fraction of the total capacity (or HTTP thread pool size), reserving capacity for interactive queries. Queries in the `low` lane are _not_ guaranteed their capacity, which may be consumed by interactive queries, but may use up to this limit if total capacity is available. | ||
|
||
This strategy can be enabled by setting `druid.query.scheduler.laning.type` to `hilo`. | ||
|
||
|Property|Description|Default| | ||
|--------|-----------|-------| | ||
|`druid.query.scheduler.laning.maxLowThreads`|Maximum number of HTTP threads that can be used by queries with a priority lower than 0.|No default, must be set if using this mode| | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps rename to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure I agree that it is particularly confusing, since the HTTP setting is Would this setting be better as a percentage, so one property could be applicable to either usage? It doesn't seem like it would be hard to switch, would just need to adjust There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. reworked/renamed to |
||
|
||
##### Server Configuration | ||
|
||
Druid uses Jetty to serve HTTP requests. | ||
Druid uses Jetty to serve HTTP requests. Each query being processed consumes a single thread from `druid.server.http.numThreads`, so consider defining `druid.query.scheduler.numThreads` to a lower value in order to reserve HTTP threads for responding to health checks, lookup loading, and other non-query, and in most cases comparatively very short lived, HTTP requests. | ||
|
||
|Property|Description|Default| | ||
|--------|-----------|-------| | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -35,6 +35,7 @@ | |
public class QueryContexts | ||
{ | ||
public static final String PRIORITY_KEY = "priority"; | ||
public static final String LANE_KEY = "lane"; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Was this added to the docs? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, I hadn't documented since I hadn't decided the behavior of whether or not a lane specified in the query context by the user should override the laning strategy, or if it should be laning strategy specific, see other comment about this |
||
public static final String TIMEOUT_KEY = "timeout"; | ||
public static final String MAX_SCATTER_GATHER_BYTES_KEY = "maxScatterGatherBytes"; | ||
public static final String MAX_QUEUED_BYTES_KEY = "maxQueuedBytes"; | ||
|
@@ -200,6 +201,11 @@ public static <T> int getPriority(Query<T> query, int defaultValue) | |
return parseInt(query, PRIORITY_KEY, defaultValue); | ||
} | ||
|
||
public static <T> String getLane(Query<T> query) | ||
{ | ||
return (String) query.getContextValue(LANE_KEY); | ||
} | ||
|
||
public static <T> boolean getEnableParallelMerges(Query<T> query) | ||
{ | ||
return parseBoolean(query, BROKER_PARALLEL_MERGE_KEY, DEFAULT_ENABLE_PARALLEL_MERGE); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.