Skip to content

Commit e7b21cf

Browse files
authored
Blog post: Query priority (#7021)
Signed-off-by: Justin Jung <jungjust@amazon.com>
1 parent 294740a commit e7b21cf

File tree

3 files changed

+94
-0
lines changed

3 files changed

+94
-0
lines changed
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
---
2+
date: 2025-09-08
3+
title: "Query Priority in Cortex"
4+
linkTitle: Query Priority in Cortex
5+
tags: [ "blog", "cortex", "query", "optimization" ]
6+
categories: [ "blog" ]
7+
projects: [ "cortex" ]
8+
description: >
9+
This article explores how the query priority can be used to improve availability and performance of critical queries.
10+
author: Justin Jung ([@justinjung04](https://github.com/justinjung04))
11+
---
12+
13+
## Introduction
14+
15+
In high-scale monitoring environments, not all queries are created equal. Some queries power critical dashboards that need sub-second response times, while others are exploratory analytics that can tolerate delays. However, the queries from a tenant is handled FIFO (first-in-first-out), which could lead to a noisy-neighbor problem within the user.
16+
17+
![Query FIFO](/images/blog/2025/query-fifo.png)
18+
19+
Cortex's query priority feature addresses this challenge by allowing operators to reserve querier resources for high-priority queries.
20+
21+
## What is Query Priority?
22+
23+
Query priority in Cortex enables you to classify queries based on various attributes and allocate dedicated querier resources to different priority levels. The system works by:
24+
25+
1. Matching queries against configurable attributes (regex patterns, time ranges, API types, user agents, dashboard UIDs)
26+
2. Assigning priority levels to matched queries
27+
3. Reserving querier capacity as a percentage for each priority level
28+
29+
![Query Priority](/images/blog/2025/query-priority.png)
30+
31+
### Configuration Example
32+
33+
```
34+
query_priority:
35+
enabled: true
36+
default_priority: 0
37+
priorities:
38+
- priority: 100
39+
reserved_queriers: 0.1 # Reserve 10% of queriers
40+
query_attributes:
41+
- regex: ".*alert.*" # Alert queries
42+
- priority: 50
43+
reserved_queriers: 0.05 # Reserve 5% of queriers
44+
query_attributes:
45+
- api_type: "query_range"
46+
time_range_limit:
47+
max: "1h" # Dashboard queries (short range)
48+
user_agent_regex: "Grafana.*"
49+
```
50+
51+
## Benefits
52+
53+
### 1. Preventing Resource Starvation
54+
55+
The most compelling use case is protecting critical queries from resource-hungry analytical workloads. Without priority, a few expensive queries scanning months of data can starve dashboard queries, causing user-facing alerts to timeout.
56+
57+
### 2. SLA Differentiation
58+
59+
Organizations can offer different service levels:
60+
61+
* Tier 1: Real-time dashboards and alerts (high priority)
62+
* Tier 2: Business intelligence queries (medium priority)
63+
* Tier 3: Ad-hoc exploration and data exports (low priority)
64+
65+
## Drawbacks
66+
67+
### 1. Resource Underutilization
68+
69+
Reserved queriers sit idle when high-priority queries aren't running. If you reserve 30% capacity for dashboard queries that only use 10% during off-peak hours, you're wasting 20% of your infrastructure.
70+
71+
### 2. Configuration Complexity
72+
73+
Query attributes require careful tuning:
74+
75+
* Regex patterns can be brittle and hard to maintain
76+
* Time window matching needs constant adjustment as usage patterns evolve
77+
* Dashboard UID matching breaks when dashboards are recreated
78+
79+
## Best Practices
80+
81+
When to use query priority
82+
83+
* High query volume with mixed workload types
84+
* Clear SLA requirements that justify the complexity
85+
* Stable query patterns that won't require frequent reconfiguration
86+
87+
When to avoid it:
88+
89+
* Homogeneous workloads where all queries have similar requirements
90+
* Unstable environments where query patterns change frequently
91+
92+
## Conclusion
93+
94+
If your users have different SLA requirements depending on their query patterns that’s consistent, this is a power feature that helps towards meeting expected availability and performance for queries. There is also plenty of room for this logic to be improved in the future, such that it self-adapts to the query pattern of the users and dynamically assign priorities to balance SLAs with fairness across different query patterns.
76.2 KB
Loading
87.5 KB
Loading

0 commit comments

Comments
 (0)