-
Notifications
You must be signed in to change notification settings - Fork 8.1k
Description
Company or project name
ClickHouse
Use case
ClickHouse commonly executes the same queries repeatedly. Currently, query analysis and planning are performed from scratch each time a query is received, even for identical queries. This overhead also affects VIEWs:
- Ordinary VIEWs: The underlying query is re-analyzed on every SELECT from the view.
- Materialized VIEWs: The underlying query is re-analyzed on every INSERT into the view.
In certain scenarios, query analysis and planning overhead becomes significant relative to total query duration, particularly for:
- Queries with sub-second execution times.
- Queries that extensively use aliases (see #86019, #72721). Aliases can significantly increase query size and, therefore, increase analysis cost.
While alias analysis performance can be improved through caching identifier resolution results (see #88043), this approach has inherent limitations. Identifier resolution caching cannot always be applied and must sometimes be invalidated—for example, when the group_by_use_nulls setting is enabled. These constraints motivate a more comprehensive query plan caching solution.
Describe the solution you'd like
Similar to Query Cache (#34011), ClickHouse could cache SELECT query plans using query AST hash as a key.
Possible implementation
Caching a universal query plan
Once the SELECT query is analyzed and planned, the resulting query plan can be cloned using IQueryPlan::clone() and stored in the query plan cache.
However, the same query may read different data on each execution. Additionally, reading steps may hold resources (e.g., preventing removal of already-merged data parts). To address these issues:
- All reading query plan steps should be replaced with a generic
ReadFromTableStepin the cache entry. - All filters from
PrewhereInfoin the reading step should be copied into theReadFromTableStep.
This ensures the cached plan remains universal and resource-independent.
Executing cached SELECT queries
When a matching cache entry exists for a query:
- Extract and copy the cached query plan.
- Replace all
ReadFromTableStepinstances with correct physical reading steps using theQueryPlan::resolveStoragesmethod. - Check if row-policy should be used.
- Apply query plan level optimizations to the extracted plan.
Query plan optimizations must be reapplied because the optimal plan may differ between executions. Possible reasons:
- Changed statistics: New data in tables can affect statistics, altering the optimal join order.
- Schema changes: New vector indexes or projections may have been added to target tables.
- Configuration changes: Query plan optimizations may have been enabled or disabled
Updating query cache entries
It may be beneficial to update cached query plans to reduce the number of optimizations applied on each query execution. This could be triggered when the number of modified query plan steps during optimization exceeds a configurable threshold.
Views
Materialized Views
SELECT queries from MATERIALIZED VIEWs can be "compiled" into query plans during CREATE or ATTACH operations and subsequently reused. The QueryPlan::resolveStorages method would need modification to properly handle view logic (replacing the storage).
Ordinary Views
Ordinary views present a more complex challenge because ClickHouse must be able to optimize the cached query plan when only a subset of view columns is read. Two potential approaches:
- (Optimal) Enhance the unused column removal optimization introduced in #76487 to fully cover distributed cases (as described in #81718).
- (Simple) Use the tuple (
query_hash, columns_to_read)as the cache key.
Describe alternatives you've considered
No response
Additional context
Query plans cannot be cached in certain cases:
- Non-correlated scalar subqueries: These subqueries are evaluated during query analysis, and their result values may differ between executions. A dedicated setting could optionally allow caching by assuming the subquery result is static and unchanging across runs.