You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Skip index is important for good performance in many OLAP scenarios. However, choosing the correct skip index is not easy, as the skipping effectiveness heavily depends on:
Data distribution
Workload
The traditional way of choosing an index is to add an index to the table, materialize it, and test it with queries. However, this approach has many disadvantages:
Materialize index for a table, or even a partition, is a costly task
When adding an index to a table in production, it may have bad effects on the currently running queries
This RFC proposes an extension to skip index called hypothetical index (ref: hypopg) which the following characteristics:
Hypothetical indexes don't have a physical file and cannot be materialized
Normal select query will not utilize hypothetical indexes
User can evaluate the skipping effectiveness with an EXPLAIN query
Describe the implementation
Declare a hypothetical index
Similar to normal skip indexes but use the keyword HYPOTHETICAL INDEX instead of INDEX.
Create a table with hypothetical indexes:
CREATETABLEt (
ts DateTime,
id UInt64,
s String,
HYPOTHETICAL INDEX hypo_id_minmax id TYPE minmax GRANULARITY 2
) ENGINE = CnchMergeTree() ORDER BY ts PARTITION BY toYYYMM(ts);
Add a hypothetical index to an existing table:
CREATETABLEt (
ts DateTime,
id UInt64,
s String
) ENGINE = CnchMergeTree() ORDER BY ts PARTITION BY toYYYMM(ts);
ALTERTABLE t ADD HYPOTHETICAL INDEX hypo_id_minmax id TYPE minmax GRANULARITY 2;
Evaluating the skipping effectiveness using a query
Add a new explain query:
EXPLAIN HYPOTHETICAL INDEX {hypo1,hypo2...} SELECT ...
Return schema:
table: String -- target table
total_parts: UInt64 -- the number of marks on this table
total_parts_after_indexes: UInt64 -- the number of marks after partition pruning, primary index, and existing skip indexes analysis
total_parts_after_hypo_index: UInt64 -- the number of marks after partition pruning, primary index, existing skip indexes analysis, and hypothetical indexes
index_cost: UInt64 -- Approximate size of this index if create and materialize
Using hypothetical index in index analysis
Hypothetical index only be evaluated after partition pruning, primary index, and existing skip indexes analysis.
Extend MergeTreeIndexReader. For hypothetical indexes, the read method will not read the index files but read the columns necessary to compute the index expression and generate the index granule on-the-fly.
Motivation
Skip index is important for good performance in many OLAP scenarios. However, choosing the correct skip index is not easy, as the skipping effectiveness heavily depends on:
The traditional way of choosing an index is to add an index to the table, materialize it, and test it with queries. However, this approach has many disadvantages:
This RFC proposes an extension to skip index called hypothetical index (ref: hypopg) which the following characteristics:
EXPLAIN
queryDescribe the implementation
Declare a hypothetical index
Similar to normal skip indexes but use the keyword
HYPOTHETICAL INDEX
instead ofINDEX
.Create a table with hypothetical indexes:
Add a hypothetical index to an existing table:
Evaluating the skipping effectiveness using a query
Add a new explain query:
EXPLAIN HYPOTHETICAL INDEX {hypo1,hypo2...} SELECT ...
Return schema:
Using hypothetical index in index analysis
Hypothetical index only be evaluated after partition pruning, primary index, and existing skip indexes analysis.
Extend
MergeTreeIndexReader
. For hypothetical indexes, theread
method will not read the index files but read the columns necessary to compute the index expression and generate the index granule on-the-fly.cc @hustnn @kevinthfang @Clark0 @chenxing-xc
The text was updated successfully, but these errors were encountered: