RFC: Hypothetical indexes #191

canhld94 · 2023-03-27T07:42:47Z

Motivation

Skip index is important for good performance in many OLAP scenarios. However, choosing the correct skip index is not easy, as the skipping effectiveness heavily depends on:

Data distribution
Workload

The traditional way of choosing an index is to add an index to the table, materialize it, and test it with queries. However, this approach has many disadvantages:

Materialize index for a table, or even a partition, is a costly task
When adding an index to a table in production, it may have bad effects on the currently running queries

This RFC proposes an extension to skip index called hypothetical index (ref: hypopg) which the following characteristics:

Hypothetical indexes don't have a physical file and cannot be materialized
Normal select query will not utilize hypothetical indexes
User can evaluate the skipping effectiveness with an EXPLAIN query

Describe the implementation

Declare a hypothetical index

Similar to normal skip indexes but use the keyword HYPOTHETICAL INDEX instead of INDEX.
Create a table with hypothetical indexes:

CREATE TABLE t (
    ts DateTime,
    id UInt64,
    s String,
    HYPOTHETICAL INDEX hypo_id_minmax id TYPE minmax GRANULARITY 2
) ENGINE = CnchMergeTree() ORDER BY ts PARTITION BY toYYYMM(ts);

Add a hypothetical index to an existing table:

CREATE TABLE t (
    ts DateTime,
    id UInt64,
    s String
) ENGINE = CnchMergeTree() ORDER BY ts PARTITION BY toYYYMM(ts);
ALTER TABLE t ADD HYPOTHETICAL INDEX hypo_id_minmax id TYPE minmax GRANULARITY 2;

Evaluating the skipping effectiveness using a query

Add a new explain query:

EXPLAIN HYPOTHETICAL INDEX {hypo1,hypo2...} SELECT ...

Return schema:

table: String -- target table
total_parts: UInt64 -- the number of marks on this table
total_parts_after_indexes: UInt64 -- the number of marks after partition pruning, primary index, and existing skip indexes analysis
total_parts_after_hypo_index: UInt64 -- the number of marks after partition pruning, primary index, existing skip indexes analysis, and hypothetical indexes
index_cost: UInt64 -- Approximate size of this index if create and materialize

Using hypothetical index in index analysis

Hypothetical index only be evaluated after partition pruning, primary index, and existing skip indexes analysis.

Extend MergeTreeIndexReader. For hypothetical indexes, the read method will not read the index files but read the columns necessary to compute the index expression and generate the index granule on-the-fly.

cc @hustnn @kevinthfang @Clark0 @chenxing-xc

The text was updated successfully, but these errors were encountered:

canhld94 mentioned this issue Mar 27, 2023

Hypothetical indexes #192

Closed

4 tasks

Adora627 added the enhancement New feature or request label Apr 21, 2023

kevinthfang added good first issue Good for newcomers and removed good first issue Good for newcomers labels Dec 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Hypothetical indexes #191

RFC: Hypothetical indexes #191

canhld94 commented Mar 27, 2023

RFC: Hypothetical indexes #191

RFC: Hypothetical indexes #191

Comments

canhld94 commented Mar 27, 2023

Motivation

Describe the implementation

Declare a hypothetical index

Evaluating the skipping effectiveness using a query

Using hypothetical index in index analysis