Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Hypothetical indexes #191

Open
canhld94 opened this issue Mar 27, 2023 · 0 comments
Open

RFC: Hypothetical indexes #191

canhld94 opened this issue Mar 27, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@canhld94
Copy link
Collaborator

Motivation

Skip index is important for good performance in many OLAP scenarios. However, choosing the correct skip index is not easy, as the skipping effectiveness heavily depends on:

  • Data distribution
  • Workload

The traditional way of choosing an index is to add an index to the table, materialize it, and test it with queries. However, this approach has many disadvantages:

  • Materialize index for a table, or even a partition, is a costly task
  • When adding an index to a table in production, it may have bad effects on the currently running queries

This RFC proposes an extension to skip index called hypothetical index (ref: hypopg) which the following characteristics:

  • Hypothetical indexes don't have a physical file and cannot be materialized
  • Normal select query will not utilize hypothetical indexes
  • User can evaluate the skipping effectiveness with an EXPLAIN query

Describe the implementation

Declare a hypothetical index

Similar to normal skip indexes but use the keyword HYPOTHETICAL INDEX instead of INDEX.
Create a table with hypothetical indexes:

CREATE TABLE t (
    ts DateTime,
    id UInt64,
    s String,
    HYPOTHETICAL INDEX hypo_id_minmax id TYPE minmax GRANULARITY 2
) ENGINE = CnchMergeTree() ORDER BY ts PARTITION BY toYYYMM(ts);

Add a hypothetical index to an existing table:

CREATE TABLE t (
    ts DateTime,
    id UInt64,
    s String
) ENGINE = CnchMergeTree() ORDER BY ts PARTITION BY toYYYMM(ts);
ALTER TABLE t ADD HYPOTHETICAL INDEX hypo_id_minmax id TYPE minmax GRANULARITY 2;

Evaluating the skipping effectiveness using a query

Add a new explain query:

EXPLAIN HYPOTHETICAL INDEX {hypo1,hypo2...} SELECT ...

Return schema:

table: String -- target table
total_parts: UInt64 -- the number of marks on this table
total_parts_after_indexes: UInt64 -- the number of marks after partition pruning, primary index, and existing skip indexes analysis
total_parts_after_hypo_index: UInt64 -- the number of marks after partition pruning, primary index, existing skip indexes analysis, and hypothetical indexes
index_cost: UInt64 -- Approximate size of this index if create and materialize

Using hypothetical index in index analysis

Hypothetical index only be evaluated after partition pruning, primary index, and existing skip indexes analysis.

Extend MergeTreeIndexReader. For hypothetical indexes, the read method will not read the index files but read the columns necessary to compute the index expression and generate the index granule on-the-fly.

cc @hustnn @kevinthfang @Clark0 @chenxing-xc

@canhld94 canhld94 mentioned this issue Mar 27, 2023
4 tasks
@Adora627 Adora627 added the enhancement New feature or request label Apr 21, 2023
@kevinthfang kevinthfang added good first issue Good for newcomers and removed good first issue Good for newcomers labels Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants