Skip to content

Prewhere optimization for indexHint, remove indexHint predicates when #74958

@shribigb

Description

@shribigb

Describe the situation

When indexHint function is used with a complementary predicate, ClickHouse doesn't push the complementary predicates to Prewhere. The query needs to be restructured so that CH can use the prewhere optimization. Ideally, CH should remove the indexHint functions from the query if they are not needed to be executed and only used for filtering.

How to reproduce

CREATE TABLE test_indexHint_prewhere
(
    `id` UInt32,
    `colA` String,
    `colB` String,
    INDEX colA_tokens_idx tokens(colA) TYPE bloom_filter GRANULARITY 1,
    INDEX colB_tokens_idx tokens(colB) TYPE bloom_filter GRANULARITY 1
)
ENGINE = MergeTree
ORDER BY id
SETTINGS index_granularity = 1;

insert into test_indexHint_prewhere SELECT number,  randomPrintableASCII(30), randomPrintableASCII(40)  FROM system.numbers LIMIT 100;

For the following query:

Query 1:

select * from test_indexHint_prewhere where id in (62,88,89,67) and ((indexHint(has(tokens(colA), 'ymo82')) and colA like '%ymo82%') OR (indexHint(has(tokens(colB), 'dKappNQY6I')) and  colB like '%dKappNQY6I%'))

The corresponding test log shows the following:

2025.01.22 22:20:19.314012 [ 15942738 ] {ce06c3ac-f0ed-4fc3-9785-add535ecd49e} <Trace> MergeTreeSelectProcessor: PREWHERE condition was split into 1 steps
2025.01.22 22:20:19.314055 [ 15942738 ] {ce06c3ac-f0ed-4fc3-9785-add535ecd49e} <Test> MergeTreeSelectProcessor: PREWHERE conditions: "in(__table1.id, __set_UInt32_10358064706842451232_4242968589107113876)", Original PREWHERE DAG:

But if the query is restructured:

Query 2:

select * from test_indexHint_prewhere where id in (62,88,89,67) and (indexHint(has(tokens(colA), 'ymo82')) OR indexHint(has(tokens(colB), 'dKappNQY6I'))) and (colA like '%ymo82%' or colB like '%dKappNQY6I%')

The corresponding test log shows the following:

2025.01.22 22:21:56.223222 [ 15942738 ] {51b12568-580c-4765-a499-3217047d9428} <Trace> MergeTreeSelectProcessor: PREWHERE condition was split into 2 steps
2025.01.22 22:21:56.223298 [ 15942738 ] {51b12568-580c-4765-a499-3217047d9428} <Test> MergeTreeSelectProcessor: PREWHERE conditions: "or(like(__table1.colA, '%ymo82%'_String), like(__table1.colB, '%dKappNQY6I%'_String))", "and(or(like(__table1.colA, '%ymo82%'_String), like(__table1.colB, '%dKappNQY6I%'_String)), in(__table1.id, __set_UInt32_10358064706842451232_4242968589107113876))", Original PREWHERE DAG:

Which pushes the predicates to prewhere. Ideally, combining the indexHint with its complementary predicate is natural, like query 1. Is it possible to exclude the indexHint predicates before QueryPlanOptimizePrewhere?

Expected performance

No response

Additional context

No response

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions