ReadFromMergeTree use MinMax index before Partition key to filter parts #48093

zhongwang97 · 2023-03-28T07:34:18Z

I have created a table which use DateTime as Partition Key

CREATE TABLE hits_simple
(
    `UserID` UInt32,
    `URL` String,
    `EventTime` DateTime
)
ENGINE = MergeTree
PRIMARY KEY (UserID, URL)
ORDER BY (UserID, URL, EventTime)
PARTITION BY toYYYYMMDD(EventTime)
SETTINGS index_granularity = 8192, index_granularity_bytes = 0;

and try to query one row, just for test

select * from hits_simple where  UserID = 279588 AND EventTime = '2014-03-17 18:08:53'

the explain show ClickHouse use MinMax to filter parts first, then use Partition key to find corresponding parts

I would like to know why not use Partition key first as it seems more efficient
thanks

The text was updated successfully, but these errors were encountered:

save-my-heart · 2023-03-28T09:04:47Z

Actually, the MinMax here is also partition key.

zhongwang97 · 2023-03-28T09:34:08Z

Actually, the MinMax here is also partition key.

Yes, I understand that the MinMax index here is based on the minimum and maximum values of each column in the data part, which is determined by the partition key.

However, this approach requires checking all data parts in the first step. Wouldn't it be more natural and efficient to first locate the corresponding data part based on the partition key and then further filter using MinMax within those data parts?

Or is there any special concern here?

den-crane · 2023-04-29T01:58:53Z

However, this approach requires checking all data parts in the first step. Wouldn't it be more natural and efficient to first locate the corresponding data part based on the partition key and then further filter using MinMax within those data parts?

it's the same. To locate the corresponding data part based on the partition key requires checking all data parts in the first step. Partition pruning checks all parts.

arloor · 2023-08-15T12:51:32Z

It looks like that Clickhouse always creates a Minmax index on datatime columns if these columns are part of partition key.
I guess that the purpose is to filter many data parts by time attribute.
OLAP queries usually come with a time attribute.
So the auto-created Minmax index of datetime in partition key will accelerate a lot of queries.

above is my guess. is that right?

tnhminh · 2023-11-16T07:55:43Z

Actually, the MinMax here is also partition key.

yes, i think so
I tried to make the MinMax use my custom key such as : UserID
The solution is as below :
Partition by (UseID) -> done

zhongwang97 added the question Question? label Mar 28, 2023

den-crane closed this as completed Apr 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReadFromMergeTree use MinMax index before Partition key to filter parts #48093

ReadFromMergeTree use MinMax index before Partition key to filter parts #48093

zhongwang97 commented Mar 28, 2023

save-my-heart commented Mar 28, 2023

zhongwang97 commented Mar 28, 2023

den-crane commented Apr 29, 2023

arloor commented Aug 15, 2023

tnhminh commented Nov 16, 2023

ReadFromMergeTree use MinMax index before Partition key to filter parts #48093

ReadFromMergeTree use MinMax index before Partition key to filter parts #48093

Comments

zhongwang97 commented Mar 28, 2023

save-my-heart commented Mar 28, 2023

zhongwang97 commented Mar 28, 2023

den-crane commented Apr 29, 2023

arloor commented Aug 15, 2023

tnhminh commented Nov 16, 2023