Partition pruning not working as expected #7948

nvartolomei · 2019-11-27T16:59:14Z

How to reproduce

Which ClickHouse server version to use: v19.17.2.4-testing

DROP TABLE IF EXISTS test_partition_filtering;

CREATE TABLE test_partition_filtering (
    `timestamp` DateTime,
    zoneId UInt64
) ENGINE = MergeTree() 
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY (zoneId, timestamp);

INSERT INTO test_partition_filtering
SELECT
  toUInt64(now())-1000*number/1000 as timestamp,
  number/1000 as zone
FROM numbers(1000000);

OPTIMIZE TABLE test_partition_filtering final;

SELECT count() FROM test_partition_filtering WHERE toDate(toStartOfDay(timestamp)) = today() and zoneId=42;
SELECT count() FROM test_partition_filtering WHERE toDate(timestamp) = today() AND zoneId = 42;

Expected behavior
Expect both SELECTS to prune partitions and to read just a single part.

Actual behavior
First query reads one part. Second query reads 12 parts.

Logs

executeQuery: (from 127.0.0.1:48812) SELECT count() FROM test_partition_filtering WHERE (toDate(toStartOfDay(timestamp)) = today()) AND (zoneId = 42)
InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "toDate(toStartOfDay(timestamp)) = today()" moved to PREWHERE
default.test_partition_filtering (SelectExecutor): Key condition: (column 0 in [42, 42]), (toDate(toStartOfDay(column 1)) in [18227, 18227]), and
default.test_partition_filtering (SelectExecutor): MinMax index condition: unknown, (toDate(toStartOfDay(column 0)) in [18227, 18227]), and
default.test_partition_filtering (SelectExecutor): Selected 1 parts by date, 1 parts by key, 1 marks to read from 1 ranges



executeQuery: (from 127.0.0.1:48812) SELECT count() FROM test_partition_filtering WHERE (toDate(timestamp) = today()) AND (zoneId = 42)
InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "toDate(timestamp) = today()" moved to PREWHERE
default.test_partition_filtering (SelectExecutor): Key condition: (column 0 in [42, 42]), (toDate(column 1) in [18227, 18227]), and
default.test_partition_filtering (SelectExecutor): MinMax index condition: unknown, (toDate(column 0) in [18227, 18227]), and
default.test_partition_filtering (SelectExecutor): Selected 12 parts by date, 1 parts by key, 1 marks to read from 1 ranges

The text was updated successfully, but these errors were encountered:

victor-perov · 2019-11-27T17:02:57Z

Is it reproducible on a stable version as well?

nvartolomei · 2019-11-27T17:36:02Z

@victor-perov 19.17.2.4-testing is a stable version even though system.build_options reports it as testing.

den-crane · 2019-11-27T18:20:21Z

each part has min_max_timestamp.idx file.
This file stores max & min values of timestamp column over this part.

pruning works with where timestamp > = <
pruning does not work with function(timestamp) > = < because function could be one-way_function

imho in these queries pruning does not work at all, only PK.

den-crane · 2019-11-27T19:34:09Z

Though maybe I am wrong.

CREATE TABLE test_partition_filtering (
    timestamp DateTime) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY tuple();

INSERT INTO test_partition_filtering SELECT '2019-01-01 10:00:00' FROM numbers(1000000);
INSERT INTO test_partition_filtering SELECT '2019-01-02 10:00:00' FROM numbers(1000000);
INSERT INTO test_partition_filtering SELECT toDateTime(today())+3600 FROM numbers(1000000);

select count() from test_partition_filtering 
where toDate(toStartOfDay(timestamp)) = today();
Processed 1.00 million rows

select count() from test_partition_filtering 
where toDate((timestamp)) = today();
Processed 1.00 million rows

victor-perov · 2019-12-09T13:53:13Z

It would be nice to get an update on the issue.

hagen1778 · 2020-01-15T11:38:24Z

I'm hitting the same issue. Is there any updates?

den-crane · 2020-09-24T23:37:28Z

related: #15255

20.10.1.4704.
SELECT count() FROM test_partition_filtering WHERE toDate(toStartOfDay(timestamp)) = today() and zoneId=42;
Selected 1 parts by date, 1 parts by key, 1 marks by primary key, 1 marks to read from 1 ranges
Processed 8.19 thousand rows

SELECT count() FROM test_partition_filtering WHERE toDate(timestamp) = today() AND zoneId = 42;
Selected 1 parts by date, 1 parts by key, 1 marks by primary key, 1 marks to read from 1 ranges
Processed 8.19 thousand rows

seems fixed starting with 20.8

amosbird · 2020-09-25T03:45:59Z

It's fixed in #13497 . And toDate(timestamp_ms / 1000) will also work because of #14513 .

vitalvi · 2021-04-29T04:29:25Z

Should partition pruning work with functions comming from CTE, for example:

with toDate('2019-01-02') as dt　select count() from test_partition_filtering　where toDate((timestamp)) = dt;

1 rows in set. Elapsed: 0.025 sec. Processed 3.00 million rows, 12.00 MB (120.75 million rows/s., 483.00 MB/s.)

Is this expected that 3 million rows have been processed in this case?

amosbird · 2021-04-29T04:35:48Z

Should partition pruning work with functions comming from CTE, for example:
with toDate('2019-01-02') as dt　select count() from test_partition_filtering　where toDate((timestamp)) = dt;

1 rows in set. Elapsed: 0.025 sec. Processed 3.00 million rows, 12.00 MB (120.75 million rows/s., 483.00 MB/s.)
Is this expected that 3 million rows have been processed in this case?

This is not CTE but scalar alias. It should work. Please share a minimal reproduceable test case.

vitalvi · 2021-04-29T05:01:12Z

I use the same table definition and data as provided above:

CREATE TABLE test_partition_filtering (
    timestamp DateTime) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY tuple();

INSERT INTO test_partition_filtering SELECT '2019-01-01 10:00:00' FROM numbers(1000000);
INSERT INTO test_partition_filtering SELECT '2019-01-02 10:00:00' FROM numbers(1000000);
INSERT INTO test_partition_filtering SELECT toDateTime(today())+3600 FROM numbers(1000000);

with toDate('2019-01-02') as dt　select count() from test_partition_filtering　where toDate((timestamp)) = dt;
1 rows in set. Elapsed: 0.025 sec. Processed 3.00 million rows, 12.00 MB (120.75 million rows/s., 483.00 MB/s.)

but

select count() from test_partition_filtering　where toDate((timestamp)) = toDate('2019-01-02');
1 rows in set. Elapsed: 0.014 sec. Processed 1.00 million rows, 4.00 MB (71.74 million rows/s., 286.97 MB/s.)

amosbird · 2021-04-29T06:05:11Z

It's already fixed in #21766

vitalvi · 2021-04-29T06:27:24Z

ohh, pretty new one. Yes, with the latest version it works as expected. Sorry for the noise and thank you @amosbird

nvartolomei added the bug Confirmed user-visible misbehaviour in official release label Nov 27, 2019

filimonov mentioned this issue Jun 19, 2020

Pick the correct partition when partition key is a function of a column used in condition #11796

Closed

den-crane added the performance label Sep 24, 2020

alexey-milovidov closed this as completed Sep 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partition pruning not working as expected #7948

Partition pruning not working as expected #7948

nvartolomei commented Nov 27, 2019 •

edited

victor-perov commented Nov 27, 2019

nvartolomei commented Nov 27, 2019

den-crane commented Nov 27, 2019 •

edited

den-crane commented Nov 27, 2019 •

edited

victor-perov commented Dec 9, 2019

hagen1778 commented Jan 15, 2020

den-crane commented Sep 24, 2020 •

edited

amosbird commented Sep 25, 2020

vitalvi commented Apr 29, 2021

amosbird commented Apr 29, 2021

vitalvi commented Apr 29, 2021 •

edited

amosbird commented Apr 29, 2021

vitalvi commented Apr 29, 2021

Partition pruning not working as expected #7948

Partition pruning not working as expected #7948

Comments

nvartolomei commented Nov 27, 2019 • edited

victor-perov commented Nov 27, 2019

nvartolomei commented Nov 27, 2019

den-crane commented Nov 27, 2019 • edited

den-crane commented Nov 27, 2019 • edited

victor-perov commented Dec 9, 2019

hagen1778 commented Jan 15, 2020

den-crane commented Sep 24, 2020 • edited

amosbird commented Sep 25, 2020

vitalvi commented Apr 29, 2021

amosbird commented Apr 29, 2021

vitalvi commented Apr 29, 2021 • edited

amosbird commented Apr 29, 2021

vitalvi commented Apr 29, 2021

nvartolomei commented Nov 27, 2019 •

edited

den-crane commented Nov 27, 2019 •

edited

den-crane commented Nov 27, 2019 •

edited

den-crane commented Sep 24, 2020 •

edited

vitalvi commented Apr 29, 2021 •

edited