Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partition pruning does not work on DateTime columns when toDate function is applied #15255

Closed
siradjev opened this issue Sep 24, 2020 · 3 comments

Comments

@siradjev
Copy link

siradjev commented Sep 24, 2020

Describe the bug
Partition pruning does not work on DateTime columns when toDate function is applied and number of distinct values of partition column in partition are more than 1 (those partitions are not eliminated, others are eliminated.

How to reproduce

  • v 20.3.11.97

drop table if exists example_table_MM;
drop table if exists example_table_DD;
CREATE TABLE example_table_MM
(
    d DateTime,
    a Int
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(d)
ORDER BY tuple();
CREATE TABLE example_table_DD
(
    d DateTime,
    a Int
)
ENGINE = MergeTree
PARTITION BY toYYYYMMDD(d)
ORDER BY tuple();

truncate table example_table_MM;
/* 2020-09 2020-08 partitions won't be eliminated, 10 and 11 will be */

insert into example_table_MM 
select toDateTime(toDate('2020-09-23')), number from numbers(1000000) UNION ALL 
select toDateTime(toDate('2020-09-24')), number from numbers(1000000) UNION ALL
select toDateTime(toDate('2020-09-25')), number from numbers(1000000) UNION ALL
select toDateTime(toDate('2020-08-15')), number from numbers(1000000) UNION ALL
select toDateTime(toDate('2020-08-16')), number from numbers(1000000) UNION ALL
select toDateTime(toDate('2020-10-15')), number from numbers(1000000) UNION ALL
select toDateTime(toDate('2020-11-15')), number from numbers(1000000);

truncate table example_table_DD ;
/* 2020-09-23 will not be eliminated */
insert into example_table_DD
select toDateTime(toDate('2020-09-23')), number from numbers(1000000) UNION ALL
select toDateTime(toDateTime('2020-09-23 11:00:00')), number from numbers(1000000) UNION ALL
select toDateTime(toDate('2020-09-24')), number from numbers(1000000) UNION ALL
select toDateTime(toDate('2020-09-25')), number from numbers(1000000) UNION ALL
select toDateTime(toDate('2020-08-15')), number from numbers(1000000);

Expected behavior
Check processed row counts:

HOSTNAME :) select count() from example_table_MM where toDate(d)=toDate('2020-10-15'); -- shall be 1M, scans 6M
:-]
:-] ;

SELECT count()
FROM example_table_MM
WHERE toDate(d) = toDate('2020-10-15')

┌─count()─┐
│ 1000000 │
└─────────┘

1 rows in set. Elapsed: 0.008 sec. Processed 6.00 million rows, 24.00 MB (747.50 million rows/s., 2.99 GB/s.)

HOSTNAME :) select count() from example_table_MM where toDate(d)=toDate('2020-08-15'); -- shall be 2M, scans 5M
:-]
:-] ;

SELECT count()
FROM example_table_MM
WHERE toDate(d) = toDate('2020-08-15')

┌─count()─┐
│ 1000000 │
└─────────┘

1 rows in set. Elapsed: 0.007 sec. Processed 5.00 million rows, 20.00 MB (709.42 million rows/s., 2.84 GB/s.)

HOSTNAME :) select count() from example_table_DD where toDate(d)=toDate('2020-09-24'); -- shall be 1M, scans 3M
:-] ;

SELECT count()
FROM example_table_DD
WHERE toDate(d) = toDate('2020-09-24')

┌─count()─┐
│ 1000000 │
└─────────┘

1 rows in set. Elapsed: 0.012 sec. Processed 3.00 million rows, 12.00 MB (246.84 million rows/s., 987.35 MB/s.)

Additional context
Without typecasting it works correctly.

@siradjev siradjev added the bug Confirmed user-visible misbehaviour in official release label Sep 24, 2020
@siradjev
Copy link
Author

Seems this one is fixed for 20.7.3.7 https://t.me/clickhouse_ru/183597
But it would be good to have test cases, given how frequently toDate(DateTimeColumn)=toDate('YYYY-MM-DD') construct breaks...

@siradjev
Copy link
Author

or 20.9... https://t.me/clickhouse_ru/183624

@UnamedRus
Copy link
Contributor

UnamedRus commented Sep 24, 2020

Still broken in 20.7, but it seems fixed in 20.8, probably that pr

CREATE TABLE part_prunning_test
(
    `ts` DateTime,
    `dt` Date DEFAULT toDate(ts),
    `db_time` DateTime DEFAULT ts
)
ENGINE = MergeTree()
PARTITION BY toDate(ts)
ORDER BY tuple()
SETTINGS index_granularity = 8192



INSERT INTO part_prunning_test(ts) SELECT toDateTime('2020-09-10 00:00:00') + INTERVAL number SECOND FROM numbers(3000000);

clickhouse version 20.7.3

select count() from part_prunning_test where toDate(ts)='2020-09-24';
1 rows in set. Elapsed: 0.369 sec. Processed 2.97 million rows, 11.88 MB (8.04 million rows/s., 32.17 MB/s.)
select count() from part_prunning_test where toDate(ts)=toDate('2020-09-24');
1 rows in set. Elapsed: 0.370 sec. Processed 2.97 million rows, 11.88 MB (8.02 million rows/s., 32.10 MB/s.)


clickhouse version 20.8.3

select count() from part_prunning_test where toDate(ts)='2020-09-24';
Selected 1 parts by date, 1 parts by key, 11 marks by primary key, 11 marks to read from 1 ranges
1 rows in set. Elapsed: 0.255 sec. Processed 86.40 thousand rows, 345.60 KB (338.89 thousand rows/s., 1.36 MB/s.)

select count() from part_prunning_test where ts>='2020-09-24 00:00:00' AND ts < '2020-09-25 00:00:00';
 Selected 1 parts by date, 1 parts by key, 11 marks by primary key, 11 marks to read from 1 ranges
1 rows in set. Elapsed: 0.256 sec. Processed 86.40 thousand rows, 345.60 KB (337.46 thousand rows/s., 1.35 MB/s.)

@den-crane den-crane added v20.7-affected st-fixed and removed bug Confirmed user-visible misbehaviour in official release labels Sep 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants