Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize_read_in_order slows down queries which takes only a few rows from set. #17364

Open
UnamedRus opened this issue Nov 24, 2020 · 3 comments · May be fixed by #64607
Open

optimize_read_in_order slows down queries which takes only a few rows from set. #17364

UnamedRus opened this issue Nov 24, 2020 · 3 comments · May be fixed by #64607
Assignees
Labels
comp-optimizers Query optimizations performance st-hold We've paused the work on issue for some reason

Comments

@UnamedRus
Copy link
Contributor

Describe the situation
If you have some query which returns a small subset of rows from a big table and sort it, optimize_read_in_order slows down query a lot.

How to reproduce
Clickhouse server 20.10, 20.11.4.13

CREATE TABLE default.test_scan
(
    `_time` DateTime,
    `key` UInt32,
    `value` UInt32,
    `dt` Date DEFAULT toDate(_time),
    `epoch` UInt64
)
ENGINE = MergeTree()
PARTITION BY toDate(_time)
ORDER BY (_time, epoch)
SETTINGS index_granularity = 8192

INSERT INTO test_scan(_time, key, value, epoch) SELECT now() + intDiv(number,100000), 1 as key, rand() % 250000 as value, now64(6) + (number * 10) FROM numbers(1000000000);

SELECT key, value FROM test_scan WHERE value = 3123 FORMAT Null
0 rows in set. Elapsed: 0.291 sec. Processed 1.00 billion rows, 4.07 GB (3.44 billion rows/s., 13.99 GB/s.)

SELECT key, value FROM test_scan WHERE value = 3123 ORDER BY _time, epoch FORMAT Null
0 rows in set. Elapsed: 2.614 sec. Processed 1.00 billion rows, 4.26 GB (382.52 million rows/s., 1.63 GB/s.)

SELECT key, value FROM test_scan WHERE value = 3123 AND not ignore(_time, epoch) FORMAT Null
0 rows in set. Elapsed: 0.388 sec. Processed 1.00 billion rows, 4.26 GB (2.58 billion rows/s., 11.00 GB/s.)

SELECT  key, value FROM (SELECT key, value, _time, epoch FROM test_scan WHERE value = 3123) ORDER BY _time, epoch FORMAT Null;
0 rows in set. Elapsed: 0.380 sec. Processed 1.00 billion rows, 4.26 GB (2.63 billion rows/s., 11.23 GB/s.)

set optimize_read_in_order=0;
SELECT key, value FROM test_scan WHERE value = 3123 ORDER BY _time, epoch FORMAT Null
0 rows in set. Elapsed: 0.359 sec. Processed 1.00 billion rows, 4.26 GB (2.79 billion rows/s., 11.88 GB/s.)

Expected performance
Queries would have similar performance.

@filimonov filimonov added the comp-optimizers Query optimizations label Dec 4, 2020
@filimonov
Copy link
Contributor

filimonov commented Dec 9, 2020

@CurtizJ :
Well, you can't do much there. If WHERE selects a small range of rows, it will be slower with optimize_read_in_order than without, because the reading itself is slower. If only somehow tricky to rewrite the pipeline or disable the optimization.

@filimonov filimonov added the st-hold We've paused the work on issue for some reason label Dec 9, 2020
@UnamedRus
Copy link
Contributor Author

UnamedRus commented Dec 9, 2020

But if we don't have LIMIT in the query, do we actually gain any benefit from optimize_read_in_order then?

@UnamedRus
Copy link
Contributor Author

UnamedRus commented Feb 22, 2023

ClickHouse version 22.3

Selected 182/362 parts by partition key, 182 parts by primary key, 200/211458 marks by primary key, 200 marks to read from 182 ranges

SET optimize_read_in_order=1;

<Debug> MemoryTracker: Peak memory usage (for query): 2.36 GiB.
15 rows in set. Elapsed: 1.269 sec. Processed 4.64 million rows, 2.20 GB (3.66 million rows/s., 1.74 GB/s.)
SET read_in_order_two_level_merge_threshold=200;
SET optimize_read_in_order=1;

<Debug> MemoryTracker: Peak memory usage (for query): 1.45 GiB.
15 rows in set. Elapsed: 0.659 sec. Processed 4.64 million rows, 2.20 GB (7.05 million rows/s., 3.34 GB/s.)
SET optimize_read_in_order=0;

<Debug> MemoryTracker: Peak memory usage (for query): 55.13 MiB.

15 rows in set. Elapsed: 0.419 sec. Processed 4.44 million rows, 2.11 GB (10.61 million rows/s., 5.04 GB/s.)

Feature request to "fix" this problem
#17941

But still not explain that huge memory usage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp-optimizers Query optimizations performance st-hold We've paused the work on issue for some reason
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants