optimize_read_in_order slows down queries which takes only a few rows from set. #17364

UnamedRus · 2020-11-24T14:47:26Z

Describe the situation
If you have some query which returns a small subset of rows from a big table and sort it, optimize_read_in_order slows down query a lot.

How to reproduce
Clickhouse server 20.10, 20.11.4.13

CREATE TABLE default.test_scan
(
    `_time` DateTime,
    `key` UInt32,
    `value` UInt32,
    `dt` Date DEFAULT toDate(_time),
    `epoch` UInt64
)
ENGINE = MergeTree()
PARTITION BY toDate(_time)
ORDER BY (_time, epoch)
SETTINGS index_granularity = 8192

INSERT INTO test_scan(_time, key, value, epoch) SELECT now() + intDiv(number,100000), 1 as key, rand() % 250000 as value, now64(6) + (number * 10) FROM numbers(1000000000);

SELECT key, value FROM test_scan WHERE value = 3123 FORMAT Null
0 rows in set. Elapsed: 0.291 sec. Processed 1.00 billion rows, 4.07 GB (3.44 billion rows/s., 13.99 GB/s.)

SELECT key, value FROM test_scan WHERE value = 3123 ORDER BY _time, epoch FORMAT Null
0 rows in set. Elapsed: 2.614 sec. Processed 1.00 billion rows, 4.26 GB (382.52 million rows/s., 1.63 GB/s.)

SELECT key, value FROM test_scan WHERE value = 3123 AND not ignore(_time, epoch) FORMAT Null
0 rows in set. Elapsed: 0.388 sec. Processed 1.00 billion rows, 4.26 GB (2.58 billion rows/s., 11.00 GB/s.)

SELECT  key, value FROM (SELECT key, value, _time, epoch FROM test_scan WHERE value = 3123) ORDER BY _time, epoch FORMAT Null;
0 rows in set. Elapsed: 0.380 sec. Processed 1.00 billion rows, 4.26 GB (2.63 billion rows/s., 11.23 GB/s.)

set optimize_read_in_order=0;
SELECT key, value FROM test_scan WHERE value = 3123 ORDER BY _time, epoch FORMAT Null
0 rows in set. Elapsed: 0.359 sec. Processed 1.00 billion rows, 4.26 GB (2.79 billion rows/s., 11.88 GB/s.)

Expected performance
Queries would have similar performance.

The text was updated successfully, but these errors were encountered:

filimonov · 2020-12-09T11:04:18Z

@CurtizJ :
Well, you can't do much there. If WHERE selects a small range of rows, it will be slower with optimize_read_in_order than without, because the reading itself is slower. If only somehow tricky to rewrite the pipeline or disable the optimization.

UnamedRus · 2020-12-09T18:14:20Z

But if we don't have LIMIT in the query, do we actually gain any benefit from optimize_read_in_order then?

UnamedRus · 2023-02-22T12:07:01Z

ClickHouse version 22.3

Selected 182/362 parts by partition key, 182 parts by primary key, 200/211458 marks by primary key, 200 marks to read from 182 ranges

SET optimize_read_in_order=1;

<Debug> MemoryTracker: Peak memory usage (for query): 2.36 GiB.
15 rows in set. Elapsed: 1.269 sec. Processed 4.64 million rows, 2.20 GB (3.66 million rows/s., 1.74 GB/s.)

SET read_in_order_two_level_merge_threshold=200;
SET optimize_read_in_order=1;

<Debug> MemoryTracker: Peak memory usage (for query): 1.45 GiB.
15 rows in set. Elapsed: 0.659 sec. Processed 4.64 million rows, 2.20 GB (7.05 million rows/s., 3.34 GB/s.)

SET optimize_read_in_order=0;

<Debug> MemoryTracker: Peak memory usage (for query): 55.13 MiB.

15 rows in set. Elapsed: 0.419 sec. Processed 4.44 million rows, 2.11 GB (10.61 million rows/s., 5.04 GB/s.)

Feature request to "fix" this problem
#17941

But still not explain that huge memory usage

UnamedRus added the performance label Nov 24, 2020

filimonov added the comp-optimizers Query optimizations label Dec 4, 2020

filimonov added the st-hold We've paused the work on issue for some reason label Dec 9, 2020

UnamedRus mentioned this issue Jun 20, 2023

Slow queries when reading in order #40583

Open

UnamedRus mentioned this issue Jan 19, 2024

Clickhouse is not using projection when sorting by primary index #58912

Open

den-crane mentioned this issue Apr 26, 2024

Projection doesn't work when using ORDER BY in the query #63022

Open

This was referenced May 23, 2024

Use tdigest statistic to disable optimize_read_in_order in bad cases #64333

Open

Use buffering while reading in order in queries with WHERE #64607

Open

CurtizJ self-assigned this May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize_read_in_order slows down queries which takes only a few rows from set. #17364

optimize_read_in_order slows down queries which takes only a few rows from set. #17364

UnamedRus commented Nov 24, 2020

filimonov commented Dec 9, 2020 •

edited

UnamedRus commented Dec 9, 2020 •

edited

UnamedRus commented Feb 22, 2023 •

edited

optimize_read_in_order slows down queries which takes only a few rows from set. #17364

optimize_read_in_order slows down queries which takes only a few rows from set. #17364

Comments

UnamedRus commented Nov 24, 2020

filimonov commented Dec 9, 2020 • edited

UnamedRus commented Dec 9, 2020 • edited

UnamedRus commented Feb 22, 2023 • edited

filimonov commented Dec 9, 2020 •

edited

UnamedRus commented Dec 9, 2020 •

edited

UnamedRus commented Feb 22, 2023 •

edited