Less reads for read-in-order optimization.

Read in order optimization reads at least one granule from every source part. This can be improved in case if data between parts is almost globally sorted, which is typical for time-series usecase.

As an example, create a table.
```
create table t (x UInt64, y String) engine = MergeTree order by x;
insert into t select number, toString(number) from numbers(8192 * 3);
insert into t select number + 8192 * 3, toString(number) from numbers(8192 * 3);
```

This table has two parts which do not intersect by primary key.
```
select name, marks, rows from system.parts where table = 't' and database = 'default'

SELECT
    name,
    marks,
    rows
FROM system.parts
WHERE (table = 't') AND (database = 'default')

Query id: f4d71ad3-e89f-4eaa-9218-21c25ba9c038

┌─name──────┬─marks─┬──rows─┐
│ all_1_1_0 │     4 │ 24576 │
│ all_2_2_0 │     4 │ 24576 │
└───────────┴───────┴───────┘


```

Run a query.
```
select y from t where toUInt64(y) > 8192 * 2 order by x limit 4 settings max_block_size = 8192

SELECT y
FROM t
WHERE toUInt64(y) > (8192 * 2)
ORDER BY x ASC
LIMIT 4
SETTINGS max_block_size = 8192

Query id: ae5c18f8-efa2-4093-814b-bc9448e7374f

┌─y─────┐
│ 16385 │
│ 16386 │
│ 16387 │
│ 16388 │
└───────┘


```
This query uses read-in-order optimization, and planner is going to read all the data from parts.
```
2022.10.25 15:32:22.021478 [ 40507 ] {ae5c18f8-efa2-4093-814b-bc9448e7374f} <Trace> MergeTreeInOrderSelectProcessor: Reading 2 ranges in order from part all_2_2_0, approx. 24576 rows starting from 0
2022.10.25 15:32:22.021520 [ 40507 ] {ae5c18f8-efa2-4093-814b-bc9448e7374f} <Trace> MergeTreeInOrderSelectProcessor: Reading 2 ranges in order from part all_1_1_0, approx. 24576 rows starting from 0
```
In this case, a query is a full scan.
```
2022.10.25 15:32:22.023843 [ 40507 ] {ae5c18f8-efa2-4093-814b-bc9448e7374f} <Information> executeQuery: Read 49152 rows, 778.30 KiB in 0.003676755 sec., 13368309 rows/sec., 206.72 MiB/sec.
```
Actually, all the data needed contains only in the first part. We did not have to read `all_2_2_0` at all. We can improve it by analyzing PK.
```
cp clickhouse/data/default/t/all_1_1_0/primary.idx clickhouse/user_files/1.primary.idx
cp clickhouse/data/default/t/all_2_2_0/primary.idx clickhouse/user_files/2.primary.idx
```

```
select x from file('1.primary.idx', 'RowBinary', 'x UInt64')

SELECT x
FROM file('1.primary.idx', 'RowBinary', 'x UInt64')

Query id: 25a84d26-c8a6-4c20-8665-c1e8b48e9703

┌─────x─┐
│     0 │
│  8192 │
│ 16384 │
│ 24575 │
└───────┘


select x from file('2.primary.idx', 'RowBinary', 'x UInt64')

SELECT x
FROM file('2.primary.idx', 'RowBinary', 'x UInt64')

Query id: 0bc6dd52-17d2-4588-b3b1-f4b7a00370fc

┌─────x─┐
│ 24576 │
│ 32768 │
│ 40960 │
│ 49151 │
└───────┘

```

From PK we can decide that we don't even need to start reading `all_2_2_0` till we read until `x = 24575` from `all_1_1_0`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Less reads for read-in-order optimization. #42662

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Less reads for read-in-order optimization. #42662

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions