Parallelize query processing right after reading FROM ... #48727

devcrafter · 2023-04-12T18:49:30Z

Changelog category (leave one):

Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Query processing is parallelized right after reading from a data source. Affected data sources are mostly simple or external storages like table functions url, file.

- mostly simple or external storages

+ avoid parallelization after SourceFromSingleChunk, SystemOne

see test_dictionaries_redis/test_long.py::test_redis_dict_long

devcrafter · 2023-04-13T20:33:58Z

https://s3.amazonaws.com/clickhouse-test-reports/48727/bb60f10035f0ce7538b98bab02c219dff474fdae/integration_tests__asan__[1/6].html
count() returns incorrect result with Redis dictionary. Assume that there is some bug around dictionaries (don't know yet whether it's Redis specific or not). So, for now, avoid parallelization after reading from 'Dictionary' storage (see 7c84dc4)

…-from-storages

test_storage_mysql/test.py:test_settings_connection_wait_timeout

…-from-storages

devcrafter · 2023-04-15T12:22:14Z

Affected :

table engines: mongodb, mysql, postgresql, sqlite, rocksdb, Hive, LogFamily engines, Buffer, Memory, KeeperMap, URL
table functions: remote, functions which uses engines from list above

Note:

Dictionary table engine is not affected yet (see comment)
numbers not affected (yet?), see comment

P.S. Can miss something. Probably, need to add some tests/performance tests in addition

+ disable parallelization for storage Null

…-from-storages

alexey-milovidov · 2023-04-15T21:20:16Z

Ok.

Many performance tests use zeros/zeros_mt to check single/multithread performance.
We can edit them by adding SETTINGS max_threads = 1 where it is zeros or numbers.
Single-threaded performance tests make sense for stability of the results.

devcrafter · 2023-04-15T22:09:03Z

Many performance tests use zeros/zeros_mt to check single/multithread performance.
We can edit them by adding SETTINGS max_threads = 1 where it is zeros or numbers.
Single-threaded performance tests make sense for stability of the results.

Rather then changing tests it much simpler to avoid parallelization after zeroes. It looks like we better stick to it for zeroes and numbers since they have mutlithreaded counterparts.

devcrafter · 2023-04-15T22:50:46Z

It'd be nice to understand the reason for failure with dictionaries. If output is parallelized for dictionaries, query returns correct result with max_threads=1:

SELECT count(), uniqExact(date), uniqExact(id) FROM redis_dict") settings max_threads=1
1000 | 1 | 1000

Incrementing max_threads will increment count() result by 1, i.e.

SELECT count(), uniqExact(date), uniqExact(id) FROM redis_dict") settings max_threads=2
1001 | 2 | 1000

SELECT count(), uniqExact(date), uniqExact(id) FROM redis_dict") settings max_threads=3
1002 | 2 | 1000
...

The reason is that a row is generated somewhere for each stream w/o data. But I didn't figure out yet how/where

…-from-storages

src/Storages/IStorage.h

alexey-milovidov · 2023-04-21T08:52:39Z

src/Storages/IStorage.cpp

@@ -133,6 +133,13 @@ void IStorage::read(
    size_t num_streams)
 {
    auto pipe = read(column_names, storage_snapshot, query_info, context, processed_stage, max_block_size, num_streams);
+
+    /// parallelize processing if not yet


Should we do it here, or is it better to do it inside InterpreterSelectQuery?

Are there any potential troubles with mutations and StorageFromMergeTreeDataPart?

Should we do it here, or is it better to do it inside InterpreterSelectQuery?

I think it's ok to do it here with the following considerations ...

num_streams is provided by InterpreterSelectQuery as a recommendation, i.e. how many threads are available for data processing. The reading step has the following choices:

(a) it knows the amount of data it can read, and it's not much data, so it creates only the necessary number of data streams based on parameters passed to IStorage::read() i.e. max_block_size/storage_limits. In this case, we don't want to adjust the number of streams and parallelizeOutputAfterReading() can return false

(b) it's either an unknown amount of data or known amount of data (but enough to utilize all available threads) -> in both cases output is parallelized by num_streams

Are there any potential troubles with mutations and StorageFromMergeTreeDataPart?

The generic thing about this change affects only storage which will use default plan step to read from storage – ReadFromStorageStep. Sophisticated engines use specialized steps to read from its storage, like ReadFromMergeTree in MergeTree case.

StorageFromMergeTreeDataPart is not affected since it uses the ReadFromMergeTree step, which overrides read() method where resize() is added.

al13n321 · 2023-04-24T18:54:17Z

Just curious: how does it not fail lots of tests in CI? When I tried input_format_parquet_preserve_order = true by default, ~10 tests failed because of reordering (all straightforward to fix).

devcrafter · 2023-04-24T19:43:56Z

Just curious: how does it not fail lots of tests in CI? When I tried input_format_parquet_preserve_order = true by default, ~10 tests failed because of reordering (all straightforward to fix).

Probably they were fixed in #48525 ?

Parallelize query processing right after reading FROM ...

1748853

- mostly simple or external storages

devcrafter marked this pull request as draft April 12, 2023 18:49

robot-ch-test-poll1 added the pr-performance Pull request with some performance improvements label Apr 12, 2023

alexey-milovidov self-assigned this Apr 12, 2023

devcrafter mentioned this pull request Apr 12, 2023

Parallel processing right after reading FROM file() #48525

Merged

devcrafter added 6 commits April 12, 2023 21:05

Let's see blast radius w/o parallelization after numbers()

ac54033

Fix fast tests

ba1adeb

+ avoid parallelization after SourceFromSingleChunk, SystemOne

Fix 00109_shard_totals_after_having.sql

5ed85d9

Fix 02231_buffer_aggregate_states_leak.sql

bb60f10

Disable 'Dictionary' storage due to count() can return incorrect result

fcd2eae

see test_dictionaries_redis/test_long.py::test_redis_dict_long

Fix test_storage_mysql/test_settings_connection_wait_timeout

eacbd2b

robot-clickhouse and others added 9 commits April 13, 2023 20:52

Automatic style fix

39df2bd

Fix flaky check

a01efbd

Merge remote-tracking branch 'origin/master' into parallel-processing…

fa63460

…-from-storages

Fix flaky check: 00109_shard_totals_after_having.sql

780e4f9

Better way to define for which storage output is parallelized

7c84dc4

Try to fix flaky intergration test

4dfad9e

test_storage_mysql/test.py:test_settings_connection_wait_timeout

Fix test_storage_mysql/test.py::test_settings_connection_wait_timeout

9e92c26

Merge remote-tracking branch 'origin/master' into parallel-processing…

60dbb7b

…-from-storages

Automatic style fix

6c03b2e

devcrafter marked this pull request as ready for review April 14, 2023 23:57

devcrafter added 2 commits April 15, 2023 12:35

Use generic way to parallelize output for file()

8603807

+ disable parallelization for storage Null

Merge remote-tracking branch 'origin/master' into parallel-processing…

cdd9aef

…-from-storages

Do not parallelize output for zeroes()

908ad29

Merge remote-tracking branch 'origin/master' into parallel-processing…

2455334

…-from-storages

Remove redundant narrowPipe()

d5eb65b

alexey-milovidov reviewed Apr 21, 2023

View reviewed changes

src/Storages/IStorage.h Show resolved Hide resolved

Update src/Storages/IStorage.h

8a92eb0

alexey-milovidov reviewed Apr 21, 2023

View reviewed changes

alexey-milovidov approved these changes Apr 23, 2023

View reviewed changes

alexey-milovidov merged commit 67de39c into master Apr 23, 2023
144 checks passed

alexey-milovidov deleted the parallel-processing-from-storages branch April 23, 2023 20:10

devcrafter mentioned this pull request Apr 24, 2023

Setting parallelize_output_from_storages #49101

Merged

azat mentioned this pull request May 25, 2023

Disable parallelize_output_from_storages for processing MATERIALIZED VIEWs and storages with one block only #50214

Merged

filimonov mentioned this pull request Jun 28, 2023

parallelize_output_from_storages: number of threads is not capped by max_threads setting #51565

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize query processing right after reading FROM ... #48727

Parallelize query processing right after reading FROM ... #48727

devcrafter commented Apr 12, 2023 •

edited

devcrafter commented Apr 13, 2023 •

edited

devcrafter commented Apr 15, 2023 •

edited

alexey-milovidov commented Apr 15, 2023 •

edited

devcrafter commented Apr 15, 2023

devcrafter commented Apr 15, 2023 •

edited

alexey-milovidov Apr 21, 2023

devcrafter Apr 23, 2023

al13n321 commented Apr 24, 2023

devcrafter commented Apr 24, 2023

Parallelize query processing right after reading FROM ... #48727

Parallelize query processing right after reading FROM ... #48727

Conversation

devcrafter commented Apr 12, 2023 • edited

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

devcrafter commented Apr 13, 2023 • edited

devcrafter commented Apr 15, 2023 • edited

alexey-milovidov commented Apr 15, 2023 • edited

devcrafter commented Apr 15, 2023

devcrafter commented Apr 15, 2023 • edited

alexey-milovidov Apr 21, 2023

Choose a reason for hiding this comment

devcrafter Apr 23, 2023

Choose a reason for hiding this comment

al13n321 commented Apr 24, 2023

devcrafter commented Apr 24, 2023

devcrafter commented Apr 12, 2023 •

edited

devcrafter commented Apr 13, 2023 •

edited

devcrafter commented Apr 15, 2023 •

edited

alexey-milovidov commented Apr 15, 2023 •

edited

devcrafter commented Apr 15, 2023 •

edited