Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow vertical merges from compact to wide parts #45681

Merged

Conversation

CurtizJ
Copy link
Member

@CurtizJ CurtizJ commented Jan 26, 2023

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Allow using Vertical merge algorithm with parts in Compact format. This will allow ClickHouse server to use much less memory for background operations. This closes #46084

@robot-clickhouse robot-clickhouse added the pr-not-for-changelog This PR should not be mentioned in the changelog label Jan 26, 2023
@novikd novikd self-assigned this Jan 26, 2023
@alexey-milovidov
Copy link
Member

Let's mark it as a "Performance improvement", although it is mostly a memory usage improvement.

@robot-ch-test-poll3 robot-ch-test-poll3 added pr-performance Pull request with some performance improvements and removed pr-not-for-changelog This PR should not be mentioned in the changelog labels Feb 6, 2023
@KochetovNicolai
Copy link
Member

Tried to test a bit
https://pastila.nl/?00c4e0d0/ab45a5be5e2e7e96e7642936638c66d2

First

Before

SELECT
    merged_from,
    part_name,
    part_type,
    merge_algorithm,
    formatReadableSize(peak_memory_usage)
FROM system.part_log
WHERE (table = 'tvm') AND (event_type = 'MergeParts')

Query id: 8d4989ba-7811-44e7-937b-6bb5c70399ea

(wide + compact -> wide)

┌─merged_from───────────────┬─part_name─┬─part_type─┬─merge_algorithm─┬─formatReadableSize(peak_memory_usage)─┐
│ ['all_1_3_2','all_4_4_0'] │ all_1_4_3 │ Wide      │ Horizontal      │ 1.05 GiB                              │
└───────────────────────────┴───────────┴───────────┴─────────────────┴───────────────────────────────────────┘
(3 compact -> wide)
┌─merged_from───────────────────────────┬─part_name─┬─part_type─┬─merge_algorithm─┬─formatReadableSize(peak_memory_usage)─┐
│ ['all_1_1_1','all_2_2_0','all_3_3_0'] │ all_1_3_2 │ Wide      │ Horizontal      │ 812.98 MiB                            │
└───────────────────────────────────────┴───────────┴───────────┴─────────────────┴───────────────────────────────────────┘

After


SELECT
    merged_from,
    part_name,
    part_type,
    merge_algorithm,
    formatReadableSize(peak_memory_usage)
FROM system.part_log
WHERE (table = 'tvm') AND (event_type = 'MergeParts')

Query id: 0f2561a3-8d5e-4dfb-b56d-93d9ebd05188

┌─merged_from───────────────────────────┬─part_name─┬─part_type─┬─merge_algorithm─┬─formatReadableSize(peak_memory_usage)─┐
│ ['all_1_1_0','all_2_2_0','all_3_3_0'] │ all_1_3_1 │ Wide      │ Vertical        │ 675.00 MiB                            │
└───────────────────────────────────────┴───────────┴───────────┴─────────────────┴───────────────────────────────────────┘
┌─merged_from───────────────┬─part_name─┬─part_type─┬─merge_algorithm─┬─formatReadableSize(peak_memory_usage)─┐
│ ['all_1_3_1','all_4_4_0'] │ all_1_4_2 │ Wide      │ Vertical        │ 667.39 MiB                            │
└───────────────────────────┴───────────┴───────────┴─────────────────┴───────────────────────────────────────┘

Second

Before

SELECT
    merged_from,
    part_name,
    part_type,
    merge_algorithm,
    formatReadableSize(peak_memory_usage)
FROM system.part_log
WHERE (table = 'tvm') AND (event_type = 'MergeParts')

(1 wide + 5 compact -> wide)
┌─merged_from─────────────────────────────────────────────────────────────────┬─part_name──┬─part_type─┬─merge_algorithm─┬─formatReadableSize(peak_memory_usage)─┐
│ ['all_1_1_0','all_2_6_1','all_7_7_0','all_8_8_0','all_9_9_0','all_10_10_0'] │ all_1_10_2 │ Wide      │ Horizontal      │ 1.05 GiB                              │
└─────────────────────────────────────────────────────────────────────────────┴────────────┴───────────┴─────────────────┴───────────────────────────────────────┘

After

SELECT
    merged_from,
    part_name,
    part_type,
    merge_algorithm,
    formatReadableSize(peak_memory_usage)
FROM system.part_log
WHERE (table = 'tvm') AND (event_type = 'MergeParts')

Query id: dece7b8f-15aa-4137-810c-3db5c69b200c

┌─merged_from─────────────────────────────────────────────────────────────────┬─part_name──┬─part_type─┬─merge_algorithm─┬─formatReadableSize(peak_memory_usage)─┐
│ ['all_1_1_0','all_2_6_1','all_7_7_0','all_8_8_0','all_9_9_0','all_10_10_0'] │ all_1_10_2 │ Wide      │ Vertical        │ 656.48 MiB                            │
└─────────────────────────────────────────────────────────────────────────────┴────────────┴───────────┴─────────────────┴───────────────────────────────────────┘

Looks like ~30% less memory usage for (wide + a few compact) parts.

@KochetovNicolai
Copy link
Member

Interestingly that for a case of 2 wide parts merge required 674.98 MiB. Looks too much.

@KochetovNicolai KochetovNicolai merged commit 3912f5a into ClickHouse:master Feb 7, 2023
@KochetovNicolai KochetovNicolai added the pr-must-backport Pull request should be backported intentionally. Use this label with great care! label Feb 7, 2023
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Feb 9, 2023
@CurtizJ
Copy link
Member Author

CurtizJ commented Feb 9, 2023

Seems that it breaks the rolling upgrade. Will reupload under a settings.

@alexey-milovidov
Copy link
Member

Interesting, how exactly?

@CurtizJ
Copy link
Member Author

CurtizJ commented Feb 10, 2023

@alexey-milovidov

The way how index_granularity is computed differs in vertical and horizontal merges. That leads to CHECKSUM_DOESNT_MATCH error.

@alexey-milovidov
Copy link
Member

CHECKSUM_DOESNT_MATCH is not an incompatibility, it is a normal situation during upgrading.

@CurtizJ
Copy link
Member Author

CurtizJ commented Feb 10, 2023

Also I'm afraid that this change will trigger LOGICAL_ERROR on old version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-performance Pull request with some performance improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make it possible to use Vertical merge algorithm together with parts of Compact type
7 participants