Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make system.replicas parallel #43998

Merged
merged 6 commits into from Jan 13, 2023

Conversation

evillique
Copy link
Member

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Make system.replicas table do parallel fetches of replicas statuses. Closes #43918

@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-improvement Pull request with some product improvements label Dec 7, 2022
@kssenii kssenii self-assigned this Dec 7, 2022
src/Storages/System/StorageSystemReplicas.cpp Outdated Show resolved Hide resolved
src/Core/Settings.h Outdated Show resolved Hide resolved
@tavplubix
Copy link
Member

Also consider using MultiRead requests in StorageReplicatedMergeTree::getStatus. Currently it does 3 + number_of_replicas (usually 3) = 6 subsequent synchronous requests = 6 RTT. We can replace it with 1 MultiGetChildren and 1 MultiGet subsequent requests (2 RTT, 3 times faster).

@tavplubix tavplubix self-assigned this Dec 7, 2022
Copy link
Member

@tavplubix tavplubix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comments above

Copy link
Member

@tavplubix tavplubix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, but it would be great to optimize StorageReplicatedMergeTree::getStatus as well

@evillique
Copy link
Member Author

Also consider using MultiRead requests in StorageReplicatedMergeTree::getStatus. Currently it does 3 + number_of_replicas (usually 3) = 6 subsequent synchronous requests = 6 RTT. We can replace it with 1 MultiGetChildren and 1 MultiGet subsequent requests (2 RTT, 3 times faster).

From the code of getStatus() it looks like it will be 1 MultiGetChildren, 1 standard Get, and 1 MultiExists. I will try to optimize it.

@tavplubix
Copy link
Member

1 standard Get, and 1 MultiExists

Exists can be replaced with Get and error code check

@tavplubix
Copy link
Member

It was not necessary to merge master and restart CI

@evillique evillique force-pushed the make_system_replicas_parallel branch from 8fe3c73 to 673ba76 Compare January 5, 2023 02:36
@tavplubix
Copy link
Member

Integration tests (asan) [2/6] - #44928 and #44908
Integration tests (asan) [4/6] - #45160
Integration tests (release) [4/4] - #44908
Integration tests (tsan) [2/6] - #44908
Stress test (asan) - #44910

Stress test (ubsan) - #45251, also there are tons of errors like

2023.01.05 07:43:05.844601 [ 1677 ] {} <Error> void DB::MergeTreeBackgroundExecutor<DB::MergeMutateRuntimeQueue>::routine(DB::TaskRuntimeDataPtr) [Queue = DB::MergeMutateRuntimeQueue]: Code: 241. DB::Exception: Memory limit (total) exceeded: would use 34.33 GiB (attempt to allocate chunk of 1146964 bytes), maximum: 34.29 GiB. OvercommitTracker decision: Memory overcommit isn't used. Waiting time or overcommit denominator are set to zero. (MEMORY_LIMIT_EXCEEDED), Stack trace (when copying this message, always include the lines below):
2023.01.05 07:43:05.855458 [ 1686 ] {} <Error> void DB::MergeTreeBackgroundExecutor<DB::MergeMutateRuntimeQueue>::routine(DB::TaskRuntimeDataPtr) [Queue = DB::MergeMutateRuntimeQueue]: Code: 241. DB::Exception: Memory limit (total) exceeded: would use 34.33 GiB (attempt to allocate chunk of 1051048 bytes), maximum: 34.29 GiB. OvercommitTracker decision: Memory overcommit isn't used. Waiting time or overcommit denominator are set to zero. (MEMORY_LIMIT_EXCEEDED), Stack trace (when copying this message, always include the lines below):
2023.01.05 07:43:05.860716 [ 1681 ] {} <Error> void DB::MergeTreeBackgroundExecutor<DB::MergeMutateRuntimeQueue>::routine(DB::TaskRuntimeDataPtr) [Queue = DB::MergeMutateRuntimeQueue]: Code: 241. DB::Exception: Memory limit (total) exceeded: would use 34.33 GiB (attempt to allocate chunk of 1072930 bytes), maximum: 34.29 GiB. OvercommitTracker decision: Memory overcommit isn't used. Waiting time or overcommit denominator are set to zero. (MEMORY_LIMIT_EXCEEDED), Stack trace (when copying this message, always include the lines below):
2023.01.05 07:43:05.871276 [ 1688 ] {} <Error> void DB::MergeTreeBackgroundExecutor<DB::MergeMutateRuntimeQueue>::routine(DB::TaskRuntimeDataPtr) [Queue = DB::MergeMutateRuntimeQueue]: Code: 241. DB::Exception: Memory limit (total) exceeded: would use 34.33 GiB (attempt to allocate chunk of 1051048 bytes), maximum: 34.29 GiB. OvercommitTracker decision: Memory overcommit isn't used. Waiting time or overcommit denominator are set to zero. (MEMORY_LIMIT_EXCEEDED), Stack trace (when copying this message, always include the lines below):
2023.01.05 07:43:05.876756 [ 1683 ] {} <Error> void DB::MergeTreeBackgroundExecutor<DB::MergeMutateRuntimeQueue>::routine(DB::TaskRuntimeDataPtr) [Queue = DB::MergeMutateRuntimeQueue]: Code: 241. DB::Exception: Memory limit (total) exceeded: would use 34.33 GiB (attempt to allocate chunk of 1072930 bytes), maximum: 34.29 GiB. OvercommitTracker decision: Memory overcommit isn't used. Waiting time or overcommit denominator are set to zero. (MEMORY_LIMIT_EXCEEDED), Stack trace (when copying this message, always include the lines below):
2023.01.05 07:43:05.887665 [ 1674 ] {} <Error> void DB::MergeTreeBackgroundExecutor<DB::MergeMutateRuntimeQueue>::routine(DB::TaskRuntimeDataPtr) [Queue = DB::MergeMutateRuntimeQueue]: Code: 241. DB::Exception: Memory limit (total) exceeded: would use 34.33 GiB (attempt to allocate chunk of 1051048 bytes), maximum: 34.29 GiB. OvercommitTracker decision: Memory overcommit isn't used. Waiting time or overcommit denominator are set to zero. (MEMORY_LIMIT_EXCEEDED), Stack trace (when copying this message, always include the lines below):
2023.01.05 07:43:05.893049 [ 1677 ] {} <Error> void DB::MergeTreeBackgroundExecutor<DB::MergeMutateRuntimeQueue>::routine(DB::TaskRuntimeDataPtr) [Queue = DB::MergeMutateRuntimeQueue]: Code: 241. DB::Exception: Memory limit (total) exceeded: would use 34.33 GiB (attempt to allocate chunk of 1072930 bytes), maximum: 34.29 GiB. OvercommitTracker decision: Memory overcommit isn't used. Waiting time or overcommit denominator are set to zero. (MEMORY_LIMIT_EXCEEDED), Stack trace (when copying this message, always include the lines below):
2023.01.05 07:43:05.906151 [ 1686 ] {} <Error> void DB::MergeTreeBackgroundExecutor<DB::MergeMutateRuntimeQueue>::routine(DB::TaskRuntimeDataPtr) [Queue = DB::MergeMutateRuntimeQueue]: Code: 241. DB::Exception: Memory limit (total) exceeded: would use 34.33 GiB (attempt to allocate chunk of 1051048 bytes), maximum: 34.29 GiB. OvercommitTracker decision: Memory overcommit isn't used. Waiting time or overcommit denominator are set to zero. (MEMORY_LIMIT_EXCEEDED), Stack trace (when copying this message, always include the lines below):
2023.01.05 07:43:05.911196 [ 1681 ] {} <Error> void DB::MergeTreeBackgroundExecutor<DB::MergeMutateRuntimeQueue>::routine(DB::TaskRuntimeDataPtr) [Queue = DB::MergeMutateRuntimeQueue]: Code: 241. DB::Exception: Memory limit (total) exceeded: would use 34.33 GiB (attempt to allocate chunk of 1072930 bytes), maximum: 34.29 GiB. OvercommitTracker decision: Memory overcommit isn't used. Waiting time or overcommit denominator are set to zero. (MEMORY_LIMIT_EXCEEDED), Stack trace (when copying this message, always include the lines below):

so I guess it's the "deadlock" in OvercommitTracker (#45236)

@tavplubix tavplubix merged commit 9d5ec47 into ClickHouse:master Jan 13, 2023
@evillique evillique deleted the make_system_replicas_parallel branch January 13, 2023 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-improvement Pull request with some product improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

system.replicas table can be slow
4 participants