Nodes with `always_fetch_merge_parts=1` using significant CPU (25 core+) while waiting for remote merges to complete #51921

isaacpz · 2023-07-07T02:43:55Z

Describe the situation
We have a 3 node Clickhouse cluster (running 23.5.2) replicated using Zookeeper. One node is treated as the writer node (always_fetch_merge_parts=0), and the other two nodes are treated as reader nodes (always_fetch_merge_parts=1). The intention of this configuration is that we often have intensive DDL queries & merges which we want to avoid customers noticing the impact of, so we run all of those intensive queries on the writer node.

When there are active merges on the writer node, we notice that CPU spikes significantly on the reader nodes (where always_fetch_merge_parts=1). We observe the reader nodes using a constant ~25 cores of CPU until the remote merge is completed.

Upon further investigation, it appears that merges on the reader nodes are rapidly spinning while they wait for the writer node to finish, which is causing the substantial CPU usage. We believe this is the case for the following reasons:

num_tries on system.replication_queue is increasing at a rate of around 40k retries per minute on the reader nodes.
Querying select * from system.merges on the reader nodes shows that all merges consistently have elapsed times of <1s (with new thread IDs each time)
During remote merges, the logs of the reader nodes are rapidly spamming errors like <Information> [database]::all_5897_19763_6_8432 (MergeFromLogEntryTask): Code: 234. DB::Exception: No active replica has part all_5897_19763_6_8432 or covering part (cannot execute queue-0000035437: MERGE_PARTS with virtual parts [all_5897_19763_6_8432]). (NO_REPLICA_HAS_PART)
Running SYSTEM STOP MERGES; on the reader nodes immediately causes the node to drop from ~25 cores of CPU to near-zero CPU usage.

How to reproduce
Configure a Clickhouse cluster with two nodes and a ReplicatedMergeTree shared between them. On one node, set always_fetch_merge_parts=1. On the other node, ensure that a long-running merge begins on the replicated table. Observe that the node with always_fetch_merge_parts=1 has extreme CPU usage until the merge is complete.

Expected performance
I expect that the node with always_fetch_merge_parts=1 should have almost no CPU impact from remote merges, and to backoff on retries if a merge is taking a long time instead of running tens of thousands of polls per minute.

Additional context
I believe this is a related issue to #21338 (and #50580, #38944). Decided to open a new ticket instead of adding to the existing tickets since the behavior we're seeing is different from #50580 (no nodes are down), and it seems like the solution proposed in #21338 might not be the only one to solve this issue (the number of retries seems unnecessary, regardless of the of the traffic generated by each retry).

The text was updated successfully, but these errors were encountered:

isaacpz · 2023-07-07T21:17:22Z

Update -- it looks like this is still an issue, but impact was exacerbated due to a deadlocked merge that we just identified & killed. Still seeing significant CPU spikes during merges, but it's less pronounced.

isaacpz added the performance label Jul 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nodes with `always_fetch_merge_parts=1` using significant CPU (25 core+) while waiting for remote merges to complete #51921

Nodes with `always_fetch_merge_parts=1` using significant CPU (25 core+) while waiting for remote merges to complete #51921

isaacpz commented Jul 7, 2023 •

edited

Loading

isaacpz commented Jul 7, 2023 •

edited

Loading

Nodes with always_fetch_merge_parts=1 using significant CPU (25 core+) while waiting for remote merges to complete #51921

Nodes with always_fetch_merge_parts=1 using significant CPU (25 core+) while waiting for remote merges to complete #51921

Comments

isaacpz commented Jul 7, 2023 • edited Loading

isaacpz commented Jul 7, 2023 • edited Loading

Nodes with `always_fetch_merge_parts=1` using significant CPU (25 core+) while waiting for remote merges to complete #51921

Nodes with `always_fetch_merge_parts=1` using significant CPU (25 core+) while waiting for remote merges to complete #51921

isaacpz commented Jul 7, 2023 •

edited

Loading

isaacpz commented Jul 7, 2023 •

edited

Loading