Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nodes with always_fetch_merge_parts=1 using significant CPU (25 core+) while waiting for remote merges to complete #51921

Open
isaacpz opened this issue Jul 7, 2023 · 1 comment

Comments

@isaacpz
Copy link

isaacpz commented Jul 7, 2023

Describe the situation
We have a 3 node Clickhouse cluster (running 23.5.2) replicated using Zookeeper. One node is treated as the writer node (always_fetch_merge_parts=0), and the other two nodes are treated as reader nodes (always_fetch_merge_parts=1). The intention of this configuration is that we often have intensive DDL queries & merges which we want to avoid customers noticing the impact of, so we run all of those intensive queries on the writer node.

When there are active merges on the writer node, we notice that CPU spikes significantly on the reader nodes (where always_fetch_merge_parts=1). We observe the reader nodes using a constant ~25 cores of CPU until the remote merge is completed.

Upon further investigation, it appears that merges on the reader nodes are rapidly spinning while they wait for the writer node to finish, which is causing the substantial CPU usage. We believe this is the case for the following reasons:

  • num_tries on system.replication_queue is increasing at a rate of around 40k retries per minute on the reader nodes.
  • Querying select * from system.merges on the reader nodes shows that all merges consistently have elapsed times of <1s (with new thread IDs each time)
  • During remote merges, the logs of the reader nodes are rapidly spamming errors like <Information> [database]::all_5897_19763_6_8432 (MergeFromLogEntryTask): Code: 234. DB::Exception: No active replica has part all_5897_19763_6_8432 or covering part (cannot execute queue-0000035437: MERGE_PARTS with virtual parts [all_5897_19763_6_8432]). (NO_REPLICA_HAS_PART)
  • Running SYSTEM STOP MERGES; on the reader nodes immediately causes the node to drop from ~25 cores of CPU to near-zero CPU usage.

How to reproduce
Configure a Clickhouse cluster with two nodes and a ReplicatedMergeTree shared between them. On one node, set always_fetch_merge_parts=1. On the other node, ensure that a long-running merge begins on the replicated table. Observe that the node with always_fetch_merge_parts=1 has extreme CPU usage until the merge is complete.

Expected performance
I expect that the node with always_fetch_merge_parts=1 should have almost no CPU impact from remote merges, and to backoff on retries if a merge is taking a long time instead of running tens of thousands of polls per minute.

Additional context
I believe this is a related issue to #21338 (and #50580, #38944). Decided to open a new ticket instead of adding to the existing tickets since the behavior we're seeing is different from #50580 (no nodes are down), and it seems like the solution proposed in #21338 might not be the only one to solve this issue (the number of retries seems unnecessary, regardless of the of the traffic generated by each retry).

@isaacpz
Copy link
Author

isaacpz commented Jul 7, 2023

Update -- it looks like this is still an issue, but impact was exacerbated due to a deadlocked merge that we just identified & killed. Still seeing significant CPU spikes during merges, but it's less pronounced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant