Always_fetch_merge_parts=1 is actively polling Zookeeper while replica is merging causing high network usage #38944

jorisgio · 2022-07-07T07:38:55Z

We are confused on how to use always_fetch_merge_parts, not clear is this is a bug or we are not using it properly.

We have 2 replicas, and queries are load_balanced in order on the first one. Merges being heavy, it makes a big performance diff if the replica handling most query is idle, so replica 0 is leader and merges, replica 1 has always_fetch_merge_parts=1, replicated_can_become_leader=0 (we tried both non leader and leader, same issue).

This works, but when replica 0 wants to merge something, it creates a queue entry in zookeeper for merge.
Replica 1 processes the queue in a loop and decides it cannot merge it and has to wait for it to appear. Replica 0 is processing the merge but it can take couple hours.
So in the process replica 1 busy poll zookeeper in a loop, which i guess is ok, except it also each time checks the part in the part_checker thread.

2022.07.07 07:34:28.903432 [ 3122970 ] {} <Warning> db.table (ReplicatedMergeTreePartCheckThread): Checking part 53_21002_21145_72
2022.07.07 07:34:28.903444 [ 3123013 ] {} <Information> MergeFromLogEntryTask: DB::Exception: No active replica has part 53_21002_21145_72 or covering part
2022.07.07 07:34:28.904085 [ 3123035 ] {} <Information> MergeFromLogEntryTask: Will fetch part 53_21002_21145_72 because setting 'always_fetch_merged_part' is true
2022.07.07 07:34:28.904485 [ 3122970 ] {} <Warning> db.table (ReplicatedMergeTreePartCheckThread): Checking if anyone has a part 53_21002_21145_72 or covering part.
2022.07.07 07:34:28.907924 [ 3122970 ] {} <Information> db.table (ReplicatedMergeTreePartCheckThread): Found parts with the same min block and with the same max block as the missing part 53_21002_21145_72 on replica 0. Hoping that it will eventually appear as a result of a merge.

But this active polling generates ~50Gbps of egress traffic from zookeeper, i guess to fetch parts informations. It is not tractable and is not scaling much more. Is this something i'm missing ? is it expects it checks covering parts every loop ?

The text was updated successfully, but these errors were encountered:

den-crane · 2022-07-07T10:57:08Z

duplicate #21338

den-crane · 2022-07-07T10:59:16Z

could be solved in #37995

jorisgio · 2022-07-07T11:04:17Z

Thanks! I searched for existing issue but didn't find the previous issue. The backoff looks good enough workaround, trying it.

filimonov · 2023-02-27T20:29:40Z

Backoff was rolled back.

tavplubix · 2023-02-27T21:01:13Z

Well, #21338 is open and this is a duplicate

jorisgio added the unexpected behaviour label Jul 7, 2022

den-crane closed this as completed Jul 7, 2022

den-crane added the duplicate label Jul 7, 2022

tavplubix reopened this Feb 27, 2023

tavplubix closed this as completed Feb 27, 2023

isaacpz mentioned this issue Jul 7, 2023

Nodes with always_fetch_merge_parts=1 using significant CPU (25 core+) while waiting for remote merges to complete #51921

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Always_fetch_merge_parts=1 is actively polling Zookeeper while replica is merging causing high network usage #38944

Always_fetch_merge_parts=1 is actively polling Zookeeper while replica is merging causing high network usage #38944

jorisgio commented Jul 7, 2022

den-crane commented Jul 7, 2022

den-crane commented Jul 7, 2022

jorisgio commented Jul 7, 2022

filimonov commented Feb 27, 2023

tavplubix commented Feb 27, 2023

Always_fetch_merge_parts=1 is actively polling Zookeeper while replica is merging causing high network usage #38944

Always_fetch_merge_parts=1 is actively polling Zookeeper while replica is merging causing high network usage #38944

Comments

jorisgio commented Jul 7, 2022

den-crane commented Jul 7, 2022

den-crane commented Jul 7, 2022

jorisgio commented Jul 7, 2022

filimonov commented Feb 27, 2023

tavplubix commented Feb 27, 2023