You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are confused on how to use always_fetch_merge_parts, not clear is this is a bug or we are not using it properly.
We have 2 replicas, and queries are load_balanced in order on the first one. Merges being heavy, it makes a big performance diff if the replica handling most query is idle, so replica 0 is leader and merges, replica 1 has always_fetch_merge_parts=1, replicated_can_become_leader=0 (we tried both non leader and leader, same issue).
This works, but when replica 0 wants to merge something, it creates a queue entry in zookeeper for merge.
Replica 1 processes the queue in a loop and decides it cannot merge it and has to wait for it to appear. Replica 0 is processing the merge but it can take couple hours.
So in the process replica 1 busy poll zookeeper in a loop, which i guess is ok, except it also each time checks the part in the part_checker thread.
2022.07.07 07:34:28.903432 [ 3122970 ] {} <Warning> db.table (ReplicatedMergeTreePartCheckThread): Checking part 53_21002_21145_72
2022.07.07 07:34:28.903444 [ 3123013 ] {} <Information> MergeFromLogEntryTask: DB::Exception: No active replica has part 53_21002_21145_72 or covering part
2022.07.07 07:34:28.904085 [ 3123035 ] {} <Information> MergeFromLogEntryTask: Will fetch part 53_21002_21145_72 because setting 'always_fetch_merged_part' is true
2022.07.07 07:34:28.904485 [ 3122970 ] {} <Warning> db.table (ReplicatedMergeTreePartCheckThread): Checking if anyone has a part 53_21002_21145_72 or covering part.
2022.07.07 07:34:28.907924 [ 3122970 ] {} <Information> db.table (ReplicatedMergeTreePartCheckThread): Found parts with the same min block and with the same max block as the missing part 53_21002_21145_72 on replica 0. Hoping that it will eventually appear as a result of a merge.
But this active polling generates ~50Gbps of egress traffic from zookeeper, i guess to fetch parts informations. It is not tractable and is not scaling much more. Is this something i'm missing ? is it expects it checks covering parts every loop ?
The text was updated successfully, but these errors were encountered:
We are confused on how to use always_fetch_merge_parts, not clear is this is a bug or we are not using it properly.
We have 2 replicas, and queries are load_balanced in order on the first one. Merges being heavy, it makes a big performance diff if the replica handling most query is idle, so replica 0 is leader and merges, replica 1 has always_fetch_merge_parts=1, replicated_can_become_leader=0 (we tried both non leader and leader, same issue).
This works, but when replica 0 wants to merge something, it creates a queue entry in zookeeper for merge.
Replica 1 processes the queue in a loop and decides it cannot merge it and has to wait for it to appear. Replica 0 is processing the merge but it can take couple hours.
So in the process replica 1 busy poll zookeeper in a loop, which i guess is ok, except it also each time checks the part in the part_checker thread.
But this active polling generates ~50Gbps of egress traffic from zookeeper, i guess to fetch parts informations. It is not tractable and is not scaling much more. Is this something i'm missing ? is it expects it checks covering parts every loop ?
The text was updated successfully, but these errors were encountered: