-
Notifications
You must be signed in to change notification settings - Fork 3.5k
[Enhancement](compaction) Try get global lock when execute compaction #49882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
run buildall |
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
run buildall |
TPC-H: Total hot run time: 34203 ms
|
TPC-DS: Total hot run time: 193350 ms
|
ClickBench: Total hot run time: 31.3 s
|
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
run cloud_p0 |
run buildall |
TPC-H: Total hot run time: 34845 ms
|
TPC-DS: Total hot run time: 193259 ms
|
run buildall |
TPC-H: Total hot run time: 33970 ms
|
TPC-DS: Total hot run time: 193289 ms
|
ClickBench: Total hot run time: 29.48 s
|
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression P0 && UT Coverage ReportIncrement line coverage Increment coverage report
|
run performance |
TPC-H: Total hot run time: 34096 ms
|
TPC-DS: Total hot run time: 192187 ms
|
ClickBench: Total hot run time: 29.87 s
|
PR approved by at least one committer and no changes requested. |
…apache#49882) Background: In cloud mode, compaction tasks for the same tablet may be scheduled across multiple BEs. To ensure that only one BE can execute a compaction task for a given tablet at a time, a global locking mechanism is used. During compaction preparation, tablet and compaction information is written as key-value pairs to the metadata service. A background thread periodically renews the lease. Other BEs can only perform compaction on a tablet when the KV entry has expired or doesn't exist, ensuring that a tablet's compaction occurs on only one BE at a time. Problem: Compaction tasks are processed through a thread pool. Currently, we first prepare compaction and acquire the global lock before queueing the task. If a BE is under heavy compaction pressure with all threads occupied, tablets may wait in the queue for extended periods. Meanwhile, other idle BEs cannot perform compaction on these tablets because they cannot acquire the global lock, leading to resource imbalance with some BEs starved and others overloaded. Solution: To address this issue, we'll modify the workflow to queue tasks first, then attempt to acquire the lock only when the task is about to be executed. This ensures that even if a tablet's compaction task is queued on one BE, another idle BE can still perform compaction on that tablet, resulting in better resource utilization across the cluster.
…bal lock when execute compaction (#49882)" (#50432) Pick #49882 Background: In cloud mode, compaction tasks for the same tablet may be scheduled across multiple BEs. To ensure that only one BE can execute a compaction task for a given tablet at a time, a global locking mechanism is used. During compaction preparation, tablet and compaction information is written as key-value pairs to the metadata service. A background thread periodically renews the lease. Other BEs can only perform compaction on a tablet when the KV entry has expired or doesn't exist, ensuring that a tablet's compaction occurs on only one BE at a time. Problem: Compaction tasks are processed through a thread pool. Currently, we first prepare compaction and acquire the global lock before queueing the task. If a BE is under heavy compaction pressure with all threads occupied, tablets may wait in the queue for extended periods. Meanwhile, other idle BEs cannot perform compaction on these tablets because they cannot acquire the global lock, leading to resource imbalance with some BEs starved and others overloaded. Solution: To address this issue, we'll modify the workflow to queue tasks first, then attempt to acquire the lock only when the task is about to be executed. This ensures that even if a tablet's compaction task is queued on one BE, another idle BE can still perform compaction on that tablet, resulting in better resource utilization across the cluster.
… access to compaction maps (#50819) Related PR: #49882 Problem Summary: *** Query id: 0-0 *** *** is nereids: 0 *** *** tablet id: 0 *** *** Aborted at 1746727905 (unix time) try "date -d @1746727905" if you are using GNU date *** *** Current BE git commitID: ace825a *** *** SIGSEGV address not mapped to object (@0x8) received by PID 3151893 (TID 3152363 OR 0x7f1186c00640) from PID 8; stack trace: *** 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/common/signal_handler.h:421 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 3# 0x00007F12D9FEE520 in /lib/x86_64-linux-gnu/libc.so.6 4# std::_Hashtable<long, std::pair<long const, std::shared_ptr<doris::CloudBaseCompaction> >, std::allocator<std::pair<long const, std::shared_ptr<doris::CloudBaseCompaction> > >, std::__detail::_Select1st, std::equal_to<long>, std::hash<long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_find_before_node(unsigned long, long const&, unsigned long) const at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/hashtable.h:1817 5# std::pair<std::__detail::_Node_iterator<std::pair<long const, std::shared_ptr<doris::CloudBaseCompaction> >, false, false>, bool> std::_Hashtable<long, std::pair<long const, std::shared_ptr<doris::CloudBaseCompaction> >, std::allocator<std::pair<long const, std::shared_ptr<doris::CloudBaseCompaction> > >, std::__detail::_Select1st, std::equal_to<long>, std::hash<long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_emplace<long, decltype(nullptr)>(std::integral_constant<bool, true>, long&&, decltype(nullptr)&&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/hashtable.h:1947 6# doris::CloudStorageEngine::_submit_base_compaction_task(std::shared_ptr<doris::CloudTablet> const&) in /mnt/hdd01/PERFORMANCE_ENV/be/lib/doris_be 7# doris::CloudStorageEngine::submit_compaction_task(std::shared_ptr<doris::CloudTablet> const&, doris::CompactionType) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/cloud/cloud_storage_engine.cpp:917 8# doris::CloudStorageEngine::_compaction_tasks_producer_callback() at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/cloud/cloud_storage_engine.cpp:494 9# doris::Thread::supervise_thread(void*) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/util/thread.cpp:499 10# start_thread at ./nptl/pthread_create.c:442 11# 0x00007F12DA0D2850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
… access to compaction maps (#50819) Related PR: #49882 Problem Summary: *** Query id: 0-0 *** *** is nereids: 0 *** *** tablet id: 0 *** *** Aborted at 1746727905 (unix time) try "date -d @1746727905" if you are using GNU date *** *** Current BE git commitID: ace825a *** *** SIGSEGV address not mapped to object (@0x8) received by PID 3151893 (TID 3152363 OR 0x7f1186c00640) from PID 8; stack trace: *** 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/common/signal_handler.h:421 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 3# 0x00007F12D9FEE520 in /lib/x86_64-linux-gnu/libc.so.6 4# std::_Hashtable<long, std::pair<long const, std::shared_ptr<doris::CloudBaseCompaction> >, std::allocator<std::pair<long const, std::shared_ptr<doris::CloudBaseCompaction> > >, std::__detail::_Select1st, std::equal_to<long>, std::hash<long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_find_before_node(unsigned long, long const&, unsigned long) const at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/hashtable.h:1817 5# std::pair<std::__detail::_Node_iterator<std::pair<long const, std::shared_ptr<doris::CloudBaseCompaction> >, false, false>, bool> std::_Hashtable<long, std::pair<long const, std::shared_ptr<doris::CloudBaseCompaction> >, std::allocator<std::pair<long const, std::shared_ptr<doris::CloudBaseCompaction> > >, std::__detail::_Select1st, std::equal_to<long>, std::hash<long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_emplace<long, decltype(nullptr)>(std::integral_constant<bool, true>, long&&, decltype(nullptr)&&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/hashtable.h:1947 6# doris::CloudStorageEngine::_submit_base_compaction_task(std::shared_ptr<doris::CloudTablet> const&) in /mnt/hdd01/PERFORMANCE_ENV/be/lib/doris_be 7# doris::CloudStorageEngine::submit_compaction_task(std::shared_ptr<doris::CloudTablet> const&, doris::CompactionType) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/cloud/cloud_storage_engine.cpp:917 8# doris::CloudStorageEngine::_compaction_tasks_producer_callback() at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/cloud/cloud_storage_engine.cpp:494 9# doris::Thread::supervise_thread(void*) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/util/thread.cpp:499 10# start_thread at ./nptl/pthread_create.c:442 11# 0x00007F12DA0D2850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
…apache#49882) Background: In cloud mode, compaction tasks for the same tablet may be scheduled across multiple BEs. To ensure that only one BE can execute a compaction task for a given tablet at a time, a global locking mechanism is used. During compaction preparation, tablet and compaction information is written as key-value pairs to the metadata service. A background thread periodically renews the lease. Other BEs can only perform compaction on a tablet when the KV entry has expired or doesn't exist, ensuring that a tablet's compaction occurs on only one BE at a time. Problem: Compaction tasks are processed through a thread pool. Currently, we first prepare compaction and acquire the global lock before queueing the task. If a BE is under heavy compaction pressure with all threads occupied, tablets may wait in the queue for extended periods. Meanwhile, other idle BEs cannot perform compaction on these tablets because they cannot acquire the global lock, leading to resource imbalance with some BEs starved and others overloaded. Solution: To address this issue, we'll modify the workflow to queue tasks first, then attempt to acquire the lock only when the task is about to be executed. This ensures that even if a tablet's compaction task is queued on one BE, another idle BE can still perform compaction on that tablet, resulting in better resource utilization across the cluster.
… access to compaction maps (apache#50819) Related PR: apache#49882 Problem Summary: *** Query id: 0-0 *** *** is nereids: 0 *** *** tablet id: 0 *** *** Aborted at 1746727905 (unix time) try "date -d @1746727905" if you are using GNU date *** *** Current BE git commitID: ace825a *** *** SIGSEGV address not mapped to object (@0x8) received by PID 3151893 (TID 3152363 OR 0x7f1186c00640) from PID 8; stack trace: *** 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/common/signal_handler.h:421 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 3# 0x00007F12D9FEE520 in /lib/x86_64-linux-gnu/libc.so.6 4# std::_Hashtable<long, std::pair<long const, std::shared_ptr<doris::CloudBaseCompaction> >, std::allocator<std::pair<long const, std::shared_ptr<doris::CloudBaseCompaction> > >, std::__detail::_Select1st, std::equal_to<long>, std::hash<long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_find_before_node(unsigned long, long const&, unsigned long) const at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/hashtable.h:1817 5# std::pair<std::__detail::_Node_iterator<std::pair<long const, std::shared_ptr<doris::CloudBaseCompaction> >, false, false>, bool> std::_Hashtable<long, std::pair<long const, std::shared_ptr<doris::CloudBaseCompaction> >, std::allocator<std::pair<long const, std::shared_ptr<doris::CloudBaseCompaction> > >, std::__detail::_Select1st, std::equal_to<long>, std::hash<long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_emplace<long, decltype(nullptr)>(std::integral_constant<bool, true>, long&&, decltype(nullptr)&&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/hashtable.h:1947 6# doris::CloudStorageEngine::_submit_base_compaction_task(std::shared_ptr<doris::CloudTablet> const&) in /mnt/hdd01/PERFORMANCE_ENV/be/lib/doris_be 7# doris::CloudStorageEngine::submit_compaction_task(std::shared_ptr<doris::CloudTablet> const&, doris::CompactionType) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/cloud/cloud_storage_engine.cpp:917 8# doris::CloudStorageEngine::_compaction_tasks_producer_callback() at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/cloud/cloud_storage_engine.cpp:494 9# doris::Thread::supervise_thread(void*) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/util/thread.cpp:499 10# start_thread at ./nptl/pthread_create.c:442 11# 0x00007F12DA0D2850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Background:
In cloud mode, compaction tasks for the same tablet may be scheduled across multiple BEs. To ensure that only one BE can execute a compaction task for a given tablet at a time, a global locking mechanism is used.
During compaction preparation, tablet and compaction information is written as key-value pairs to the metadata service. A background thread periodically renews the lease. Other BEs can only perform compaction on a tablet when the KV entry has expired or doesn't exist, ensuring that a tablet's compaction occurs on only one BE at a time.
Problem:
Compaction tasks are processed through a thread pool. Currently, we first prepare compaction and acquire the global lock before queueing the task. If a BE is under heavy compaction pressure with all threads occupied, tablets may wait in the queue for extended periods. Meanwhile, other idle BEs cannot perform compaction on these tablets because they cannot acquire the global lock, leading to resource imbalance with some BEs starved and others overloaded.
Solution:
To address this issue, we'll modify the workflow to queue tasks first, then attempt to acquire the lock only when the task is about to be executed. This ensures that even if a tablet's compaction task is queued on one BE, another idle BE can still perform compaction on that tablet, resulting in better resource utilization across the cluster.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)