-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](load_stream) Fix flush token deadlock by ensuring wait_for_flush_tasks is called before destruction #60284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
…ait_for_flush_tasks is called before destruction apache#60284 Problem: When TabletStream is destroyed without pre_close() being called (e.g., on_idle_timeout scenario), the _flush_token destructor calls shutdown() which triggers deadlock detection if called from the pool thread. Root cause: - on_idle_timeout() directly calls brpc::StreamClose() without calling LoadStream::close() - This triggers the destruction chain without calling pre_close() on TabletStreams - If flush tasks are still running, TabletStream may be destroyed in pool thread Solution: - Add IndexStream::~IndexStream() to ensure wait_for_flush_tasks() is called on all TabletStreams - Add TabletStream::wait_for_flush_tasks() to wait for all flush tasks to complete - This ensures _flush_token is properly handled before TabletStream destruction - Revert commit c18ef17 (shared_from_this) as it is no longer needed
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a deadlock issue that occurs when TabletStream is destroyed without pre_close() being called (e.g., in the on_idle_timeout scenario). The deadlock happened because the flush token's destructor calls shutdown(), which performs deadlock detection and throws if called from the pool thread itself.
Changes:
- Reverts PR #60148's use of
shared_from_this()in async lambdas, replacing with rawthispointer - Adds
TabletStream::wait_for_flush_tasks()to ensure flush tasks complete before destruction - Adds
IndexStream::~IndexStream()to callwait_for_flush_tasks()on all tablet streams - Refactors
TabletStream::pre_close()to call the newwait_for_flush_tasks()method
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| be/src/runtime/load_stream.h | Removes std::enable_shared_from_this from TabletStream, adds wait_for_flush_tasks() method and _flush_tasks_done flag, adds IndexStream destructor |
| be/src/runtime/load_stream.cpp | Implements wait_for_flush_tasks() and IndexStream destructor, reverts lambda captures from shared_from_this() to raw this, refactors pre_close() |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
86e21d6 to
c36ec0e
Compare
…ait_for_flush_tasks is called before destruction apache#60284 Problem: When TabletStream is destroyed without pre_close() being called (e.g., on_idle_timeout scenario), the _flush_token destructor calls shutdown() which triggers deadlock detection if called from the pool thread. Root cause: - on_idle_timeout() directly calls brpc::StreamClose() without calling LoadStream::close() - This triggers the destruction chain without calling pre_close() on TabletStreams - If flush tasks are still running, TabletStream may be destroyed in pool thread Solution: - Add IndexStream::~IndexStream() to ensure wait_for_flush_tasks() is called on all TabletStreams - Add TabletStream::wait_for_flush_tasks() to wait for all flush tasks to complete - This ensures _flush_token is properly handled before TabletStream destruction - Revert commit c18ef17 (shared_from_this) as it is no longer needed
|
run buildall |
TPC-H: Total hot run time: 32864 ms |
TPC-H: Total hot run time: 32400 ms |
ClickBench: Total hot run time: 28.29 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
…h_tasks is called before destruction Problem: When TabletStream is destroyed without pre_close() being called (e.g., on_idle_timeout scenario), the _flush_token destructor calls shutdown() which triggers deadlock detection if called from the pool thread. Root cause: - on_idle_timeout() directly calls brpc::StreamClose() without calling LoadStream::close() - This triggers the destruction chain without calling pre_close() on TabletStreams - If flush tasks are still running, TabletStream may be destroyed in pool thread Solution: - Add IndexStream::~IndexStream() to ensure wait_for_flush_tasks() is called on all TabletStreams - Add TabletStream::wait_for_flush_tasks() to wait for all flush tasks to complete - This ensures _flush_token is properly handled before TabletStream destruction - Revert commit c18ef17 (shared_from_this) as it is no longer needed
c36ec0e to
cc77be5
Compare
|
run buildall |
TPC-H: Total hot run time: 33039 ms |
ClickBench: Total hot run time: 28.37 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #60148
Problem Summary:
Problem:
When TabletStream is destroyed without pre_close() being called (e.g., on_idle_timeout scenario), the _flush_token destructor calls shutdown() which triggers deadlock detection if called from the pool thread.
Root cause:
Solution:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)