Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix the bug of SinkIOBuffer user-after-free #40674

Merged
merged 3 commits into from
Feb 4, 2024

Conversation

trueeyu
Copy link
Contributor

@trueeyu trueeyu commented Feb 4, 2024

Why I'm doing:

This problem has been fixed about 6 times, but it still hasn’t been completely fixed.

Previous PRs that fixed this issue:

  1. fix issues in export sink operator: [BugFix] fix issues in export sink operator #15915
  2. Fix driver pending due to unclosed sink buffer: [BugFix] Fix driver pending due to unclosed sink buffer #24057
  3. Ensure SinkIOBuffer outlives execution queue: [BugFix] Ensure SinkIOBuffer outlives execution queue #25245
  4. Revert "[BugFix] Ensure SinkIOBuffer outlives execution queue": Revert "[BugFix] Ensure SinkIOBuffer outlives execution queue" #25468
  5. Fix SinkIOBuffer use-after-free by skipping stop task: [BugFix] Fix SinkIOBuffer use-after-free by skipping stop task #26028
  6. Fix sink operator dcheck fail occasionall: [BugFix] fix sink operator dcheck fail occasionally #38094

Currently branch-main can reproduce this bug by adding sleep:

#40633

The bug in PR3 is that: join io queue in bthread io task so gets stuck.
The bug in PR5 is that the io thread has not exited and _is_finished has already been set to True, causing the Operator to be destructed first.
The bug in PR6. The reason is that set_finishing puts the request into the queue first, and then ++_num_pending_chunks, causing problems with DCHECK.

What I'm doing:

Repair principles:

  • The bthread io queue exits first and then the Operator is destructed.
  • Operator close don't be stuck for too long, otherwise the pipeline scheduling thread will be stuck.

How to fix:

  1. When SinkIOBuffer is destructed, you need to wait for the bthread queue to exit.
  2. Currently, it is only possible to exit when there is only one finish task in the queue, so the exit will be fast and the SinkIOBuffer destructor will not be stuck.
  3. If Operator prepare failed, there is no chance to stop queue, so will should stop queue in destructor of SinkIOBuffer. It is safe to call queue::stop multiple times.

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.2
    • 3.1
    • 3.0
    • 2.5

Signed-off-by: trueeyu <lxhhust350@qq.com>
@trueeyu trueeyu requested a review from a team as a code owner February 4, 2024 06:06
@mergify mergify bot assigned trueeyu Feb 4, 2024
@github-actions github-actions bot added the 2.5 label Feb 4, 2024
LOG(WARNING) << "SinkIOBuffer join queue failed: " << _exec_queue_id->value;
}
}
}

virtual Status prepare(RuntimeState* state, RuntimeProfile* parent_profile) = 0;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most risky bug in this code is:
Memory leak due to not deleting _exec_queue_id after its use.

You can modify the code like this:

@@ -53,7 +53,19 @@ class SinkIOBuffer {
 public:
     SinkIOBuffer(int32_t num_sinkers) : _num_result_sinkers(num_sinkers), _exec_queue_id(nullptr) {}
 
     virtual ~SinkIOBuffer() {
         if (_exec_queue_id != nullptr) {
             // If `Operator` prepare failed, there is no chance to stop queue, so will should stop queue here.
             // It is safe to call stop multiple times.
             if (bthread::execution_queue_stop(*_exec_queue_id) != 0) {
                 LOG(WARNING) << "SinkIOBuffer stop queue failed: " << _exec_queue_id->value;
             }
             if (bthread::execution_queue_join(*_exec_queue_id) != 0) {
                 LOG(WARNING) << "SinkIOBuffer join queue failed: " << _exec_queue_id->value;
             }
+            delete _exec_queue_id;  // Free the allocated memory for _exec_queue_id.
+            _exec_queue_id = nullptr;  // Safely point _exec_queue_id to nullptr after deletion.
         }
     }
 
     virtual Status prepare(RuntimeState* state, RuntimeProfile* parent_profile) = 0;
 

This modification ensures that the memory allocated for _exec_queue_id is properly released when the SinkIOBuffer destructor is called, thus preventing a memory leak.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_exec_queue_id is std::unique_ptr, so it's safe here.

Signed-off-by: trueeyu <lxhhust350@qq.com>
satanson
satanson previously approved these changes Feb 4, 2024
Signed-off-by: trueeyu <lxhhust350@qq.com>
Copy link

github-actions bot commented Feb 4, 2024

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

@trueeyu trueeyu merged commit fe74760 into StarRocks:main Feb 4, 2024
41 checks passed
Copy link

github-actions bot commented Feb 4, 2024

@Mergifyio backport branch-3.2

@github-actions github-actions bot removed the 3.2 label Feb 4, 2024
Copy link

github-actions bot commented Feb 4, 2024

@Mergifyio backport branch-3.1

Copy link

github-actions bot commented Feb 4, 2024

@Mergifyio backport branch-3.0

Copy link

github-actions bot commented Feb 4, 2024

@Mergifyio backport branch-2.5

@github-actions github-actions bot removed the 2.5 label Feb 4, 2024
Copy link
Contributor

mergify bot commented Feb 4, 2024

backport branch-3.2

✅ Backports have been created

Copy link
Contributor

mergify bot commented Feb 4, 2024

backport branch-3.1

✅ Backports have been created

Copy link
Contributor

mergify bot commented Feb 4, 2024

backport branch-3.0

✅ Backports have been created

Copy link
Contributor

mergify bot commented Feb 4, 2024

backport branch-2.5

✅ Backports have been created

Copy link

github-actions bot commented Feb 4, 2024

[BE Incremental Coverage Report]

fail : 6 / 8 (75.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 be/src/exec/pipeline/sink/export_sink_operator.cpp 0 1 00.00% [114]
🔵 be/src/exec/pipeline/sink/mysql_table_sink_operator.cpp 0 1 00.00% [110]
🔵 be/src/exec/pipeline/sink/sink_io_buffer.h 5 5 100.00% []
🔵 be/src/exec/pipeline/sink/file_sink_operator.cpp 1 1 100.00% []

mergify bot pushed a commit that referenced this pull request Feb 4, 2024
Signed-off-by: trueeyu <lxhhust350@qq.com>
(cherry picked from commit fe74760)
mergify bot pushed a commit that referenced this pull request Feb 4, 2024
Signed-off-by: trueeyu <lxhhust350@qq.com>
(cherry picked from commit fe74760)
mergify bot pushed a commit that referenced this pull request Feb 4, 2024
Signed-off-by: trueeyu <lxhhust350@qq.com>
(cherry picked from commit fe74760)

# Conflicts:
#	be/src/exec/pipeline/sink/export_sink_operator.cpp
#	be/src/exec/pipeline/sink/file_sink_operator.cpp
#	be/src/exec/pipeline/sink/mysql_table_sink_operator.cpp
#	be/test/exec/sink/sink_io_buffer_test.cpp
mergify bot pushed a commit that referenced this pull request Feb 4, 2024
Signed-off-by: trueeyu <lxhhust350@qq.com>
(cherry picked from commit fe74760)

# Conflicts:
#	be/src/exec/pipeline/sink/export_sink_operator.cpp
#	be/src/exec/pipeline/sink/file_sink_operator.cpp
#	be/src/exec/pipeline/sink/mysql_table_sink_operator.cpp
#	be/test/exec/sink/sink_io_buffer_test.cpp
wanpengfei-git pushed a commit that referenced this pull request Feb 4, 2024
#40708)

Signed-off-by: trueeyu <lxhhust350@qq.com>
Co-authored-by: trueeyu <lxhhust350@qq.com>
wanpengfei-git pushed a commit that referenced this pull request Feb 5, 2024
wanpengfei-git pushed a commit that referenced this pull request Feb 6, 2024
#40709)

Signed-off-by: trueeyu <lxhhust350@qq.com>
Co-authored-by: trueeyu <lxhhust350@qq.com>
wanpengfei-git pushed a commit that referenced this pull request Feb 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants