[fix](be) Fix runtime filter crash with shared hash table#63257
[fix](be) Fix runtime filter crash with shared hash table#63257BiteTheDDDDt wants to merge 1 commit into
Conversation
### What problem does this PR solve? Issue Number: close #xxx Related PR: apache#49556, apache#61768, apache#62056, apache#60563 Problem Summary: Shared hash table broadcast joins can wake non-builder hash join sink tasks after the builder was terminated early. The non-builder runtime filter path then reads a missing shared runtime filter wrapper from the map and may install a null wrapper, causing RuntimeFilterWrapper::set_state to segfault while publishing runtime filters during close. Only signal non-builder tasks when the builder actually built the shared hash table, and convert shared runtime filter wrapper map assumptions into Status errors instead of DCHECK/operator[]. ### Release note Fixed a BE crash when publishing runtime filters for shared hash table hash joins. ### Check List (For Author) - Test: Manual test - build-support/check-format.sh - git diff --check - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
There was a problem hiding this comment.
Pull request overview
This PR hardens the BE shared-hash-table runtime-filter path for broadcast hash joins, preventing non-builder tasks from using uninitialized shared runtime-filter wrappers after the builder is terminated early.
Changes:
- Avoid marking the shared hash table as signaled when the builder task was terminated.
- Replace shared runtime-filter map
DCHECK/operator[]assumptions with explicitStatus::InternalErrorvalidation.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
be/src/runtime_filter/runtime_filter_producer_helper.cpp |
Adds validation for shared runtime-filter wrapper lookup and duplicate insertion. |
be/src/pipeline/exec/hashjoin_build_sink.cpp |
Prevents signaling non-builder tasks when the shared hash-table builder was terminated before initialization. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| auto it = runtime_filters.find(filter_id); | ||
| if (it == runtime_filters.end() || it->second == nullptr) { | ||
| return Status::InternalError( | ||
| "runtime_filters does not contain valid filter_id {} when not building " | ||
| "hash table", | ||
| filter_id); |
| // Only signal non-builder tasks when the builder actually built the hash table. | ||
| // When the builder is terminated early, process_build_block() has not initialized the | ||
| // shared hash table or runtime filter wrappers, so non-builders must return EOF. | ||
| if (!_terminated) { | ||
| p._signaled = true; | ||
| } |
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
What problem does this PR solve?
Issue Number: close #xxx
Problem Summary: Fix BE SIGSEGV in branch-4.0 when hash join build sink publishes runtime filters during close. The crash point is
RuntimeFilterWrapper::set_state(), and the direct cause is that aRuntimeFilterProducercan hold a null_wrapperin the shared hash table runtime filter path.Root cause
Bug introduced by: #49556
#49556 refactored the broadcast/shared hash table controller and introduced the shared runtime filter wrapper handoff:
HashJoinBuildSinkOperatorX::_runtime_filters.use_shared_table=trueand_should_build_hash_table=false.HashJoinBuildSinkLocalState::close()unconditionally sets_signaled = truefor shared hash table, even if the builder task was terminated before it built the hash table and filled_runtime_filters.DCHECK(runtime_filters.contains(...))and thenruntime_filters[filter_id]. In release builds the DCHECK is disabled, so a missing map entry inserts a default nullshared_ptr._wrapper->set_state(), causing the SIGSEGV inRuntimeFilterWrapper::set_state().I verified the branch-4.0 blame: both the unconditional
_signaled = trueand theruntime_filters[filter_id]shared-wrapper path come from #49556.Fix
Fixed on master by: #62056
#62056 fixed this shared hash table race by only setting
_signaled = truewhen the builder task was not terminated. If the builder is terminated early, non-builder tasks return EOF and do not enter the shared hash table/runtime filter path with uninitialized shared state.This PR is a branch-4.0 pick/backport of the necessary #62056 logic.
It also picks the relevant defensive idea from #60563: replace the runtime filter
DCHECK + operator[]assumption with explicitStatus::InternalErrorchecks, so a missing/null wrapper returns an error instead of inserting a null wrapper and crashing.Related PR
_signaledrace on master.Release note
Fixed a BE crash when publishing runtime filters for shared hash table hash joins.
Check List (For Author)