Skip to content

[fix](be) Avoid local runtime filter merge deadlock#65101

Open
BiteTheDDDDt wants to merge 1 commit into
apache:branch-4.0from
BiteTheDDDDt:codex/pick-64866-branch-4.0
Open

[fix](be) Avoid local runtime filter merge deadlock#65101
BiteTheDDDDt wants to merge 1 commit into
apache:branch-4.0from
BiteTheDDDDt:codex/pick-64866-branch-4.0

Conversation

@BiteTheDDDDt

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: None

Related PR: #64866

Problem Summary: Backport #64866 to branch-4.0. Local runtime filter merge can deadlock because the old local merge context lock protected both the merger and producer list, allowing lock inversion with producer/merger locks. This backport snapshots the local merge context/producers under RuntimeFilterMgr lock and performs merge, size, and debug work after releasing it.

branch-4.0 does not contain the recursive CTE stage/reset runtime-filter machinery that exists on newer branches, so this backport keeps the core lock-order fix in branch-4.0's existing non-stage runtime-filter flow.

Release note

None

Check List (For Author)

  • Test: Unit Test
    • ./run-be-ut.sh --run --filter=RuntimeFilterMgrTest.*
    • ./run-be-ut.sh --run --filter=RuntimeFilterMergerTest.*
    • git diff --check upstream/branch-4.0..HEAD
  • Behavior changed: No
  • Does this need documentation: No

Issue Number: None

Related PR: None

Problem Summary: Local runtime filter merge can deadlock when one join
build instance publishes a local-merge runtime filter while another
instance sends its runtime filter size. The old local merge context lock
protected both the merger and the producer list, so one path could hold
a producer runtime filter lock and then wait for the context lock while
another path held the context lock and then waited for a producer lock.

This change gives RuntimeFilterMerger its own internal synchronization
and makes LocalMergeContext expose a snapshot of the merger and
producers. Publish, send-size, and sync-size paths take the context lock
only while copying that snapshot, then merge filters or update producer
sizes outside the context lock. RuntimeFilterMerger returns the ready
transition from merge_from directly, removing the separate unlocked
ready check.

None

- Test: Unit Test
- build-support/clang-format.sh
be/src/exec/runtime_filter/runtime_filter_merger.h
be/src/exec/runtime_filter/runtime_filter_mgr.cpp
be/src/exec/runtime_filter/runtime_filter_mgr.h
be/src/exec/runtime_filter/runtime_filter_producer.cpp
be/test/exec/runtime_filter/runtime_filter_merger_test.cpp
be/test/exec/runtime_filter/runtime_filter_mgr_test.cpp
    - git diff --cached --check
    - ./run-be-ut.sh --run --filter=RuntimeFilterMgrTest.*
    - ./run-be-ut.sh --run --filter=RuntimeFilterMergerTest.*
- Behavior changed: No
- Does this need documentation: No

(cherry picked from commit 9d7d3a2)
@BiteTheDDDDt BiteTheDDDDt marked this pull request as ready for review July 1, 2026 09:42
@BiteTheDDDDt BiteTheDDDDt requested a review from morningman as a code owner July 1, 2026 09:42
@BiteTheDDDDt

Copy link
Copy Markdown
Contributor Author

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant