[fix](fe) Fix incorrect runtime filter locality in V2 translator for INTERSECT/EXCEPT#62314
[fix](fe) Fix incorrect runtime filter locality in V2 translator for INTERSECT/EXCEPT#62314BiteTheDDDDt wants to merge 1 commit intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
There was a problem hiding this comment.
Pull request overview
Fixes incorrect runtime filter locality computation in the Nereids V2 runtime-filter translator for set operations (INTERSECT/EXCEPT), aligning behavior with the V1 translator so filters can be delivered locally when source/target are in the same fragment.
Changes:
- Compute
isLocalTargetfor a runtime-filter target group by comparing source/target fragment IDs (and treatingCTEScanNodeas non-local). - Construct
RuntimeFilterTargetwith explicitisLocalTargetinstead of the 2-arg constructor that hard-codesfalse. - Add unit tests intended to validate runtime filter locality for INTERSECT/EXCEPT (including multi-child INTERSECT).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
fe/fe-core/src/main/java/org/apache/doris/nereids/glue/translator/RunTimeFilterTranslatorV2.java |
Correctly derives and propagates isLocalTarget when building legacy RuntimeFilterTargets for V2 translation. |
fe/fe-core/src/test/java/org/apache/doris/nereids/glue/translator/RunTimeFilterTranslatorV2Test.java |
Adds tests around runtime-filter locality for set operations (but currently the assertions don’t actually detect the reported bug). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| List<RuntimeFilter> runtimeFilters = planner.getRuntimeFilters(); | ||
| // INTERSECT may or may not generate runtime filters depending on stats/cost. | ||
| // If runtime filters are generated, verify locality flags are consistent. | ||
| for (RuntimeFilter rf : runtimeFilters) { | ||
| if (rf.getBuilderNode() instanceof SetOperationNode) { | ||
| verifyLocalityConsistency(rf); | ||
| } | ||
| } |
There was a problem hiding this comment.
These tests can pass without asserting anything if no runtime filters are generated (or if none have a SetOperationNode builder). That makes the suite ineffective and potentially flaky across environments/cost models. Consider asserting that at least one SetOperationNode-built runtime filter exists for the query (or forcing RF generation via session vars) before validating locality-related expectations.
| private void verifyLocalityConsistency(RuntimeFilter rf) { | ||
| boolean hasRemote = rf.hasRemoteTargets(); | ||
| // Use toThrift() to get hasLocalTargets since there's no public getter | ||
| boolean hasLocal = rf.toThrift().has_local_targets; | ||
| Assertions.assertNotEquals(hasRemote, hasLocal, | ||
| String.format("RuntimeFilter %d: hasRemoteTargets (%s) must differ from " | ||
| + "hasLocalTargets (%s). Both true or both false indicates a bug.", | ||
| rf.getFilterId().asInt(), hasRemote, hasLocal)); |
There was a problem hiding this comment.
verifyLocalityConsistency() currently only asserts hasRemoteTargets != hasLocalTargets. The reported bug (targets always marked remote-only: hasRemote=true, hasLocal=false) already satisfies this condition, so this assertion would have passed before the fix and won’t catch regressions. To test the fix, assert the expected locality (e.g., that some SetOperation RFs are local for these queries, or compute expected local/remote from builder/target fragment IDs and compare).
utafrali
left a comment
There was a problem hiding this comment.
The production fix is logically correct and faithfully mirrors V1 (RuntimeFilterTranslator) behavior, but the test suite has two compounding weaknesses: the core assertion passes with both the buggy and fixed code (the XOR invariant held before the fix too), and the tests may pass vacuously if the planner emits no filters under the test configuration. The tests need to assert the correct direction of locality (local vs remote) and guarantee at least one filter is actually exercised.
| private void verifyLocalityConsistency(RuntimeFilter rf) { | ||
| boolean hasRemote = rf.hasRemoteTargets(); | ||
| // Use toThrift() to get hasLocalTargets since there's no public getter | ||
| boolean hasLocal = rf.toThrift().has_local_targets; |
There was a problem hiding this comment.
This assertion does not distinguish the buggy behavior from the fixed behavior. Before the fix, all targets had isLocalTarget=false, which means hasRemoteTargets=true and hasLocal=false — these are already not equal, so assertNotEquals(hasRemote, hasLocal) passes with the buggy code too.
To actually validate the fix, the test needs to assert the direction of locality. For a two-table INTERSECT where the SetOperation node and its scan targets are in the same plan fragment, the expected result is hasLocal=true, hasRemote=false. Something like:
Assertions.assertTrue(hasLocal,
"Expected same-fragment targets to be marked local for filter " + rf.getFilterId().asInt());
Assertions.assertFalse(hasRemote,
"Expected no remote targets for same-fragment filter " + rf.getFilterId().asInt());This requires knowing the expected locality per test case, but without it the test is not a regression guard for the actual bug.
| // If runtime filters are generated, verify locality flags are consistent. | ||
| for (RuntimeFilter rf : runtimeFilters) { | ||
| if (rf.getBuilderNode() instanceof SetOperationNode) { | ||
| verifyLocalityConsistency(rf); |
There was a problem hiding this comment.
The comment "INTERSECT may or may not generate runtime filters depending on stats/cost" effectively admits the test can pass vacuously with an empty filter list. If runtimeFilters is empty (or contains no SetOperationNode-built filters), the for loop body never runs and the test proves nothing.
Consider adding an assertion that at least one filter was actually verified:
long checked = runtimeFilters.stream()
.filter(rf -> rf.getBuilderNode() instanceof SetOperationNode)
.peek(this::verifyLocalityConsistency)
.count();
Assertions.assertTrue(checked > 0, "Expected at least one SetOperationNode runtime filter to be present");Alternatively, pin enable_runtime_filter_prune=false or use a session variable that guarantees filter generation.
| * and not both false (the BE requires exactly one to be true). | ||
| * Before the fix, hasLocalTargets was always false and hasRemoteTargets | ||
| * was always true, regardless of actual fragment placement. | ||
| */ |
There was a problem hiding this comment.
Using rf.toThrift() to read has_local_targets is fragile: toThrift() is a serialization method that may have side effects or preconditions (e.g., requiring finalize() to have been called first), and it couples the test to the Thrift wire format rather than the Java model. Since the bug is about RuntimeFilterTarget.isLocalTarget, a cleaner approach would be to add a @VisibleForTesting method to RuntimeFilter that exposes hasLocalTargets directly, or to inspect RuntimeFilterTarget.isLocalTarget on the individual target objects if they are accessible.
| Expr srcExpr = ExpressionTranslator.translate(head.getSourceExpression(), ctx); | ||
|
|
||
| List<RuntimeFilterTarget> targets = new ArrayList<>(); | ||
| boolean isLocalTarget = true; |
There was a problem hiding this comment.
The isLocalTarget computation iterates filters once here and then again in the target-building loop below. V1 (RuntimeFilterTranslator) does this in a single pass with stream().allMatch(...). This is a minor inefficiency but also a style inconsistency with the code being mirrored. Consider:
boolean isLocalTarget = filters.stream()
.map(RuntimeFilterV2::getLegacyTargetNode)
.allMatch(t -> !(t instanceof CTEScanNode)
&& t.getFragmentId().equals(sourceNode.getFragmentId()));This is a direct port of the V1 logic and makes the parallel intent explicit.
| Expr srcExpr = ExpressionTranslator.translate(head.getSourceExpression(), ctx); | ||
|
|
||
| List<RuntimeFilterTarget> targets = new ArrayList<>(); | ||
| boolean isLocalTarget = true; |
There was a problem hiding this comment.
A brief inline comment explaining the all-or-nothing semantics would help future readers. Without it, the single shared flag looks like it might be a mistake. V1 has the same pattern without explanation, but this is the translation of a subtle BE constraint. Something like:
// Compute locality once for the whole group: if any target is in a different fragment
// or is a CTE scan, mark all targets as remote. BE requires has_remote_targets and
// has_local_targets to never both be true for the same RuntimeFilter.7951c2a to
b60b2de
Compare
…INTERSECT/EXCEPT ### What problem does this PR solve? Issue Number: close #xxx Problem Summary: RunTimeFilterTranslatorV2.translateRuntimeFilterGroup() used the 2-arg RuntimeFilterTarget constructor which hard-codes isLocalTarget=false, causing all V2-translated runtime filters (for INTERSECT/EXCEPT) to be incorrectly marked as remote-only. This leads to: - Filters always routed through global merge path instead of local delivery - Consumer registered in global_runtime_filter_mgr instead of local - Consumer wait time shortened (may cause RF timeout and miss filtering) - Suboptimal bloom filter sizing due to incorrect needShuffle computation The fix computes isLocalTarget by comparing target and source fragment IDs, consistent with the V1 translator (RuntimeFilterTranslator) approach. Uses allMatch semantics (if any target is remote, all are marked remote) to satisfy the BE constraint that has_remote_targets != has_local_targets. ### Release note None ### Check List (For Author) - Test: Unit Test (RunTimeFilterTranslatorV2Test) - Behavior changed: No (bug fix, previously always remote, now correctly computed) - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
b60b2de to
d1c0ef7
Compare
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
Problem Summary:
RunTimeFilterTranslatorV2.translateRuntimeFilterGroup()used the 2-argRuntimeFilterTargetconstructor which hard-codesisLocalTarget=false, causing all V2-translated runtime filters (for INTERSECT/EXCEPT) to be incorrectly marked as remote-only.This leads to:
global_runtime_filter_mgrinstead oflocal_runtime_filter_mgrneedShufflecomputationThe fix computes
isLocalTargetby comparing target and source fragment IDs, consistent with the V1 translator (RuntimeFilterTranslator) approach. UsesallMatchsemantics (if any target is remote, all are marked remote) to satisfy the BE constraint thathas_remote_targets != has_local_targets.