[fix](fe) Fix incorrect runtime filter locality in V2 translator for INTERSECT/EXCEPT#62314

Open

BiteTheDDDDt wants to merge 1 commit intoapache:masterfrom

BiteTheDDDDt:fix-rf-v2-locality

Contributor

BiteTheDDDDt commented Apr 10, 2026 •

edited

Loading

Problem Summary:

RunTimeFilterTranslatorV2.translateRuntimeFilterGroup() used the 2-arg RuntimeFilterTarget constructor which hard-codes isLocalTarget=false, causing all V2-translated runtime filters (for INTERSECT/EXCEPT) to be incorrectly marked as remote-only.

This leads to:

Filters always routed through global merge path instead of local delivery
Consumer registered in global_runtime_filter_mgr instead of local_runtime_filter_mgr
Consumer wait time shortened (may cause RF timeout and miss filtering)
Suboptimal bloom filter sizing due to incorrect needShuffle computation

The fix computes isLocalTarget by comparing target and source fragment IDs, consistent with the V1 translator (RuntimeFilterTranslator) approach. Uses allMatch semantics (if any target is remote, all are marked remote) to satisfy the BE constraint that has_remote_targets != has_local_targets.

Copilot AI review requested due to automatic review settings

April 10, 2026 04:09

Contributor

Thearas commented Apr 10, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

Copilot started reviewing on behalf of BiteTheDDDDt

April 10, 2026 04:10

Copilot AI reviewed

View reviewed changes

Contributor

Copilot AI left a comment

Pull request overview

Fixes incorrect runtime filter locality computation in the Nereids V2 runtime-filter translator for set operations (INTERSECT/EXCEPT), aligning behavior with the V1 translator so filters can be delivered locally when source/target are in the same fragment.

Changes:

Compute isLocalTarget for a runtime-filter target group by comparing source/target fragment IDs (and treating CTEScanNode as non-local).
Construct RuntimeFilterTarget with explicit isLocalTarget instead of the 2-arg constructor that hard-codes false.
Add unit tests intended to validate runtime filter locality for INTERSECT/EXCEPT (including multi-child INTERSECT).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`fe/fe-core/src/main/java/org/apache/doris/nereids/glue/translator/RunTimeFilterTranslatorV2.java`	Correctly derives and propagates `isLocalTarget` when building legacy `RuntimeFilterTarget`s for V2 translation.
`fe/fe-core/src/test/java/org/apache/doris/nereids/glue/translator/RunTimeFilterTranslatorV2Test.java`	Adds tests around runtime-filter locality for set operations (but currently the assertions don’t actually detect the reported bug).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

...re/src/test/java/org/apache/doris/nereids/glue/translator/RunTimeFilterTranslatorV2Test.java Outdated

Comment on lines +74 to +81

+                      List<RuntimeFilter> runtimeFilters = planner.getRuntimeFilters();
+                      // INTERSECT may or may not generate runtime filters depending on stats/cost.
+                      // If runtime filters are generated, verify locality flags are consistent.
+                      for (RuntimeFilter rf : runtimeFilters) {
+                          if (rf.getBuilderNode() instanceof SetOperationNode) {
+                              verifyLocalityConsistency(rf);
+                          }
+                      }

Copilot AI Apr 10, 2026

These tests can pass without asserting anything if no runtime filters are generated (or if none have a SetOperationNode builder). That makes the suite ineffective and potentially flaky across environments/cost models. Consider asserting that at least one SetOperationNode-built runtime filter exists for the query (or forcing RF generation via session vars) before validating locality-related expectations.

Copilot uses AI. Check for mistakes.

...re/src/test/java/org/apache/doris/nereids/glue/translator/RunTimeFilterTranslatorV2Test.java Outdated

Comment on lines +127 to +134

+                  private void verifyLocalityConsistency(RuntimeFilter rf) {
+                      boolean hasRemote = rf.hasRemoteTargets();
+                      // Use toThrift() to get hasLocalTargets since there's no public getter
+                      boolean hasLocal = rf.toThrift().has_local_targets;
+                      Assertions.assertNotEquals(hasRemote, hasLocal,
+                              String.format("RuntimeFilter %d: hasRemoteTargets (%s) must differ from "
+                                      + "hasLocalTargets (%s). Both true or both false indicates a bug.",
+                                      rf.getFilterId().asInt(), hasRemote, hasLocal));

Copilot AI Apr 10, 2026

verifyLocalityConsistency() currently only asserts hasRemoteTargets != hasLocalTargets. The reported bug (targets always marked remote-only: hasRemote=true, hasLocal=false) already satisfies this condition, so this assertion would have passed before the fix and won’t catch regressions. To test the fix, assert the expected locality (e.g., that some SetOperation RFs are local for these queries, or compute expected local/remote from builder/target fragment IDs and compare).

Copilot uses AI. Check for mistakes.

utafrali reviewed

View reviewed changes

utafrali left a comment

The production fix is logically correct and faithfully mirrors V1 (RuntimeFilterTranslator) behavior, but the test suite has two compounding weaknesses: the core assertion passes with both the buggy and fixed code (the XOR invariant held before the fix too), and the tests may pass vacuously if the planner emits no filters under the test configuration. The tests need to assert the correct direction of locality (local vs remote) and guarantee at least one filter is actually exercised.

...re/src/test/java/org/apache/doris/nereids/glue/translator/RunTimeFilterTranslatorV2Test.java Outdated

+                  private void verifyLocalityConsistency(RuntimeFilter rf) {
+                      boolean hasRemote = rf.hasRemoteTargets();
+                      // Use toThrift() to get hasLocalTargets since there's no public getter
+                      boolean hasLocal = rf.toThrift().has_local_targets;

utafrali Apr 10, 2026

This assertion does not distinguish the buggy behavior from the fixed behavior. Before the fix, all targets had isLocalTarget=false, which means hasRemoteTargets=true and hasLocal=false — these are already not equal, so assertNotEquals(hasRemote, hasLocal) passes with the buggy code too.

To actually validate the fix, the test needs to assert the direction of locality. For a two-table INTERSECT where the SetOperation node and its scan targets are in the same plan fragment, the expected result is hasLocal=true, hasRemote=false. Something like:

Assertions.assertTrue(hasLocal,
    "Expected same-fragment targets to be marked local for filter " + rf.getFilterId().asInt());
Assertions.assertFalse(hasRemote,
    "Expected no remote targets for same-fragment filter " + rf.getFilterId().asInt());

This requires knowing the expected locality per test case, but without it the test is not a regression guard for the actual bug.

...re/src/test/java/org/apache/doris/nereids/glue/translator/RunTimeFilterTranslatorV2Test.java Outdated

+                      // If runtime filters are generated, verify locality flags are consistent.
+                      for (RuntimeFilter rf : runtimeFilters) {
+                          if (rf.getBuilderNode() instanceof SetOperationNode) {
+                              verifyLocalityConsistency(rf);

utafrali Apr 10, 2026

The comment "INTERSECT may or may not generate runtime filters depending on stats/cost" effectively admits the test can pass vacuously with an empty filter list. If runtimeFilters is empty (or contains no SetOperationNode-built filters), the for loop body never runs and the test proves nothing.

Consider adding an assertion that at least one filter was actually verified:

long checked = runtimeFilters.stream()
    .filter(rf -> rf.getBuilderNode() instanceof SetOperationNode)
    .peek(this::verifyLocalityConsistency)
    .count();
Assertions.assertTrue(checked > 0, "Expected at least one SetOperationNode runtime filter to be present");

Alternatively, pin enable_runtime_filter_prune=false or use a session variable that guarantees filter generation.

...re/src/test/java/org/apache/doris/nereids/glue/translator/RunTimeFilterTranslatorV2Test.java Outdated

+                   * and not both false (the BE requires exactly one to be true).
+                   * Before the fix, hasLocalTargets was always false and hasRemoteTargets
+                   * was always true, regardless of actual fragment placement.
+                   */

utafrali Apr 10, 2026

Using rf.toThrift() to read has_local_targets is fragile: toThrift() is a serialization method that may have side effects or preconditions (e.g., requiring finalize() to have been called first), and it couples the test to the Thrift wire format rather than the Java model. Since the bug is about RuntimeFilterTarget.isLocalTarget, a cleaner approach would be to add a @VisibleForTesting method to RuntimeFilter that exposes hasLocalTargets directly, or to inspect RuntimeFilterTarget.isLocalTarget on the individual target objects if they are accessible.

...e-core/src/main/java/org/apache/doris/nereids/glue/translator/RunTimeFilterTranslatorV2.java

                       Expr srcExpr = ExpressionTranslator.translate(head.getSourceExpression(), ctx);
                       List<RuntimeFilterTarget> targets = new ArrayList<>();
+                      boolean isLocalTarget = true;

utafrali Apr 10, 2026

The isLocalTarget computation iterates filters once here and then again in the target-building loop below. V1 (RuntimeFilterTranslator) does this in a single pass with stream().allMatch(...). This is a minor inefficiency but also a style inconsistency with the code being mirrored. Consider:

boolean isLocalTarget = filters.stream()
    .map(RuntimeFilterV2::getLegacyTargetNode)
    .allMatch(t -> !(t instanceof CTEScanNode)
        && t.getFragmentId().equals(sourceNode.getFragmentId()));

This is a direct port of the V1 logic and makes the parallel intent explicit.

...e-core/src/main/java/org/apache/doris/nereids/glue/translator/RunTimeFilterTranslatorV2.java

                       Expr srcExpr = ExpressionTranslator.translate(head.getSourceExpression(), ctx);
                       List<RuntimeFilterTarget> targets = new ArrayList<>();
+                      boolean isLocalTarget = true;

utafrali Apr 10, 2026

A brief inline comment explaining the all-or-nothing semantics would help future readers. Without it, the single shared flag looks like it might be a mistake. V1 has the same pattern without explanation, but this is the translation of a subtle BE constraint. Something like:

// Compute locality once for the whole group: if any target is in a different fragment
// or is a CTE scan, mark all targets as remote. BE requires has_remote_targets and
// has_local_targets to never both be true for the same RuntimeFilter.

BiteTheDDDDt force-pushed the fix-rf-v2-locality branch from 7951c2a to b60b2de Compare

April 10, 2026 05:08


          [fix](fe) Fix incorrect runtime filter locality in V2 translator for …

d1c0ef7

…INTERSECT/EXCEPT

### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

RunTimeFilterTranslatorV2.translateRuntimeFilterGroup() used the 2-arg
RuntimeFilterTarget constructor which hard-codes isLocalTarget=false,
causing all V2-translated runtime filters (for INTERSECT/EXCEPT) to be
incorrectly marked as remote-only.

This leads to:
- Filters always routed through global merge path instead of local delivery
- Consumer registered in global_runtime_filter_mgr instead of local
- Consumer wait time shortened (may cause RF timeout and miss filtering)
- Suboptimal bloom filter sizing due to incorrect needShuffle computation

The fix computes isLocalTarget by comparing target and source fragment IDs,
consistent with the V1 translator (RuntimeFilterTranslator) approach. Uses
allMatch semantics (if any target is remote, all are marked remote) to
satisfy the BE constraint that has_remote_targets != has_local_targets.

### Release note

None

### Check List (For Author)

- Test: Unit Test (RunTimeFilterTranslatorV2Test)
- Behavior changed: No (bug fix, previously always remote, now correctly computed)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

BiteTheDDDDt force-pushed the fix-rf-v2-locality branch from b60b2de to d1c0ef7 Compare

April 10, 2026 05:17

Contributor Author

BiteTheDDDDt commented Apr 10, 2026

run buildall

Contributor

hello-stephen commented Apr 10, 2026

FE UT Coverage Report

Increment line coverage 55.56% (5/9) 🎉
Increment coverage report
Complete coverage report

englefly approved these changes

View reviewed changes

Contributor

github-actions bot commented Apr 11, 2026

PR approved by at least one committer and no changes requested.

github-actions bot added the approved label

Contributor

github-actions bot commented Apr 11, 2026

PR approved by anyone and no changes requested.

github-actions bot added the reviewed label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved reviewed