Skip to content

[fix](fe) Fix incorrect runtime filter locality in V2 translator for INTERSECT/EXCEPT#62314

Open
BiteTheDDDDt wants to merge 1 commit intoapache:masterfrom
BiteTheDDDDt:fix-rf-v2-locality
Open

[fix](fe) Fix incorrect runtime filter locality in V2 translator for INTERSECT/EXCEPT#62314
BiteTheDDDDt wants to merge 1 commit intoapache:masterfrom
BiteTheDDDDt:fix-rf-v2-locality

Conversation

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor

@BiteTheDDDDt BiteTheDDDDt commented Apr 10, 2026

Problem Summary:

RunTimeFilterTranslatorV2.translateRuntimeFilterGroup() used the 2-arg RuntimeFilterTarget constructor which hard-codes isLocalTarget=false, causing all V2-translated runtime filters (for INTERSECT/EXCEPT) to be incorrectly marked as remote-only.

This leads to:

  • Filters always routed through global merge path instead of local delivery
  • Consumer registered in global_runtime_filter_mgr instead of local_runtime_filter_mgr
  • Consumer wait time shortened (may cause RF timeout and miss filtering)
  • Suboptimal bloom filter sizing due to incorrect needShuffle computation

The fix computes isLocalTarget by comparing target and source fragment IDs, consistent with the V1 translator (RuntimeFilterTranslator) approach. Uses allMatch semantics (if any target is remote, all are marked remote) to satisfy the BE constraint that has_remote_targets != has_local_targets.

Copilot AI review requested due to automatic review settings April 10, 2026 04:09
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 10, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes incorrect runtime filter locality computation in the Nereids V2 runtime-filter translator for set operations (INTERSECT/EXCEPT), aligning behavior with the V1 translator so filters can be delivered locally when source/target are in the same fragment.

Changes:

  • Compute isLocalTarget for a runtime-filter target group by comparing source/target fragment IDs (and treating CTEScanNode as non-local).
  • Construct RuntimeFilterTarget with explicit isLocalTarget instead of the 2-arg constructor that hard-codes false.
  • Add unit tests intended to validate runtime filter locality for INTERSECT/EXCEPT (including multi-child INTERSECT).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
fe/fe-core/src/main/java/org/apache/doris/nereids/glue/translator/RunTimeFilterTranslatorV2.java Correctly derives and propagates isLocalTarget when building legacy RuntimeFilterTargets for V2 translation.
fe/fe-core/src/test/java/org/apache/doris/nereids/glue/translator/RunTimeFilterTranslatorV2Test.java Adds tests around runtime-filter locality for set operations (but currently the assertions don’t actually detect the reported bug).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +74 to +81
List<RuntimeFilter> runtimeFilters = planner.getRuntimeFilters();
// INTERSECT may or may not generate runtime filters depending on stats/cost.
// If runtime filters are generated, verify locality flags are consistent.
for (RuntimeFilter rf : runtimeFilters) {
if (rf.getBuilderNode() instanceof SetOperationNode) {
verifyLocalityConsistency(rf);
}
}
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests can pass without asserting anything if no runtime filters are generated (or if none have a SetOperationNode builder). That makes the suite ineffective and potentially flaky across environments/cost models. Consider asserting that at least one SetOperationNode-built runtime filter exists for the query (or forcing RF generation via session vars) before validating locality-related expectations.

Copilot uses AI. Check for mistakes.
Comment on lines +127 to +134
private void verifyLocalityConsistency(RuntimeFilter rf) {
boolean hasRemote = rf.hasRemoteTargets();
// Use toThrift() to get hasLocalTargets since there's no public getter
boolean hasLocal = rf.toThrift().has_local_targets;
Assertions.assertNotEquals(hasRemote, hasLocal,
String.format("RuntimeFilter %d: hasRemoteTargets (%s) must differ from "
+ "hasLocalTargets (%s). Both true or both false indicates a bug.",
rf.getFilterId().asInt(), hasRemote, hasLocal));
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verifyLocalityConsistency() currently only asserts hasRemoteTargets != hasLocalTargets. The reported bug (targets always marked remote-only: hasRemote=true, hasLocal=false) already satisfies this condition, so this assertion would have passed before the fix and won’t catch regressions. To test the fix, assert the expected locality (e.g., that some SetOperation RFs are local for these queries, or compute expected local/remote from builder/target fragment IDs and compare).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@utafrali utafrali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The production fix is logically correct and faithfully mirrors V1 (RuntimeFilterTranslator) behavior, but the test suite has two compounding weaknesses: the core assertion passes with both the buggy and fixed code (the XOR invariant held before the fix too), and the tests may pass vacuously if the planner emits no filters under the test configuration. The tests need to assert the correct direction of locality (local vs remote) and guarantee at least one filter is actually exercised.

private void verifyLocalityConsistency(RuntimeFilter rf) {
boolean hasRemote = rf.hasRemoteTargets();
// Use toThrift() to get hasLocalTargets since there's no public getter
boolean hasLocal = rf.toThrift().has_local_targets;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion does not distinguish the buggy behavior from the fixed behavior. Before the fix, all targets had isLocalTarget=false, which means hasRemoteTargets=true and hasLocal=false — these are already not equal, so assertNotEquals(hasRemote, hasLocal) passes with the buggy code too.

To actually validate the fix, the test needs to assert the direction of locality. For a two-table INTERSECT where the SetOperation node and its scan targets are in the same plan fragment, the expected result is hasLocal=true, hasRemote=false. Something like:

Assertions.assertTrue(hasLocal,
    "Expected same-fragment targets to be marked local for filter " + rf.getFilterId().asInt());
Assertions.assertFalse(hasRemote,
    "Expected no remote targets for same-fragment filter " + rf.getFilterId().asInt());

This requires knowing the expected locality per test case, but without it the test is not a regression guard for the actual bug.

// If runtime filters are generated, verify locality flags are consistent.
for (RuntimeFilter rf : runtimeFilters) {
if (rf.getBuilderNode() instanceof SetOperationNode) {
verifyLocalityConsistency(rf);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment "INTERSECT may or may not generate runtime filters depending on stats/cost" effectively admits the test can pass vacuously with an empty filter list. If runtimeFilters is empty (or contains no SetOperationNode-built filters), the for loop body never runs and the test proves nothing.

Consider adding an assertion that at least one filter was actually verified:

long checked = runtimeFilters.stream()
    .filter(rf -> rf.getBuilderNode() instanceof SetOperationNode)
    .peek(this::verifyLocalityConsistency)
    .count();
Assertions.assertTrue(checked > 0, "Expected at least one SetOperationNode runtime filter to be present");

Alternatively, pin enable_runtime_filter_prune=false or use a session variable that guarantees filter generation.

* and not both false (the BE requires exactly one to be true).
* Before the fix, hasLocalTargets was always false and hasRemoteTargets
* was always true, regardless of actual fragment placement.
*/
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using rf.toThrift() to read has_local_targets is fragile: toThrift() is a serialization method that may have side effects or preconditions (e.g., requiring finalize() to have been called first), and it couples the test to the Thrift wire format rather than the Java model. Since the bug is about RuntimeFilterTarget.isLocalTarget, a cleaner approach would be to add a @VisibleForTesting method to RuntimeFilter that exposes hasLocalTargets directly, or to inspect RuntimeFilterTarget.isLocalTarget on the individual target objects if they are accessible.

Expr srcExpr = ExpressionTranslator.translate(head.getSourceExpression(), ctx);

List<RuntimeFilterTarget> targets = new ArrayList<>();
boolean isLocalTarget = true;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The isLocalTarget computation iterates filters once here and then again in the target-building loop below. V1 (RuntimeFilterTranslator) does this in a single pass with stream().allMatch(...). This is a minor inefficiency but also a style inconsistency with the code being mirrored. Consider:

boolean isLocalTarget = filters.stream()
    .map(RuntimeFilterV2::getLegacyTargetNode)
    .allMatch(t -> !(t instanceof CTEScanNode)
        && t.getFragmentId().equals(sourceNode.getFragmentId()));

This is a direct port of the V1 logic and makes the parallel intent explicit.

Expr srcExpr = ExpressionTranslator.translate(head.getSourceExpression(), ctx);

List<RuntimeFilterTarget> targets = new ArrayList<>();
boolean isLocalTarget = true;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A brief inline comment explaining the all-or-nothing semantics would help future readers. Without it, the single shared flag looks like it might be a mistake. V1 has the same pattern without explanation, but this is the translation of a subtle BE constraint. Something like:

// Compute locality once for the whole group: if any target is in a different fragment
// or is a CTE scan, mark all targets as remote. BE requires has_remote_targets and
// has_local_targets to never both be true for the same RuntimeFilter.

…INTERSECT/EXCEPT

### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

RunTimeFilterTranslatorV2.translateRuntimeFilterGroup() used the 2-arg
RuntimeFilterTarget constructor which hard-codes isLocalTarget=false,
causing all V2-translated runtime filters (for INTERSECT/EXCEPT) to be
incorrectly marked as remote-only.

This leads to:
- Filters always routed through global merge path instead of local delivery
- Consumer registered in global_runtime_filter_mgr instead of local
- Consumer wait time shortened (may cause RF timeout and miss filtering)
- Suboptimal bloom filter sizing due to incorrect needShuffle computation

The fix computes isLocalTarget by comparing target and source fragment IDs,
consistent with the V1 translator (RuntimeFilterTranslator) approach. Uses
allMatch semantics (if any target is remote, all are marked remote) to
satisfy the BE constraint that has_remote_targets != has_local_targets.

### Release note

None

### Check List (For Author)

- Test: Unit Test (RunTimeFilterTranslatorV2Test)
- Behavior changed: No (bug fix, previously always remote, now correctly computed)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 55.56% (5/9) 🎉
Increment coverage report
Complete coverage report

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 11, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants