Skip to content

[fix](be) Preserve null probe rows in mark anti join#64108

Closed
mrhhsg wants to merge 1 commit into
apache:branch-3.1from
mrhhsg:pick-63767-branch-3.1
Closed

[fix](be) Preserve null probe rows in mark anti join#64108
mrhhsg wants to merge 1 commit into
apache:branch-3.1from
mrhhsg:pick-63767-branch-3.1

Conversation

@mrhhsg
Copy link
Copy Markdown
Member

@mrhhsg mrhhsg commented Jun 4, 2026

What problem does this PR solve?

Issue Number: None

Related PR: #63767

Problem Summary:

This is a branch-3.1 cherry-pick of #63767.

Correlated NOT IN subqueries under disjunction can be rewritten to a mark null-aware left anti join with additional join conjuncts. On branch-3.1, when the probe join key is NULL, the hash table lookup advanced the probe index before the caller could run the null-probe handling path. As a result, the probe row could be skipped before the mark column was evaluated by the outer disjunction, producing incomplete query results.

This change keeps the probe index on the NULL row so the null-aware join path can emit the correct mark value. The branch-3.1 implementation encodes the NULL probe key as build_idx_map[probe_idx] == bucket_size, so the cherry-pick was adapted to preserve that probe row instead of advancing probe_idx.

Release note

Fix incorrect results for correlated NOT IN subqueries combined with disjunctions.

Check List (For Author)

  • Test:

    • Regression test
    • Manual test (add detailed scripts or steps below)
      • Ran ./build-support/clang-format.sh be/src/vec/common/hash_table/join_hash_table.h (passed)
      • Ran ./build-support/check-format.sh (passed)
      • Ran git diff --check HEAD~1..HEAD (passed)
      • Attempted DORIS_HOME=$PWD ninja -C be/ut_build_ASAN src/exec/CMakeFiles/Exec.dir/operator/join/null_aware_left_anti_join_impl.cpp.o src/exec/CMakeFiles/Exec.dir/operator/hashjoin_probe_operator.cpp.o src/exec/CMakeFiles/Exec.dir/operator/hashjoin_build_sink.cpp.o, but local CMake regeneration failed because the current local thirdparty/CMake environment cannot resolve the existing target absl::random_internal_pool_urbg.
      • Attempted ./build.sh --be, but it was blocked by the same pre-existing local CMake/thirdparty target issue: Target "doris_be" links to absl::random_internal_pool_urbg but the target was not found.
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • Yes. Corrects query result semantics for affected null-aware mark anti joins.
    • No.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Issue Number: None

Related PR: None

Problem Summary: Correlated NOT IN subqueries under disjunction can be rewritten to a mark null-aware left anti join with additional join conjuncts. When the probe join key is NULL, the hash table lookup advanced the probe index before the caller could run the null-probe handling path. As a result, the probe row was skipped before the mark column was evaluated by the outer disjunction, producing incomplete query results. This change keeps the probe index on the NULL row so the null-aware join path can emit the correct mark value.

Fix incorrect results for correlated NOT IN subqueries combined with disjunctions.

- Test:
    - Regression test: `doris-local-regression.sh --network 10.26.20.3/24 run -d correctness -s test_subquery_in_disjunction -forceGenOut`
    - Regression test: `doris-local-regression.sh --network 10.26.20.3/24 run -d correctness -s test_subquery_in_disjunction`
    - Manual test: verified the NOT IN + OR reproducer before and after the fix on a local FE/BE cluster
    - Build: `./build.sh --be`
- Behavior changed: Yes. Corrects query result semantics for affected null-aware mark anti joins.
- Does this need documentation: No

(cherry picked from commit 81394a2)
@mrhhsg mrhhsg requested a review from morrySnow as a code owner June 4, 2026 06:44
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented Jun 4, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes due to a blocking null-aware mark join probe issue.

Critical checkpoints:

  • Correctness: Blocking issue found. The new null-probe branch returns with build_idx reset to 0 and current_offset still 0, so the caller neither invokes _process_probe_null_key nor advances probe_index. This can drop the nullable probe row or repeatedly process the same row instead of preserving it for mark-join evaluation.
  • Tests: The added regression cases target the right area, but they do not offset this control-flow issue in the implementation; please rerun the affected correctness suite after fixing.
  • Existing review context: No existing inline review threads or replies were present, so this is not a duplicate.
  • User focus: No additional user-provided review focus was specified.
  • Repository instructions: I could not find AGENTS.md or a repository code-review skill file in this checkout; I applied the supplied review prompt requirements and standard blocking-review criteria.

if (build_idx == bucket_size) {
probe_idx++;
build_idx = 0;
picking_null_keys = false;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loses the sentinel that the caller relies on for nullable probe keys. In do_process, the special path that preserves a null probe key is only entered when build_index == hash_table_ctx.hash_table->get_bucket_size(); by resetting build_idx to 0 here, current_offset remains 0 and probe_idx is returned unchanged. That means the caller emits an empty block and does not advance past this probe row, so a nullable probe row in a null-aware join with other conjuncts can be dropped or cause repeated empty output instead of being processed by _process_probe_null_key. Please keep enough state for the caller to recognize the null-probe-key case while also avoiding the incorrect decrement at the end of this function.

@mrhhsg mrhhsg closed this Jun 4, 2026
@mrhhsg mrhhsg deleted the pick-63767-branch-3.1 branch June 4, 2026 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants