[fix](be) Preserve null probe rows in mark anti join#64108
Conversation
Issue Number: None
Related PR: None
Problem Summary: Correlated NOT IN subqueries under disjunction can be rewritten to a mark null-aware left anti join with additional join conjuncts. When the probe join key is NULL, the hash table lookup advanced the probe index before the caller could run the null-probe handling path. As a result, the probe row was skipped before the mark column was evaluated by the outer disjunction, producing incomplete query results. This change keeps the probe index on the NULL row so the null-aware join path can emit the correct mark value.
Fix incorrect results for correlated NOT IN subqueries combined with disjunctions.
- Test:
- Regression test: `doris-local-regression.sh --network 10.26.20.3/24 run -d correctness -s test_subquery_in_disjunction -forceGenOut`
- Regression test: `doris-local-regression.sh --network 10.26.20.3/24 run -d correctness -s test_subquery_in_disjunction`
- Manual test: verified the NOT IN + OR reproducer before and after the fix on a local FE/BE cluster
- Build: `./build.sh --be`
- Behavior changed: Yes. Corrects query result semantics for affected null-aware mark anti joins.
- Does this need documentation: No
(cherry picked from commit 81394a2)
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
There was a problem hiding this comment.
Requesting changes due to a blocking null-aware mark join probe issue.
Critical checkpoints:
- Correctness: Blocking issue found. The new null-probe branch returns with build_idx reset to 0 and current_offset still 0, so the caller neither invokes _process_probe_null_key nor advances probe_index. This can drop the nullable probe row or repeatedly process the same row instead of preserving it for mark-join evaluation.
- Tests: The added regression cases target the right area, but they do not offset this control-flow issue in the implementation; please rerun the affected correctness suite after fixing.
- Existing review context: No existing inline review threads or replies were present, so this is not a duplicate.
- User focus: No additional user-provided review focus was specified.
- Repository instructions: I could not find AGENTS.md or a repository code-review skill file in this checkout; I applied the supplied review prompt requirements and standard blocking-review criteria.
| if (build_idx == bucket_size) { | ||
| probe_idx++; | ||
| build_idx = 0; | ||
| picking_null_keys = false; |
There was a problem hiding this comment.
This loses the sentinel that the caller relies on for nullable probe keys. In do_process, the special path that preserves a null probe key is only entered when build_index == hash_table_ctx.hash_table->get_bucket_size(); by resetting build_idx to 0 here, current_offset remains 0 and probe_idx is returned unchanged. That means the caller emits an empty block and does not advance past this probe row, so a nullable probe row in a null-aware join with other conjuncts can be dropped or cause repeated empty output instead of being processed by _process_probe_null_key. Please keep enough state for the caller to recognize the null-probe-key case while also avoiding the incorrect decrement at the end of this function.
What problem does this PR solve?
Issue Number: None
Related PR: #63767
Problem Summary:
This is a branch-3.1 cherry-pick of #63767.
Correlated NOT IN subqueries under disjunction can be rewritten to a mark null-aware left anti join with additional join conjuncts. On branch-3.1, when the probe join key is NULL, the hash table lookup advanced the probe index before the caller could run the null-probe handling path. As a result, the probe row could be skipped before the mark column was evaluated by the outer disjunction, producing incomplete query results.
This change keeps the probe index on the NULL row so the null-aware join path can emit the correct mark value. The branch-3.1 implementation encodes the NULL probe key as
build_idx_map[probe_idx] == bucket_size, so the cherry-pick was adapted to preserve that probe row instead of advancingprobe_idx.Release note
Fix incorrect results for correlated NOT IN subqueries combined with disjunctions.
Check List (For Author)
Test:
correctness/test_subquery_in_disjunctioncases and expected output from [fix](be) Preserve null probe rows in mark anti join #63767../build-support/clang-format.sh be/src/vec/common/hash_table/join_hash_table.h(passed)./build-support/check-format.sh(passed)git diff --check HEAD~1..HEAD(passed)DORIS_HOME=$PWD ninja -C be/ut_build_ASAN src/exec/CMakeFiles/Exec.dir/operator/join/null_aware_left_anti_join_impl.cpp.o src/exec/CMakeFiles/Exec.dir/operator/hashjoin_probe_operator.cpp.o src/exec/CMakeFiles/Exec.dir/operator/hashjoin_build_sink.cpp.o, but local CMake regeneration failed because the current local thirdparty/CMake environment cannot resolve the existing targetabsl::random_internal_pool_urbg../build.sh --be, but it was blocked by the same pre-existing local CMake/thirdparty target issue:Target "doris_be" links to absl::random_internal_pool_urbg but the target was not found.Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)