[fix](be) Preserve null probe rows in mark anti join#63767
Conversation
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Correlated NOT IN subqueries under disjunction can be rewritten to a mark null-aware left anti join with additional join conjuncts. When the probe join key is NULL, the hash table lookup advanced the probe index before the caller could run the null-probe handling path. As a result, the probe row was skipped before the mark column was evaluated by the outer disjunction, producing incomplete query results. This change keeps the probe index on the NULL row so the null-aware join path can emit the correct mark value.
### Release note
Fix incorrect results for correlated NOT IN subqueries combined with disjunctions.
### Check List (For Author)
- Test:
- Regression test: `doris-local-regression.sh --network 10.26.20.3/24 run -d correctness -s test_subquery_in_disjunction -forceGenOut`
- Regression test: `doris-local-regression.sh --network 10.26.20.3/24 run -d correctness -s test_subquery_in_disjunction`
- Manual test: verified the NOT IN + OR reproducer before and after the fix on a local FE/BE cluster
- Build: `./build.sh --be`
- Behavior changed: Yes. Corrects query result semantics for affected null-aware mark anti joins.
- Does this need documentation: No
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
There was a problem hiding this comment.
I found one blocking correctness issue in the null-aware mark anti join fix. The change preserves NULL probe rows when there are no build-side NULL keys, but it regresses the same path when the build hash table keeps NULL keys, which this join mode explicitly does for null-aware joins with other conjuncts.
Critical checkpoint conclusions:
- Goal/test: The PR targets incorrect results for nullable probe keys in mark null-aware left anti joins under disjunction. The added regression covers a nullable probe key, but only with
i.a IS NOT NULL, so it does not cover the build-side NULL path that remains active in this join mode. - Scope: The code change is small and focused, but incomplete because it does not clear the resumed build index when stopping on a NULL probe row.
- Concurrency/lifecycle/config/compatibility/persistence: Not applicable; this is local hash join probe state processing with no new config, persistence, or protocol/storage format change.
- Parallel paths: The reviewed path is specifically
NULL_AWARE_LEFT_ANTI_JOIN/NULL_AWARE_LEFT_SEMI_JOINwith other conjuncts and mark processing; non-other-conjunct paths are separate. - Special conditional checks: The changed NULL-probe check needs to maintain the downstream invariant that
probe_idxstill points to the NULL row andbuild_idxdoes not trigger continuation rewind. - Test coverage/results: Existing added results are deterministic, but missing a case where the subquery/build side contains a NULL mark key, which is exactly where
keep_null_key = truemakesbuild_idx_map[probe_idx]nonzero for NULL probes. - Observability/performance: No additional observability needed; no material performance concern found.
- Data correctness: Blocking issue found; affected queries can skip or reprocess rows when build-side NULL keys exist.
User focus: No additional user-provided review focus was specified.
| if constexpr (has_null_map) { | ||
| if (null_map[probe_idx]) { | ||
| probe_idx++; | ||
| break; |
There was a problem hiding this comment.
This still does not preserve the NULL probe row correctly when the build side contains NULL keys. For null-aware joins with other conjuncts, HashJoinBuild::run() sets keep_null_key = true, so build_idx_map[probe_idx] can be first[bucket_size] for a NULL probe. After this change the loop breaks with that nonzero build_idx, then line 492 applies probe_idx -= (build_idx != 0), returning the previous probe row instead of the NULL row. The caller therefore misses _process_probe_null_key(probe_idx) for this row, or resumes probing the previous row against the NULL bucket. A repro is the new nullable test shape without filtering NULLs out of the subquery build side, e.g. inserting an inner (1, NULL) and removing i.a IS NOT NULL. Please clear build_idx before breaking on a NULL probe, or otherwise skip the continuation rewind for this case, and add coverage with build-side NULL keys.
What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Correlated
NOT INsubqueries under disjunction can be rewritten to a mark null-aware left anti join with additional join conjuncts. When the probe join key isNULL, the hash table lookup advanced the probe index before the caller could run the null-probe handling path. As a result, the probe row was skipped before the mark column was evaluated by the outer disjunction, producing incomplete query results. This change keeps the probe index on theNULLrow so the null-aware join path can emit the correct mark value.Release note
Fix incorrect results for correlated
NOT INsubqueries combined with disjunctions.Check List (For Author)
doris-local-regression.sh --network 10.26.20.3/24 run -d correctness -s test_subquery_in_disjunction -forceGenOutdoris-local-regression.sh --network 10.26.20.3/24 run -d correctness -s test_subquery_in_disjunctionNOT IN+ORreproducer before and after the fix on a local FE/BE cluster./build.sh --be