Skip to content

[fix](be) Preserve null probe rows in mark anti join#63767

Open
mrhhsg wants to merge 1 commit into
apache:masterfrom
mrhhsg:fix/subquery-not-in-or
Open

[fix](be) Preserve null probe rows in mark anti join#63767
mrhhsg wants to merge 1 commit into
apache:masterfrom
mrhhsg:fix/subquery-not-in-or

Conversation

@mrhhsg
Copy link
Copy Markdown
Member

@mrhhsg mrhhsg commented May 27, 2026

What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Correlated NOT IN subqueries under disjunction can be rewritten to a mark null-aware left anti join with additional join conjuncts. When the probe join key is NULL, the hash table lookup advanced the probe index before the caller could run the null-probe handling path. As a result, the probe row was skipped before the mark column was evaluated by the outer disjunction, producing incomplete query results. This change keeps the probe index on the NULL row so the null-aware join path can emit the correct mark value.

Release note

Fix incorrect results for correlated NOT IN subqueries combined with disjunctions.

Check List (For Author)

  • Test:
    • Regression test: doris-local-regression.sh --network 10.26.20.3/24 run -d correctness -s test_subquery_in_disjunction -forceGenOut
    • Regression test: doris-local-regression.sh --network 10.26.20.3/24 run -d correctness -s test_subquery_in_disjunction
    • Manual test: verified the NOT IN + OR reproducer before and after the fix on a local FE/BE cluster
    • Build: ./build.sh --be
  • Behavior changed: Yes. Corrects query result semantics for affected null-aware mark anti joins.
  • Does this need documentation: No

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Correlated NOT IN subqueries under disjunction can be rewritten to a mark null-aware left anti join with additional join conjuncts. When the probe join key is NULL, the hash table lookup advanced the probe index before the caller could run the null-probe handling path. As a result, the probe row was skipped before the mark column was evaluated by the outer disjunction, producing incomplete query results. This change keeps the probe index on the NULL row so the null-aware join path can emit the correct mark value.

### Release note

Fix incorrect results for correlated NOT IN subqueries combined with disjunctions.

### Check List (For Author)

- Test:
    - Regression test: `doris-local-regression.sh --network 10.26.20.3/24 run -d correctness -s test_subquery_in_disjunction -forceGenOut`
    - Regression test: `doris-local-regression.sh --network 10.26.20.3/24 run -d correctness -s test_subquery_in_disjunction`
    - Manual test: verified the NOT IN + OR reproducer before and after the fix on a local FE/BE cluster
    - Build: `./build.sh --be`
- Behavior changed: Yes. Corrects query result semantics for affected null-aware mark anti joins.
- Does this need documentation: No
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 27, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one blocking correctness issue in the null-aware mark anti join fix. The change preserves NULL probe rows when there are no build-side NULL keys, but it regresses the same path when the build hash table keeps NULL keys, which this join mode explicitly does for null-aware joins with other conjuncts.

Critical checkpoint conclusions:

  • Goal/test: The PR targets incorrect results for nullable probe keys in mark null-aware left anti joins under disjunction. The added regression covers a nullable probe key, but only with i.a IS NOT NULL, so it does not cover the build-side NULL path that remains active in this join mode.
  • Scope: The code change is small and focused, but incomplete because it does not clear the resumed build index when stopping on a NULL probe row.
  • Concurrency/lifecycle/config/compatibility/persistence: Not applicable; this is local hash join probe state processing with no new config, persistence, or protocol/storage format change.
  • Parallel paths: The reviewed path is specifically NULL_AWARE_LEFT_ANTI_JOIN/NULL_AWARE_LEFT_SEMI_JOIN with other conjuncts and mark processing; non-other-conjunct paths are separate.
  • Special conditional checks: The changed NULL-probe check needs to maintain the downstream invariant that probe_idx still points to the NULL row and build_idx does not trigger continuation rewind.
  • Test coverage/results: Existing added results are deterministic, but missing a case where the subquery/build side contains a NULL mark key, which is exactly where keep_null_key = true makes build_idx_map[probe_idx] nonzero for NULL probes.
  • Observability/performance: No additional observability needed; no material performance concern found.
  • Data correctness: Blocking issue found; affected queries can skip or reprocess rows when build-side NULL keys exist.

User focus: No additional user-provided review focus was specified.

if constexpr (has_null_map) {
if (null_map[probe_idx]) {
probe_idx++;
break;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still does not preserve the NULL probe row correctly when the build side contains NULL keys. For null-aware joins with other conjuncts, HashJoinBuild::run() sets keep_null_key = true, so build_idx_map[probe_idx] can be first[bucket_size] for a NULL probe. After this change the loop breaks with that nonzero build_idx, then line 492 applies probe_idx -= (build_idx != 0), returning the previous probe row instead of the NULL row. The caller therefore misses _process_probe_null_key(probe_idx) for this row, or resumes probing the previous row against the NULL bucket. A repro is the new nullable test shape without filtering NULLs out of the subquery build side, e.g. inserting an inner (1, NULL) and removing i.a IS NOT NULL. Please clear build_idx before breaking on a NULL probe, or otherwise skip the continuation rewind for this case, and add coverage with build-side NULL keys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants