Skip to content

[opt](nereids) Optimize I/O operations for the IS NULL predicate#62304

Open
englefly wants to merge 3 commits intoapache:masterfrom
englefly:is-null-opt-v2
Open

[opt](nereids) Optimize I/O operations for the IS NULL predicate#62304
englefly wants to merge 3 commits intoapache:masterfrom
englefly:is-null-opt-v2

Conversation

@englefly
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Treat nullable fields as a combination of a nullable flag and data. When evaluating the col IS NULL predicate, use the NestedColumnPruning rule to prune the col field to col.NULL, thereby saving I/O on the data.

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:
When a query uses `col IS NULL` or `col IS NOT NULL` as the only access to a
nullable column, the BE previously read the full column data. Nullable columns
consist of a null-flag column (null map) and a data column. When the only usage
is null-checking, we should prune the data column and read only the null flag.

This commit extends the existing NestedColumnPruning framework (which already
handles `length(str_col)` via OFFSET paths) to detect IS NULL patterns and emit
`[col_name, NULL]` access paths. The BE can then skip data reading for those
columns.

### Changes

**AccessPathInfo.java** — Add `ACCESS_NULL = "NULL"` constant

**AccessPathExpressionCollector.java** — Implement `visitIsNull()` and
`visitNot()` to detect IS NULL/IS NOT NULL on direct SlotReferences (without
subPath), creating NULL-suffix access contexts. Add fallback nullable slot
handling with guards for NestedColumnPrunable and string-like types.

**NestedColumnPruning.java** — Add `containsNullCheck()` early-exit guard,
`isNullCheckOnly` field and `hasNullCheckOnlyAccess()` method to
DataTypeAccessTree, NULL path handling in `setAccessByPath()`, null-only branch
in `pruneDataType()`, and NULL-path stripping from allAccessPaths when data
access also exists (predicateAccessPaths retains NULL paths).

**PruneNestedColumnTest.java** — Add 3 unit tests: struct IS NULL pruning,
IS NOT NULL pruning, mixed IS NULL + field access. Update existing testFilter
expectations for null-optimized paths.

**null_column_pruning.groovy** — New regression test verifying EXPLAIN plans
show NULL access paths for struct/array/map IS NULL, IS NOT NULL, aggregates,
mixed access, and full-struct projection scenarios.

### Release note

Support IS NULL / IS NOT NULL null-flag-only column reading optimization via
NestedColumnPruning. When a nullable column is only used in null checks, only
the null flag is read, skipping full data column reading.

### Check List (For Author)

- Test: Unit Test (PruneNestedColumnTest 38/38 pass) + Regression test (plan-only)
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 10, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

…ss in column pruning

### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:
When using `struct_element(struct_col, 'city') IS NULL`, the nested column
pruning optimization failed to emit the NULL-suffixed access path (e.g.
`[struct_col.city.NULL]`). Instead it produced `[struct_col.city]`, meaning BE
would read the full column data instead of just the null flag.

Three interconnected bugs were identified and fixed:

1. **visitIsNull only handled direct SlotReference**: The IS NULL visitor only
   recognized `col IS NULL` (direct slot), not nested expressions like
   `struct_element(s, 'city') IS NULL`. Fixed by broadening to accept any
   nullable expression and propagating the NULL context through recognized access
   visitors (struct_element, element_at, etc.).

2. **setAccessByPath set accessPartialChild before NULL check**: The NULL path
   marker is a flag, not a real child. Setting `accessPartialChild = true` before
   checking for NULL caused `isNullCheckOnly` detection to fail. Fixed by moving
   the NULL check before `accessPartialChild` assignment.

3. **pruneDataType returned Optional.empty() for null-check-only nodes**: Parent
   nodes interpreted this as "child not accessed" and dropped it from the pruned
   type. Fixed by returning `Optional.of(type)` so null-check-only children are
   preserved in the pruned struct type.

4. **Variant sub-column NULL stripping**: Variant types do not support null-flag-only
   optimization for sub-column access. Added stripping of NULL suffix in the variant
   slot reference handler to maintain existing behavior.

### Release note

None

### Check List (For Author)

- Test: Unit Test (PruneNestedColumnTest) / Manual test (explain verbose)
- Behavior changed: No (optimization was already intended but not working for nested access)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@englefly
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 88.89% (72/81) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants