Skip to content

feat: extend single ndv optimization to non-arithmetic supporting types for equality predicates #21473

Open
buraksenn wants to merge 6 commits intoapache:mainfrom
buraksenn:extend-single-ndv-optimization
Open

feat: extend single ndv optimization to non-arithmetic supporting types for equality predicates #21473
buraksenn wants to merge 6 commits intoapache:mainfrom
buraksenn:extend-single-ndv-optimization

Conversation

@buraksenn
Copy link
Copy Markdown
Contributor

@buraksenn buraksenn commented Apr 8, 2026

Which issue does this PR close?

Rationale for this change

In the case of equality we can say ndv is 1 for non-numeric types as well. Please check issue for detailed explanation

What changes are included in this PR?

Adding a type-agnostic pattern-matching layer that inspects the predicate tree for col = literal shapes and sets NDV=Exact(1) regardless of column type

Are these changes tested?

Yes existing and additional unit tests.

Are there any user-facing changes?

No

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Apr 8, 2026
@asolimando
Copy link
Copy Markdown
Member

Thanks @buraksenn for tackling both #21109 and #21111 together with this type-agnostic approach, also the test consolidation over the separate per-type functions is a nice improvement.

A couple of minor notes/questions:

  • Since we can't rely on interval analysis for string-based columns, I am not sure we handle correctly "contradictory" filters like name = 'alice' AND name = 'bob', which should get Exact(0) as happening for numeric types. I don't see this as a blocker because it's not worse than before and this is an edge case, but it's good to know where we stand and file a follow-up issue, if needed.

  • On a similar note, I was wondering if something like col = NULL gets simplified as false (under the unknown as false semantics for predicates in WHERE clauses). In case it doesn't, it would be worth adding a test making sure we don't set NDV to 1.

  • Did you manage to understand why the interval analysis doesn't seem to work as expected for temporal types? It might be worth filing a separate issue for that if you have bandwidth, so we don't lose track of this issue.

@buraksenn
Copy link
Copy Markdown
Contributor Author

Thanks @asolimando for the review. I've checked this again. To make it generic I've missed that I can actually make temporal types work with interval arithmetic analysis since they are i64 backed as well.

Thus answering all:

  • I'm kinda afraid into getting this kind of complex behavior since I may not know what can be a side effect. Maybe I can add it to this PR and see the reviews
  • I've missed this totally will push to this branch.
  • For this, adding support for temporal types seem to be better as I've missed whiel trying to make it generic so I'll open a PR parallel to this one.

I'm working on both changes now and will handle this in the upcoming hours

@asolimando
Copy link
Copy Markdown
Member

Thanks @asolimando for the review. I've checked this again. To make it generic I've missed that I can actually make temporal types work with interval arithmetic analysis since they are i64 backed as well.

Sounds great!

Thus answering all:

  • I'm kinda afraid into getting this kind of complex behavior since I may not know what can be a side effect. Maybe I can add it to this PR and see the reviews

I think it's enough to add a test to see what happens, before we would have been returning the same NDV value for the input column anyway, I don't think we are doing worse than that, let's add a test and possibly file an issue to not lose the context we built.

  • I've missed this totally will push to this branch.
  • For this, adding support for temporal types seem to be better as I've missed whiel trying to make it generic so I'll open a PR parallel to this one.

No worries, making it type agnostic was a good call IMO, but what can be handled with interval arithmetic should do it as we can benefit from all the handling of edge cases like the "incoherent predicate" above and possibly others I couldn't think of.

I'm working on both changes now and will handle this in the upcoming hours

No worries, feel free to ping me if you need a review, happy to take a look, thanks for working on this and helping out on so many stats-related tasks :)

@buraksenn buraksenn changed the title feat: extend single ndv optimization to non-numeric types for equality predicates feat: extend single ndv optimization to non-arithmetic supporting types for equality predicates Apr 9, 2026
@buraksenn
Copy link
Copy Markdown
Contributor Author

Thanks @asolimando for the review. I've checked this again. To make it generic I've missed that I can actually make temporal types work with interval arithmetic analysis since they are i64 backed as well.

Sounds great!

Thus answering all:

  • I'm kinda afraid into getting this kind of complex behavior since I may not know what can be a side effect. Maybe I can add it to this PR and see the reviews

I think it's enough to add a test to see what happens, before we would have been returning the same NDV value for the input column anyway, I don't think we are doing worse than that, let's add a test and possibly file an issue to not lose the context we built.

  • I've missed this totally will push to this branch.
  • For this, adding support for temporal types seem to be better as I've missed whiel trying to make it generic so I'll open a PR parallel to this one.

No worries, making it type agnostic was a good call IMO, but what can be handled with interval arithmetic should do it as we can benefit from all the handling of edge cases like the "incoherent predicate" above and possibly others I couldn't think of.

I'm working on both changes now and will handle this in the upcoming hours

No worries, feel free to ping me if you need a review, happy to take a look, thanks for working on this and helping out on so many stats-related tasks :)

Thanks @asolimando I've opened #21520 temporal types I think it is very straightforward and contained PR and only left non-arithmetic supporting changes here.

Copy link
Copy Markdown
Member

@asolimando asolimando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for addressing all the comments!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Extend single-value NDV optimization to string types

2 participants