feat: extend single ndv optimization to non-arithmetic supporting types for equality predicates #21473
feat: extend single ndv optimization to non-arithmetic supporting types for equality predicates #21473buraksenn wants to merge 6 commits intoapache:mainfrom
Conversation
|
Thanks @buraksenn for tackling both #21109 and #21111 together with this type-agnostic approach, also the test consolidation over the separate per-type functions is a nice improvement. A couple of minor notes/questions:
|
|
Thanks @asolimando for the review. I've checked this again. To make it generic I've missed that I can actually make temporal types work with interval arithmetic analysis since they are Thus answering all:
I'm working on both changes now and will handle this in the upcoming hours |
Sounds great!
I think it's enough to add a test to see what happens, before we would have been returning the same NDV value for the input column anyway, I don't think we are doing worse than that, let's add a test and possibly file an issue to not lose the context we built.
No worries, making it type agnostic was a good call IMO, but what can be handled with interval arithmetic should do it as we can benefit from all the handling of edge cases like the "incoherent predicate" above and possibly others I couldn't think of.
No worries, feel free to ping me if you need a review, happy to take a look, thanks for working on this and helping out on so many stats-related tasks :) |
Thanks @asolimando I've opened #21520 temporal types I think it is very straightforward and contained PR and only left non-arithmetic supporting changes here. |
asolimando
left a comment
There was a problem hiding this comment.
LGTM, thanks for addressing all the comments!
Which issue does this PR close?
Rationale for this change
In the case of equality we can say ndv is 1 for non-numeric types as well. Please check issue for detailed explanation
What changes are included in this PR?
Adding a type-agnostic pattern-matching layer that inspects the predicate tree for col = literal shapes and sets NDV=Exact(1) regardless of column type
Are these changes tested?
Yes existing and additional unit tests.
Are there any user-facing changes?
No