HIVE-29479: Improve histogram-based selectivity estimation for two-sided range predicates by rubenada · Pull Request #6477 · apache/hive

rubenada · 2026-05-12T09:36:16Z

What changes were proposed in this pull request?

This PR adapts FilterSelectivityEstimator so that histogram statistics are used for all types of range predicates (so far it was only done for single-sided range predicates and bounded ranges).

Why are the changes needed?

This PR allows the CBO planner to use histogram statistics for all types of range predicates.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit tests added.

…ded range predicates

soumyakanti3578

Overall looks good to me, but maybe we should add tests for IS [NOT] NULL with SEARCH, and also a couple of tests for SEARCH containing both ranges and points?

soumyakanti3578 · 2026-05-14T23:27:25Z

Please go through the Sonar issues as well. I think some of them are good suggestions. :)

thomasrebele

Thank you for the PR! The overall approach looks good. I've made a few suggestions and requests.

thomasrebele · 2026-05-15T16:05:05Z

+            rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, ref).accept(FilterSelectivityEstimator.this));
+      }
+
+      RangeSets.forEach(sarg.rangeSet, new RangeSets.Consumer<C>() {


Maybe the code could be simplified with for (Range<C> range : sarg.rangeSet.asRanges()) { ... }? It might be possible to treat a range as-is, without differentiating all the different kinds of ranges.

Good point! I have applied the suggested refactoring.

While I agree that the refactored code is much smaller/simplified now, I feel the previous version was more organized and readable as it's now a huge method with multiple if else blocks.

Moreover, I see that implementing RangeSets.Consumer<C> is the preferred method both in Hive (org.apache.hadoop.hive.ql.optimizer.calcite.RangeConverter) and in Calcite (several places). If the new code is not significantly more performant than the earlier version, then maybe we should keep things familiar?

Another small benefit of implementing RangeSets.Consumer<C> is it will be easily searchable from IDE by looking for all subclasses.

BTW, I am willing to approve this as-is, but just wanted to hear both of your thoughts on this.

I have no strong opinion here.
I agree that the original version with the RangeSets.Consumer<C> seemed a bit better in terms of code homogeneity and maintainability (easier to find if we ever need to apply adjustments to all rangeSet processing, like the separate, incoming HIVE-28911).
So I'm willing to move back to the Consumer approach, wdyt @thomasrebele ?

There are a few places in Calcite that iterate over sarg.rangeSet.asRanges() without the Conumer:

RexUtil#sargRef

DruidDateTimeUtils#leafToRanges

The places where a RangeSets.Consumer<C> is used in Calcite, there is an easy mapping from the different range types to a distinct action. Hive always uses the sarg.rangeSet with a RangeSets.Consumer. However, I could only find one usage, and it was introduced by @soumyakanti3578, so I'm not sure whether the opinion is unbiased :) The usages (including those from the Consumer) can be found in the IDE by looking at the usages of org.apache.calcite.util.Sarg#rangeSet.

I had a try simplifying the code a bit, see thomasrebele@29cb98b. It's a bit less efficient than Ruben's proposal. It might be a bit more readable.

sonarqubecloud · 2026-05-18T17:28:36Z

Quality Gate passed

Issues
3 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

rubenada · 2026-05-18T20:11:23Z

@soumyakanti3578 , @thomasrebele thanks for the review. I think I have addressed all the remarks, feel free to take another look.

rubenada added 2 commits May 11, 2026 19:11

HIVE-29479: Improve histogram-based selectivity estimation for two-si…

75980cd

…ded range predicates

Tests

5164caf

asf-ci-hive added the tests pending label May 12, 2026

Tests

0d9a543

asf-ci-hive added tests unstable and removed tests pending labels May 12, 2026

Tests

294bf14

rubenada marked this pull request as ready for review May 12, 2026 13:44

rubenada changed the title ~~[WIP] HIVE-29479: Improve histogram-based selectivity estimation for two-sided range predicates~~ HIVE-29479: Improve histogram-based selectivity estimation for two-sided range predicates May 12, 2026

asf-ci-hive added tests pending tests unstable and removed tests unstable tests pending labels May 12, 2026

Adjust test files (cbo plan changes due to new stats feature)

fad09cf

asf-ci-hive added tests pending tests passed and removed tests unstable tests pending labels May 14, 2026

soumyakanti3578 reviewed May 14, 2026

View reviewed changes

Comment thread ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java Outdated

Review, refactor, more tests

47366b3

asf-ci-hive added tests pending tests passed and removed tests passed tests pending labels May 15, 2026

thomasrebele suggested changes May 15, 2026

View reviewed changes

Refactoring

470bf75

asf-ci-hive added tests pending and removed tests passed tests pending labels May 15, 2026

asf-ci-hive added the tests unstable label May 15, 2026

minor

06f48c6

asf-ci-hive added tests pending tests unstable and removed tests unstable tests pending labels May 18, 2026

Adjust test

d7dee73

asf-ci-hive added tests pending tests unstable and removed tests unstable tests pending labels May 18, 2026

asf-ci-hive added tests passed and removed tests pending labels May 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HIVE-29479: Improve histogram-based selectivity estimation for two-sided range predicates#6477

HIVE-29479: Improve histogram-based selectivity estimation for two-sided range predicates#6477
rubenada wants to merge 9 commits into
apache:masterfrom
rubenada:HIVE-29479

rubenada commented May 12, 2026

Uh oh!

soumyakanti3578 left a comment

Uh oh!

Uh oh!

soumyakanti3578 commented May 14, 2026

Uh oh!

thomasrebele left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thomasrebele May 15, 2026

Uh oh!

rubenada May 15, 2026

Uh oh!

soumyakanti3578 May 18, 2026

Uh oh!

rubenada May 19, 2026

Uh oh!

thomasrebele May 20, 2026

Uh oh!

sonarqubecloud Bot commented May 18, 2026

Uh oh!

rubenada commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

rubenada commented May 12, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

soumyakanti3578 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

soumyakanti3578 commented May 14, 2026

Uh oh!

thomasrebele left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thomasrebele May 15, 2026

Choose a reason for hiding this comment

Uh oh!

rubenada May 15, 2026

Choose a reason for hiding this comment

Uh oh!

soumyakanti3578 May 18, 2026

Choose a reason for hiding this comment

Uh oh!

rubenada May 19, 2026

Choose a reason for hiding this comment

Uh oh!

thomasrebele May 20, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented May 18, 2026

Quality Gate passed

Uh oh!

rubenada commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants