test: Add test for computing min/max values for expresions#21681
test: Add test for computing min/max values for expresions#21681alamb wants to merge 1 commit intoapache:mainfrom
Conversation
| STORED AS PARQUET | ||
| LOCATION '../core/tests/data/clickbench_hits_10.parquet'; | ||
|
|
||
| query II |
There was a problem hiding this comment.
This is a test for a a query that is optimized to used statistics in https://github.com/apache/datafusion/pull/21651/changes
On main, DataFusion computes delta from the actual rows and then takes the real min/max,
On #21651, the new logic propagates exact column min/max through the projection using interval arithmetic, then aggregate_statistics treats those derived bounds as exact aggregate answers and replaces the whole aggregate with literals.
For UserID - ClientIP, the interval formed from independent column extrema is wider than the set of values that actually occur in the data, because the min UserID is not paired with the max ClientIP in the same row, and similarly for the max side. The optimizer therefore folds to unattainable values:
- wrong folded min: -2461439047704734435
- wrong folded max: 7418527521343057109
Which issue does this PR close?
Rationale for this change
This test shows a bug in #21651
The test passes on main but fails on that PR like this
What changes are included in this PR?
Add a new test
Are these changes tested?
They are all tests
Are there any user-facing changes?
No