Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions datafusion/sqllogictest/test_files/aggregate.slt
Original file line number Diff line number Diff line change
Expand Up @@ -8951,3 +8951,33 @@ GROUP BY id ORDER BY id;

statement ok
DROP TABLE first_last_value_str_tests;

# Regression test for incorrect MIN/MAX folding from projected expression
# statistics. The PR branch `aggregate-stats-single-mode-and-cast` rewrites this
# query to unattainable literals using parquet min/max envelopes for UserID and
# ClientIP.
statement ok
SET datafusion.execution.target_partitions = 1;

statement ok
CREATE EXTERNAL TABLE hits_raw
STORED AS PARQUET
LOCATION '../core/tests/data/clickbench_hits_10.parquet';

query II
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a test for a a query that is optimized to used statistics in https://github.com/apache/datafusion/pull/21651/changes

On main, DataFusion computes delta from the actual rows and then takes the real min/max,

On #21651, the new logic propagates exact column min/max through the projection using interval arithmetic, then aggregate_statistics treats those derived bounds as exact aggregate answers and replaces the whole aggregate with literals.

For UserID - ClientIP, the interval formed from independent column extrema is wider than the set of values that actually occur in the data, because the min UserID is not paired with the max ClientIP in the same row, and similarly for the max side. The optimizer therefore folds to unattainable values:

  • wrong folded min: -2461439047704734435
  • wrong folded max: 7418527521343057109

SELECT MIN(delta), MAX(delta)
FROM (
SELECT "UserID" - CAST("ClientIP" AS BIGINT) AS delta
FROM hits_raw
);
----
-2461439044872611287 7418527518698834918

statement ok
SET datafusion.execution.target_partitions = 4;

statement ok
RESET datafusion.catalog.create_default_catalog_and_schema;

statement ok
DROP TABLE hits_raw;
Loading