Problem
Queries using the SQL standard FILTER (WHERE ...) clause on aggregate functions currently fall back to non-native Spark execution. For example:
SELECT SUM(i) FILTER (WHERE flag = true) FROM t
This prevents users from getting the performance benefits of native execution for a commonly used SQL pattern.
Expected Behavior
FILTER (WHERE ...) on aggregates should be natively executed without falling back to Spark.
Current Behavior
The native converter throws an assertion error and falls back to HashAggregateExec.
Use Cases
- Conditional aggregation in a single pass (e.g., sum of revenue where status = 'completed')
- Multiple aggregates with different predicates in the same SELECT
- Standard SQL compatibility
Example Query
SELECT
SUM(i) FILTER (WHERE flag = true) AS sum_true,
SUM(i) FILTER (WHERE flag = false) AS sum_false,
AVG(d) FILTER (WHERE j > 20) AS avg_d,
COUNT(*) FILTER (WHERE i IS NULL) AS cnt_null
FROM test_agg_filter
Problem
Queries using the SQL standard
FILTER (WHERE ...)clause on aggregate functions currently fall back to non-native Spark execution. For example:This prevents users from getting the performance benefits of native execution for a commonly used SQL pattern.
Expected Behavior
FILTER (WHERE ...)on aggregates should be natively executed without falling back to Spark.Current Behavior
The native converter throws an assertion error and falls back to
HashAggregateExec.Use Cases
Example Query