DataFrame API: allow aggregate functions in select() (#17874) #21021

martin-g · 2026-03-23T13:08:43Z

Could you please add some test case(s) with window functions too ?

martin-g · 2026-03-23T13:30:37Z

expressions is filtered out above to contain on Expression items.
What if the original was SELECT *, count(col("a")) ... ?
The wildcard would have been dropped above and here and here has_non_aggregate_expr would be false.

Please add an aggregate function to the test at https://github.com/apache/datafusion/pull/21021/changes#diff-4a599584dfc900ec21169f4f820a1b1db46b004b77533dab83a6178d5d3a467eR6909

martin-g · 2026-03-23T13:17:10Z

Suggested change

let rewritten = rewrite_expr(expr.clone(), &aggr_map)?;

let alias = expr.name_for_alias()?;

rewritten_exprs.push(SelectExpr::Expression(rewritten.alias(alias)));

let alias = expr.name_for_alias()?;

let rewritten = rewrite_expr(expr, &aggr_map)?;

let final_expr = match &rewritten {

Expr::Alias(_) => rewritten,

_ => rewritten.alias(alias),

};

rewritten_exprs.push(SelectExpr::Expression(final_expr));

Only add alias if the rewritten expression doesn't already have one

martin-g · 2026-03-23T13:21:43Z

Suggested change

count(col("c9")).alias("count_c9_2"),

count(col("c9")),

count(col("c9")),

let's remove the "manual" aliases here and assert that the logic at https://github.com/apache/datafusion/pull/21021/changes#diff-997707d7dfcac94032b84a25bc0010c62209bf767e3abc6580a55a0a97c19de2R498 generates unique aliases.

martin-g · 2026-03-23T13:18:59Z

Suggested change

assert_eq!(batches[0].num_rows(), 100);

assert_eq!(batches[0].num_columns(), 14);

assert!(!batches.is_empty());

assert_eq!(batches.iter().map(|b| b.num_rows()).sum::<usize>(), 100);

assert!(batches.iter().all(|b| b.num_columns() == 14));

martin-g · 2026-03-23T13:19:19Z

Suggested change

assert_eq!(batches[0].num_rows(), 100);

assert_eq!(batches[0].num_columns(), 14);

assert!(!batches.is_empty());

assert_eq!(batches.iter().map(|b| b.num_rows()).sum::<usize>(), 100);

assert!(batches.iter().all(|b| b.num_columns() == 14));

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame API: allow aggregate functions in select() (#17874) #21021

Diff view

Diff view

There are no files selected for viewing

martin-g Mar 23, 2026

Uh oh!

Uh oh!

martin-g Mar 23, 2026

Uh oh!

Uh oh!

martin-g Mar 23, 2026

Uh oh!

martin-g Mar 23, 2026

Uh oh!

martin-g Mar 23, 2026

Uh oh!

martin-g Mar 23, 2026

Uh oh!

Uh oh!

Uh oh!

DataFrame API: allow aggregate functions in select() (#17874) #21021

Are you sure you want to change the base?

DataFrame API: allow aggregate functions in select() (#17874) #21021

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

martin-g Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

martin-g Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

martin-g Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

martin-g Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

martin-g Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

martin-g Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!