Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce repetition in try_process_group_by_unnest and try_process_unnest #11498

Closed
alamb opened this issue Jul 16, 2024 · 2 comments · Fixed by #11714
Closed

Reduce repetition in try_process_group_by_unnest and try_process_unnest #11498

alamb opened this issue Jul 16, 2024 · 2 comments · Fixed by #11714
Assignees

Comments

@alamb
Copy link
Contributor

alamb commented Jul 16, 2024

This function seems to have some code repetition with function try_process_unnest.

I wonder if there's a better way to handle this, such as planning unnest before aggregation, and then reusing the current group-by planning logic. This seems more intuitive to me. But I'm not sure about it.

Originally posted by @jonahgao in #11469 (comment)

The idea is to remove the repetition, and possibly also this comment as well: https://github.com/apache/datafusion/pull/11469/files#r1678995060

If unnest has already been processed by try_process_aggregate_unnest, does the following logic for handling unnest become redundant?

@alamb alamb changed the title This function seems to have some code repetition with function [try_process_unnest](https://github.com/apache/datafusion/blob/f204869ff55bb3e39cf23fc0a34ebd5021e6773f/datafusion/sql/src/select.rs#L295). Reduce repetition in try_process_group_by_unnest and try_process_unnest Jul 16, 2024
@JasonLi-cn
Copy link
Contributor

take

@JasonLi-cn
Copy link
Contributor

JasonLi-cn commented Jul 30, 2024

After testing, it is found that the following logic is necessary even if the unnest has been processed by try_process_aggregate_unnest. Because try_process_aggregate_unnest deals with unnest in the Aggregate's input, and the following logic deals with unnest in select_exprs, which is downstream of the Aggregate. So we still need this piece of logic:

let mut unnest_columns = vec![];
// from which column used for projection, before the unnest happen
// including non unnest column and unnest column
let mut inner_projection_exprs = vec![];
// expr returned here maybe different from the originals in inner_projection_exprs
// for example:
// - unnest(struct_col) will be transformed into unnest(struct_col).field1, unnest(struct_col).field2
// - unnest(array_col) will be transformed into unnest(array_col).element
// - unnest(array_col) + 1 will be transformed into unnest(array_col).element +1
let outer_projection_exprs: Vec<Expr> = intermediate_select_exprs
.iter()
.map(|expr| {
transform_bottom_unnest(
&intermediate_plan,
&mut unnest_columns,
&mut inner_projection_exprs,
expr,
)
})
.collect::<Result<Vec<_>>>()?
.into_iter()
.flatten()
.collect();
// No more unnest is possible
if unnest_columns.is_empty() {
// The original expr does not contain any unnest
if i == 0 {
return LogicalPlanBuilder::from(intermediate_plan)
.project(inner_projection_exprs)?
.build();
}
break;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants