Reduce repetition in `try_process_group_by_unnest` and `try_process_unnest` #11498

alamb · 2024-07-16T20:29:03Z

This function seems to have some code repetition with function try_process_unnest.

I wonder if there's a better way to handle this, such as planning unnest before aggregation, and then reusing the current group-by planning logic. This seems more intuitive to me. But I'm not sure about it.

Originally posted by @jonahgao in #11469 (comment)

The idea is to remove the repetition, and possibly also this comment as well: https://github.com/apache/datafusion/pull/11469/files#r1678995060

If unnest has already been processed by try_process_aggregate_unnest, does the following logic for handling unnest become redundant?

JasonLi-cn · 2024-07-23T01:39:18Z

take

JasonLi-cn · 2024-07-30T06:39:24Z

After testing, it is found that the following logic is necessary even if the unnest has been processed by try_process_aggregate_unnest. Because try_process_aggregate_unnest deals with unnest in the Aggregate's input, and the following logic deals with unnest in select_exprs, which is downstream of the Aggregate. So we still need this piece of logic:

datafusion/datafusion/sql/src/select.rs

Lines 311 to 344 in 2f5e73c

    
           let mut unnest_columns = vec![]; 
        
           // from which column used for projection, before the unnest happen 
        
           // including non unnest column and unnest column 
        
           let mut inner_projection_exprs = vec![]; 
        
           // expr returned here maybe different from the originals in inner_projection_exprs 
        
           // for example: 
        
           // - unnest(struct_col) will be transformed into unnest(struct_col).field1, unnest(struct_col).field2 
        
           // - unnest(array_col) will be transformed into unnest(array_col).element 
        
           // - unnest(array_col) + 1 will be transformed into unnest(array_col).element +1 
        
           let outer_projection_exprs: Vec<Expr> = intermediate_select_exprs 
        
               .iter() 
        
               .map(|expr| { 
        
                   transform_bottom_unnest( 
        
                       &intermediate_plan, 
        
                       &mut unnest_columns, 
        
                       &mut inner_projection_exprs, 
        
                       expr, 
        
                   ) 
        
               }) 
        
               .collect::<Result<Vec<_>>>()? 
        
               .into_iter() 
        
               .flatten() 
        
               .collect(); 
        
           // No more unnest is possible 
        
           if unnest_columns.is_empty() { 
        
               // The original expr does not contain any unnest 
        
               if i == 0 { 
        
                   return LogicalPlanBuilder::from(intermediate_plan) 
        
                       .project(inner_projection_exprs)? 
        
                       .build(); 
        
               } 
        
               break;

alamb mentioned this issue Jul 16, 2024

feat: support unnest in GROUP BY clause #11469

Merged

github-actions bot assigned JasonLi-cn Jul 23, 2024

JasonLi-cn mentioned this issue Jul 30, 2024

Reduce repetition in try_process_group_by_unnest and try_process_unnest #11714

Merged

alamb closed this as completed in #11714 Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce repetition in `try_process_group_by_unnest` and `try_process_unnest` #11498

Reduce repetition in `try_process_group_by_unnest` and `try_process_unnest` #11498

alamb commented Jul 16, 2024 •

edited

Loading

JasonLi-cn commented Jul 23, 2024

JasonLi-cn commented Jul 30, 2024 •

edited

Loading

Reduce repetition in try_process_group_by_unnest and try_process_unnest #11498

Reduce repetition in try_process_group_by_unnest and try_process_unnest #11498

Comments

alamb commented Jul 16, 2024 • edited Loading

JasonLi-cn commented Jul 23, 2024

JasonLi-cn commented Jul 30, 2024 • edited Loading

Reduce repetition in `try_process_group_by_unnest` and `try_process_unnest` #11498

Reduce repetition in `try_process_group_by_unnest` and `try_process_unnest` #11498

alamb commented Jul 16, 2024 •

edited

Loading

JasonLi-cn commented Jul 30, 2024 •

edited

Loading