Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make alias_symbol more human-readable #10280

Closed
JasonLi-cn opened this issue Apr 29, 2024 · 3 comments · Fixed by #10333 or #10939
Closed

Make alias_symbol more human-readable #10280

JasonLi-cn opened this issue Apr 29, 2024 · 3 comments · Fixed by #10333 or #10939
Labels
enhancement New feature or request

Comments

@JasonLi-cn
Copy link
Contributor

JasonLi-cn commented Apr 29, 2024

Is your feature request related to a problem or challenge?

DataFusion CLI v37.1.0
> select * from number;
+----+----+----+
| c0 | c1 | c2 |
+----+----+----+
| 1  | 2  | 3  |
+----+----+----+
1 row(s) fetched.
Elapsed 0.005 seconds.

> explain select c0 + 1, count(c0 + 1) from number group by c0 + 1;
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                                                                                                                              |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | Aggregate: groupBy=[[CAST(number.c0 AS Int64) + Int64(1)Int64(1)CAST(number.c0 AS Int64)number.c0 AS number.c0 + Int64(1)]], aggr=[[COUNT(CAST(number.c0 AS Int64) + Int64(1)Int64(1)CAST(number.c0 AS Int64)number.c0 AS number.c0 + Int64(1))]] |
|               |   Projection: CAST(number.c0 AS Int64) + Int64(1) AS CAST(number.c0 AS Int64) + Int64(1)Int64(1)CAST(number.c0 AS Int64)number.c0                                                                                                                 |
|               |     TableScan: number projection=[c0]                                                                                                                                                                                                             |
| physical_plan | AggregateExec: mode=FinalPartitioned, gby=[number.c0 + Int64(1)@0 as number.c0 + Int64(1)], aggr=[COUNT(number.c0 + Int64(1))]                                                                                                                    |
|               |   CoalesceBatchesExec: target_batch_size=8192                                                                                                                                                                                                     |
|               |     RepartitionExec: partitioning=Hash([number.c0 + Int64(1)@0], 8), input_partitions=1                                                                                                                                                           |
|               |       AggregateExec: mode=Partial, gby=[CAST(number.c0 AS Int64) + Int64(1)Int64(1)CAST(number.c0 AS Int64)number.c0@0 as number.c0 + Int64(1)], aggr=[COUNT(number.c0 + Int64(1))]                                                               |
|               |         ProjectionExec: expr=[CAST(c0@0 AS Int64) + 1 as CAST(number.c0 AS Int64) + Int64(1)Int64(1)CAST(number.c0 AS Int64)number.c0]                                                                                                            |
|               |           MemoryExec: partitions=1, partition_sizes=[1]                                                                                                                                                                                           |
|               |                                                                                                                                                                                                                                                   |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 row(s) fetched.
Elapsed 0.015 seconds.
ProjectionExec: expr=[CAST(c0@0 AS Int64) + 1 as CAST(number.c0 AS Int64) + Int64(1)Int64(1)CAST(number.c0 AS Int64)number.c0]                                                                                                            |

The expr of ProjectionExec is confusing.

The related code:

fn f_up(&mut self, expr: &Expr) -> Result<TreeNodeRecursion> {
let (_idx, sub_expr_identifier) = self.pop_enter_mark();
// skip exprs should not be recognize.
if self.expr_mask.ignores(expr) {
let curr_expr_identifier = ExprSet::expr_identifier(expr);
self.visit_stack
.push(VisitRecord::ExprItem(curr_expr_identifier));
return Ok(TreeNodeRecursion::Continue);
}
let curr_expr_identifier = ExprSet::expr_identifier(expr);
let alias_symbol = format!("{curr_expr_identifier}{sub_expr_identifier}");

Maybe we should make it more human-readable.

Describe the solution you'd like

ProjectionExec: expr=[CAST(c0@0 AS Int64) + 1 as CAST(number.c0 AS Int64) + Int64(1)]                                                                                                            

or

ProjectionExec: expr=[CAST(c0@0 AS Int64) + 1 as CAST(number.c0 AS Int64) + Int64(1)<$SUB_IDEN...>]                                                                                              

Describe alternatives you've considered

No response

Additional context

No response

@MohamedAbdeen21
Copy link
Contributor

MohamedAbdeen21 commented May 3, 2024

Hey, this optimizer rule is used by so many test cases, so I wanted to collect opinions before proceeding with implementing the change.

My suggestion

Using let alias_symbol = format!("${{{curr_expr_id}}}"); to mark common expressions. So the query above would be:

 AggregateExec: mode=Partial, gby=[${CAST(number.c0 AS Int64) + Int64(1)}@0 as number.c0 + Int64(1)], aggr=[COUNT(number.c0 + Int64(1))] 
    ProjectionExec: expr=[CAST(c0@0 AS Int64) + 1 as ${CAST(number.c0 AS Int64) + Int64(1)}] 

Or to make plans shorter, $0, $1, etc...

Some naming alternatives I've considered

  • Using @0, @1, etc ... However, the @ symbol is already used in the physical plan (check the query above), which means that you may have columns like @0@0 which makes no sense.

  • let alias_symbol = format!("{curr_expr_id}").replace(" ", "_");. Sound like a good idea until you see it.

AggregateExec: mode=Partial, gby=[CAST(number.c0_AS_Int64)_+_Int64(1)@0 as number.c0 + Int64(1)], aggr=[COUNT(number.c0 + Int64(1))] 
    ProjectionExec: expr=[CAST(c0@0 AS Int64) + 1 as CAST(number.c0_AS_Int64)_+_Int64(1)

It's not immediately obvious + what happens if the column doesn't have a space?

Please feel free to share other alternatives.

@alamb
Copy link
Contributor

alamb commented May 6, 2024

#10333 is a really nice PR 💯

@alamb
Copy link
Contributor

alamb commented May 9, 2024

We unfortunately found issues in #10333 so we are going to revert it in #10436

See discussion here: #10436 (review)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
3 participants