[GLUTEN-10636][VL]Use backend validation to find all unsupported expression#10637
Conversation
|
Run Gluten Clickhouse CI on x86 |
|
@jinchengchenghh @WangGuangxin @zhztheplayer Do you think it is a reasonable way to extend |
4997a7d to
ac5a794
Compare
|
Run Gluten Clickhouse CI on x86 |
ac5a794 to
75518fa
Compare
|
Run Gluten Clickhouse CI on x86 |
75518fa to
e859240
Compare
|
Run Gluten Clickhouse CI on x86 |
|
I agree we should prefer PartialProject over vanilla Spark project in almost all cases where projects fall back. |
I think so. In this PR, I use validation to find the expressions that backend doesn't support, do you think it is a reasonable way? |
e859240 to
a16cb17
Compare
|
Run Gluten Clickhouse CI on x86 |
a16cb17 to
e5f4e2d
Compare
|
Run Gluten Clickhouse CI on x86 |
e5f4e2d to
9d7475e
Compare
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
9468a3c to
d5dccbc
Compare
|
Run Gluten Clickhouse CI on x86 |
d5dccbc to
4f58ad3
Compare
|
Run Gluten Clickhouse CI on x86 |
4f58ad3 to
fc3d30e
Compare
|
Run Gluten Clickhouse CI on x86 |
|
Kindly ping @jinchengchenghh @WangGuangxin |
jinchengchenghh
left a comment
There was a problem hiding this comment.
Thanks for your enhancement.
This looks like a bit complicated, could we reuse replaceWithExpressionTransformer to validate?
| case class ColumnarPartialProjectExec( | ||
| projectList: Seq[Expression], | ||
| child: SparkPlan, | ||
| replacedAlias: Seq[Alias]) |
There was a problem hiding this comment.
Why you move replacedAlias to argument list?
There was a problem hiding this comment.
Because after this PR, SubQuery may be replaced. If replacedAlias is not put into the first argument list, then prepareSubqueries can't find the subquery in replacedAlias. So I put it in the first argument list.
There was a problem hiding this comment.
The Alias is unique, with this change, the exprId is different, and the alias origins from projectList, the subquery should be found from it.
There was a problem hiding this comment.
The Alias is unique, with this change, the exprId is different
Sorry, I can't understand this. Do you mean that the Alias's exprId is different from the original exprId?
the alias origins from projectList, the subquery should be found from it.
I get it. Thanks!
ea63e28 to
fde57c7
Compare
|
Run Gluten Clickhouse CI on x86 |
2 similar comments
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
ed6e63c to
78af02c
Compare
|
Run Gluten Clickhouse CI on x86 |
78af02c to
fa50849
Compare
|
Run Gluten Clickhouse CI on x86 |
3 similar comments
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
a0ef2c0 to
84919be
Compare
|
Run Gluten Clickhouse CI on x86 |
|
good enhancement! |
84919be to
53ec176
Compare
|
Run Gluten Clickhouse CI on x86 |
|
@jinchengchenghh Please take a look again. It seems that the CI failure has nothing to do with this PR. Thanks! |
jinchengchenghh
left a comment
There was a problem hiding this comment.
Thanks for your enhancement!
What changes are proposed in this pull request?
Now PartialProject can't be applied to the expressions that are unsupported by native backend, so maybe we can use native validation to find all unsupported expressions and apply PartialProject to them.
This PR uses the following method to find all unsupported expressions: Traverse the expression trees in post-order. For each tree node, if this expression is not offload-able, then we replace it by an
Alias.For example, if the expression is
func4(func3(func2(func1()))), then we first determine whetherfunc1()is offload-able. Second, we determine whetherfunc2()is offload-able and so on.After this PR, expressions like
map_from_arraysthat hasn't been supported by velox can be calculated usingPartialProjectExec.This PR also adds a config named
spark.gluten.sql.columnar.partial.validationto control whether to use native validation to find all unsupported expressions.How was this patch tested?
unit test.
Related issue: #10636