Skip to content

[GLUTEN-10636][VL]Use backend validation to find all unsupported expression#10637

Merged
jinchengchenghh merged 10 commits intoapache:mainfrom
jiangjiangtian:find_unsupported_using_validation
Nov 11, 2025
Merged

[GLUTEN-10636][VL]Use backend validation to find all unsupported expression#10637
jinchengchenghh merged 10 commits intoapache:mainfrom
jiangjiangtian:find_unsupported_using_validation

Conversation

@jiangjiangtian
Copy link
Copy Markdown
Contributor

@jiangjiangtian jiangjiangtian commented Sep 5, 2025

What changes are proposed in this pull request?

Now PartialProject can't be applied to the expressions that are unsupported by native backend, so maybe we can use native validation to find all unsupported expressions and apply PartialProject to them.

This PR uses the following method to find all unsupported expressions: Traverse the expression trees in post-order. For each tree node, if this expression is not offload-able, then we replace it by an Alias.
For example, if the expression is func4(func3(func2(func1()))), then we first determine whether func1() is offload-able. Second, we determine whether func2() is offload-able and so on.

After this PR, expressions like map_from_arrays that hasn't been supported by velox can be calculated using PartialProjectExec.
This PR also adds a config named spark.gluten.sql.columnar.partial.validation to control whether to use native validation to find all unsupported expressions.

How was this patch tested?

unit test.

Related issue: #10636

@github-actions github-actions Bot added CORE works for Gluten Core VELOX DOCS labels Sep 5, 2025
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Sep 5, 2025

#10636

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Sep 5, 2025

Run Gluten Clickhouse CI on x86

@jiangjiangtian
Copy link
Copy Markdown
Contributor Author

@jinchengchenghh @WangGuangxin @zhztheplayer Do you think it is a reasonable way to extend PartialProject to all unsupported expressions. Please take a look, thanks!

@jiangjiangtian jiangjiangtian force-pushed the find_unsupported_using_validation branch from 4997a7d to ac5a794 Compare September 5, 2025 10:10
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Sep 5, 2025

Run Gluten Clickhouse CI on x86

@jiangjiangtian jiangjiangtian force-pushed the find_unsupported_using_validation branch from ac5a794 to 75518fa Compare September 5, 2025 11:47
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Sep 5, 2025

Run Gluten Clickhouse CI on x86

@jiangjiangtian jiangjiangtian force-pushed the find_unsupported_using_validation branch from 75518fa to e859240 Compare September 5, 2025 12:05
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Sep 5, 2025

Run Gluten Clickhouse CI on x86

@zhztheplayer
Copy link
Copy Markdown
Member

I agree we should prefer PartialProject over vanilla Spark project in almost all cases where projects fall back.

@jiangjiangtian
Copy link
Copy Markdown
Contributor Author

I agree we should prefer PartialProject over vanilla Spark project in almost all cases where projects fall back.

I think so. In this PR, I use validation to find the expressions that backend doesn't support, do you think it is a reasonable way?

@jiangjiangtian jiangjiangtian force-pushed the find_unsupported_using_validation branch from e859240 to a16cb17 Compare September 8, 2025 09:33
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Sep 8, 2025

Run Gluten Clickhouse CI on x86

@jiangjiangtian jiangjiangtian force-pushed the find_unsupported_using_validation branch from a16cb17 to e5f4e2d Compare September 12, 2025 08:33
@github-actions github-actions Bot removed the DOCS label Sep 12, 2025
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@jiangjiangtian jiangjiangtian force-pushed the find_unsupported_using_validation branch from e5f4e2d to 9d7475e Compare September 12, 2025 08:47
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@jiangjiangtian jiangjiangtian force-pushed the find_unsupported_using_validation branch from 9468a3c to d5dccbc Compare September 15, 2025 02:37
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@jiangjiangtian jiangjiangtian force-pushed the find_unsupported_using_validation branch from d5dccbc to 4f58ad3 Compare October 21, 2025 09:58
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@jiangjiangtian jiangjiangtian force-pushed the find_unsupported_using_validation branch from 4f58ad3 to fc3d30e Compare October 21, 2025 10:03
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@zhztheplayer
Copy link
Copy Markdown
Member

Kindly ping @jinchengchenghh @WangGuangxin

Copy link
Copy Markdown
Contributor

@jinchengchenghh jinchengchenghh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your enhancement.
This looks like a bit complicated, could we reuse replaceWithExpressionTransformer to validate?

case class ColumnarPartialProjectExec(
projectList: Seq[Expression],
child: SparkPlan,
replacedAlias: Seq[Alias])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you move replacedAlias to argument list?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because after this PR, SubQuery may be replaced. If replacedAlias is not put into the first argument list, then prepareSubqueries can't find the subquery in replacedAlias. So I put it in the first argument list.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Alias is unique, with this change, the exprId is different, and the alias origins from projectList, the subquery should be found from it.

Copy link
Copy Markdown
Contributor Author

@jiangjiangtian jiangjiangtian Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Alias is unique, with this change, the exprId is different

Sorry, I can't understand this. Do you mean that the Alias's exprId is different from the original exprId?

the alias origins from projectList, the subquery should be found from it.

I get it. Thanks!

@jiangjiangtian jiangjiangtian force-pushed the find_unsupported_using_validation branch from ea63e28 to fde57c7 Compare October 24, 2025 09:50
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

2 similar comments
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@jiangjiangtian jiangjiangtian force-pushed the find_unsupported_using_validation branch from ed6e63c to 78af02c Compare October 28, 2025 06:36
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@jiangjiangtian jiangjiangtian force-pushed the find_unsupported_using_validation branch from 78af02c to fa50849 Compare October 28, 2025 09:32
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

3 similar comments
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@jiangjiangtian jiangjiangtian force-pushed the find_unsupported_using_validation branch from a0ef2c0 to 84919be Compare October 30, 2025 06:41
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@FelixYBW
Copy link
Copy Markdown
Contributor

good enhancement!

@jiangjiangtian jiangjiangtian force-pushed the find_unsupported_using_validation branch from 84919be to 53ec176 Compare October 31, 2025 11:07
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@jiangjiangtian
Copy link
Copy Markdown
Contributor Author

@jinchengchenghh Please take a look again. It seems that the CI failure has nothing to do with this PR. Thanks!

Copy link
Copy Markdown
Contributor

@jinchengchenghh jinchengchenghh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your enhancement!

@jinchengchenghh jinchengchenghh merged commit f4b7f25 into apache:main Nov 11, 2025
101 of 102 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants