Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules #32686

Closed
wants to merge 31 commits into from

Conversation

sigmod
Copy link
Contributor

@sigmod sigmod commented May 27, 2021

What changes were proposed in this pull request?

Added the following TreePattern enums:

  • AGGREGATE_EXPRESSION
  • ALIAS
  • GROUPING_ANALYTICS
  • GENERATOR
  • HIGH_ORDER_FUNCTION
  • LAMBDA_FUNCTION
  • NEW_INSTANCE
  • PIVOT
  • PYTHON_UDF
  • TIME_WINDOW
  • TIME_ZONE_AWARE_EXPRESSION
  • UP_CAST
  • COMMAND
  • EVENT_TIME_WATERMARK
  • UNRESOLVED_RELATION
  • WITH_WINDOW_DEFINITION
  • UNRESOLVED_ALIAS
  • UNRESOLVED_ATTRIBUTE
  • UNRESOLVED_DESERIALIZER
  • UNRESOLVED_ORDINAL
  • UNRESOLVED_FUNCTION
  • UNRESOLVED_HINT
  • UNRESOLVED_SUBQUERY_COLUMN_ALIAS
  • UNRESOLVED_FUNC

Added tree pattern pruning to the following Analyzer rules:

  • ResolveBinaryArithmetic
  • WindowsSubstitution
  • ResolveAliases
  • ResolveGroupingAnalytics
  • ResolvePivot
  • ResolveOrdinalInOrderByAndGroupBy
  • LookupFunction
  • ResolveSubquery
  • ResolveSubqueryColumnAliases
  • ApplyCharTypePadding
  • UpdateOuterReferences
  • ResolveCreateNamedStruct
  • TimeWindowing
  • CleanupAliases
  • EliminateUnions
  • EliminateSubqueryAliases
  • HandleAnalysisOnlyCommand
  • ResolveNewInstances
  • ResolveUpCast
  • ResolveDeserializer
  • ResolveOutputRelation
  • ResolveEncodersInUDF
  • HandleNullInputsForUDF
  • ResolveGenerate
  • ExtractGenerator
  • GlobalAggregates
  • ResolveAggregateFunctions

Why are the changes needed?

Reduce the number of tree traversals and hence improve the query compilation latency.

How was this patch tested?

Existing tests.
Performance diff:
<style type="text/css"></style>

  Baseline Experiment Experiment/Baseline
ResolveBinaryArithmetic 43264874 34707117 0.80
WindowsSubstitution 3322996 2734192 0.82
ResolveAliases 24859263 21359941 0.86
ResolveGroupingAnalytics 39249143 25417569 0.80
ResolvePivot 6393408 2843314 0.44
ResolveOrdinalInOrderByAndGroupBy 10750806 3386715 0.32
LookupFunction 22087384 15481294 0.70
ResolveSubquery 1129139340 944402323 0.84
ResolveSubqueryColumnAliases 5055038 2808210 0.56
ApplyCharTypePadding 76285576 63785681 0.84
UpdateOuterReferences 6548321 3092539 0.47
ResolveCreateNamedStruct 38111477 17350249 0.46
TimeWindowing 41694190 3739134 0.09
CleanupAliases 48683506 39584921 0.81
EliminateUnions 3405069 2372506 0.70
EliminateSubqueryAliases 9626649 9518216 0.99
HandleAnalysisOnlyCommand 2562123 2661432 1.04
ResolveNewInstances 16208966 1982314 0.12
ResolveUpCast 14067843 1868615 0.13
ResolveDeserializer 17991103 2320308 0.13
ResolveOutputRelation 5815277 2088787 0.36
ResolveEncodersInUDF 14182892 1045113 0.07
HandleNullInputsForUDF 19850838 1329528 0.07
ResolveGenerate 5587345 1953192 0.35
ExtractGenerator 120378046 3386286 0.03
GlobalAggregates 16510455 13553155 0.82
ResolveAggregateFunctions 1041848509 828049280 0.79

@SparkQA
Copy link

SparkQA commented May 27, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43546/

@sigmod sigmod changed the title [WIP][SPARK-35544] Add tree pattern pruning to Analyzer rules [WIP][SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules May 27, 2021
@SparkQA
Copy link

SparkQA commented May 27, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43546/

@SparkQA
Copy link

SparkQA commented May 27, 2021

Test build #139029 has finished for PR 32686 at commit b7d966e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 27, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43547/

@SparkQA
Copy link

SparkQA commented May 27, 2021

Test build #139031 has finished for PR 32686 at commit c001574.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 27, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43549/

@sigmod sigmod changed the base branch from master to branch-3.1 May 27, 2021 21:18
@sigmod sigmod changed the base branch from branch-3.1 to master May 27, 2021 21:18
@SparkQA
Copy link

SparkQA commented May 27, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43547/

@SparkQA
Copy link

SparkQA commented May 28, 2021

Test build #139054 has finished for PR 32686 at commit f57f0fb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 28, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43591/

@SparkQA
Copy link

SparkQA commented May 28, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43591/

@SparkQA
Copy link

SparkQA commented May 29, 2021

Test build #139070 has finished for PR 32686 at commit ad9dbed.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 31, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43617/

@SparkQA
Copy link

SparkQA commented May 31, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43617/

@sigmod sigmod changed the title [WIP][SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules May 31, 2021
@sigmod
Copy link
Contributor Author

sigmod commented May 31, 2021

@dbaliafroozeh @gengliangwang @hvanhovell @maryannxue this PR is ready for review. Thanks!

@SparkQA
Copy link

SparkQA commented May 31, 2021

Test build #139096 has finished for PR 32686 at commit 0542922.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 31, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43624/

@SparkQA
Copy link

SparkQA commented May 31, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43624/

@SparkQA
Copy link

SparkQA commented May 31, 2021

Test build #139104 has finished for PR 32686 at commit 53d07ee.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hvanhovell
Copy link
Contributor

One small ergonomic comment. I would be great if we can create some shorthand for the function closures. I would probably make the in individual value be matcher for itself (if Enumeration allows subclassing of the Value class), and create a bunch of functions that allow you to compose them, e.g.: any, all, ...

@@ -423,7 +424,9 @@ class Analyzer(override val catalogManager: CatalogManager)
*/
object ResolveAliases extends Rule[LogicalPlan] {
private def assignAliases(exprs: Seq[NamedExpression]) = {
exprs.map(_.transformUp { case u @ UnresolvedAlias(child, optGenAliasFunc) =>
exprs.map(_.transformUpWithPruning(_.containsPattern(UNRESOLVED_ALIAS))
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the indention looks wired here. Shall we move the { in the above line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -1876,7 +1879,7 @@ class Analyzer(override val catalogManager: CatalogManager)
private def allowGroupByAlias: Boolean = conf.groupByAliases && !conf.ansiEnabled

override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUpWithPruning(
AlwaysProcess.fn, ruleId) {
_.containsAllPatterns(AGGREGATE, UNRESOLVED_ATTRIBUTE), ruleId) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add comment saying that mayResolveAttrByAggregateExprs requires the TreePattern UNRESOLVED_ATTRIBUTE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -3736,7 +3744,8 @@ object EliminateUnions extends Rule[LogicalPlan] {
* rule can't work for those parameters.
*/
object CleanupAliases extends Rule[LogicalPlan] with AliasHelper {
override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUpWithPruning(
_.containsPattern(ALIAS)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we need to set the Tree Pattern of MultiAlias.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for the catch!

@@ -117,6 +120,7 @@ case class AggregateExpression(
UnresolvedAttribute(aggregateFunction.toString)
}


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@sigmod
Copy link
Contributor Author

sigmod commented Jun 1, 2021

One small ergonomic comment. I would be great if we can create some shorthand for the function closures. I would probably make the in individual value be matcher for itself (if Enumeration allows subclassing of the Value class), and create a bunch of functions that allow you to compose them, e.g.: any, all, ...

I'm not sure what the transformWithPruning interface exactly looks like. IIUC, transformWithPruning may still not be able to just take a composed pattern instead of a lambda, because we also have and, or, not over all, any -- even though they're not frequent. If we'd like to put and, or, not into patterns, it sounds a bit complex, as we need to be able to process a tree of such compositions.

Anyway, thanks for the suggestion. I'll think about whether there's a simpler approach and may address it subsequent PRs.

@SparkQA
Copy link

SparkQA commented Jun 1, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43645/

@gengliangwang
Copy link
Member

Thanks, merging to master

@SparkQA
Copy link

SparkQA commented Jun 1, 2021

Test build #139125 has finished for PR 32686 at commit 8252a6a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants