[SPARK-48010][SQL] Avoid repeated calls to conf.resolver in resolveExpression by nikhilsheoran-db · Pull Request #46248 · apache/spark

nikhilsheoran-db · 2024-04-26T16:52:09Z

What changes were proposed in this pull request?

This PR instead of calling conf.resolver for each call in resolveExpression, reuses the resolver obtained once.

Why are the changes needed?

Consider a view with large number of columns (~1000s). When looking at the RuleExecutor metrics and flamegraph for a query that only does DESCRIBE SELECT * FROM large_view, observed that a large fraction of time is spent in ResolveReferences and ResolveRelations. Of these, the majority of the driver time went in initializing the conf to obtain conf.resolver for each of the column in the view.
Since, the same conf is used in each of these calls, calling the conf.resolver again and again can be avoided by initializing it once and reusing the same resolver.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Created a dummy view with 3000 columns.
Observed the RuleExecutor metrics using RuleExecutor.dumpTimeSpent().
RuleExecutor metrics before this change (after multiple runs)

=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 1483
Total time: 8.026801698 seconds

Rule                                                                                    Effective Time / Total Time                     Effective Runs / Total Runs

org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations                        4060159342 / 4062186814                         1 / 6
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences                       3789405037 / 3809203288                         2 / 6
org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$CombinedTypeCoercionRule        0 / 20741164                                    0 / 6
org.apache.spark.sql.catalyst.analysis.ResolveTimeZone                                  17800584 / 19431350                             1 / 6
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast                           15036018 / 15060440                             1 / 6
org.apache.spark.sql.catalyst.analysis.UpdateAttributeNullability                       0 / 14929810                                    0 / 7

RuleExecutor metrics after this change (after multiple runs)

=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 1483
Total time: 2.892630859 seconds

Rule                                                                                    Effective Time / Total Time                     Effective Runs / Total Runs

org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations                        1490357745 / 1492398446                         1 / 6
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences                       1212205822 / 1241729981                         2 / 6
org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$CombinedTypeCoercionRule        0 / 23857161                                    0 / 6
org.apache.spark.sql.catalyst.analysis.ResolveTimeZone                                  16603250 / 18806065                             1 / 6
org.apache.spark.sql.catalyst.analysis.UpdateAttributeNullability                       0 / 16749306                                    0 / 7
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast                           11158299 / 11183593                             1 / 6

Was this patch authored or co-authored using generative AI tooling?

No

dongjoon-hyun

+1, LGTM. Thank you, @nikhilsheoran-db !

dongjoon-hyun · 2024-04-26T18:23:33Z

Merged to master for Apache Spark 4.0.0.

Avoid repeated calls to conf.resolver in resolveExpression

839bfee

github-actions Bot added the SQL label Apr 26, 2024

dongjoon-hyun approved these changes Apr 26, 2024

View reviewed changes

dongjoon-hyun closed this in 6098bd9 Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-48010][SQL] Avoid repeated calls to conf.resolver in resolveExpression#46248

[SPARK-48010][SQL] Avoid repeated calls to conf.resolver in resolveExpression#46248
nikhilsheoran-db wants to merge 1 commit into
apache:masterfrom
nikhilsheoran-db:SPARK-48010

nikhilsheoran-db commented Apr 26, 2024 •

edited

Loading

Uh oh!

dongjoon-hyun left a comment

Uh oh!

dongjoon-hyun commented Apr 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nikhilsheoran-db commented Apr 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Apr 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nikhilsheoran-db commented Apr 26, 2024 •

edited

Loading