Skip to content

[SPARK-48010][SQL] Avoid repeated calls to conf.resolver in resolveExpression#46248

Closed
nikhilsheoran-db wants to merge 1 commit into
apache:masterfrom
nikhilsheoran-db:SPARK-48010
Closed

[SPARK-48010][SQL] Avoid repeated calls to conf.resolver in resolveExpression#46248
nikhilsheoran-db wants to merge 1 commit into
apache:masterfrom
nikhilsheoran-db:SPARK-48010

Conversation

@nikhilsheoran-db
Copy link
Copy Markdown
Contributor

@nikhilsheoran-db nikhilsheoran-db commented Apr 26, 2024

What changes were proposed in this pull request?

  • This PR instead of calling conf.resolver for each call in resolveExpression, reuses the resolver obtained once.

Why are the changes needed?

  • Consider a view with large number of columns (~1000s). When looking at the RuleExecutor metrics and flamegraph for a query that only does DESCRIBE SELECT * FROM large_view, observed that a large fraction of time is spent in ResolveReferences and ResolveRelations. Of these, the majority of the driver time went in initializing the conf to obtain conf.resolver for each of the column in the view.
  • Since, the same conf is used in each of these calls, calling the conf.resolver again and again can be avoided by initializing it once and reusing the same resolver.

Does this PR introduce any user-facing change?

No

How was this patch tested?

  • Created a dummy view with 3000 columns.
  • Observed the RuleExecutor metrics using RuleExecutor.dumpTimeSpent().
  • RuleExecutor metrics before this change (after multiple runs)
=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 1483
Total time: 8.026801698 seconds

Rule                                                                                    Effective Time / Total Time                     Effective Runs / Total Runs

org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations                        4060159342 / 4062186814                         1 / 6
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences                       3789405037 / 3809203288                         2 / 6
org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$CombinedTypeCoercionRule        0 / 20741164                                    0 / 6
org.apache.spark.sql.catalyst.analysis.ResolveTimeZone                                  17800584 / 19431350                             1 / 6
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast                           15036018 / 15060440                             1 / 6
org.apache.spark.sql.catalyst.analysis.UpdateAttributeNullability                       0 / 14929810                                    0 / 7
  • RuleExecutor metrics after this change (after multiple runs)
=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 1483
Total time: 2.892630859 seconds

Rule                                                                                    Effective Time / Total Time                     Effective Runs / Total Runs

org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations                        1490357745 / 1492398446                         1 / 6
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences                       1212205822 / 1241729981                         2 / 6
org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$CombinedTypeCoercionRule        0 / 23857161                                    0 / 6
org.apache.spark.sql.catalyst.analysis.ResolveTimeZone                                  16603250 / 18806065                             1 / 6
org.apache.spark.sql.catalyst.analysis.UpdateAttributeNullability                       0 / 16749306                                    0 / 7
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast                           11158299 / 11183593                             1 / 6

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions Bot added the SQL label Apr 26, 2024
Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @nikhilsheoran-db !

@dongjoon-hyun
Copy link
Copy Markdown
Member

Merged to master for Apache Spark 4.0.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants