-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-19290][SQL] add a new extending interface in Analyzer for post-hoc resolution #16645
Conversation
Test build #71663 has finished for PR 16645 at commit
|
@@ -106,6 +106,13 @@ class Analyzer( | |||
*/ | |||
val extendedResolutionRules: Seq[Rule[LogicalPlan]] = Nil | |||
|
|||
/** | |||
* Override to provide rules to do post-hoc resolution. Note that these rules will be executed | |||
* in an individual bach. This batch is run right after the normal resolution batch and execute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bach
-> batch
is run
-> is to run
My main concern of this pr is that if people will think it is recommended to add new batches to force those rules running in a certain ordering. For these resolution rules, we can also use conditions to control when they will fire, right? If we will always replace a logical plan to another one in the analysis phase, seems we should use |
@@ -62,15 +62,17 @@ private[hive] class HiveSessionState(sparkSession: SparkSession) | |||
override val extendedResolutionRules = | |||
catalog.ParquetConversions :: | |||
catalog.OrcConversions :: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about moving the rule catalog.ParquetConversions and catalog.OrcConversions at the beginning of the batch postHocResolutionRules
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do they need to? Eventually they will be optimizer rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two rules need MetastoreRelation
. Ideally, they should be after the rule FindHiveSerdeTable
.
I am fine to keep it if we plan to move it into optimizer rules.
@yhuai yes we can use conditions and put them in |
I also understand the concern of @yhuai . But, when the number of rules in a single batch keeps growing, using a single condition |
also ping @hvanhovell |
Test build #71693 has finished for PR 16645 at commit
|
retest this please |
Test build #71696 has finished for PR 16645 at commit
|
Test build #71825 has started for PR 16645 at commit |
retest this please |
Test build #71831 has finished for PR 16645 at commit
|
LGTM |
Thanks! Merging to master. |
…-hoc resolution ## What changes were proposed in this pull request? To implement DDL commands, we added several analyzer rules in sql/hive module to analyze DDL related plans. However, our `Analyzer` currently only have one extending interface: `extendedResolutionRules`, which defines extra rules that will be run together with other rules in the resolution batch, and doesn't fit DDL rules well, because: 1. DDL rules may do some checking and normalization, but we may do it many times as the resolution batch will run rules again and again, until fixed point, and it's hard to tell if a DDL rule has already done its checking and normalization. It's fine because DDL rules are idempotent, but it's bad for analysis performance 2. some DDL rules may depend on others, and it's pretty hard to write `if` conditions to guarantee the dependencies. It will be good if we have a batch which run rules in one pass, so that we can guarantee the dependencies by rules order. This PR adds a new extending interface in `Analyzer`: `postHocResolutionRules`, which defines rules that will be run only once in a batch runs right after the resolution batch. ## How was this patch tested? existing tests Author: Wenchen Fan <wenchen@databricks.com> Closes apache#16645 from cloud-fan/analyzer.
…-hoc resolution ## What changes were proposed in this pull request? To implement DDL commands, we added several analyzer rules in sql/hive module to analyze DDL related plans. However, our `Analyzer` currently only have one extending interface: `extendedResolutionRules`, which defines extra rules that will be run together with other rules in the resolution batch, and doesn't fit DDL rules well, because: 1. DDL rules may do some checking and normalization, but we may do it many times as the resolution batch will run rules again and again, until fixed point, and it's hard to tell if a DDL rule has already done its checking and normalization. It's fine because DDL rules are idempotent, but it's bad for analysis performance 2. some DDL rules may depend on others, and it's pretty hard to write `if` conditions to guarantee the dependencies. It will be good if we have a batch which run rules in one pass, so that we can guarantee the dependencies by rules order. This PR adds a new extending interface in `Analyzer`: `postHocResolutionRules`, which defines rules that will be run only once in a batch runs right after the resolution batch. ## How was this patch tested? existing tests Author: Wenchen Fan <wenchen@databricks.com> Closes apache#16645 from cloud-fan/analyzer.
What changes were proposed in this pull request?
To implement DDL commands, we added several analyzer rules in sql/hive module to analyze DDL related plans. However, our
Analyzer
currently only have one extending interface:extendedResolutionRules
, which defines extra rules that will be run together with other rules in the resolution batch, and doesn't fit DDL rules well, because:if
conditions to guarantee the dependencies. It will be good if we have a batch which run rules in one pass, so that we can guarantee the dependencies by rules order.This PR adds a new extending interface in
Analyzer
:postHocResolutionRules
, which defines rules that will be run only once in a batch runs right after the resolution batch.How was this patch tested?
existing tests