New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier #33200
Conversation
Kubernetes integration test starting |
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test status success |
Test build #140609 has finished for PR 33200 at commit
|
Test build #140614 has finished for PR 33200 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #140766 has finished for PR 33200 at commit
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
Test build #140881 has finished for PR 33200 at commit
|
cc @cloud-fan. TIA! |
@@ -229,22 +228,13 @@ case class ReplaceTableAsSelectStatement( | |||
* Column data as parsed by ALTER TABLE ... ADD COLUMNS. | |||
*/ | |||
case class QualifiedColType( | |||
name: Seq[String], | |||
fieldName: FieldName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very sure about this change. This is the column to add and we don't need to resolve the field name, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you summarize what we need to do for it? What I can think of:
- normalize the field name according to the actual table schema
- make sure the column name does not exist in the table schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For AlterTableAddColumns
, we need to 1) resolve the "parent" name if the column being added is a nested one, and 2) check if the column name already exists.
For AlterTableReplaceColumns
, it seems that we do not need to check anything with this new change (I removed it in the recent commit)
The reason I was using FieldName
is so that I can check whether QualifiedColType
is resolved or not so that rule doesn't run if already resolved:
private def hasUnresolvedColumns(cols: Seq[QualifiedColType]): Boolean = {
cols.exists(col => !col.fieldName.resolved || col.position.exists(!_.resolved))
, but I agree that using ResolvedFieldName
is a bit weird since the field name is being "added". Maybe turn QualifiedColType
into an Expression
?
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala
Show resolved
Hide resolved
Kubernetes integration test starting |
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test status success |
Test build #141512 has finished for PR 33200 at commit
|
Test build #141513 has finished for PR 33200 at commit
|
*/ | ||
case class AlterTableAddColumns( | ||
table: LogicalPlan, | ||
columnsToAdd: Seq[QualifiedColType]) extends AlterTableColumnCommand { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since QualifiedColType
is not an expression, the default QueryPlan.resolved
won't work here. Can we override resolved
?
val table = a.table.asInstanceOf[ResolvedTable] | ||
a.transformExpressions { | ||
case u: UnresolvedFieldName => resolveFieldNames(table, u.name, u) | ||
} | ||
|
||
case a @ AlterTableAddColumns(r: ResolvedTable, cols) if hasUnresolvedColumns(cols) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After address https://github.com/apache/spark/pull/33200/files#r676413397 , here can be case a @ ... if !a.resolved
val schema = r.table.schema | ||
val resolvedCols = cols.map { col => | ||
col.path match { | ||
case Some(parent) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's match case Some(parent: UnresolvedFieldName)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for some minor comments
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #141689 has finished for PR 33200 at commit
|
@imback82 can you fix the conflicts? thanks! |
Thanks @cloud-fan for the review! I will work on #33200 (comment) after this PR gets in. |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #141720 has finished for PR 33200 at commit
|
The test failure in GA is unrelated and jenkins passes. Thanks, merging to master/3.2! (this is the last PR to migrate ALTER TABLE ... COLUMN to v2 command) |
…ds to use UnresolvedTable to resolve the identifier ### What changes were proposed in this pull request? This PR proposes to migrate the following `ALTER TABLE ... ADD/REPLACE COLUMNS` commands to use `UnresolvedTable` as a `child` to resolve the table identifier. This allows consistent resolution rules (temp view first, etc.) to be applied for both v1/v2 commands. More info about the consistent resolution rule proposal can be found in [JIRA](https://issues.apache.org/jira/browse/SPARK-29900) or [proposal doc](https://docs.google.com/document/d/1hvLjGA8y_W_hhilpngXVub1Ebv8RsMap986nENCFnrg/edit?usp=sharing). ### Why are the changes needed? This is a part of effort to make the relation lookup behavior consistent: [SPARK-29900](https://issues.apache.org/jira/browse/SPARK-29900). ### Does this PR introduce _any_ user-facing change? After this PR, the above `ALTER TABLE ... ADD/REPLACE COLUMNS` commands will have a consistent resolution behavior. ### How was this patch tested? Updated existing tests. Closes #33200 from imback82/alter_add_cols. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 809b88a) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…. COLUMN ### What changes were proposed in this pull request? This a followup of the recent work such as #33200 For `ALTER TABLE` commands, the logical plans do not have the common `AlterTable` prefix in the name and just use names like `SetTableLocation`. This PR proposes to follow the same naming rule in `ALTER TABE ... COLUMN` commands. This PR also moves these AlterTable commands to a individual file and give them a base trait. ### Why are the changes needed? name simplification ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing test Closes #33609 from cloud-fan/dsv2. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
…. COLUMN ### What changes were proposed in this pull request? This a followup of the recent work such as #33200 For `ALTER TABLE` commands, the logical plans do not have the common `AlterTable` prefix in the name and just use names like `SetTableLocation`. This PR proposes to follow the same naming rule in `ALTER TABE ... COLUMN` commands. This PR also moves these AlterTable commands to a individual file and give them a base trait. ### Why are the changes needed? name simplification ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing test Closes #33609 from cloud-fan/dsv2. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit 7cb9c1c) Signed-off-by: Max Gekk <max.gekk@gmail.com>
### What changes were proposed in this pull request? Now that all the commands that use `UnresolvedV2Relation` have been migrated to use `UnresolvedTable` and `UnresolvedView` (e.g, #33200), `UnresolvedV2Relation` can be removed. ### Why are the changes needed? To remove unused code. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Removing dead code and no code coverage existed before. Closes #33677 from imback82/remove_unresolvedv2relation. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
This PR proposes to migrate the following
ALTER TABLE ... ADD/REPLACE COLUMNS
commands to useUnresolvedTable
as achild
to resolve the table identifier. This allows consistent resolution rules (temp view first, etc.) to be applied for both v1/v2 commands. More info about the consistent resolution rule proposal can be found in JIRA or proposal doc.Why are the changes needed?
This is a part of effort to make the relation lookup behavior consistent: SPARK-29900.
Does this PR introduce any user-facing change?
After this PR, the above
ALTER TABLE ... ADD/REPLACE COLUMNS
commands will have a consistent resolution behavior.How was this patch tested?
Updated existing tests.