[SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier #33200

imback82 · 2021-07-03T05:43:44Z

What changes were proposed in this pull request?

This PR proposes to migrate the following ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable as a child to resolve the table identifier. This allows consistent resolution rules (temp view first, etc.) to be applied for both v1/v2 commands. More info about the consistent resolution rule proposal can be found in JIRA or proposal doc.

Why are the changes needed?

This is a part of effort to make the relation lookup behavior consistent: SPARK-29900.

Does this PR introduce any user-facing change?

After this PR, the above ALTER TABLE ... ADD/REPLACE COLUMNS commands will have a consistent resolution behavior.

How was this patch tested?

Updated existing tests.

SparkQA · 2021-07-03T06:27:28Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45121/

SparkQA · 2021-07-03T06:31:07Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45122/

SparkQA · 2021-07-03T06:59:32Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45121/

SparkQA · 2021-07-03T07:04:20Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45122/

SparkQA · 2021-07-03T09:03:41Z

Test build #140609 has finished for PR 33200 at commit f729d8d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-07-03T16:56:20Z

Test build #140614 has finished for PR 33200 at commit 0c83e37.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-07-03T17:30:01Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45127/

SparkQA · 2021-07-03T18:02:46Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45127/

SparkQA · 2021-07-07T22:52:26Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45278/

SparkQA · 2021-07-07T23:25:27Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45278/

SparkQA · 2021-07-08T02:29:29Z

Test build #140766 has finished for PR 33200 at commit 4e0cf80.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-07-10T19:53:49Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45392/

SparkQA · 2021-07-10T23:46:08Z

Test build #140881 has finished for PR 33200 at commit 1b78841.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

imback82 · 2021-07-12T01:58:42Z

cc @cloud-fan. TIA!

cloud-fan · 2021-07-12T14:44:40Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala

@@ -229,22 +228,13 @@ case class ReplaceTableAsSelectStatement(
 * Column data as parsed by ALTER TABLE ... ADD COLUMNS.
 */
 case class QualifiedColType(
-    name: Seq[String],
+    fieldName: FieldName,


I'm not very sure about this change. This is the column to add and we don't need to resolve the field name, right?

can you summarize what we need to do for it? What I can think of:

normalize the field name according to the actual table schema

make sure the column name does not exist in the table schema

For AlterTableAddColumns, we need to 1) resolve the "parent" name if the column being added is a nested one, and 2) check if the column name already exists.

For AlterTableReplaceColumns, it seems that we do not need to check anything with this new change (I removed it in the recent commit)

The reason I was using FieldName is so that I can check whether QualifiedColType is resolved or not so that rule doesn't run if already resolved:

private def hasUnresolvedColumns(cols: Seq[QualifiedColType]): Boolean = { cols.exists(col => !col.fieldName.resolved || col.position.exists(!_.resolved))

, but I agree that using ResolvedFieldName is a bit weird since the field name is being "added". Maybe turn QualifiedColType into an Expression?

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala

…er_add_cols

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala

SparkQA · 2021-07-22T18:49:41Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46029/

SparkQA · 2021-07-22T18:50:48Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46030/

SparkQA · 2021-07-22T19:23:26Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46030/

SparkQA · 2021-07-22T19:25:47Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46029/

SparkQA · 2021-07-22T22:33:45Z

Test build #141512 has finished for PR 33200 at commit e4248c2.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class ResolvedFieldName(path: Seq[String], field: StructField) extends FieldName

SparkQA · 2021-07-22T22:42:46Z

Test build #141513 has finished for PR 33200 at commit 37de953.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-07-26T08:50:39Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala

+ */
+case class AlterTableAddColumns(
+    table: LogicalPlan,
+    columnsToAdd: Seq[QualifiedColType]) extends AlterTableColumnCommand {


since QualifiedColType is not an expression, the default QueryPlan.resolved won't work here. Can we override resolved?

cloud-fan · 2021-07-26T08:51:33Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

        val table = a.table.asInstanceOf[ResolvedTable]
        a.transformExpressions {
          case u: UnresolvedFieldName => resolveFieldNames(table, u.name, u)
        }

+      case a @ AlterTableAddColumns(r: ResolvedTable, cols) if hasUnresolvedColumns(cols) =>


After address https://github.com/apache/spark/pull/33200/files#r676413397 , here can be case a @ ... if !a.resolved

cloud-fan · 2021-07-26T09:21:31Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

+        val schema = r.table.schema
+        val resolvedCols = cols.map { col =>
+          col.path match {
+            case Some(parent) =>


nit: let's match case Some(parent: UnresolvedFieldName)

cloud-fan

LGTM except for some minor comments

SparkQA · 2021-07-27T07:51:20Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46203/

SparkQA · 2021-07-27T08:32:19Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46203/

SparkQA · 2021-07-27T10:09:53Z

Test build #141689 has finished for PR 33200 at commit 8dcc44d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-07-27T10:29:59Z

@imback82 can you fix the conflicts? thanks!

imback82 · 2021-07-27T16:35:08Z

Thanks @cloud-fan for the review! I will work on #33200 (comment) after this PR gets in.

SparkQA · 2021-07-27T17:22:33Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46231/

SparkQA · 2021-07-27T18:01:42Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46231/

SparkQA · 2021-07-27T21:32:33Z

Test build #141720 has finished for PR 33200 at commit d2e6910.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
public class TimerWithCustomTimeUnit extends Timer
public static class AppShuffleInfo
public class RetryingBlockTransferor
public class ShuffleChecksumHelper
class MutableCheckedOutputStream(out: OutputStream) extends OutputStream
case class ShuffleChecksumBlockId(shuffleId: Int, mapId: Long, reduceId: Int) extends BlockId
public final class Aggregation implements Serializable
public final class Count implements AggregateFunc
public final class CountStar implements AggregateFunc
public final class Max implements AggregateFunc
public final class Min implements AggregateFunc
public final class Sum implements AggregateFunc
class Observation(name: String)
case class CoalescedMapperPartitionSpec(
trait AQEShuffleReadRule extends Rule[SparkPlan]
case class CoalesceShufflePartitions(session: SparkSession) extends AQEShuffleReadRule
class BasicWriteTaskStatsTracker(
protected abstract class ConnectionProviderBase extends Logging
case class ScanBuilderHolder(
class ContinuousWriteRDD(var prev: RDD[InternalRow], writerFactory: StreamingDataWriterFactory,
case class WriteToContinuousDataSource(write: StreamingWrite, query: LogicalPlan,
case class WriteToContinuousDataSourceExec(write: StreamingWrite, query: SparkPlan,

cloud-fan · 2021-07-28T06:00:10Z

The test failure in GA is unrelated and jenkins passes.

Thanks, merging to master/3.2! (this is the last PR to migrate ALTER TABLE ... COLUMN to v2 command)

…ds to use UnresolvedTable to resolve the identifier ### What changes were proposed in this pull request? This PR proposes to migrate the following `ALTER TABLE ... ADD/REPLACE COLUMNS` commands to use `UnresolvedTable` as a `child` to resolve the table identifier. This allows consistent resolution rules (temp view first, etc.) to be applied for both v1/v2 commands. More info about the consistent resolution rule proposal can be found in [JIRA](https://issues.apache.org/jira/browse/SPARK-29900) or [proposal doc](https://docs.google.com/document/d/1hvLjGA8y_W_hhilpngXVub1Ebv8RsMap986nENCFnrg/edit?usp=sharing). ### Why are the changes needed? This is a part of effort to make the relation lookup behavior consistent: [SPARK-29900](https://issues.apache.org/jira/browse/SPARK-29900). ### Does this PR introduce _any_ user-facing change? After this PR, the above `ALTER TABLE ... ADD/REPLACE COLUMNS` commands will have a consistent resolution behavior. ### How was this patch tested? Updated existing tests. Closes #33200 from imback82/alter_add_cols. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 809b88a) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…. COLUMN ### What changes were proposed in this pull request? This a followup of the recent work such as #33200 For `ALTER TABLE` commands, the logical plans do not have the common `AlterTable` prefix in the name and just use names like `SetTableLocation`. This PR proposes to follow the same naming rule in `ALTER TABE ... COLUMN` commands. This PR also moves these AlterTable commands to a individual file and give them a base trait. ### Why are the changes needed? name simplification ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing test Closes #33609 from cloud-fan/dsv2. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>

…. COLUMN ### What changes were proposed in this pull request? This a followup of the recent work such as #33200 For `ALTER TABLE` commands, the logical plans do not have the common `AlterTable` prefix in the name and just use names like `SetTableLocation`. This PR proposes to follow the same naming rule in `ALTER TABE ... COLUMN` commands. This PR also moves these AlterTable commands to a individual file and give them a base trait. ### Why are the changes needed? name simplification ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing test Closes #33609 from cloud-fan/dsv2. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit 7cb9c1c) Signed-off-by: Max Gekk <max.gekk@gmail.com>

### What changes were proposed in this pull request? Now that all the commands that use `UnresolvedV2Relation` have been migrated to use `UnresolvedTable` and `UnresolvedView` (e.g, #33200), `UnresolvedV2Relation` can be removed. ### Why are the changes needed? To remove unused code. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Removing dead code and no code coverage existed before. Closes #33677 from imback82/remove_unresolvedv2relation. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

imback82 added 2 commits June 30, 2021 21:08

initial commit

0017b94

more update

eca48e3

github-actions bot added the SQL label Jul 3, 2021

clean up

f729d8d

Fix tests

0c83e37

imback82 mentioned this pull request Jul 5, 2021

[SPARK-34302][SQL][FOLLOWUP] More code cleanup #33213

Closed

imback82 added 2 commits July 7, 2021 14:25

Merge remote-tracking branch 'upstream/master' into alter_add_cols

eb5c294

remove checks and operation

4e0cf80

imback82 changed the title ~~[SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier~~ [WIP][SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier Jul 7, 2021

refinement

1b78841

imback82 changed the title ~~[WIP][SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier~~ [SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier Jul 10, 2021

Add require message

8b94461

cloud-fan reviewed Jul 12, 2021

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala Outdated Show resolved Hide resolved

imback82 added 2 commits July 16, 2021 13:49

remove few checks

1eab11f

Merge branch 'alter_add_cols' of github.com:imback82/spark-4 into alt…

e791737

…er_add_cols

imback82 commented Jul 22, 2021

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala Show resolved Hide resolved

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala Show resolved Hide resolved

cloud-fan reviewed Jul 26, 2021

View reviewed changes

cloud-fan approved these changes Jul 26, 2021

View reviewed changes

Address PR comments

8dcc44d

cloud-fan approved these changes Jul 27, 2021

View reviewed changes

Merge remote-tracking branch 'upstream/master' into alter_add_cols

d2e6910

cloud-fan closed this in 809b88a Jul 28, 2021

cloud-fan mentioned this pull request Aug 2, 2021

[SPARK-36380][SQL] Simplify the logical plan names for ALTER TABLE ... COLUMN #33609

Closed

imback82 mentioned this pull request Aug 8, 2021

[SPARK-36450][SQL] Remove unused UnresolvedV2Relation #33677

Closed

[SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier #33200

[SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier #33200

Conversation

imback82 commented Jul 3, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented Jul 3, 2021

SparkQA commented Jul 3, 2021

SparkQA commented Jul 3, 2021

SparkQA commented Jul 3, 2021

SparkQA commented Jul 3, 2021

SparkQA commented Jul 3, 2021

SparkQA commented Jul 3, 2021

SparkQA commented Jul 3, 2021

SparkQA commented Jul 7, 2021

SparkQA commented Jul 7, 2021

SparkQA commented Jul 8, 2021

SparkQA commented Jul 10, 2021

SparkQA commented Jul 10, 2021

imback82 commented Jul 12, 2021

cloud-fan Jul 12, 2021

Choose a reason for hiding this comment

cloud-fan Jul 12, 2021

Choose a reason for hiding this comment

imback82 Jul 17, 2021 • edited

Choose a reason for hiding this comment

SparkQA commented Jul 22, 2021

SparkQA commented Jul 22, 2021

SparkQA commented Jul 22, 2021

SparkQA commented Jul 22, 2021

SparkQA commented Jul 22, 2021

SparkQA commented Jul 22, 2021

cloud-fan Jul 26, 2021

Choose a reason for hiding this comment

cloud-fan Jul 26, 2021

Choose a reason for hiding this comment

cloud-fan Jul 26, 2021

Choose a reason for hiding this comment

cloud-fan left a comment

Choose a reason for hiding this comment

SparkQA commented Jul 27, 2021

SparkQA commented Jul 27, 2021

SparkQA commented Jul 27, 2021

cloud-fan commented Jul 27, 2021

imback82 commented Jul 27, 2021

SparkQA commented Jul 27, 2021

SparkQA commented Jul 27, 2021

SparkQA commented Jul 27, 2021

cloud-fan commented Jul 28, 2021

imback82 Jul 17, 2021 •

edited