Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier #33200

Closed
wants to merge 18 commits into from

Conversation

imback82
Copy link
Contributor

@imback82 imback82 commented Jul 3, 2021

What changes were proposed in this pull request?

This PR proposes to migrate the following ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable as a child to resolve the table identifier. This allows consistent resolution rules (temp view first, etc.) to be applied for both v1/v2 commands. More info about the consistent resolution rule proposal can be found in JIRA or proposal doc.

Why are the changes needed?

This is a part of effort to make the relation lookup behavior consistent: SPARK-29900.

Does this PR introduce any user-facing change?

After this PR, the above ALTER TABLE ... ADD/REPLACE COLUMNS commands will have a consistent resolution behavior.

How was this patch tested?

Updated existing tests.

@github-actions github-actions bot added the SQL label Jul 3, 2021
@SparkQA
Copy link

SparkQA commented Jul 3, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45121/

@SparkQA
Copy link

SparkQA commented Jul 3, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45122/

@SparkQA
Copy link

SparkQA commented Jul 3, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45121/

@SparkQA
Copy link

SparkQA commented Jul 3, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45122/

@SparkQA
Copy link

SparkQA commented Jul 3, 2021

Test build #140609 has finished for PR 33200 at commit f729d8d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 3, 2021

Test build #140614 has finished for PR 33200 at commit 0c83e37.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 3, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45127/

@SparkQA
Copy link

SparkQA commented Jul 3, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45127/

@imback82 imback82 changed the title [SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier [WIP][SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier Jul 7, 2021
@SparkQA
Copy link

SparkQA commented Jul 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45278/

@SparkQA
Copy link

SparkQA commented Jul 7, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45278/

@SparkQA
Copy link

SparkQA commented Jul 8, 2021

Test build #140766 has finished for PR 33200 at commit 4e0cf80.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imback82 imback82 changed the title [WIP][SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier [SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier Jul 10, 2021
@SparkQA
Copy link

SparkQA commented Jul 10, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45392/

@SparkQA
Copy link

SparkQA commented Jul 10, 2021

Test build #140881 has finished for PR 33200 at commit 1b78841.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imback82
Copy link
Contributor Author

cc @cloud-fan. TIA!

@@ -229,22 +228,13 @@ case class ReplaceTableAsSelectStatement(
* Column data as parsed by ALTER TABLE ... ADD COLUMNS.
*/
case class QualifiedColType(
name: Seq[String],
fieldName: FieldName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very sure about this change. This is the column to add and we don't need to resolve the field name, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you summarize what we need to do for it? What I can think of:

  1. normalize the field name according to the actual table schema
  2. make sure the column name does not exist in the table schema

Copy link
Contributor Author

@imback82 imback82 Jul 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For AlterTableAddColumns, we need to 1) resolve the "parent" name if the column being added is a nested one, and 2) check if the column name already exists.

For AlterTableReplaceColumns, it seems that we do not need to check anything with this new change (I removed it in the recent commit)

The reason I was using FieldName is so that I can check whether QualifiedColType is resolved or not so that rule doesn't run if already resolved:

private def hasUnresolvedColumns(cols: Seq[QualifiedColType]): Boolean = {
      cols.exists(col => !col.fieldName.resolved || col.position.exists(!_.resolved))

, but I agree that using ResolvedFieldName is a bit weird since the field name is being "added". Maybe turn QualifiedColType into an Expression?

@SparkQA
Copy link

SparkQA commented Jul 22, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46029/

@SparkQA
Copy link

SparkQA commented Jul 22, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46030/

@SparkQA
Copy link

SparkQA commented Jul 22, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46030/

@SparkQA
Copy link

SparkQA commented Jul 22, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46029/

@SparkQA
Copy link

SparkQA commented Jul 22, 2021

Test build #141512 has finished for PR 33200 at commit e4248c2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ResolvedFieldName(path: Seq[String], field: StructField) extends FieldName

@SparkQA
Copy link

SparkQA commented Jul 22, 2021

Test build #141513 has finished for PR 33200 at commit 37de953.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

*/
case class AlterTableAddColumns(
table: LogicalPlan,
columnsToAdd: Seq[QualifiedColType]) extends AlterTableColumnCommand {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since QualifiedColType is not an expression, the default QueryPlan.resolved won't work here. Can we override resolved?

val table = a.table.asInstanceOf[ResolvedTable]
a.transformExpressions {
case u: UnresolvedFieldName => resolveFieldNames(table, u.name, u)
}

case a @ AlterTableAddColumns(r: ResolvedTable, cols) if hasUnresolvedColumns(cols) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After address https://github.com/apache/spark/pull/33200/files#r676413397 , here can be case a @ ... if !a.resolved

val schema = r.table.schema
val resolvedCols = cols.map { col =>
col.path match {
case Some(parent) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's match case Some(parent: UnresolvedFieldName)

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for some minor comments

@SparkQA
Copy link

SparkQA commented Jul 27, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46203/

@SparkQA
Copy link

SparkQA commented Jul 27, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46203/

@SparkQA
Copy link

SparkQA commented Jul 27, 2021

Test build #141689 has finished for PR 33200 at commit 8dcc44d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

@imback82 can you fix the conflicts? thanks!

@imback82
Copy link
Contributor Author

Thanks @cloud-fan for the review! I will work on #33200 (comment) after this PR gets in.

@SparkQA
Copy link

SparkQA commented Jul 27, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46231/

@SparkQA
Copy link

SparkQA commented Jul 27, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46231/

@SparkQA
Copy link

SparkQA commented Jul 27, 2021

Test build #141720 has finished for PR 33200 at commit d2e6910.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public class TimerWithCustomTimeUnit extends Timer
  • public static class AppShuffleInfo
  • public class RetryingBlockTransferor
  • public class ShuffleChecksumHelper
  • class MutableCheckedOutputStream(out: OutputStream) extends OutputStream
  • case class ShuffleChecksumBlockId(shuffleId: Int, mapId: Long, reduceId: Int) extends BlockId
  • public final class Aggregation implements Serializable
  • public final class Count implements AggregateFunc
  • public final class CountStar implements AggregateFunc
  • public final class Max implements AggregateFunc
  • public final class Min implements AggregateFunc
  • public final class Sum implements AggregateFunc
  • class Observation(name: String)
  • case class CoalescedMapperPartitionSpec(
  • trait AQEShuffleReadRule extends Rule[SparkPlan]
  • case class CoalesceShufflePartitions(session: SparkSession) extends AQEShuffleReadRule
  • class BasicWriteTaskStatsTracker(
  • protected abstract class ConnectionProviderBase extends Logging
  • case class ScanBuilderHolder(
  • class ContinuousWriteRDD(var prev: RDD[InternalRow], writerFactory: StreamingDataWriterFactory,
  • case class WriteToContinuousDataSource(write: StreamingWrite, query: LogicalPlan,
  • case class WriteToContinuousDataSourceExec(write: StreamingWrite, query: SparkPlan,

@cloud-fan
Copy link
Contributor

The test failure in GA is unrelated and jenkins passes.

Thanks, merging to master/3.2! (this is the last PR to migrate ALTER TABLE ... COLUMN to v2 command)

@cloud-fan cloud-fan closed this in 809b88a Jul 28, 2021
cloud-fan pushed a commit that referenced this pull request Jul 28, 2021
…ds to use UnresolvedTable to resolve the identifier

### What changes were proposed in this pull request?

This PR proposes to migrate the following `ALTER TABLE ... ADD/REPLACE COLUMNS` commands to use `UnresolvedTable` as a `child` to resolve the table identifier. This allows consistent resolution rules (temp view first, etc.) to be applied for both v1/v2 commands. More info about the consistent resolution rule proposal can be found in [JIRA](https://issues.apache.org/jira/browse/SPARK-29900) or [proposal doc](https://docs.google.com/document/d/1hvLjGA8y_W_hhilpngXVub1Ebv8RsMap986nENCFnrg/edit?usp=sharing).

### Why are the changes needed?

This is a part of effort to make the relation lookup behavior consistent: [SPARK-29900](https://issues.apache.org/jira/browse/SPARK-29900).

### Does this PR introduce _any_ user-facing change?

After this PR, the above `ALTER TABLE ... ADD/REPLACE COLUMNS` commands will have a consistent resolution behavior.

### How was this patch tested?

Updated existing tests.

Closes #33200 from imback82/alter_add_cols.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 809b88a)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
MaxGekk pushed a commit that referenced this pull request Aug 3, 2021
…. COLUMN

### What changes were proposed in this pull request?

This a followup of the recent work such as #33200

For `ALTER TABLE` commands, the logical plans do not have the common `AlterTable` prefix in the name and just use names like `SetTableLocation`. This PR proposes to follow the same naming rule in `ALTER TABE ... COLUMN` commands.

This PR also moves these AlterTable commands to a individual file and give them a base trait.

### Why are the changes needed?

name simplification

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing test

Closes #33609 from cloud-fan/dsv2.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
MaxGekk pushed a commit that referenced this pull request Aug 3, 2021
…. COLUMN

### What changes were proposed in this pull request?

This a followup of the recent work such as #33200

For `ALTER TABLE` commands, the logical plans do not have the common `AlterTable` prefix in the name and just use names like `SetTableLocation`. This PR proposes to follow the same naming rule in `ALTER TABE ... COLUMN` commands.

This PR also moves these AlterTable commands to a individual file and give them a base trait.

### Why are the changes needed?

name simplification

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing test

Closes #33609 from cloud-fan/dsv2.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit 7cb9c1c)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
cloud-fan pushed a commit that referenced this pull request Aug 9, 2021
### What changes were proposed in this pull request?

Now that all the commands that use `UnresolvedV2Relation` have been migrated to use `UnresolvedTable` and `UnresolvedView` (e.g, #33200), `UnresolvedV2Relation` can be removed.

### Why are the changes needed?

To remove unused code.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Removing dead code and no code coverage existed before.

Closes #33677 from imback82/remove_unresolvedv2relation.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants