Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-30615][SQL] Introduce Analyzer rule for V2 AlterTable column change resolution #27350

Closed
wants to merge 11 commits into from

Conversation

brkyvz
Copy link
Contributor

@brkyvz brkyvz commented Jan 24, 2020

What changes were proposed in this pull request?

Adds an Analyzer rule to normalize the column names used in V2 AlterTable table changes. We need to handle all ColumnChange operations. We add an extra match statement for future proofing new changes that may be added. This prevents downstream consumers (e.g. catalogs) to deal about case sensitivity or check that columns exist, etc.

We also fix the behavior for ALTER TABLE CHANGE COLUMN (Hive style syntax) for adding comments to complex data types. Currently, the data type needs to be provided as part of the Hive style syntax. This assumes that the data type as changed when it may have not and the user only wants to add a comment, which fails in CheckAnalysis.

Why are the changes needed?

Currently we do not handle case sensitivity correctly for ALTER TABLE ALTER COLUMN operations.

Does this PR introduce any user-facing change?

No, fixes a bug.

How was this patch tested?

Introduced v2CommandsCaseSensitivitySuite and added a test around HiveStyle Change columns to PlanResolutionSuite

@SparkQA
Copy link

SparkQA commented Jan 24, 2020

Test build #117333 has finished for PR 27350 at commit 54266d9.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 24, 2020

Test build #117364 has finished for PR 27350 at commit f4db1b2.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 24, 2020

Test build #117368 has finished for PR 27350 at commit 5c369af.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 28, 2020

Test build #117464 has finished for PR 27350 at commit 9c3022d.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@brkyvz brkyvz changed the title [WIP][SPARK-30615][SQL] Normalize column names in V2 AlterTable [SPARK-30615][SQL] Normalize column names in V2 AlterTable Jan 28, 2020
@brkyvz brkyvz changed the title [SPARK-30615][SQL] Normalize column names in V2 AlterTable [SPARK-30615][SQL] Introduce Analyzer rule for V2 AlterTable column change resolution Jan 28, 2020
@brkyvz
Copy link
Contributor Author

brkyvz commented Jan 28, 2020

cc @rdblue @imback82 @cloud-fan

@SparkQA
Copy link

SparkQA commented Jan 28, 2020

Test build #117490 has finished for PR 27350 at commit 3ba74c5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait HasBlockSize extends Params
  • class HasBlockSize(Params):
  • case class Like(str: Expression, pattern: Expression, escape: Expression)
  • case class RLike(left: Expression, right: Expression)
  • public final class ColumnDictionary implements Dictionary
  • trait AliasAwareOutputPartitioning extends UnaryExecNode

@brkyvz
Copy link
Contributor Author

brkyvz commented Jan 28, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Jan 28, 2020

Test build #117492 has finished for PR 27350 at commit 2a9c1a0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 28, 2020

Test build #117495 has finished for PR 27350 at commit 2a9c1a0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 29, 2020

Test build #117519 has finished for PR 27350 at commit c035012.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 29, 2020

Test build #117521 has finished for PR 27350 at commit c824d15.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 30, 2020

Test build #117529 has finished for PR 27350 at commit 8c1402d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

pos))

case other =>
throw new AnalysisException(
Copy link
Contributor

@cloud-fan cloud-fan Jan 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we return Some(add) here as well? CheckAnalysis also throws error for this case.

comment.fieldNames(),
TableChange.updateColumnComment(_, comment.newComment())).orElse(Some(comment))

case rename: RenameColumn =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we check that the new name doesn't conflict with the existing field names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left to CheckAnalysis

field: String,
struct: StructType): ColumnPosition = {
position match {
case null => null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when will ColumnPosition be null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you're adding a column without a position (so to the end)

(position, dataType) match {
case (after: After, struct: StructType) =>
struct.fieldNames.contains(after.column())
case (after: After, _) => false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we fail for this case? We can't add fields to a non-struct column.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is used for non add-column methods as well, I left it a bit more generic for now

Some(field)
includeCollections: Boolean = false,
resolver: Resolver = _ == _): Option[(Seq[String], StructField)] = {
def prettyFieldName(nameParts: Seq[String]): String = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we reuse CatalogV2Implicits.MultipartIdentifierHelper.quoted ?

@brkyvz
Copy link
Contributor Author

brkyvz commented Jan 30, 2020

@cloud-fan addressed your comments

@SparkQA
Copy link

SparkQA commented Jan 30, 2020

Test build #117578 has finished for PR 27350 at commit 2b364e2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@brkyvz
Copy link
Contributor Author

brkyvz commented Jan 30, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Jan 31, 2020

Test build #117584 has finished for PR 27350 at commit 2b364e2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

} else {
val (fieldNames, field) = fieldOpt.get
if (field.dataType == typeChange.newDataType()) {
// The user didn't want the field to change, so remove this change
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove other column changes if they are noop, e.g. UpdateColumnNullability without changing nullability. We can address it in a followup.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 290a528 Jan 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants