-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-36449][SQL] v2 ALTER TABLE REPLACE COLUMNS should check duplicates for the user specified columns #33676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| if (struct.findNestedField(fieldNames, includeCollections = true, r).isDefined) { | ||
| def checkColumnNotExists(op: String, fieldNames: Seq[String], struct: StructType): Unit = { | ||
| if (struct.findNestedField( | ||
| fieldNames, includeCollections = true, alter.conf.resolver).isDefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
capturing resolver directly from alter variable to simplify.
|
|
||
| test("SPARK-36449: Replacing columns with duplicate name should not be allowed") { | ||
| alterTableTest( | ||
| () => ReplaceColumns( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to create a new ReplaceColumns. Otherwise, analyzed will be set to true after the first iteration.
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #142192 has finished for PR 33676 at commit
|
|
cc @cloud-fan |
| } | ||
|
|
||
| private def alterTableTest( | ||
| alter: AlterTableCommand, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about simply changing this to by-name parameter? alter: => AlterTableCommand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, updated.
|
Test build #142231 has started for PR 33676 at commit |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Refer to this link for build results (access rights to CI server needed): |
|
thanks, merging to master/3.2! |
…ates for the user specified columns ### What changes were proposed in this pull request? Currently, v2 ALTER TABLE REPLACE COLUMNS does not check duplicates for the user specified columns. For example, ``` spark.sql(s"CREATE TABLE $t (id int) USING $v2Format") spark.sql(s"ALTER TABLE $t REPLACE COLUMNS (data string, data string)") ``` doesn't fail the analysis, and it's up to the catalog implementation to handle it. ### Why are the changes needed? To check the duplicate columns during analysis. ### Does this PR introduce _any_ user-facing change? Yes, now the above will command will print out the following: ``` org.apache.spark.sql.AnalysisException: Found duplicate column(s) in the user specified columns: `data` ``` ### How was this patch tested? Added new unit tests Closes #33676 from imback82/replace_cols_duplicates. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit e1a5d94) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Currently, v2 ALTER TABLE REPLACE COLUMNS does not check duplicates for the user specified columns. For example,
doesn't fail the analysis, and it's up to the catalog implementation to handle it.
Why are the changes needed?
To check the duplicate columns during analysis.
Does this PR introduce any user-facing change?
Yes, now the above will command will print out the following:
How was this patch tested?
Added new unit tests