New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-30615][SQL] Introduce Analyzer rule for V2 AlterTable column change resolution #27350
Conversation
Test build #117333 has finished for PR 27350 at commit
|
Test build #117364 has finished for PR 27350 at commit
|
Test build #117368 has finished for PR 27350 at commit
|
Test build #117464 has finished for PR 27350 at commit
|
Test build #117490 has finished for PR 27350 at commit
|
retest this please |
Test build #117492 has finished for PR 27350 at commit
|
Test build #117495 has finished for PR 27350 at commit
|
Test build #117519 has finished for PR 27350 at commit
|
Test build #117521 has finished for PR 27350 at commit
|
Test build #117529 has finished for PR 27350 at commit
|
pos)) | ||
|
||
case other => | ||
throw new AnalysisException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we return Some(add)
here as well? CheckAnalysis
also throws error for this case.
comment.fieldNames(), | ||
TableChange.updateColumnComment(_, comment.newComment())).orElse(Some(comment)) | ||
|
||
case rename: RenameColumn => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we check that the new name doesn't conflict with the existing field names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left to CheckAnalysis
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Show resolved
Hide resolved
field: String, | ||
struct: StructType): ColumnPosition = { | ||
position match { | ||
case null => null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when will ColumnPosition
be null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when you're adding a column without a position (so to the end)
(position, dataType) match { | ||
case (after: After, struct: StructType) => | ||
struct.fieldNames.contains(after.column()) | ||
case (after: After, _) => false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we fail for this case? We can't add fields to a non-struct column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is used for non add-column methods as well, I left it a bit more generic for now
Some(field) | ||
includeCollections: Boolean = false, | ||
resolver: Resolver = _ == _): Option[(Seq[String], StructField)] = { | ||
def prettyFieldName(nameParts: Seq[String]): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we reuse CatalogV2Implicits.MultipartIdentifierHelper.quoted
?
@cloud-fan addressed your comments |
Test build #117578 has finished for PR 27350 at commit
|
retest this please |
Test build #117584 has finished for PR 27350 at commit
|
} else { | ||
val (fieldNames, field) = fieldOpt.get | ||
if (field.dataType == typeChange.newDataType()) { | ||
// The user didn't want the field to change, so remove this change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove other column changes if they are noop, e.g. UpdateColumnNullability
without changing nullability. We can address it in a followup.
thanks, merging to master! |
What changes were proposed in this pull request?
Adds an Analyzer rule to normalize the column names used in V2 AlterTable table changes. We need to handle all ColumnChange operations. We add an extra match statement for future proofing new changes that may be added. This prevents downstream consumers (e.g. catalogs) to deal about case sensitivity or check that columns exist, etc.
We also fix the behavior for ALTER TABLE CHANGE COLUMN (Hive style syntax) for adding comments to complex data types. Currently, the data type needs to be provided as part of the Hive style syntax. This assumes that the data type as changed when it may have not and the user only wants to add a comment, which fails in CheckAnalysis.
Why are the changes needed?
Currently we do not handle case sensitivity correctly for ALTER TABLE ALTER COLUMN operations.
Does this PR introduce any user-facing change?
No, fixes a bug.
How was this patch tested?
Introduced v2CommandsCaseSensitivitySuite and added a test around HiveStyle Change columns to PlanResolutionSuite