New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-17910][SQL] Allow users to update the comment of a column #15717
Conversation
Test build #67905 has finished for PR 15717 at commit
|
/** | ||
* Return the full description of the StructField. | ||
*/ | ||
def getDesc(): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this really needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes because we have to print the full description of the StructField, but StructField.toString
don't output metadata
field.
Looks like there is an actual test failure. |
Test build #68279 has finished for PR 15717 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ready for review now. Perhaps we could easily add full support for HIVE-style ALTER TABLE CHANGE COLUMN
statement after this PR have been merged.
@@ -93,6 +93,8 @@ statement | |||
SET TBLPROPERTIES tablePropertyList #setTableProperties | |||
| ALTER (TABLE | VIEW) tableIdentifier | |||
UNSET TBLPROPERTIES (IF EXISTS)? tablePropertyList #unsetTableProperties | |||
| ALTER (TABLE | VIEW) tableIdentifier | |||
CHANGE COLUMN? expandColTypeList #changeColumns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't support change partitions for now, will implement it in another PR.
-- Create the origin table | ||
CREATE TABLE test_change(a Int, b String, c Int); | ||
CREATE VIEW test_view(a, b, c) AS | ||
SELECT * FROM VALUES (1, "one", 11), (null, "two", 22) AS testData(a, b, c); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to make the column a
nullable here, otherwise it can't be coverted to StructField(a,IntegerType,true,{comment: newComment})
retest this please. - This build error is strange because it successfully built on my local envirement. |
Test build #68284 has finished for PR 15717 at commit
|
cba5bbd
to
fda6d3a
Compare
Test build #68324 has finished for PR 15717 at commit
|
Test build #68327 has finished for PR 15717 at commit
|
Test build #68335 has finished for PR 15717 at commit
|
Any update on this PR? |
@gatorsmile I have been busy on other stuffs recently so no progress has been made on this PR, I will try to find some time to make this PR pass all unit cases this weekend. Thank you for looking at this! |
0097809
to
7976462
Compare
Test build #68565 has finished for PR 15717 at commit
|
@@ -273,6 +273,68 @@ case class AlterTableUnsetPropertiesCommand( | |||
|
|||
} | |||
|
|||
|
|||
/** | |||
* A command to change the columns for a table, only support change column comment for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: support changing column comments
|
||
/** | ||
* A command to change the columns for a table, only support change column comment for now. | ||
* This function creates a [[AlterTableChangeColumnsCommand]] logical plan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this line.
* {{{ | ||
* ALTER (TABLE | VIEW) table_identifier | ||
* CHANGE (COLUMN) column_name column_name column_dataType column_comment | ||
* (FIRST | AFTER column_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above is not clear. Please check how Hive documents it.
ALTER TABLE table_name [PARTITION partition_spec] CHANGE [COLUMN] col_old_name col_new_name column_type
[COMMENT col_comment] [FIRST|AFTER column_name] [CASCADE|RESTRICT];
A basic question. Like Hive, we should not support |
@gatorsmile Thank you for clarify, will remove the |
Test build #68586 has finished for PR 15717 at commit
|
@gatorsmile Would you please have a look at this when you have time? Thank you! |
I prefer to doing |
Will review this PR tonight. Thanks! |
Another question about this PR: does it support data source tables? |
It should support data source tables, acturally perhaps I should add test cases in |
Test build #68719 has finished for PR 15717 at commit
|
@gatorsmile I've added tests ensure that it support data source tables, please check when you have time, thank you! |
Will review this PR tomorrow. |
Could you please add a test case for verifying the metadata field of the column will not be lost after adding a comment? Thanks! |
val columnsMap = columns.map { case (oldName: String, newField: StructField) => | ||
// Find the origin column from schema by column name. | ||
val originColumn = findColumn(table.schema, oldName, resolver) | ||
// Throw a Exception if the column name/dataType is changed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: please correct a Exception
to an exception
in all the comments you added. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that a
should be changed to an
, but should we also change Exception
to exception
? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not very common to use Exception
here. Or you can change it to -> an AnalysisException
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile That's great! Thank you!
ALTER TABLE test_change CHANGE a a1 STRING COMMENT 'this is column a1' AFTER b; | ||
DESC test_change; | ||
|
||
-- Case sensitive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the comment. This test case is to check the behavior when spark.sql.caseSensitive
is off.
Could you please also add a test case when |
@gatorsmile Thank you for your suggestions! I'll update the code this evening! |
s"'${originColumn.name}' with type '${originColumn.dataType}' to " + | ||
s"'${newField.name}' with type '${newField.dataType}'") | ||
} | ||
// Create a new column from the origin column with new comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: new comment
-> the new comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my poor English... :'(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the rule is very simple.
A countable noun always takes either the indefinite (a, an) or definite (the) article when it is singular. When plural, it takes the definite article if it refers to a definite, specific group and no article if it is used in a general sense.
Test build #70073 has finished for PR 15717 at commit
|
Test build #70089 has finished for PR 15717 at commit
|
LGTM, cc @gatorsmile for final sign-off |
* {{{ | ||
* ALTER TABLE table_identifier | ||
* CHANGE [COLUMN] column_old_name column_new_name column_dataType [COMMENT column_comment] | ||
* [FIRST | AFTER column_name]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the right Hive syntax, but SparkSqlParser.scala
is using the different one.
* {{{ | ||
* ALTER TABLE table [PARTITION partition_spec] | ||
* CHANGE [COLUMN] `col` `col` dataType [COMMENT "comment"] [FIRST | AFTER `otherCol`] | ||
* [, `col2` `col2` dataType [COMMENT "comment"] [FIRST | AFTER `otherCol`], ...] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason we allow users to modify multiple columns in this DDL? This is different from what Hive supports. Should we do it? cc @cloud-fan
In addition, based on the existing syntax, if we really want to support multiple columns, we should change the keywords to CHANGE COLUMNS
. Like Hive, we can do something like
ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we supporting multiple columns syntax, we also need to consider the edge case. For example, duplicate column names in the same DDL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's follow hive's syntax
Sorry for the last minute comment. I did not realize it until I manually run these test cases in Hive. |
@gatorsmile @cloud-fan In fact that's mainly my fault, I should have checked that the syntax is the same as that in HIVE. I'm working on this now. Thank you! |
Test build #70129 has finished for PR 15717 at commit
|
LGTM |
@jiangxb1987 Could you update the PR description to reflect the new syntax? Thanks! |
@gatorsmile I've updated the PR description, thanks! |
Merging in master. |
## What changes were proposed in this pull request? Right now, once a user set the comment of a column with create table command, he/she cannot update the comment. It will be useful to provide a public interface (e.g. SQL) to do that. This PR implements the following SQL statement: ``` ALTER TABLE table [PARTITION partition_spec] CHANGE [COLUMN] column_old_name column_new_name column_dataType [COMMENT column_comment] [FIRST | AFTER column_name]; ``` For further expansion, we could support alter `name`/`dataType`/`index` of a column too. ## How was this patch tested? Add new test cases in `ExternalCatalogSuite` and `SessionCatalogSuite`. Add sql file test for `ALTER TABLE CHANGE COLUMN` statement. Author: jiangxingbo <jiangxb1987@gmail.com> Closes apache#15717 from jiangxb1987/change-column.
## What changes were proposed in this pull request? Right now, once a user set the comment of a column with create table command, he/she cannot update the comment. It will be useful to provide a public interface (e.g. SQL) to do that. This PR implements the following SQL statement: ``` ALTER TABLE table [PARTITION partition_spec] CHANGE [COLUMN] column_old_name column_new_name column_dataType [COMMENT column_comment] [FIRST | AFTER column_name]; ``` For further expansion, we could support alter `name`/`dataType`/`index` of a column too. ## How was this patch tested? Add new test cases in `ExternalCatalogSuite` and `SessionCatalogSuite`. Add sql file test for `ALTER TABLE CHANGE COLUMN` statement. Author: jiangxingbo <jiangxb1987@gmail.com> Closes apache#15717 from jiangxb1987/change-column.
@jiangxb1987 any ideas why this still doesn't work for me in spark 2.2.0?
The same sql query executed from hive works as expected, but from spark it has no effect Thanks |
Can you also show the result of DESC TABLE before and after the command? |
Please file a JIRA issue if you think there is an unexpected behavior. Thank you! |
@jiangxb1987 thanks for so quick response! I filed the issue: |
What changes were proposed in this pull request?
Right now, once a user set the comment of a column with create table command, he/she cannot update the comment. It will be useful to provide a public interface (e.g. SQL) to do that.
This PR implements the following SQL statement:
For further expansion, we could support alter
name
/dataType
/index
of a column too.How was this patch tested?
Add new test cases in
ExternalCatalogSuite
andSessionCatalogSuite
.Add sql file test for
ALTER TABLE CHANGE COLUMN
statement.