[SPARK-17910][SQL] Allow users to update the comment of a column #15717

jiangxb1987 · 2016-11-01T16:08:35Z

What changes were proposed in this pull request?

Right now, once a user set the comment of a column with create table command, he/she cannot update the comment. It will be useful to provide a public interface (e.g. SQL) to do that.

This PR implements the following SQL statement:

ALTER TABLE table [PARTITION partition_spec]
CHANGE [COLUMN] column_old_name column_new_name column_dataType
[COMMENT column_comment]
[FIRST | AFTER column_name];

For further expansion, we could support alter name/dataType/index of a column too.

How was this patch tested?

Add new test cases in ExternalCatalogSuite and SessionCatalogSuite.
Add sql file test for ALTER TABLE CHANGE COLUMN statement.

SparkQA · 2016-11-01T17:48:15Z

Test build #67905 has finished for PR 15717 at commit 4acf0d1.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-11-02T06:11:57Z

sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructField.scala

+  /**
+   * Return the full description of the StructField.
+   */
+  def getDesc(): String = {


is this really needed?

Yes because we have to print the full description of the StructField, but StructField.toString don't output metadata field.

rxin · 2016-11-02T06:12:05Z

Looks like there is an actual test failure.

SparkQA · 2016-11-07T16:03:30Z

Test build #68279 has finished for PR 15717 at commit 579025f.

This patch fails to build.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class AlterTableChangeColumnsCommand(

jiangxb1987

This is ready for review now. Perhaps we could easily add full support for HIVE-style ALTER TABLE CHANGE COLUMN statement after this PR have been merged.

jiangxb1987 · 2016-11-07T16:18:12Z

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

@@ -93,6 +93,8 @@ statement
        SET TBLPROPERTIES tablePropertyList                            #setTableProperties
    | ALTER (TABLE | VIEW) tableIdentifier
        UNSET TBLPROPERTIES (IF EXISTS)? tablePropertyList             #unsetTableProperties
+    | ALTER (TABLE | VIEW) tableIdentifier
+        CHANGE COLUMN? expandColTypeList                               #changeColumns


Don't support change partitions for now, will implement it in another PR.

jiangxb1987 · 2016-11-07T16:28:21Z

sql/core/src/test/resources/sql-tests/inputs/change-column.sql

+-- Create the origin table
+CREATE TABLE test_change(a Int, b String, c Int);
+CREATE VIEW test_view(a, b, c) AS
+SELECT * FROM VALUES (1, "one", 11), (null, "two", 22) AS testData(a, b, c);


We have to make the column a nullable here, otherwise it can't be coverted to StructField(a,IntegerType,true,{comment: newComment})

jiangxb1987 · 2016-11-07T16:34:31Z

retest this please. - This build error is strange because it successfully built on my local envirement.

SparkQA · 2016-11-07T16:43:38Z

Test build #68284 has finished for PR 15717 at commit 579025f.

This patch fails to build.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class AlterTableChangeColumnsCommand(

SparkQA · 2016-11-08T08:41:02Z

Test build #68324 has finished for PR 15717 at commit cba5bbd.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2016-11-08T08:45:42Z

Test build #68327 has finished for PR 15717 at commit fda6d3a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-11-08T10:58:08Z

Test build #68335 has finished for PR 15717 at commit 0097809.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2016-11-12T07:35:47Z

Any update on this PR?

jiangxb1987 · 2016-11-12T08:20:24Z

@gatorsmile I have been busy on other stuffs recently so no progress has been made on this PR, I will try to find some time to make this PR pass all unit cases this weekend. Thank you for looking at this!

SparkQA · 2016-11-12T17:45:23Z

Test build #68565 has finished for PR 15717 at commit 7976462.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2016-11-12T18:04:11Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

@@ -273,6 +273,68 @@ case class AlterTableUnsetPropertiesCommand(

 }

+
+/**
+ * A command to change the columns for a table, only support change column comment for now.


Nit: support changing column comments

gatorsmile · 2016-11-12T18:04:47Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

+
+/**
+ * A command to change the columns for a table, only support change column comment for now.
+ * This function creates a [[AlterTableChangeColumnsCommand]] logical plan.


Remove this line.

gatorsmile · 2016-11-12T18:08:24Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

+ * {{{
+ *   ALTER (TABLE | VIEW) table_identifier
+ *   CHANGE (COLUMN) column_name column_name column_dataType column_comment
+ *   (FIRST | AFTER column_name);


Above is not clear. Please check how Hive documents it.

ALTER TABLE table_name [PARTITION partition_spec] CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name] [CASCADE|RESTRICT];

gatorsmile · 2016-11-12T18:10:27Z

A basic question. Like Hive, we should not support ALTER VIEW CHANGE COLUMN, right?

jiangxb1987 · 2016-11-13T02:29:18Z

@gatorsmile Thank you for clarify, will remove the ALTER VIEW CHANGE COLUMN statement today. BTW do you think we should add [CASCADE|RESTRICT] in this PR?

SparkQA · 2016-11-13T19:27:09Z

Test build #68586 has finished for PR 15717 at commit fc316cd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jiangxb1987 · 2016-11-15T09:17:38Z

@gatorsmile Would you please have a look at this when you have time? Thank you!

gatorsmile · 2016-11-15T21:36:27Z

I prefer to doing [CASCADE|RESTRICT] in a separate PR. However, we still need to add a test case to verify whether the behavior is following the default RESTRICT.

gatorsmile · 2016-11-15T21:39:02Z

Will review this PR tonight. Thanks!

gatorsmile · 2016-11-16T07:46:22Z

Another question about this PR: does it support data source tables?

jiangxb1987 · 2016-11-16T09:42:13Z

It should support data source tables, acturally perhaps I should add test cases in DDLSuite for that - yes I'll add that this evening. Thank you!

SparkQA · 2016-11-16T18:20:02Z

Test build #68719 has finished for PR 15717 at commit 855d520.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jiangxb1987 · 2016-11-17T07:11:45Z

@gatorsmile I've added tests ensure that it support data source tables, please check when you have time, thank you!

gatorsmile · 2016-11-18T03:46:33Z

Will review this PR tomorrow.

gatorsmile · 2016-12-13T08:20:29Z

Could you please add a test case for verifying the metadata field of the column will not be lost after adding a comment? Thanks!

gatorsmile · 2016-12-13T08:24:56Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

+    val columnsMap = columns.map { case (oldName: String, newField: StructField) =>
+      // Find the origin column from schema by column name.
+      val originColumn = findColumn(table.schema, oldName, resolver)
+      // Throw a Exception if the column name/dataType is changed.


Nit: please correct a Exception to an exception in all the comments you added. Thanks!

I understand that a should be changed to an, but should we also change Exception to exception? Thanks!

It is not very common to use Exception here. Or you can change it to -> an AnalysisException.

@gatorsmile That's great! Thank you!

gatorsmile · 2016-12-13T08:29:23Z

sql/core/src/test/resources/sql-tests/inputs/change-column.sql

+ALTER TABLE test_change CHANGE a a1 STRING COMMENT 'this is column a1' AFTER b;
+DESC test_change;
+
+-- Case sensitive


Please update the comment. This test case is to check the behavior when spark.sql.caseSensitive is off.

gatorsmile · 2016-12-13T08:34:17Z

Could you please also add a test case when spark.sql.caseSensitive is on?

jiangxb1987 · 2016-12-13T08:38:25Z

@gatorsmile Thank you for your suggestions! I'll update the code this evening!

gatorsmile · 2016-12-13T08:41:40Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

+            s"'${originColumn.name}' with type '${originColumn.dataType}' to " +
+            s"'${newField.name}' with type '${newField.dataType}'")
+      }
+      // Create a new column from the origin column with new comment.


Nit: new comment -> the new comment

Sorry for my poor English... :'(

Actually, the rule is very simple.

A countable noun always takes either the indefinite (a, an) or definite (the) article when it is singular. When plural, it takes the definite article if it refers to a definite, specific group and no article if it is used in a general sense.

SparkQA · 2016-12-13T10:35:26Z

Test build #70073 has finished for PR 15717 at commit a71683c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-12-13T19:54:13Z

Test build #70089 has finished for PR 15717 at commit 57934d4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-12-14T03:43:58Z

LGTM, cc @gatorsmile for final sign-off

gatorsmile · 2016-12-14T06:32:13Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

+ * {{{
+ *   ALTER TABLE table_identifier
+ *   CHANGE [COLUMN] column_old_name column_new_name column_dataType [COMMENT column_comment]
+ *   [FIRST | AFTER column_name];


This is the right Hive syntax, but SparkSqlParser.scala is using the different one.

gatorsmile · 2016-12-14T06:35:19Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

+   * {{{
+   *   ALTER TABLE table [PARTITION partition_spec]
+   *   CHANGE [COLUMN] `col` `col` dataType [COMMENT "comment"] [FIRST | AFTER `otherCol`]
+   *   [, `col2` `col2` dataType [COMMENT "comment"] [FIRST | AFTER `otherCol`], ...]


What is the reason we allow users to modify multiple columns in this DDL? This is different from what Hive supports. Should we do it? cc @cloud-fan

In addition, based on the existing syntax, if we really want to support multiple columns, we should change the keywords to CHANGE COLUMNS. Like Hive, we can do something like

ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)

When we supporting multiple columns syntax, we also need to consider the edge case. For example, duplicate column names in the same DDL.

let's follow hive's syntax

gatorsmile · 2016-12-14T06:39:43Z

Sorry for the last minute comment. I did not realize it until I manually run these test cases in Hive.

jiangxb1987 · 2016-12-14T09:29:17Z

@gatorsmile @cloud-fan In fact that's mainly my fault, I should have checked that the syntax is the same as that in HIVE. I'm working on this now. Thank you!

SparkQA · 2016-12-14T13:30:24Z

Test build #70129 has finished for PR 15717 at commit 73c82fc.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class AlterTableChangeColumnCommand(

gatorsmile · 2016-12-14T18:21:21Z

LGTM

gatorsmile · 2016-12-14T18:22:52Z

@jiangxb1987 Could you update the PR description to reflect the new syntax? Thanks!

jiangxb1987 · 2016-12-15T14:16:13Z

@gatorsmile I've updated the PR description, thanks!

rxin · 2016-12-15T18:09:18Z

Merging in master.

## What changes were proposed in this pull request? Right now, once a user set the comment of a column with create table command, he/she cannot update the comment. It will be useful to provide a public interface (e.g. SQL) to do that. This PR implements the following SQL statement: ``` ALTER TABLE table [PARTITION partition_spec] CHANGE [COLUMN] column_old_name column_new_name column_dataType [COMMENT column_comment] [FIRST | AFTER column_name]; ``` For further expansion, we could support alter `name`/`dataType`/`index` of a column too. ## How was this patch tested? Add new test cases in `ExternalCatalogSuite` and `SessionCatalogSuite`. Add sql file test for `ALTER TABLE CHANGE COLUMN` statement. Author: jiangxingbo <jiangxb1987@gmail.com> Closes apache#15717 from jiangxb1987/change-column.

skliarpawlo · 2018-02-27T10:28:45Z

@jiangxb1987 any ideas why this still doesn't work for me in spark 2.2.0?
Sample code:

spark.sql("""
    alter table test_table
    change column metric metric string comment "metic doc"
""")

The same sql query executed from hive works as expected, but from spark it has no effect

Thanks

jiangxb1987 · 2018-02-27T10:49:42Z

Can you also show the result of DESC TABLE before and after the command?

jiangxb1987 · 2018-02-27T10:50:22Z

Please file a JIRA issue if you think there is an unexpected behavior. Thank you!

skliarpawlo · 2018-02-27T11:14:08Z

@jiangxb1987 thanks for so quick response! I filed the issue:
https://issues.apache.org/jira/browse/SPARK-23525

rxin reviewed Nov 2, 2016

View reviewed changes

jiangxb1987 commented Nov 7, 2016

View reviewed changes

jiangxb1987 changed the title ~~[SPARK-17910][SQL][WIP] Allow users to update the comment of a column~~ [SPARK-17910][SQL] Allow users to update the comment of a column Nov 7, 2016

jiangxb1987 force-pushed the change-column branch from cba5bbd to fda6d3a Compare November 8, 2016 07:05

jiangxb1987 force-pushed the change-column branch from 0097809 to 7976462 Compare November 12, 2016 16:18

gatorsmile reviewed Nov 12, 2016

View reviewed changes

gatorsmile reviewed Dec 13, 2016

View reviewed changes

add more test cases.

57934d4

gatorsmile reviewed Dec 14, 2016

View reviewed changes

modify syntax

73c82fc

asfgit closed this in 01e14bf Dec 15, 2016

[SPARK-17910][SQL] Allow users to update the comment of a column #15717

[SPARK-17910][SQL] Allow users to update the comment of a column #15717

Conversation

jiangxb1987 commented Nov 1, 2016 • edited

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Nov 1, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rxin commented Nov 2, 2016

SparkQA commented Nov 7, 2016

jiangxb1987 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jiangxb1987 commented Nov 7, 2016

SparkQA commented Nov 7, 2016

SparkQA commented Nov 8, 2016

SparkQA commented Nov 8, 2016

SparkQA commented Nov 8, 2016

gatorsmile commented Nov 12, 2016

jiangxb1987 commented Nov 12, 2016

SparkQA commented Nov 12, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gatorsmile commented Nov 12, 2016

jiangxb1987 commented Nov 13, 2016 • edited

SparkQA commented Nov 13, 2016

jiangxb1987 commented Nov 15, 2016

gatorsmile commented Nov 15, 2016

gatorsmile commented Nov 15, 2016

gatorsmile commented Nov 16, 2016

jiangxb1987 commented Nov 16, 2016

SparkQA commented Nov 16, 2016

jiangxb1987 commented Nov 17, 2016

gatorsmile commented Nov 18, 2016

gatorsmile commented Dec 13, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gatorsmile commented Dec 13, 2016

jiangxb1987 commented Dec 13, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Dec 13, 2016

SparkQA commented Dec 13, 2016

cloud-fan commented Dec 14, 2016 • edited

Choose a reason for hiding this comment

gatorsmile Dec 14, 2016 • edited

Choose a reason for hiding this comment

gatorsmile Dec 14, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gatorsmile commented Dec 14, 2016

jiangxb1987 commented Dec 14, 2016

SparkQA commented Dec 14, 2016

gatorsmile commented Dec 14, 2016

gatorsmile commented Dec 14, 2016

jiangxb1987 commented Dec 15, 2016

rxin commented Dec 15, 2016

skliarpawlo commented Feb 27, 2018

jiangxb1987 commented Feb 27, 2018

jiangxb1987 commented Feb 27, 2018 • edited

skliarpawlo commented Feb 27, 2018

jiangxb1987 commented Nov 1, 2016 •

edited

jiangxb1987 commented Nov 13, 2016 •

edited

cloud-fan commented Dec 14, 2016 •

edited

gatorsmile Dec 14, 2016 •

edited

gatorsmile Dec 14, 2016 •

edited

jiangxb1987 commented Feb 27, 2018 •

edited