Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17910][SQL] Allow users to update the comment of a column #15717

Closed
wants to merge 16 commits into from

Conversation

jiangxb1987
Copy link
Contributor

@jiangxb1987 jiangxb1987 commented Nov 1, 2016

What changes were proposed in this pull request?

Right now, once a user set the comment of a column with create table command, he/she cannot update the comment. It will be useful to provide a public interface (e.g. SQL) to do that.

This PR implements the following SQL statement:

ALTER TABLE table [PARTITION partition_spec]
CHANGE [COLUMN] column_old_name column_new_name column_dataType
[COMMENT column_comment]
[FIRST | AFTER column_name];

For further expansion, we could support alter name/dataType/index of a column too.

How was this patch tested?

Add new test cases in ExternalCatalogSuite and SessionCatalogSuite.
Add sql file test for ALTER TABLE CHANGE COLUMN statement.

@SparkQA
Copy link

SparkQA commented Nov 1, 2016

Test build #67905 has finished for PR 15717 at commit 4acf0d1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

/**
* Return the full description of the StructField.
*/
def getDesc(): String = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes because we have to print the full description of the StructField, but StructField.toString don't output metadata field.

@rxin
Copy link
Contributor

rxin commented Nov 2, 2016

Looks like there is an actual test failure.

@SparkQA
Copy link

SparkQA commented Nov 7, 2016

Test build #68279 has finished for PR 15717 at commit 579025f.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class AlterTableChangeColumnsCommand(

Copy link
Contributor Author

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ready for review now. Perhaps we could easily add full support for HIVE-style ALTER TABLE CHANGE COLUMN statement after this PR have been merged.

@@ -93,6 +93,8 @@ statement
SET TBLPROPERTIES tablePropertyList #setTableProperties
| ALTER (TABLE | VIEW) tableIdentifier
UNSET TBLPROPERTIES (IF EXISTS)? tablePropertyList #unsetTableProperties
| ALTER (TABLE | VIEW) tableIdentifier
CHANGE COLUMN? expandColTypeList #changeColumns
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't support change partitions for now, will implement it in another PR.

-- Create the origin table
CREATE TABLE test_change(a Int, b String, c Int);
CREATE VIEW test_view(a, b, c) AS
SELECT * FROM VALUES (1, "one", 11), (null, "two", 22) AS testData(a, b, c);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to make the column a nullable here, otherwise it can't be coverted to StructField(a,IntegerType,true,{comment: newComment})

@jiangxb1987
Copy link
Contributor Author

retest this please. - This build error is strange because it successfully built on my local envirement.

@jiangxb1987 jiangxb1987 changed the title [SPARK-17910][SQL][WIP] Allow users to update the comment of a column [SPARK-17910][SQL] Allow users to update the comment of a column Nov 7, 2016
@SparkQA
Copy link

SparkQA commented Nov 7, 2016

Test build #68284 has finished for PR 15717 at commit 579025f.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class AlterTableChangeColumnsCommand(

@SparkQA
Copy link

SparkQA commented Nov 8, 2016

Test build #68324 has finished for PR 15717 at commit cba5bbd.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 8, 2016

Test build #68327 has finished for PR 15717 at commit fda6d3a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 8, 2016

Test build #68335 has finished for PR 15717 at commit 0097809.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Any update on this PR?

@jiangxb1987
Copy link
Contributor Author

@gatorsmile I have been busy on other stuffs recently so no progress has been made on this PR, I will try to find some time to make this PR pass all unit cases this weekend. Thank you for looking at this!

@SparkQA
Copy link

SparkQA commented Nov 12, 2016

Test build #68565 has finished for PR 15717 at commit 7976462.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -273,6 +273,68 @@ case class AlterTableUnsetPropertiesCommand(

}


/**
* A command to change the columns for a table, only support change column comment for now.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: support changing column comments


/**
* A command to change the columns for a table, only support change column comment for now.
* This function creates a [[AlterTableChangeColumnsCommand]] logical plan.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line.

* {{{
* ALTER (TABLE | VIEW) table_identifier
* CHANGE (COLUMN) column_name column_name column_dataType column_comment
* (FIRST | AFTER column_name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above is not clear. Please check how Hive documents it.

ALTER TABLE table_name [PARTITION partition_spec] CHANGE [COLUMN] col_old_name col_new_name column_type
  [COMMENT col_comment] [FIRST|AFTER column_name] [CASCADE|RESTRICT];

@gatorsmile
Copy link
Member

A basic question. Like Hive, we should not support ALTER VIEW CHANGE COLUMN, right?

@jiangxb1987
Copy link
Contributor Author

jiangxb1987 commented Nov 13, 2016

@gatorsmile Thank you for clarify, will remove the ALTER VIEW CHANGE COLUMN statement today. BTW do you think we should add [CASCADE|RESTRICT] in this PR?

@SparkQA
Copy link

SparkQA commented Nov 13, 2016

Test build #68586 has finished for PR 15717 at commit fc316cd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor Author

@gatorsmile Would you please have a look at this when you have time? Thank you!

@gatorsmile
Copy link
Member

I prefer to doing [CASCADE|RESTRICT] in a separate PR. However, we still need to add a test case to verify whether the behavior is following the default RESTRICT.

@gatorsmile
Copy link
Member

Will review this PR tonight. Thanks!

@gatorsmile
Copy link
Member

Another question about this PR: does it support data source tables?

@jiangxb1987
Copy link
Contributor Author

It should support data source tables, acturally perhaps I should add test cases in DDLSuite for that - yes I'll add that this evening. Thank you!

@SparkQA
Copy link

SparkQA commented Nov 16, 2016

Test build #68719 has finished for PR 15717 at commit 855d520.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor Author

@gatorsmile I've added tests ensure that it support data source tables, please check when you have time, thank you!

@gatorsmile
Copy link
Member

Will review this PR tomorrow.

@gatorsmile
Copy link
Member

Could you please add a test case for verifying the metadata field of the column will not be lost after adding a comment? Thanks!

val columnsMap = columns.map { case (oldName: String, newField: StructField) =>
// Find the origin column from schema by column name.
val originColumn = findColumn(table.schema, oldName, resolver)
// Throw a Exception if the column name/dataType is changed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: please correct a Exception to an exception in all the comments you added. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that a should be changed to an, but should we also change Exception to exception? Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not very common to use Exception here. Or you can change it to -> an AnalysisException.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile That's great! Thank you!

ALTER TABLE test_change CHANGE a a1 STRING COMMENT 'this is column a1' AFTER b;
DESC test_change;

-- Case sensitive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the comment. This test case is to check the behavior when spark.sql.caseSensitive is off.

@gatorsmile
Copy link
Member

Could you please also add a test case when spark.sql.caseSensitive is on?

@jiangxb1987
Copy link
Contributor Author

@gatorsmile Thank you for your suggestions! I'll update the code this evening!

s"'${originColumn.name}' with type '${originColumn.dataType}' to " +
s"'${newField.name}' with type '${newField.dataType}'")
}
// Create a new column from the origin column with new comment.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: new comment -> the new comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my poor English... :'(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the rule is very simple.

A countable noun always takes either the indefinite (a, an) or definite (the) article when it is singular. When plural, it takes the definite article if it refers to a definite, specific group and no article if it is used in a general sense.

@SparkQA
Copy link

SparkQA commented Dec 13, 2016

Test build #70073 has finished for PR 15717 at commit a71683c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 13, 2016

Test build #70089 has finished for PR 15717 at commit 57934d4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

cloud-fan commented Dec 14, 2016

LGTM, cc @gatorsmile for final sign-off

* {{{
* ALTER TABLE table_identifier
* CHANGE [COLUMN] column_old_name column_new_name column_dataType [COMMENT column_comment]
* [FIRST | AFTER column_name];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the right Hive syntax, but SparkSqlParser.scala is using the different one.

* {{{
* ALTER TABLE table [PARTITION partition_spec]
* CHANGE [COLUMN] `col` `col` dataType [COMMENT "comment"] [FIRST | AFTER `otherCol`]
* [, `col2` `col2` dataType [COMMENT "comment"] [FIRST | AFTER `otherCol`], ...]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason we allow users to modify multiple columns in this DDL? This is different from what Hive supports. Should we do it? cc @cloud-fan

In addition, based on the existing syntax, if we really want to support multiple columns, we should change the keywords to CHANGE COLUMNS. Like Hive, we can do something like

ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we supporting multiple columns syntax, we also need to consider the edge case. For example, duplicate column names in the same DDL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's follow hive's syntax

@gatorsmile
Copy link
Member

Sorry for the last minute comment. I did not realize it until I manually run these test cases in Hive.

@jiangxb1987
Copy link
Contributor Author

@gatorsmile @cloud-fan In fact that's mainly my fault, I should have checked that the syntax is the same as that in HIVE. I'm working on this now. Thank you!

@SparkQA
Copy link

SparkQA commented Dec 14, 2016

Test build #70129 has finished for PR 15717 at commit 73c82fc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class AlterTableChangeColumnCommand(

@gatorsmile
Copy link
Member

LGTM

@gatorsmile
Copy link
Member

@jiangxb1987 Could you update the PR description to reflect the new syntax? Thanks!

@jiangxb1987
Copy link
Contributor Author

@gatorsmile I've updated the PR description, thanks!

@rxin
Copy link
Contributor

rxin commented Dec 15, 2016

Merging in master.

@asfgit asfgit closed this in 01e14bf Dec 15, 2016
robert3005 pushed a commit to palantir/spark that referenced this pull request Dec 15, 2016
## What changes were proposed in this pull request?

Right now, once a user set the comment of a column with create table command, he/she cannot update the comment. It will be useful to provide a public interface (e.g. SQL) to do that.

This PR implements the following SQL statement:
```
ALTER TABLE table [PARTITION partition_spec]
CHANGE [COLUMN] column_old_name column_new_name column_dataType
[COMMENT column_comment]
[FIRST | AFTER column_name];
```

For further expansion, we could support alter `name`/`dataType`/`index` of a column too.

## How was this patch tested?

Add new test cases in `ExternalCatalogSuite` and `SessionCatalogSuite`.
Add sql file test for `ALTER TABLE CHANGE COLUMN` statement.

Author: jiangxingbo <jiangxb1987@gmail.com>

Closes apache#15717 from jiangxb1987/change-column.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?

Right now, once a user set the comment of a column with create table command, he/she cannot update the comment. It will be useful to provide a public interface (e.g. SQL) to do that.

This PR implements the following SQL statement:
```
ALTER TABLE table [PARTITION partition_spec]
CHANGE [COLUMN] column_old_name column_new_name column_dataType
[COMMENT column_comment]
[FIRST | AFTER column_name];
```

For further expansion, we could support alter `name`/`dataType`/`index` of a column too.

## How was this patch tested?

Add new test cases in `ExternalCatalogSuite` and `SessionCatalogSuite`.
Add sql file test for `ALTER TABLE CHANGE COLUMN` statement.

Author: jiangxingbo <jiangxb1987@gmail.com>

Closes apache#15717 from jiangxb1987/change-column.
@skliarpawlo
Copy link

@jiangxb1987 any ideas why this still doesn't work for me in spark 2.2.0?
Sample code:

spark.sql("""
    alter table test_table
    change column metric metric string comment "metic doc"
""")

The same sql query executed from hive works as expected, but from spark it has no effect

Thanks

@jiangxb1987
Copy link
Contributor Author

Can you also show the result of DESC TABLE before and after the command?

@jiangxb1987
Copy link
Contributor Author

jiangxb1987 commented Feb 27, 2018

Please file a JIRA issue if you think there is an unexpected behavior. Thank you!

@skliarpawlo
Copy link

@jiangxb1987 thanks for so quick response! I filed the issue:
https://issues.apache.org/jira/browse/SPARK-23525

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants