Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Nov 15, 2021

What changes were proposed in this pull request?

In the PR, I propose to allow ANSI intervals: year-month and day-time intervals in the ALTER TABLE .. ADD COLUMNS command for tables in v1 and v2 In-Memory catalogs. Also added an unified test suite to migrate related tests in the future.

Why are the changes needed?

To improve user experience with Spark SQL. After the changes, users will be able to add columns with ANSI intervals instead of dropping and creating new table.

Does this PR introduce any user-facing change?

In some sense, yes. After the changes, the command doesn't output any error message.

How was this patch tested?

By running new test suite:

$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableAddColumnsSuite"
$ build/sbt -Phive-2.3 "test:testOnly *HiveDDLSuite"

@github-actions github-actions bot added the SQL label Nov 15, 2021
@sarutak
Copy link
Member

sarutak commented Nov 15, 2021

v1 Hive external catalog doesn't support the command

v1 Hive external catalog supports ALTER TABLE ADD COLUMN doesn't it?

spark-sql> CREATE TABLE tbl1(x INT) using Parquet;
spark-sql> ALTER TABLE tbl1 ADD COLUMN (y STRING);
spark-sql> DESCRIBE tbl1;
x                   	int                 	                    
y                   	string  

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 15, 2021

v1 Hive external catalog supports ALTER TABLE ADD COLUMN doesn't it?

Thanks. I have corrected the description.

@MaxGekk MaxGekk requested review from cloud-fan and sarutak November 15, 2021 12:01
@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 15, 2021

@cloud-fan @sarutak @Peng-Lei @AngersZhuuuu Could you review this PR, please.

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 15, 2021

@cloud-fan FYI, Here is the ticket for fully unified tests of ALTER TABLE .. ADD COLUMNS https://issues.apache.org/jira/browse/SPARK-37045

@SparkQA
Copy link

SparkQA commented Nov 15, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49700/

@sarutak
Copy link
Member

sarutak commented Nov 15, 2021

v1 Hive external catalog doesn't support ANSI intervals

I think Hive external catalog supports ANSI intervals. In this case, the format of metadata is not Hive compatible but Spark specific.

spark-sql> CREATE TABLE tbl2(ym INTERVAL YEAR TO MONTH, dt INTERVAL DAY TO SECOND) using Parquet;
21/11/15 21:05:08 WARN HiveExternalCatalog: Hive incompatible types found: interval year to month, interval day to second. Persisting data source table `default`.`tbl2` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.
spark-sql> INSERT INTO tbl2 VALUES(INTERVAL '1-2' YEAR TO MONTH, INTERVAL '1 2:3:4' DAY TO SECOND);
spark-sql> SELECT * FROM tbl2;
1-2	1 02:03:04.000000000
Time taken: 0.381 seconds, Fetched 1 row(s)

@SparkQA
Copy link

SparkQA commented Nov 15, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49700/

@SparkQA
Copy link

SparkQA commented Nov 15, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49703/

@SparkQA
Copy link

SparkQA commented Nov 15, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49703/

@SparkQA
Copy link

SparkQA commented Nov 15, 2021

Test build #145230 has finished for PR 34600 at commit 81fef07.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 15, 2021

Test build #145233 has finished for PR 34600 at commit 84e9a0e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 15, 2021

We do not have one suite for hive catalog?

@cloud-fan I have added one.

@MaxGekk MaxGekk changed the title [SPARK-37332][SQL] Allow ANSI intervals in v2 ALTER TABLE .. ADD COLUMNS [SPARK-37332][SQL] Allow ANSI intervals in ALTER TABLE .. ADD COLUMNS Nov 15, 2021
Copy link
Member

@sarutak sarutak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Pending CIs.

@SparkQA
Copy link

SparkQA commented Nov 15, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49711/

@SparkQA
Copy link

SparkQA commented Nov 15, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49711/

@SparkQA
Copy link

SparkQA commented Nov 15, 2021

Test build #145241 has finished for PR 34600 at commit df18813.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sarutak
Copy link
Member

sarutak commented Nov 16, 2021

I cloned MaxGekk:add-columns-ansi-interval and run GA jobs on my repository, and docker integration tests passed.
https://github.com/sarutak/spark/runs/4219609621?check_suite_focus=true

So, I think the docker integration tests failure on your repository is not related to this change.
https://github.com/MaxGekk/spark/runs/4214187254?check_suite_focus=true

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 16, 2021

Merging to master. Thank you, @sarutak and @cloud-fan for review.

@MaxGekk MaxGekk closed this in 0f20678 Nov 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants