[SPARK-27953][SQL] Save default constraint with Column into table properties when create Hive table #24792

beliefer · 2019-06-04T10:30:01Z

What changes were proposed in this pull request?

Background
Default constraint with column is ANSI standard.
Hive 3.0+ has supported default constraint ref:https://issues.apache.org/jira/browse/HIVE-18726
But Spark SQL implement this feature not yet.

Design
Hive is widely used in production environments and is the standard in the field of big data in fact.
But Hive exists many version used in production and the feature between each version are different.

Spark SQL need to implement default constraint, but there are three points to pay attention to in design:
First, Spark SQL should reduce coupling with Hive.
Second, default constraint could compatible with different versions of Hive.
Thrid, Which expression of default constraint should Spark SQL support? I think should support literal, current_date(), current_timestamp(). Maybe other expression should also supported, like Cast(1 as float), 1 + 2 and so on.

We want to save the metadata of default constraint into properties of Hive table, and then we restore metadata from the properties after client gets newest metadata.The implement is the same as other metadata (e.g. partition,bucket,statistics).

Because default constraint is part of column, so I think could reuse the metadata of StructField. The default constraint will cached by metadata of StructField.

Detail of this PR
This is a sub task to implement default constraint.
This PR will solve the issue that save default constraint into properties of Hive table or data source table.

There exists some issue in this PR:
First, how to check a number specified by somebody compliance with the accuracy and scope of the data type, like float, double.
Second, some code looks not very elegant, I hope to improve it with your suggestions.

Brother PR
This PR is related to https://github.com/apache/spark/pull/24372. If this PR finish, unselected target column can be inserted into the default value, while running insert into.
After this PR, I will continue open other PR about default constraint, like alter table, desc table.

How was this patch tested?

UT

SparkQA · 2019-06-04T10:39:08Z

Test build #106146 has finished for PR 24792 at commit 0dd717e.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-06-04T11:01:07Z

Test build #106147 has finished for PR 24792 at commit 1c5ea5c.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-06-04T11:17:47Z

Test build #106148 has finished for PR 24792 at commit 1c5ea5c.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-06-04T12:10:21Z

Test build #106151 has finished for PR 24792 at commit 1d9f701.

This patch fails to generate documentation.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-06-04T14:29:58Z

Test build #106153 has finished for PR 24792 at commit 1d9f701.

This patch fails to generate documentation.
This patch merges cleanly.
This patch adds no public classes.

remove link

SparkQA · 2019-06-04T14:53:10Z

Test build #106156 has finished for PR 24792 at commit a91bf42.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile

@beliefer Thank you for initiating this. This is not a small work. Could you have a design doc? We need to investigate the impact of DEFAULT on all the other DDL/DML commands and the impact on the data source APIs.

Personally, I think we might need to create an umbrella JIRA and estimate the sizing.

SparkQA · 2019-06-05T04:02:29Z

Test build #106180 has finished for PR 24792 at commit e7a387c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-06-05T05:38:43Z

Test build #106185 has finished for PR 24792 at commit 3a9aa1e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-06-05T10:47:20Z

Test build #106195 has finished for PR 24792 at commit a203a8c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

beliefer · 2019-06-05T10:57:26Z

@gatorsmile Thanks for your review. As you said, this is not a small work. I refined the description of PR and created a parent jira SPARK-27943 and described the design simply. I created five sub jira used to each task. If I find other sub task, I will add new sub jira. This PR changed to a sub task of SPARK-27943 and related to SPARK-27953.
I supplemented some detail after discussion with @wangyum .

beliefer · 2019-06-06T04:23:14Z

@srowen Maybe you can help me review this PR, thanks! If not , thanks too.

srowen · 2019-06-10T13:50:13Z

I don't feel confident enough to review changes to the SQL language support here

beliefer · 2019-06-11T01:56:32Z

I don't feel confident enough to review changes to the SQL language support here

It doesn't matter, thanks.

gatorsmile · 2019-06-11T06:00:15Z

@beliefer Before submitting PRs, could we first start it with a design doc? Ping me if the design doc is ready to review. Thanks!

beliefer · 2019-06-11T08:58:19Z

@beliefer Before submitting PRs, could we first start it with a design doc? Ping me if the design doc is ready to review. Thanks!

@gatorsmile Thanks for your reply. The design doc is ready, how I pass it to you? What format of design doc recommended?

lipzhu · 2019-06-11T10:05:24Z

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

@@ -735,7 +735,7 @@ colTypeList
    ;

 colType
-    : identifier dataType (COMMENT STRING)?
+    : identifier dataType (COMMENT STRING)? (DEFAULT defaultExpression=expression)?


Is that defaultExpression=expression scope too big for DDL default constraint? In my memory, the common default constraint are NULL, NUMBER, STRING, CURRENT_DATE, CURRENT_TIMESTAMP.

Is that defaultExpression=expression scope too big for DDL default constraint? In my memory, the common default constraint are NULL, NUMBER, STRING, CURRENT_DATE, CURRENT_TIMESTAMP.

Thanks for your review. As your said, the description of this PR contains a discussion about the scope of default constraint. Do we need to implement other expressions, like Cast(1 as float), 1 + 2 and so on ?

This is Oracle's default constraint. https://docs.oracle.com/javadb/10.8.3.0/ref/rrefsqlj30540.html#rrefsqlj30540__sqlj64478
You can take a look at other DB engines' default constraint.

@lipzhu It's worth to reference, but we need to look at the actual situation on Spark SQL. Thanks.

@lipzhu I reduced the scope of default constraint. Thanks.

SparkQA · 2019-06-12T06:29:50Z

Test build #106401 has finished for PR 24792 at commit a912b87.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-06-12T10:56:42Z

Test build #106415 has finished for PR 24792 at commit 9b600a3.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-06-12T11:06:04Z

Test build #106416 has finished for PR 24792 at commit 3bf06af.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-06-12T14:48:22Z

Test build #106418 has finished for PR 24792 at commit 545bba0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2019-06-13T01:30:58Z

Hi, @beliefer . For the umbrella issue, the subtask JIRA ID is enough for the title.

beliefer · 2019-06-13T02:06:13Z

Hi, @beliefer . For the umbrella issue, the subtask JIRA ID is enough for the title.

OK. Thanks for your reminder.

SparkQA · 2019-06-13T05:45:25Z

Test build #106451 has finished for PR 24792 at commit 9931eb6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-06-13T06:19:12Z

Test build #106452 has finished for PR 24792 at commit 46c12d8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

beliefer · 2019-06-17T02:14:28Z

@gatorsmile The design doc of default constraint is ready.

SparkQA · 2019-11-12T15:16:23Z

Test build #113627 has finished for PR 24792 at commit 48c8b1e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

LiShuMing · 2019-12-03T12:50:17Z

@gatorsmile The design doc of default constraint is ready.

what's the progress of this pr? As https://issues.apache.org/jira/browse/SPARK-29119 also associate with this pr.

I think this will be a useful function for users to handle default value or computed columns;

@beliefer You can put your design doc on the Google Docs for more details and add comparisons with other engines, eg:

github-actions · 2020-03-13T00:14:16Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

WIP create table with default constraint

0dd717e

fix Scala style.

1c5ea5c

add import

1d9f701

Update Metadata.scala

a91bf42

remove link

gatorsmile reviewed Jun 4, 2019

View reviewed changes

put default in non-reserved.

e7a387c

Fix bug of UT.

3a9aa1e

Adjust UT.

a203a8c

beliefer changed the title ~~[WIP][SPARK-27943][SQL] Add new feature create table could specify column with default constraint~~ [WIP][SPARK-27943][SPARK-27953][SQL] Add new feature create table could specify column with default constraint Jun 5, 2019

beliefer changed the title ~~[WIP][SPARK-27943][SPARK-27953][SQL] Add new feature create table could specify column with default constraint~~ [SPARK-27943][SPARK-27953][SQL] Add new feature create table could specify column with default constraint Jun 6, 2019

lipzhu reviewed Jun 11, 2019

View reviewed changes

support null as default constraint.

a912b87

beliefer added 2 commits June 12, 2019 18:32

reduce scope of default constraint.

9b600a3

Add correct key of metadata.

3bf06af

Modify () to ( + ).

545bba0

dongjoon-hyun added the NEW FEATURE label Jun 13, 2019

dongjoon-hyun changed the title ~~[SPARK-27943][SPARK-27953][SQL] Add new feature create table could specify column with default constraint~~ [SPARK-27953][SQL] Add new feature create table could specify column with default constraint Jun 13, 2019

dongjoon-hyun changed the title ~~[SPARK-27953][SQL] Add new feature create table could specify column with default constraint~~ [SPARK-27953][SQL] Save default constraint with Column into table properties when create Hive table Jun 13, 2019

beliefer added 2 commits June 13, 2019 10:31

Adjust the order of colType.

9931eb6

Simplify exception info.

46c12d8

dongjoon-hyun added SQL and removed NEW FEATURE labels Jun 14, 2019

dongjoon-hyun mentioned this pull request Aug 7, 2019

[SPARK-27900][K8s] Add jvm oom flag #25229

Closed

Merge branch 'master' into add-default-constraint-for-create-hive-table

48c8b1e

github-actions bot added the Stale label Mar 13, 2020

github-actions bot closed this Mar 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-27953][SQL] Save default constraint with Column into table properties when create Hive table #24792

[SPARK-27953][SQL] Save default constraint with Column into table properties when create Hive table #24792

beliefer commented Jun 4, 2019 •

edited

SparkQA commented Jun 4, 2019

SparkQA commented Jun 4, 2019

SparkQA commented Jun 4, 2019

SparkQA commented Jun 4, 2019

SparkQA commented Jun 4, 2019

SparkQA commented Jun 4, 2019

gatorsmile left a comment

SparkQA commented Jun 5, 2019

SparkQA commented Jun 5, 2019

SparkQA commented Jun 5, 2019

beliefer commented Jun 5, 2019 •

edited

beliefer commented Jun 6, 2019 •

edited

srowen commented Jun 10, 2019

beliefer commented Jun 11, 2019

gatorsmile commented Jun 11, 2019

beliefer commented Jun 11, 2019 •

edited

lipzhu Jun 11, 2019

beliefer Jun 11, 2019 •

edited

lipzhu Jun 11, 2019 •

edited

beliefer Jun 12, 2019 •

edited

beliefer Jun 12, 2019

SparkQA commented Jun 12, 2019

SparkQA commented Jun 12, 2019

SparkQA commented Jun 12, 2019

SparkQA commented Jun 12, 2019

dongjoon-hyun commented Jun 13, 2019 •

edited

beliefer commented Jun 13, 2019

SparkQA commented Jun 13, 2019

SparkQA commented Jun 13, 2019

beliefer commented Jun 17, 2019

SparkQA commented Nov 12, 2019

LiShuMing commented Dec 3, 2019

github-actions bot commented Mar 13, 2020

[SPARK-27953][SQL] Save default constraint with Column into table properties when create Hive table #24792

[SPARK-27953][SQL] Save default constraint with Column into table properties when create Hive table #24792

Conversation

beliefer commented Jun 4, 2019 • edited

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Jun 4, 2019

SparkQA commented Jun 4, 2019

SparkQA commented Jun 4, 2019

SparkQA commented Jun 4, 2019

SparkQA commented Jun 4, 2019

SparkQA commented Jun 4, 2019

gatorsmile left a comment

Choose a reason for hiding this comment

SparkQA commented Jun 5, 2019

SparkQA commented Jun 5, 2019

SparkQA commented Jun 5, 2019

beliefer commented Jun 5, 2019 • edited

beliefer commented Jun 6, 2019 • edited

srowen commented Jun 10, 2019

beliefer commented Jun 11, 2019

gatorsmile commented Jun 11, 2019

beliefer commented Jun 11, 2019 • edited

lipzhu Jun 11, 2019

Choose a reason for hiding this comment

beliefer Jun 11, 2019 • edited

Choose a reason for hiding this comment

lipzhu Jun 11, 2019 • edited

Choose a reason for hiding this comment

beliefer Jun 12, 2019 • edited

Choose a reason for hiding this comment

beliefer Jun 12, 2019

Choose a reason for hiding this comment

SparkQA commented Jun 12, 2019

SparkQA commented Jun 12, 2019

SparkQA commented Jun 12, 2019

SparkQA commented Jun 12, 2019

dongjoon-hyun commented Jun 13, 2019 • edited

beliefer commented Jun 13, 2019

SparkQA commented Jun 13, 2019

SparkQA commented Jun 13, 2019

beliefer commented Jun 17, 2019

SparkQA commented Nov 12, 2019

LiShuMing commented Dec 3, 2019

github-actions bot commented Mar 13, 2020

beliefer commented Jun 4, 2019 •

edited

beliefer commented Jun 5, 2019 •

edited

beliefer commented Jun 6, 2019 •

edited

beliefer commented Jun 11, 2019 •

edited

beliefer Jun 11, 2019 •

edited

lipzhu Jun 11, 2019 •

edited

beliefer Jun 12, 2019 •

edited

dongjoon-hyun commented Jun 13, 2019 •

edited