Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-27953][SQL] Save default constraint with Column into table properties when create Hive table #24792

Closed
wants to merge 14 commits into from
Closed

[SPARK-27953][SQL] Save default constraint with Column into table properties when create Hive table #24792

wants to merge 14 commits into from

Conversation

beliefer
Copy link
Contributor

@beliefer beliefer commented Jun 4, 2019

What changes were proposed in this pull request?

Background
Default constraint with column is ANSI standard.
Hive 3.0+ has supported default constraint ref:https://issues.apache.org/jira/browse/HIVE-18726
But Spark SQL implement this feature not yet.

Design
Hive is widely used in production environments and is the standard in the field of big data in fact.
But Hive exists many version used in production and the feature between each version are different.

Spark SQL need to implement default constraint, but there are three points to pay attention to in design:
First, Spark SQL should reduce coupling with Hive.
Second, default constraint could compatible with different versions of Hive.
Thrid, Which expression of default constraint should Spark SQL support? I think should support literal, current_date(), current_timestamp(). Maybe other expression should also supported, like Cast(1 as float), 1 + 2 and so on.

We want to save the metadata of default constraint into properties of Hive table, and then we restore metadata from the properties after client gets newest metadata.The implement is the same as other metadata (e.g. partition,bucket,statistics).

Because default constraint is part of column, so I think could reuse the metadata of StructField. The default constraint will cached by metadata of StructField.

Detail of this PR
This is a sub task to implement default constraint.
This PR will solve the issue that save default constraint into properties of Hive table or data source table.

There exists some issue in this PR:
First, how to check a number specified by somebody compliance with the accuracy and scope of the data type, like float, double.
Second, some code looks not very elegant, I hope to improve it with your suggestions.

Brother PR
This PR is related to https://github.com/apache/spark/pull/24372. If this PR finish, unselected target column can be inserted into the default value, while running insert into.
After this PR, I will continue open other PR about default constraint, like alter table, desc table.

How was this patch tested?

UT

@SparkQA
Copy link

SparkQA commented Jun 4, 2019

Test build #106146 has finished for PR 24792 at commit 0dd717e.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 4, 2019

Test build #106147 has finished for PR 24792 at commit 1c5ea5c.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 4, 2019

Test build #106148 has finished for PR 24792 at commit 1c5ea5c.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 4, 2019

Test build #106151 has finished for PR 24792 at commit 1d9f701.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 4, 2019

Test build #106153 has finished for PR 24792 at commit 1d9f701.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 4, 2019

Test build #106156 has finished for PR 24792 at commit a91bf42.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beliefer Thank you for initiating this. This is not a small work. Could you have a design doc? We need to investigate the impact of DEFAULT on all the other DDL/DML commands and the impact on the data source APIs.

Personally, I think we might need to create an umbrella JIRA and estimate the sizing.

@SparkQA
Copy link

SparkQA commented Jun 5, 2019

Test build #106180 has finished for PR 24792 at commit e7a387c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 5, 2019

Test build #106185 has finished for PR 24792 at commit 3a9aa1e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 5, 2019

Test build #106195 has finished for PR 24792 at commit a203a8c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer beliefer changed the title [WIP][SPARK-27943][SQL] Add new feature create table could specify column with default constraint [WIP][SPARK-27943][SPARK-27953][SQL] Add new feature create table could specify column with default constraint Jun 5, 2019
@beliefer
Copy link
Contributor Author

beliefer commented Jun 5, 2019

@gatorsmile Thanks for your review. As you said, this is not a small work. I refined the description of PR and created a parent jira SPARK-27943 and described the design simply. I created five sub jira used to each task. If I find other sub task, I will add new sub jira. This PR changed to a sub task of SPARK-27943 and related to SPARK-27953.
I supplemented some detail after discussion with @wangyum .

@beliefer beliefer changed the title [WIP][SPARK-27943][SPARK-27953][SQL] Add new feature create table could specify column with default constraint [SPARK-27943][SPARK-27953][SQL] Add new feature create table could specify column with default constraint Jun 6, 2019
@beliefer
Copy link
Contributor Author

beliefer commented Jun 6, 2019

@srowen Maybe you can help me review this PR, thanks! If not , thanks too.

@srowen
Copy link
Member

srowen commented Jun 10, 2019

I don't feel confident enough to review changes to the SQL language support here

@beliefer
Copy link
Contributor Author

I don't feel confident enough to review changes to the SQL language support here

It doesn't matter, thanks.

@gatorsmile
Copy link
Member

@beliefer Before submitting PRs, could we first start it with a design doc? Ping me if the design doc is ready to review. Thanks!

@beliefer
Copy link
Contributor Author

beliefer commented Jun 11, 2019

@beliefer Before submitting PRs, could we first start it with a design doc? Ping me if the design doc is ready to review. Thanks!

@gatorsmile Thanks for your reply. The design doc is ready, how I pass it to you? What format of design doc recommended?

@@ -735,7 +735,7 @@ colTypeList
;

colType
: identifier dataType (COMMENT STRING)?
: identifier dataType (COMMENT STRING)? (DEFAULT defaultExpression=expression)?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that defaultExpression=expression scope too big for DDL default constraint? In my memory, the common default constraint are NULL, NUMBER, STRING, CURRENT_DATE, CURRENT_TIMESTAMP.

Copy link
Contributor Author

@beliefer beliefer Jun 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that defaultExpression=expression scope too big for DDL default constraint? In my memory, the common default constraint are NULL, NUMBER, STRING, CURRENT_DATE, CURRENT_TIMESTAMP.

Thanks for your review. As your said, the description of this PR contains a discussion about the scope of default constraint. Do we need to implement other expressions, like Cast(1 as float), 1 + 2 and so on ?

Copy link
Contributor

@lipzhu lipzhu Jun 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is Oracle's default constraint. https://docs.oracle.com/javadb/10.8.3.0/ref/rrefsqlj30540.html#rrefsqlj30540__sqlj64478
You can take a look at other DB engines' default constraint.

Copy link
Contributor Author

@beliefer beliefer Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lipzhu It's worth to reference, but we need to look at the actual situation on Spark SQL. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lipzhu I reduced the scope of default constraint. Thanks.

@SparkQA
Copy link

SparkQA commented Jun 12, 2019

Test build #106401 has finished for PR 24792 at commit a912b87.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 12, 2019

Test build #106415 has finished for PR 24792 at commit 9b600a3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 12, 2019

Test build #106416 has finished for PR 24792 at commit 3bf06af.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 12, 2019

Test build #106418 has finished for PR 24792 at commit 545bba0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-27943][SPARK-27953][SQL] Add new feature create table could specify column with default constraint [SPARK-27953][SQL] Add new feature create table could specify column with default constraint Jun 13, 2019
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jun 13, 2019

Hi, @beliefer . For the umbrella issue, the subtask JIRA ID is enough for the title.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-27953][SQL] Add new feature create table could specify column with default constraint [SPARK-27953][SQL] Save default constraint with Column into table properties when create Hive table Jun 13, 2019
@beliefer
Copy link
Contributor Author

Hi, @beliefer . For the umbrella issue, the subtask JIRA ID is enough for the title.

OK. Thanks for your reminder.

@SparkQA
Copy link

SparkQA commented Jun 13, 2019

Test build #106451 has finished for PR 24792 at commit 9931eb6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 13, 2019

Test build #106452 has finished for PR 24792 at commit 46c12d8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

@gatorsmile The design doc of default constraint is ready.

@SparkQA
Copy link

SparkQA commented Nov 12, 2019

Test build #113627 has finished for PR 24792 at commit 48c8b1e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@LiShuMing
Copy link
Contributor

@gatorsmile The design doc of default constraint is ready.

what's the progress of this pr? As https://issues.apache.org/jira/browse/SPARK-29119 also associate with this pr.

I think this will be a useful function for users to handle default value or computed columns;

@beliefer You can put your design doc on the Google Docs for more details and add comparisons with other engines, eg:

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Mar 13, 2020
@github-actions github-actions bot closed this Mar 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants