Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-43529][SQL] Support general constant expressions as CREATE/REPLACE TABLE OPTIONS values #41191

Closed
wants to merge 48 commits into from

Conversation

dtenedor
Copy link
Contributor

@dtenedor dtenedor commented May 16, 2023

What changes were proposed in this pull request?

This PR updates the SQL compiler to support general constnat expressions in the syntax for CREATE/REPLACE TABLE OPTIONS values, rather than restricting to a few types of literals only.

  • The analyzer now checks that the provided expressions are in fact foldable, and throws an error message otherwise.
  • This error message that users encounter in these cases improves from a general "syntax error at or near " to instead indicate that the syntax is valid, but only constant expressions are supported in these contexts.

Why are the changes needed?

This makes it easier to provide OPTIONS lists in SQL, supporting use cases like concatenating strings with ||.

Does this PR introduce any user-facing change?

Yes, the SQL syntax changes.

How was this patch tested?

This PR adds new unit test coverage.

commit
@github-actions github-actions bot added the SQL label May 16, 2023
@dtenedor dtenedor requested a review from MaxGekk May 17, 2023 16:47
@dtenedor dtenedor requested a review from MaxGekk May 17, 2023 19:56
@dtenedor dtenedor changed the title [SPARK-43529][SQL] Support general expressions as OPTIONS values in the parser [SPARK-43529][SQL] Support general constant expressions as OPTIONS values in the parser May 17, 2023
@dtenedor
Copy link
Contributor Author

Note @MaxGekk @gengliangwang it says the GitHub actions CI failed, but actually this was just an unrelated PySpark flake and they actually passed :)

Copy link
Contributor Author

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gengliangwang for your review! Please take another look.

Copy link
Contributor Author

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gengliangwang for your review! Please take another look.

@dtenedor dtenedor changed the title [SPARK-43529][SQL] Support general constant expressions as OPTIONS values in the parser [SPARK-43529][SQL] Support general literal expressions as OPTIONS values in the parser May 20, 2023
@dtenedor
Copy link
Contributor Author

Update on this: I thought of a simpler way to break apart this work into smaller pieces so we can make gradual improvements. Let me make a commit and then I can ping this thread again.

@dtenedor
Copy link
Contributor Author

This is done, I have deferred the constant folding logic to the analyzer. This is ready for another look.

Copy link
Contributor Author

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gengliangwang, thanks again for your reviews, this is ready again :)

dtenedor

This comment was marked as duplicate.

val dt = value.dataType
value match {
case Literal(null, _) =>
"null"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be null or "null"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be "null", this part computes the string value for the table option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But string can be null too. If it is "null", then the literal should be Literal("null", StringType)

Copy link
Contributor Author

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gengliangwang I responded to your comments. I think this is ready to merge now, if you agree.

val dt = value.dataType
value match {
case Literal(null, _) =>
"null"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be "null", this part computes the string value for the table option.

@dtenedor
Copy link
Contributor Author

dtenedor commented Jun 7, 2023

The main commit responding to the last round of code review comments is here: 65f9a92.

@dtenedor
Copy link
Contributor Author

dtenedor commented Jun 8, 2023

(Note, this is passing all CI again.)

val dt = value.dataType
value match {
case Literal(null, _) =>
"null"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"null"
null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, done.

Copy link
Member

@gengliangwang gengliangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for the last comment. Sorry for the slow response, I was focusing on another project.

@dtenedor
Copy link
Contributor Author

dtenedor commented Jun 9, 2023

LGTM except for the last comment. Sorry for the slow response, I was focusing on another project.

No worries, thanks for the careful reviews :)

@gengliangwang
Copy link
Member

This has been iterated for many rounds. I will make a follow-up PR to further make the code simpler.
Merging to master.

gengliangwang added a commit that referenced this pull request Jun 13, 2023
…related plans

### What changes were proposed in this pull request?

Follow-up of #41191 to clean up the code in UnresolvedTableSpec and related plans:
* Rename `OptionsListExpressions` as `OptionList`
* Rename `trait TableSpec` as `TableSpecBase`
* Rename `ResolvedTableSpec` as `TableSpec`, make sure all the physical plans are using `TableSpec` instead of `TableSpecBase`.
* Move option list expressions to UnresolvedTableSpec, so that all the specs are in one class.
* Make UnaryExpression an `UnaryExpression`, so that transforming with `mapExpressions` will transform it and the option list expressions in its child
* Restore the signatures of class `CreateTable`, `CreateTableAsSelect`, `ReplaceTable` and `ReplaceTableAsSelect`

### Why are the changes needed?

Make the code implementation simpler
### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests

Closes #41549 from gengliangwang/optionsFollowUp.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
a0x8o added a commit to a0x8o/spark that referenced this pull request Jun 13, 2023
…related plans

### What changes were proposed in this pull request?

Follow-up of apache/spark#41191 to clean up the code in UnresolvedTableSpec and related plans:
* Rename `OptionsListExpressions` as `OptionList`
* Rename `trait TableSpec` as `TableSpecBase`
* Rename `ResolvedTableSpec` as `TableSpec`, make sure all the physical plans are using `TableSpec` instead of `TableSpecBase`.
* Move option list expressions to UnresolvedTableSpec, so that all the specs are in one class.
* Make UnaryExpression an `UnaryExpression`, so that transforming with `mapExpressions` will transform it and the option list expressions in its child
* Restore the signatures of class `CreateTable`, `CreateTableAsSelect`, `ReplaceTable` and `ReplaceTableAsSelect`

### Why are the changes needed?

Make the code implementation simpler
### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests

Closes #41549 from gengliangwang/optionsFollowUp.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
czxm pushed a commit to czxm/spark that referenced this pull request Jun 19, 2023
…LACE TABLE OPTIONS values

### What changes were proposed in this pull request?

This PR updates the SQL compiler to support general constnat expressions in the syntax for CREATE/REPLACE TABLE OPTIONS values, rather than restricting to a few types of literals only.

* The analyzer now checks that the provided expressions are in fact `foldable`, and throws an error message otherwise.
* This error message that users encounter in these cases improves from a general "syntax error at or near <location>" to instead indicate that the syntax is valid, but only constant expressions are supported in these contexts.

### Why are the changes needed?

This makes it easier to provide OPTIONS lists in SQL, supporting use cases like concatenating strings with `||`.

### Does this PR introduce _any_ user-facing change?

Yes, the SQL syntax changes.

### How was this patch tested?

This PR adds new unit test coverage.

Closes apache#41191 from dtenedor/expression-properties.

Authored-by: Daniel Tenedorio <daniel.tenedorio@databricks.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
czxm pushed a commit to czxm/spark that referenced this pull request Jun 19, 2023
…related plans

### What changes were proposed in this pull request?

Follow-up of apache#41191 to clean up the code in UnresolvedTableSpec and related plans:
* Rename `OptionsListExpressions` as `OptionList`
* Rename `trait TableSpec` as `TableSpecBase`
* Rename `ResolvedTableSpec` as `TableSpec`, make sure all the physical plans are using `TableSpec` instead of `TableSpecBase`.
* Move option list expressions to UnresolvedTableSpec, so that all the specs are in one class.
* Make UnaryExpression an `UnaryExpression`, so that transforming with `mapExpressions` will transform it and the option list expressions in its child
* Restore the signatures of class `CreateTable`, `CreateTableAsSelect`, `ReplaceTable` and `ReplaceTableAsSelect`

### Why are the changes needed?

Make the code implementation simpler
### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests

Closes apache#41549 from gengliangwang/optionsFollowUp.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants