-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries #49772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries #49772
Conversation
@cloud-fan, @stefankandic, please take a look - this is just a revert of PR #48962, as we decided not to proceed with session level collations for now, and will do a follow up to apply object level collations for queries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the other audience, could you provide a link for this decision, @dejankrak-db ?
The decision has since been made not to ship this functionality for now,
@dongjoon-hyun , there are 2 main reasons for this decision:
Therefore, it was decided to pause session level collation functionality for now, thus partially reverting unused parts of the original PR for maintaining a cleaner code moving forward, while still keeping other parts required to support object level collation resolution. Hope this clarifies the reasoning well! I have also updated the PR description with this info, thanks! |
...talyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultStringTypes.scala
Show resolved
Hide resolved
...talyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultStringTypes.scala
Outdated
Show resolved
Hide resolved
I'm good with removing this hacky feature. It's too fragile to use |
...talyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultStringTypes.scala
Outdated
Show resolved
Hide resolved
...talyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultStringTypes.scala
Outdated
Show resolved
Hide resolved
@stefankandic, when you find some time please take a look at the latest logic for DDL collation resolution as well as removing the DML collation resolution entirely, as discussed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
...yst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala
Outdated
Show resolved
Hide resolved
…ysis/ResolveDDLCommandStringTypes.scala Co-authored-by: Stefan Kandic <154237371+stefankandic@users.noreply.github.com>
@cloud-fan, I have removed the entire session-level collation feature and all the associated workarounds/code - please take a look if the implementation looks good now, we would like to support DDL commands collation resolution with these changes. |
...yst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala
Show resolved
Hide resolved
...yst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala
Show resolved
Hide resolved
...yst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala
Show resolved
Hide resolved
...yst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala
Show resolved
Hide resolved
...yst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala
Show resolved
Hide resolved
...yst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala
Show resolved
Hide resolved
...yst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala
Show resolved
Hide resolved
thanks, merging to master/4.0! |
… apply object level collation for DDL queries ### What changes were proposed in this pull request? This PR is a partial revert of the original PR #48962 that introduced the resolution of default session level collation for DDL and DML queries. The part that is reverted is the default collation resolution for DML queries, whereas the part that is kept is the default collation resolution for DDL queries, which is required to apply the object level collation that was introduced as part of PR #49084. As part of this logic, object level collation is now applied to DDL queries accordingly, with the main logic implemented in ResolveDefaultStringTypes.stringTypeForDDLCommand() method. ### Why are the changes needed? As there were some unresolved technical issues when attempting to merge the functionality from PR #48962 on Delta side, due to its effect on DML queries, it was decided to pause this functionality for now, thus partially reverting unused parts for maintaining a cleaner code moving forward. Also, this is inline with customer feedback where object level collation is much more requested functionality, so the focus is to introduce the resolution of object level collation for DDL queries instead, allowing the collation to be specified per table or view on their creation or modification, with propagating the default collation specified to subsequent queries on top of those entities. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests that cover the collations functionality, as well adding new dedicated tests for applying object level collation to the underlying columns. ### Was this patch authored or co-authored using generative AI tooling? No Closes #49772 from dejankrak-db/revert-session-collations. Lead-authored-by: Dejan Krakovic <dejan.krakovic@databricks.com> Co-authored-by: Stefan Kandic <stefan.kandic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit e92e12a) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
… in the given schema" ### What changes were proposed in this pull request? After removing session-level collation (#49772) we can also revert the PR that changed the behavior of `from_json` and `from_xml` expressions to use json and not sql type representation under the hood (#48750). ### Why are the changes needed? Now that we don't have correctness problems with session level collation, using `sql` instead of `json` will lead to smaller and more efficient type representation. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #50234 from stefankandic/revertFromJsonChange. Authored-by: Stefan Kandic <stefan.kandic@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
… in the given schema" ### What changes were proposed in this pull request? After removing session-level collation (#49772) we can also revert the PR that changed the behavior of `from_json` and `from_xml` expressions to use json and not sql type representation under the hood (#48750). ### Why are the changes needed? Now that we don't have correctness problems with session level collation, using `sql` instead of `json` will lead to smaller and more efficient type representation. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #50234 from stefankandic/revertFromJsonChange. Authored-by: Stefan Kandic <stefan.kandic@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit 0094f44) Signed-off-by: Max Gekk <max.gekk@gmail.com>
… in the given schema" ### What changes were proposed in this pull request? After removing session-level collation (apache#49772) we can also revert the PR that changed the behavior of `from_json` and `from_xml` expressions to use json and not sql type representation under the hood (apache#48750). ### Why are the changes needed? Now that we don't have correctness problems with session level collation, using `sql` instead of `json` will lead to smaller and more efficient type representation. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50234 from stefankandic/revertFromJsonChange. Authored-by: Stefan Kandic <stefan.kandic@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
What changes were proposed in this pull request?
This PR is a partial revert of the original PR #48962 that introduced the resolution of default session level collation for DDL and DML queries.
The part that is reverted is the default collation resolution for DML queries, whereas the part that is kept is the default collation resolution for DDL queries, which is required to apply the object level collation that was introduced as part of PR #49084. As part of this logic, object level collation is now applied to DDL queries accordingly, with the main logic implemented in ResolveDefaultStringTypes.stringTypeForDDLCommand() method.
Why are the changes needed?
As there were some unresolved technical issues when attempting to merge the functionality from PR #48962 on Delta side, due to its effect on DML queries, it was decided to pause this functionality for now, thus partially reverting unused parts for maintaining a cleaner code moving forward.
Also, this is inline with customer feedback where object level collation is much more requested functionality, so the focus is to introduce the resolution of object level collation for DDL queries instead, allowing the collation to be specified per table or view on their creation or modification, with propagating the default collation specified to subsequent queries on top of those entities.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Existing tests that cover the collations functionality, as well adding new dedicated tests for applying object level collation to the underlying columns.
Was this patch authored or co-authored using generative AI tooling?
No