-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-50214][SQL] From json/xml should not change collations in the given schema #48750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExprUtils.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala
Outdated
Show resolved
Hide resolved
…onSQLFunctionsSuite.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stefankandic Could you fix the test failure as it is related to your changes:
[info] - function from_json *** FAILED *** (42 milliseconds)
[info] Expected and actual plans do not match:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Waiting for CI.
val spark = this.spark | ||
import spark.implicits._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe import testImplicits._
at the beginning of the class, see JsonFunctionsSuite
.
I think the failed test is not related to the changes:
+1, LGTM. Merging to master. |
… in the given schema" ### What changes were proposed in this pull request? After removing session-level collation (#49772) we can also revert the PR that changed the behavior of `from_json` and `from_xml` expressions to use json and not sql type representation under the hood (#48750). ### Why are the changes needed? Now that we don't have correctness problems with session level collation, using `sql` instead of `json` will lead to smaller and more efficient type representation. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #50234 from stefankandic/revertFromJsonChange. Authored-by: Stefan Kandic <stefan.kandic@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
… in the given schema" ### What changes were proposed in this pull request? After removing session-level collation (#49772) we can also revert the PR that changed the behavior of `from_json` and `from_xml` expressions to use json and not sql type representation under the hood (#48750). ### Why are the changes needed? Now that we don't have correctness problems with session level collation, using `sql` instead of `json` will lead to smaller and more efficient type representation. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #50234 from stefankandic/revertFromJsonChange. Authored-by: Stefan Kandic <stefan.kandic@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit 0094f44) Signed-off-by: Max Gekk <max.gekk@gmail.com>
… in the given schema" ### What changes were proposed in this pull request? After removing session-level collation (apache#49772) we can also revert the PR that changed the behavior of `from_json` and `from_xml` expressions to use json and not sql type representation under the hood (apache#48750). ### Why are the changes needed? Now that we don't have correctness problems with session level collation, using `sql` instead of `json` will lead to smaller and more efficient type representation. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50234 from stefankandic/revertFromJsonChange. Authored-by: Stefan Kandic <stefan.kandic@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
What changes were proposed in this pull request?
This fix ensures that
from_json
andfrom_xml
return the exact schema provided, even when session collation is set.Why are the changes needed?
When serializing schema with the
sql
method, parsing it back can yield a different schema if session collation is set. This fix maintains consistency in schema structure regardless of collation settings.Does this PR introduce any user-facing change?
No.
How was this patch tested?
New unit tests.
Was this patch authored or co-authored using generative AI tooling?
No.