Skip to content

Conversation

stefankandic
Copy link
Contributor

What changes were proposed in this pull request?

This fix ensures that from_json and from_xml return the exact schema provided, even when session collation is set.

Why are the changes needed?

When serializing schema with the sql method, parsing it back can yield a different schema if session collation is set. This fix maintains consistency in schema structure regardless of collation settings.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Nov 4, 2024
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stefankandic Could you fix the test failure as it is related to your changes:

[info] - function from_json *** FAILED *** (42 milliseconds)
[info]   Expected and actual plans do not match:

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for CI.

Comment on lines 29 to 30
val spark = this.spark
import spark.implicits._
Copy link
Member

@MaxGekk MaxGekk Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe import testImplicits._ at the beginning of the class, see JsonFunctionsSuite.

@MaxGekk
Copy link
Member

MaxGekk commented Nov 5, 2024

I think the failed test is not related to the changes:

[info] OracleIntegrationSuite:
[info] org.apache.spark.sql.jdbc.v2.OracleIntegrationSuite *** ABORTED *** (10 minutes, 19 seconds)
[info]   The code passed to eventually never returned normally. Attempted 589 times over 10.001613945816667 minutes. Last failure 

+1, LGTM. Merging to master.
Thank you, @stefankandic and @dongjoon-hyun @HyukjinKwon for review.

@MaxGekk MaxGekk closed this in 273b02f Nov 5, 2024
MaxGekk pushed a commit that referenced this pull request Mar 11, 2025
… in the given schema"

### What changes were proposed in this pull request?
After removing session-level collation (#49772) we can also revert the PR that changed the behavior of `from_json` and `from_xml` expressions to use json and not sql type representation under the hood (#48750).

### Why are the changes needed?
Now that we don't have correctness problems with session level collation, using `sql` instead of `json` will lead to smaller and more efficient type representation.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing unit tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #50234 from stefankandic/revertFromJsonChange.

Authored-by: Stefan Kandic <stefan.kandic@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
MaxGekk pushed a commit that referenced this pull request Mar 11, 2025
… in the given schema"

### What changes were proposed in this pull request?
After removing session-level collation (#49772) we can also revert the PR that changed the behavior of `from_json` and `from_xml` expressions to use json and not sql type representation under the hood (#48750).

### Why are the changes needed?
Now that we don't have correctness problems with session level collation, using `sql` instead of `json` will lead to smaller and more efficient type representation.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing unit tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #50234 from stefankandic/revertFromJsonChange.

Authored-by: Stefan Kandic <stefan.kandic@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit 0094f44)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
anoopj pushed a commit to anoopj/spark that referenced this pull request Mar 15, 2025
… in the given schema"

### What changes were proposed in this pull request?
After removing session-level collation (apache#49772) we can also revert the PR that changed the behavior of `from_json` and `from_xml` expressions to use json and not sql type representation under the hood (apache#48750).

### Why are the changes needed?
Now that we don't have correctness problems with session level collation, using `sql` instead of `json` will lead to smaller and more efficient type representation.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing unit tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#50234 from stefankandic/revertFromJsonChange.

Authored-by: Stefan Kandic <stefan.kandic@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants