Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48150][SQL] try_parse_json output should be declared as nullable #46409

Closed

Conversation

JoshRosen
Copy link
Contributor

@JoshRosen JoshRosen commented May 6, 2024

What changes were proposed in this pull request?

The try_parse_json expression added in #46141 declares improper output nullability: the try_ version's output must be marked as nullable. This PR corrects the nullability and adds a test.

Why are the changes needed?

Incorrectly declaring an expression's output as non-nullable when it is actually nullable may lead to crashes.

Does this PR introduce any user-facing change?

Yes, it affects output nullability and thus may affect query result schemas.

How was this patch tested?

New unit test cases.

Was this patch authored or co-authored using generative AI tooling?

No.

@JoshRosen JoshRosen requested a review from cloud-fan May 6, 2024 22:39
@github-actions github-actions bot added the SQL label May 6, 2024
@dongjoon-hyun
Copy link
Member

cc @harshmotw-db and @cloud-fan from #46141

Copy link
Contributor

@harshmotw-db harshmotw-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an oversight. Thanks for the fix!

@dongjoon-hyun
Copy link
Member

According to the CI result, maybe ProtoToParsedPlanTestSuite seems to become flaky too.

[info] *** 1 TEST FAILED ***
[error] Failed tests:
[error] 	org.apache.spark.sql.connect.ProtoToParsedPlanTestSuite

@JoshRosen
Copy link
Contributor Author

It's not flaky, it's a legitimate test failure:

[info] - function_try_parse_json *** FAILED *** (5 milliseconds)
[info]   Expected and actual plans do not match:
[info]   
[info]   === Expected Plan ===
[info]   Project [staticinvoke(class org.apache.spark.sql.catalyst.expressions.variant.VariantExpressionEvalUtils$, VariantType, parseJson, g#0, false, StringType, BooleanType, true, false, true) AS try_parse_json(g)#0]
[info]   +- LocalRelation <empty>, [id#0L, a#0, b#0, d#0, e#0, f#0, g#0]
[info]   
[info]   
[info]   === Actual Plan ===
[info]   Project [staticinvoke(class org.apache.spark.sql.catalyst.expressions.variant.VariantExpressionEvalUtils$, VariantType, parseJson, g#0, false, StringType, BooleanType, true, true, true) AS try_parse_json(g)#0]
[info]   +- LocalRelation <empty>, [id#0L, a#0, b#0, d#0, e#0, f#0, g#0] (ProtoToParsedPlanTestSuite.scala:205)

@dongjoon-hyun
Copy link
Member

Wow! Thank you for the fix.

@dongjoon-hyun
Copy link
Member

Merged to master for Apache Spark 4.0.0-preview. Thank you, @JoshRosen and all.

JacobZheng0927 pushed a commit to JacobZheng0927/spark that referenced this pull request May 11, 2024
### What changes were proposed in this pull request?

The `try_parse_json` expression added in apache#46141 declares improper output nullability: the `try_` version's output must be marked as nullable. This PR corrects the nullability and adds a test.

### Why are the changes needed?

Incorrectly declaring an expression's output as non-nullable when it is actually nullable may lead to crashes.

### Does this PR introduce _any_ user-facing change?

Yes, it affects output nullability and thus may affect query result schemas.

### How was this patch tested?

New unit test cases.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46409 from JoshRosen/fix-try-parse-json-nullability.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants