-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-47922][SQL] Implement the try_parse_json expression #46141
Conversation
…ng with parse_json and variant_get
…authentication errors
…hema is variant type
…essions/jsonExpressions.scala Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
…park into from_json_variant
// scalastyle:on line.size.limit | ||
case class TryParseJson(expr: Expression, replacement: Expression) | ||
extends RuntimeReplaceable with InheritAnalysisRules { | ||
def this(child: Expression) = this(child, TryEval(ParseJson(child))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big fan of the TryEval
approach, as it's more like a temporary workaround. See this TODO: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryEval.scala#L83
The issue is, TryEval
swallows all the exceptions, including the ones from inputs. e.g. try_parse_json(a function that produces JSON string but fails)
should still fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand. I have reimplemented it and made it similar to the implementation of the try_variant_get
expression.
…into try_parse_json
…ython_scalar_variant
thanks, merging to master! |
### What changes were proposed in this pull request? The `try_parse_json` expression added in #46141 declares improper output nullability: the `try_` version's output must be marked as nullable. This PR corrects the nullability and adds a test. ### Why are the changes needed? Incorrectly declaring an expression's output as non-nullable when it is actually nullable may lead to crashes. ### Does this PR introduce _any_ user-facing change? Yes, it affects output nullability and thus may affect query result schemas. ### How was this patch tested? New unit test cases. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46409 from JoshRosen/fix-try-parse-json-nullability. Authored-by: Josh Rosen <joshrosen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request? This pull request implements the `try_parse_json` that runs `parse_json` on string expressions to extract variants. However, if `parse_json` throws an exception on a row, the value `null` is returned. ### Why are the changes needed? Sometimes, columns containing JSON strings may contain some invalid inputs that should be ignored instead of having the whole execution failed because of it. ### Does this PR introduce _any_ user-facing change? Yes, it allows users to run the `try_parse_json` expression. ### How was this patch tested? Unit tests to check if `try_parse_json` works just like `parse_json` on valid inputs, returns `null` on invalid inputs, and fails on incorrect input data types. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#46141 from harshmotw-db/try_parse_json. Authored-by: Harsh Motwani <harsh.motwani@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? The `try_parse_json` expression added in apache#46141 declares improper output nullability: the `try_` version's output must be marked as nullable. This PR corrects the nullability and adds a test. ### Why are the changes needed? Incorrectly declaring an expression's output as non-nullable when it is actually nullable may lead to crashes. ### Does this PR introduce _any_ user-facing change? Yes, it affects output nullability and thus may affect query result schemas. ### How was this patch tested? New unit test cases. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46409 from JoshRosen/fix-try-parse-json-nullability. Authored-by: Josh Rosen <joshrosen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
What changes were proposed in this pull request?
This pull request implements the
try_parse_json
that runsparse_json
on string expressions to extract variants. However, ifparse_json
throws an exception on a row, the valuenull
is returned.Why are the changes needed?
Sometimes, columns containing JSON strings may contain some invalid inputs that should be ignored instead of having the whole execution failed because of it.
Does this PR introduce any user-facing change?
Yes, it allows users to run the
try_parse_json
expression.How was this patch tested?
Unit tests to check if
try_parse_json
works just likeparse_json
on valid inputs, returnsnull
on invalid inputs, and fails on incorrect input data types.Was this patch authored or co-authored using generative AI tooling?
No