-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-31106][SQL] Support valid_json function #28167
Conversation
@HyukjinKwon @maropu @dongjoon-hyun @MaxGekk |
@iRakson, how many functions do you target to add? I would like to avoid adding a bunch of functions just for the sake of matching. |
This function and two aggregate functions, if you are ok with these.
|
Can you clarify the usage and benefit? in particular, considering we can easily work around via |
Although these aggregate functions can be a little faster than working around with to_json, I agree implementing them might not worth it. So I will not implement them. Thank you. |
|
Is |
Yes. |
ok to test |
...catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala
Show resolved
Hide resolved
parser => { | ||
// parse the JSON string | ||
while (parser.nextToken() != null) { | ||
parser.skipChildren() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you check that only top level fields are valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, nested fields are checked by JsonParser
as well.
Test build #121032 has finished for PR 28167 at commit
|
Test build #121042 has finished for PR 28167 at commit
|
@iRakson Could you provide the links?
|
MySQL Names may differ across DBMSs. |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
Show resolved
Hide resolved
How did you decide the |
Actually ISJSON seemed more intuitive to me. And in order to follow other JSON function names |
I don't have strong preference though, |
Test build #121075 has finished for PR 28167 at commit
|
Hi, @iRakson . Please update the PR description with the following information which were requested by @gatorsmile . |
Updated. |
|
@maropu Updated. |
Retest this please. |
I saw the discussion on the naming in the above comments. The naming is always difficult. The current one looks like too-Apache-Spark-specific.
@maropu , @gatorsmile , and @HyukjinKwon . Although |
Test build #121288 has finished for PR 28167 at commit
|
Yea, thanks for your suggestion, @dongjoon-hyun. Since all the existing names are totally different, I suggested the clear meaning one above. Actually, I'm still not sure we need this function because we already the workaround for this purpose as @HyukjinKwon suggested above. |
FYI, IBM Db2 for I is neither DB2 for LUW nor DB2 for zOS. DB2 for I is not widely used. Please get rid of "IBM Db2: IS JSON" |
@iRakson @maropu @dongjoon-hyun @HyukjinKwon @MaxGekk I checked the above reference links. They are very helpful. There are many different JSON-related functions? Could we first stop merging new functions first? Let us do more investigation, discuss which APIs we need to add and why they are needed? |
Yeah, I think this is much better approach. |
I am good to stop it for now, yes. I have the same concern (#28167 (comment)). |
Thanks, @maropu , @gatorsmile , @iRakson , @HyukjinKwon . +1 for more invesigations on JSON effort. |
Since 3.0, we are able to inject functions. Maybe we can create a separate sub-project or third-party project to maintain dialect functions from other platforms instead of adding them directly to Spark. For example, I recently created https://github.com/yaooqinn/spark-func-extras to achieve such a goal(Maybe we can move this project to a proper place to maintain). Functions in this project can overwrite spark builtin ones to keep the meaning of the platforms where they come from if conflicts occur. Also, some function candidates may eventually go into the spark’s master branch if they turn out that they are quite useful and do not conflict. cc @cloud-fan @maropu, @gatorsmile, @HyukjinKwon, @dongjoon-hyun, thanks. |
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
In this PR, I am proposing a JSON function
valid_json
.This function will check whether a given string is valid JSON string or not. It it is valid it will return
true
, if it is invalid it will returnfalse
. In case ofNULL
inputNULL
will be returned.Why are the changes needed?
This is one the most basic and required functions when we are dealing with JSON data.
Say, user is reading a table then he can segregate the table into rows with valid JSON and rows with invalid JSON. Now user only needs to process on the rows containing valid/invalid JSON.
This function is supported by some popular databases as well. Some of them are :
JSON_VALID
ISJSON
IS JSON
IS JSON
json_valid
JSON_VALID
is_valid_json
Does this PR introduce any user-facing change?
Yes, Now users can use valid_json() function to verify whether the given JSON string is valid or not.
How was this patch tested?
Pass the Jenkins with newly added test cases.