Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] GetJsonObject should return null for invalid query instead of throwing an exception #10212

Closed
revans2 opened this issue Jan 18, 2024 · 8 comments · Fixed by #10581
Closed
Assignees
Labels
bug Something isn't working

Comments

@revans2
Copy link
Collaborator

revans2 commented Jan 18, 2024

Describe the bug
There are some invalid paths, like in python

f.get_json_object('jsonStr', '''$.'a''').alias('qsqa2'),

which results in the cudf code throwing an exception saying the path is invalid. But the Spark code just returns a null for an invalid path. We should do the same.

ai.rapids.cudf.CudfException: CUDF failure at:/.../json_path.cu:657: Encountered invalid JSONPath input string
	at ai.rapids.cudf.ColumnView.getJSONObject(Native Method)
	at ai.rapids.cudf.ColumnView.getJSONObject(ColumnView.java:2995)

or

f.get_json_object('jsonStr', 'a').alias('just_a'),
ai.rapids.cudf.CudfException: CUDF failure at:/.../json_path.cu:593: Unrecognized JSONPath operator
	at ai.rapids.cudf.ColumnView.getJSONObject(Native Method)
	at ai.rapids.cudf.ColumnView.getJSONObject(ColumnView.java:2995)

@revans2 revans2 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Jan 18, 2024
@revans2 revans2 changed the title [BUG] getJsonObject should return null for invalid query instead of throwing an exception [BUG] GetJsonObject should return null for invalid query instead of throwing an exception Jan 18, 2024
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Jan 30, 2024
@SurajAralihalli
Copy link
Collaborator

SurajAralihalli commented Feb 27, 2024

This should be addressed by PR 15082 for the cases where JSONPath (query) contains invalid syntax.

@SurajAralihalli
Copy link
Collaborator

SurajAralihalli commented Mar 1, 2024

Cudf PR 15082 does fix the issue (returns null instead of throwing an error) in the cases where cudf thinks the query has an invalid argument. This means the example stated in the description will not result in cudf exception.

However, an attempt to handle it within the spark-rapids is being made by PR 10466 by handling it before passing to cudf. I'll let authors of PR 10466 close this issue.

@sameerz
Copy link
Collaborator

sameerz commented Mar 1, 2024

@thirtiseven has this issue been verified to be closed by #10466 ?

cc: @revans2

@thirtiseven
Copy link
Collaborator

@thirtiseven has this issue been verified to be closed by #10466 ?

Thanks for remind. Technically, this issue was fixed by #15082 and #10466, but the test case test_get_json_object_invalid_path still fails, so I didn't notice.

Now the case fails for a different reason:
$.'a is a valid path from Spark's check, so it is normalized to $[''a'] and passed to kernel in plugin, which results in

ai.rapids.cudf.CudfException: CUDF failure on: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-690-cuda11/thirdparty/cudf/cpp/src/json/json_path.cu:631: Invalid empty name in JSONPath query string

I'm going to move this case out of test_get_json_object_invalid_path and find/create an issue and test to place it. Before that I think we can keep this issue open for a while.

@thirtiseven thirtiseven self-assigned this Mar 4, 2024
@thirtiseven
Copy link
Collaborator

Filed issue for the above case: #10537

@SurajAralihalli
Copy link
Collaborator

The PR #15082 doesn't address the issue with $[''a'], but it does fix the problem with $.'a.
The exception "Invalid empty name in JSONPath query string" occurs for $[''a'] both before and after #15082.

This issue remains a limitation of cudf's get_json_object function.

@res-life
Copy link
Collaborator

Now new implementation parses path in Spark-Rapids by using Spark code, so we should check the invalid path in Spark-Rapids before calling Kernel code.

@res-life
Copy link
Collaborator

Will be fixed by: #10581

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants