Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] GetJsonObject sees a double quote in a single quoted string as invalid #10219

Closed
revans2 opened this issue Jan 18, 2024 · 1 comment
Closed
Assignees
Labels
bug Something isn't working

Comments

@revans2
Copy link
Collaborator

revans2 commented Jan 18, 2024

Describe the bug
With input data of {'a':'A"'}, and a json path of $, Spark will output {"a":"A\""}, but we output null, meaning it was an error and we could not parse it. I'm not sure if it has something to do with it being a single double quote and quote matching is a problem or what because ['a','b','"C"'], will fail because of #10218 not because it returned None.

@revans2 revans2 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Jan 18, 2024
@revans2 revans2 changed the title [BUG] GetJsonObject sees a double quote in since quotes as invalid [BUG] GetJsonObject sees a double quote in a single quoted string as invalid Jan 19, 2024
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Jan 30, 2024
@revans2
Copy link
Collaborator Author

revans2 commented Jan 30, 2024

@SurajAralihalli put in some help trying to debug this on the CUDF side and it looks like single quote support is off by default. And wouldn't you know it. The java API does not support configs so we get the default configs that do not support single quotes.

https://github.com/rapidsai/cudf/blob/57bbe94e995b9a0365276e4cb26853dce219e22a/java/src/main/native/src/ColumnViewJni.cpp#L2451

We need to fix the JNI API to let configs be passed in. We need to enable single quote support in our operator, and we need to add more tests to verify that it is working properly with single quotes.

rapids-bot bot pushed a commit to rapidsai/cudf that referenced this issue Feb 12, 2024
Resolves [10219](NVIDIA/spark-rapids#10219)

This PR introduces a new class named `GetJsonObjectOptions` that holds the configurations to control the behavior of the underlying `cudf::get_json_object` function. It incorporates this new class into the `getJSONObject` JAVA API as an additional argument but also keeps the previous API to maintain backwards compatibility.  It also includes a test case, `testGetJSONObjectWithSingleQuotes`, validating the behavior of `getJSONObject` when single quotes are enabled.

Authors:
  - Suraj Aralihalli (https://github.com/SurajAralihalli)

Approvers:
  - Robert (Bobby) Evans (https://github.com/revans2)
  - MithunR (https://github.com/mythrocks)
  - Karthikeyan (https://github.com/karthikeyann)

URL: #14956
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants