Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-5033: Query on JSON That Has Null as Value For Each Key #2731

Closed
wants to merge 1 commit into from

Conversation

unical1988
Copy link

@unical1988 unical1988 commented Dec 29, 2022

DRILL-5033: Query on JSON That Has Null as Value For Each Key

Description

Drill returns same result with or without store.json.all_text_mode=true
Note that each key in the JSON has null as its value.
[root@cent01 null_eq_joins]# cat right_all_nulls.json

{
"intKey" : null,
"bgintKey": null,
"strKey": null,
"boolKey": null,
"fltKey": null,
"dblKey": null,
"timKey": null,
"dtKey": null,
"tmstmpKey": null,
"intrvldyKey": null,
"intrvlyrKey": null
}

Querying the above JSON file results in null as query result.
We should see each of the keys in the JSON as a column in query result.
And in each column the value should be a null value.
Current behavior does not look right.

0: jdbc:drill:schema=dfs.tmp> select * from `right_all_nulls.json`;
+-------+
|   *   |
+-------+
| null  |
+-------+
1 row selected (0.313 seconds)

Documentation

(Please describe user-visible changes similar to what should appear in the Drill documentation.)

Testing

(Please describe how this PR has been tested.)

@cgivre cgivre changed the title DRILL-8033 DRILL-5033 Dec 29, 2022
@cgivre cgivre changed the title DRILL-5033 DRILL-5033: Query on JSON That Has Null as Value For Each Key Dec 29, 2022
@cgivre
Copy link
Contributor

cgivre commented Dec 29, 2022

I copied the JIRA into the PR description.

Copy link
Member

@vvysotskyi vvysotskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @unical1988, thanks for the contribution.
I'm afraid I have to disagree with this approach of writing nulls as varchar values since it will cause schema change issues when values from the next records have a different type.
Please see the design doc attached to https://issues.apache.org/jira/browse/DRILL-4824 for more details of issues with handling different states within JSON.

@unical1988
Copy link
Author

@vvysotskyi My attempt to deal with this bug is just a quick workaround since the solution as stated by @cgivre might just be the setting of the schema, of the dataset to query, from the start (which requires non trivial updates to the code).

@cgivre
Copy link
Contributor

cgivre commented Dec 30, 2022

@unical1988
You actually don't have to modify the code to get this data to read properly. As I mentioned on the user group, the easiest way would probably be to provide a schema. The good news is that you can do this at query time. Take a look here: https://drill.apache.org/docs/plugin-configuration-basics/#specifying-the-schema-as-table-function-parameter

An example query might be:

select * from table(dfs.tmp.`file.json`(
schema => 'inline=(col0 varchar, col1 date properties {`drill.format` = `yyyy-MM-dd`})
properties {`drill.strict` = `false`}'))

@unical1988
Copy link
Author

Thanks @cgivre for the clarification, but suppose the assumption that considering nulls as strings would solve the issue, were the changes i made (over the class JSONReader.java) adequate (should the methods be changed as i did)? i see that some tests didn't pass.

@cgivre
Copy link
Contributor

cgivre commented Dec 30, 2022

Thanks @cgivre for the clarification, but suppose the assumption that considering nulls as strings would solve the issue, were the changes i made (over the class JSONReader.java) adequate (should the methods be changed as i did)? i see that some tests didn't pass.

For us to merge a pull request, all the unit tests have to pass. (Or be modified with an explanation of why they are being modified) Drill is a very complex beast with a lot of dependencies so even small changes can break things you didn't intend to. Believe me... I know from experience ;-)

One other thing to note is that there is another option drill.exec.functions.cast_empty_string_to_null. Setting this to true forces empty strings to be treated as null. This can have some unintended side effects, but might also help you out.

@cgivre
Copy link
Contributor

cgivre commented Jan 18, 2023

I'm going to close this PR. If there is any objection, we can revisit.

@cgivre cgivre closed this Jan 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants