New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRILL-5033: Query on JSON That Has Null as Value For Each Key #2731
Conversation
I copied the JIRA into the PR description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @unical1988, thanks for the contribution.
I'm afraid I have to disagree with this approach of writing nulls as varchar values since it will cause schema change issues when values from the next records have a different type.
Please see the design doc attached to https://issues.apache.org/jira/browse/DRILL-4824 for more details of issues with handling different states within JSON.
@vvysotskyi My attempt to deal with this bug is just a quick workaround since the solution as stated by @cgivre might just be the setting of the schema, of the dataset to query, from the start (which requires non trivial updates to the code). |
@unical1988 An example query might be: select * from table(dfs.tmp.`file.json`(
schema => 'inline=(col0 varchar, col1 date properties {`drill.format` = `yyyy-MM-dd`})
properties {`drill.strict` = `false`}')) |
Thanks @cgivre for the clarification, but suppose the assumption that considering nulls as strings would solve the issue, were the changes i made (over the class JSONReader.java) adequate (should the methods be changed as i did)? i see that some tests didn't pass. |
For us to merge a pull request, all the unit tests have to pass. (Or be modified with an explanation of why they are being modified) Drill is a very complex beast with a lot of dependencies so even small changes can break things you didn't intend to. Believe me... I know from experience ;-) One other thing to note is that there is another option |
I'm going to close this PR. If there is any objection, we can revisit. |
DRILL-5033: Query on JSON That Has Null as Value For Each Key
Description
Drill returns same result with or without
store.json.all_text_mode
=trueNote that each key in the JSON has null as its value.
[root@cent01 null_eq_joins]# cat right_all_nulls.json
Querying the above JSON file results in null as query result.
We should see each of the keys in the JSON as a column in query result.
And in each column the value should be a null value.
Current behavior does not look right.
Documentation
(Please describe user-visible changes similar to what should appear in the Drill documentation.)
Testing
(Please describe how this PR has been tested.)