New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrongly split results when mapping "" in query #4301
Comments
Related discussion thread: https://groups.google.com/forum/#!msg/druid-user/QqYmUYPduH4/x8ji3RRjBQAJ |
I bet it's related to different behavior for actual nulls in actual columns, vs. columns that don't exist. |
yes. That's the reason. |
If anyone is interested in fixing this, try starting by searching for calls to There should be some tests too, probably in GroupByQueryRunnerTest and TopNQueryRunnerTest. That just checks reading values though. To check that filtering works too, try adding some tests to the filter test files like SelectorFilterTest and BoundFilterTest. |
Mind if I pick this up? I want to contribute. |
@gianm I am struggling to make a reproduction scenario for TDD. You mentioned the root cause. But, I don't understand the meaning of the versus. I guess the column points dimension of extraction, so the column can be either exists or not exists. Could you please give me more explanation about this?
I am trying to make a test case which is based on
Loading index from
This is a loaded data from the index file.
|
This issue has been marked as stale due to 280 days of inactivity. |
This issue has been closed due to lack of activity. If you think that |
I want to map empty strings to "N/A" in query results. However, the results are strangely split between "null" and "N/A"'s (query examples below), as if the mapping was not being applied in all cases.
When we don't use the mapping, there is only one category: "null". The raw data, by the way, has only empty strings for nulls.
Interestingly, it seems like the rows whose "" are not mapped to "N/A" are those that have timestamps that come before the first row with a non-null entry; the corresponding segments only have "" for that dimension.
This seems like a bug in Druid (tried both 0.8.2 and 0.9.2).
The empty values are split in the result:
Remove the "": "N/A" map and change it to one that has no effect ("foobar" : "N/A"),
and the result groups all the null values:
Note that 100130 + 46534 = 146664
The text was updated successfully, but these errors were encountered: