Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix issue with auto column grouping #16489

Merged
merged 2 commits into from
May 27, 2024

Conversation

clintropolis
Copy link
Member

Description

changes:

  • fixes bug where AutoTypeColumnIndexer reports incorrect cardinality, allowing it to incorrectly use array grouper algorithm for realtime queries producing incorrect results for strings
  • fixes bug where auto LONG and DOUBLE type columns incorrectly report not having null values, resulting in incorrect null handling when grouping

This PR has:

  • been self-reviewed.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

changes:
* fixes bug where AutoTypeColumnIndexer reports incorrect cardinality, allowing it to incorrectly use array grouper algorithm for realtime queries producing incorrect results for strings
* fixes bug where auto LONG and DOUBLE type columns incorrectly report not having null values, resulting in incorrect null handling when grouping
@pranavbhole
Copy link
Contributor

Thank you very much for PR. Do we still have underline issue in array groupers to treat single value field ?

@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 22, 2024
@clintropolis
Copy link
Member Author

clintropolis commented May 22, 2024

Thank you very much for PR. Do we still have underline issue in array groupers to treat single value field ?

No i think its fine, the issue here was that the indexer was reporting a cardinality through the storage adapter, which made the array grouper think that it was dictionary encoded in a way where it could use the dictionary ids, but the indexer was not providing a dictionary encoded selector that fit that contract, so correcting the cardinality reporting solves that problem

@cryptoe cryptoe merged commit 4e1de50 into apache:master May 27, 2024
86 of 87 checks passed
@clintropolis clintropolis deleted the fix-auto-grouping branch May 27, 2024 05:50
adarshsanjeev pushed a commit to adarshsanjeev/druid that referenced this pull request May 27, 2024
* fix issue with auto column grouping
changes:
* fixes bug where AutoTypeColumnIndexer reports incorrect cardinality, allowing it to incorrectly use array grouper algorithm for realtime queries producing incorrect results for strings
* fixes bug where auto LONG and DOUBLE type columns incorrectly report not having null values, resulting in incorrect null handling when grouping

* fix test
adarshsanjeev added a commit that referenced this pull request May 27, 2024
* fix issue with auto column grouping (#16489)

* fix issue with auto column grouping
changes:
* fixes bug where AutoTypeColumnIndexer reports incorrect cardinality, allowing it to incorrectly use array grouper algorithm for realtime queries producing incorrect results for strings
* fixes bug where auto LONG and DOUBLE type columns incorrectly report not having null values, resulting in incorrect null handling when grouping

* fix test

* Fix checkstyle

---------

Co-authored-by: Clint Wylie <cwylie@apache.org>
clintropolis added a commit to clintropolis/druid that referenced this pull request May 28, 2024
changes:
* fix issue similar to apache#16489 but for NestedDataColumnIndexerV4, which can report STRING type if it only processes a single type of values. this should be less common than the auto indexer problem
* fix some issues with sql benchmarks
ektravel pushed a commit to ektravel/druid that referenced this pull request May 29, 2024
* fix issue with auto column grouping
changes:
* fixes bug where AutoTypeColumnIndexer reports incorrect cardinality, allowing it to incorrectly use array grouper algorithm for realtime queries producing incorrect results for strings
* fixes bug where auto LONG and DOUBLE type columns incorrectly report not having null values, resulting in incorrect null handling when grouping

* fix test
clintropolis added a commit that referenced this pull request Jun 12, 2024
* fix NestedDataColumnIndexerV4 to not report cardinality
changes:
* fix issue similar to #16489 but for NestedDataColumnIndexerV4, which can report STRING type if it only processes a single type of values. this should be less common than the auto indexer problem
* fix some issues with sql benchmarks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants