adjust topn heap algorithm to only use known cardinality path when dictionary is unique by clintropolis · Pull Request #11186 · apache/druid

clintropolis · 2021-04-30T12:55:13Z

Description

This PR adjusts the heap based topN algorithm to only use known the "known" cardinality path only when the dictionary contains unique values. The known cardinality path to aggregate values uses an array based approach, where an array of aggregator arrays the size of the value cardinality is created, and the dictionaryId is expected to index to an array position with the aggregators for that value, as an optimization to avoid a map lookup.

However, when a selector is aggregated which does not have unique dictionaryIds, but does know its cardinality, such as a selector from an IndexedTable from a join result which uses the row number as the dictionaryId instead, it means that each dictionaryId will be 'new', and thus have a null array entry and still incur the map interaction this path is trying to avoid.

Instead, these selectors will now just use the map directly by using the cardinality "unknown" path instead.

This PR has:

been self-reviewed.
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.

…ctionary is unique

abhishekagarwal87 · 2021-04-30T13:58:22Z

...ing/src/main/java/org/apache/druid/query/topn/types/StringTopNColumnAggregatesProcessor.java

  )
  {
-    if (selector.getValueCardinality() != DimensionDictionarySelector.CARDINALITY_UNKNOWN) {
+    if (capabilities.isDictionaryEncoded().and(capabilities.areDictionaryValuesUnique()).isTrue()) {


so it is not possible to have unique dictionary ids but unknown cardinality?

Hmm, that is a good point, I guess it is implicit of the current state of things that the only things we have right now that report having unique dictionary ids are things with known cardinality, but I suppose this check needs both things to be true. I'll modify this check both conditions.

I've updated the check, and added a comment to try to explain what is going on with the aggregation algorithm selection and why

adjust topn heap algorithm to only use known cardinality path when di…

8cd54a9

…ctionary is unique

clintropolis added the Area - Querying label Apr 30, 2021

abhishekagarwal87 reviewed Apr 30, 2021

View reviewed changes

clintropolis added 2 commits May 2, 2021 21:05

better check and add comment

e397800

adjust comment more

904816f

jon-wei approved these changes Jun 10, 2021

View reviewed changes

jon-wei merged commit 6b272c8 into apache:master Jun 10, 2021

clintropolis deleted the topn-heap-cardinality-known-when-unique branch June 10, 2021 23:33

clintropolis added this to the 0.22.0 milestone Aug 12, 2021

clintropolis mentioned this pull request Mar 2, 2022

adjust topn heap operation when string is dictionary encoded, but not uniquely #12291

Merged

2 tasks

vsharathchandra mentioned this pull request Oct 5, 2022

High querytime for topN queries with injective=false namespace after upgrade to druid 0.22 #12135

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adjust topn heap algorithm to only use known cardinality path when dictionary is unique#11186

adjust topn heap algorithm to only use known cardinality path when dictionary is unique#11186
jon-wei merged 3 commits intoapache:masterfrom
clintropolis:topn-heap-cardinality-known-when-unique

clintropolis commented Apr 30, 2021

Uh oh!

abhishekagarwal87 Apr 30, 2021

Uh oh!

clintropolis May 2, 2021

Uh oh!

clintropolis May 3, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

clintropolis commented Apr 30, 2021

Description

Uh oh!

abhishekagarwal87 Apr 30, 2021

Choose a reason for hiding this comment

Uh oh!

clintropolis May 2, 2021

Choose a reason for hiding this comment

Uh oh!

clintropolis May 3, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants