Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query perf tuning: improve GroupBy query perf by setting initial size for HashMap #5291

Merged
merged 1 commit into from Apr 23, 2020

Conversation

xiangfu0
Copy link
Contributor

Hashmap used in DictionaryBasedGroupKeyGenerator for holding groupKeys is initialized with default initial size(16), which is very small. This caused performance degradation for group by queries with many groupKeys.

In order to run select group by query, the time used to scan 36MM entries are:

  • Not setting initial size: 1400ms
  • Setting initial size: 900ms

Credit to @npawar , @kishoreg for benchmark

…hMap used in DictionaryBasedGroupKeyGenerator.
@xiangfu0 xiangfu0 force-pushed the set_initial_size_for_group_key_hash_map branch from 9f99a31 to 0bb8d7b Compare April 23, 2020 08:23
Copy link
Contributor

@jackjlli jackjlli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Nicely done!

@xiangfu0 xiangfu0 merged commit 62799a3 into master Apr 23, 2020
@xiangfu0 xiangfu0 deleted the set_initial_size_for_group_key_hash_map branch April 23, 2020 18:36
snleee pushed a commit to snleee/pinot that referenced this pull request May 20, 2020
For the use cases with high qps with low selectivity queries,
initializing the hashmap with the size of the upperbound incurred
more penalty as opposed to improving performance by reducing
the number of hashmap resizes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants