Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use thread local for groupby raw key holders #5419

Merged
merged 2 commits into from
May 21, 2020
Merged

Conversation

xiangfu0
Copy link
Contributor

No description provided.

@xiangfu0 xiangfu0 changed the title Use thread local for groupby raw key holders [WIP]Use thread local for groupby raw key holders May 20, 2020
@xiangfu0 xiangfu0 force-pushed the thread_local_hashmap branch 3 times, most recently from 646b149 to 1655c27 Compare May 20, 2020 10:32
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's figure out a way to not cache the huge map from expensive queries

if (longOverflow) {
_globalGroupIdUpperBound = numGroupsLimit;
_rawKeyHolder = new ArrayMapBasedHolder(_globalGroupIdUpperBound);
if (!mapBasedRawKeyHolders.containsKey(ArrayMapBasedHolder.class.getName())) {
mapBasedRawKeyHolders.put(ArrayMapBasedHolder.class.getName(), new ArrayMapBasedHolder(_globalGroupIdUpperBound).getInternal());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think initializing to _globalGroupIdUpperBound got introduced in #5291. For many cases with multiple group by columns (high cardinality and/or MV columns) this number can be huge. Unclear to me if making this thread-local will protect against such cases that may require allocating huge chunk of memory upfornt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I feel we may need to have a range of groupIdBound, and only do thread local for them. if it's too small or too large, maybe just create new objects without and with initial size.

Copy link
Contributor

@snleee snleee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please resolve the conflict before checking this in. On my side, I have profiled two different use cases locally and verified that this fixes the hotspot issue for hash map init.

@fx19880617 Thank you for working this quickly!

@xiangfu0 xiangfu0 changed the title [WIP]Use thread local for groupby raw key holders Use thread local for groupby raw key holders May 21, 2020
@snleee snleee merged commit a3ebd0c into master May 21, 2020
@snleee snleee deleted the thread_local_hashmap branch May 21, 2020 03:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants