Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More accurate dictionary size estimation in RowBasedKeySerde #4768

Closed
jihoonson opened this issue Sep 9, 2017 · 11 comments
Closed

More accurate dictionary size estimation in RowBasedKeySerde #4768

jihoonson opened this issue Sep 9, 2017 · 11 comments
Labels

Comments

@jihoonson
Copy link
Contributor

The original discussion is #4704 (comment).

RowBasedKeySerde currently builds a dictionary at query time. To avoid using too many memory, the dictionary size is limited by user configuration. However, the current dictionary size estimation is based on a rough calculation.

JOL might be an alternative for more accurate dictionary size estimation if its overhead is not so large.

@jihoonson jihoonson changed the title JOL library for more accurate dictionary size estimation More accurate dictionary size estimation in RowBasedKeySerde Sep 9, 2017
@leventov
Copy link
Member

leventov commented Sep 9, 2017

What do you mean by "overhead" here? I thought the only concern with it might be that it only supports HotSpot JVM (though I'm not sure). But it could be fine, if Druid doesn't really aim to support other JVMs. Apart from that, it should be perfectly fine to use JOL.

Note e. g. that Presto, a high-profile project similar to Druid, uses it.

@leventov
Copy link
Member

leventov commented Sep 9, 2017

Ok, and another issue could be the license, because JOL is GPL.

@jihoonson
Copy link
Contributor Author

I meant the overhead to get the size of each string in dictionary because it will be a very frequent operation.

Ok, and another issue could be the license, because JOL is GPL.

This is a really sad news..

@leventov
Copy link
Member

leventov commented Sep 9, 2017

@jihoonson see #4771

@leventov
Copy link
Member

leventov commented Sep 9, 2017

Or, if still using this approach, byte-buddy-agent library could be used to obtain Instrumentation instance, and then calling Instrumentation.getObjectSize().

@jihoonson
Copy link
Contributor Author

#5583 might be related as well.

@stale
Copy link

stale bot commented Jun 21, 2019

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 2 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Jun 21, 2019
@jihoonson
Copy link
Contributor Author

Still relevant.

@stale
Copy link

stale bot commented Jun 26, 2019

This issue is no longer marked as stale.

@stale stale bot removed the stale label Jun 26, 2019
@stale
Copy link

stale bot commented Apr 1, 2020

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Apr 1, 2020
@stale
Copy link

stale bot commented Apr 29, 2020

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.

@stale stale bot closed this as completed Apr 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants