New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache: Add maxEntrySize config, make groupBy cacheable by default. #5108
Conversation
The idea is this makes it more feasible to cache query types that can potentially generate large result sets, like groupBy and select, without fear of writing too much to the cache per query. Includes a refactor of cache population code in CachingQueryRunner and CachingClusteredClient, such that they now use the same CachePopulator interface with two implementations: one for foreground and one for background. The main reason for splitting the foreground / background impls is that the foreground impl can have a more effective implementation of maxEntrySize. It can stop retaining subvalues for the cache early.
Removed WIP tag -- I have added tests and resolved conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm 👍
@@ -52,10 +52,13 @@ | |||
private int cacheBulkMergeLimit = Integer.MAX_VALUE; | |||
|
|||
@JsonProperty | |||
private int resultLevelCacheLimit = Integer.MAX_VALUE; | |||
private int maxEntrySize = 1_000_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a typical cache entry size? (how did you pick this number)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our cluster the average size is about 1KB. But I am not sure how representative this is. I chose 1MB because I figured it was a large number that will block egregiously large cache entries.
@gianm please fix the licenses.
|
Oops, missed those. I pushed with new license headers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after Travis.
The idea is a maxEntrySize makes it more feasible to cache query types that
can potentially generate large result sets, like groupBy and select,
without fear of writing too much to the cache per query.
Includes a refactor of cache population code in CachingQueryRunner and
CachingClusteredClient, such that they now use the same CachePopulator
interface with two implementations: one for foreground and one for
background.
The main reason for splitting the foreground / background impls is
that the foreground impl can have a more efficient implementation of
maxEntrySize. It can stop retaining subvalues for the cache early.
Also includes:
put/ok
,put/errors
,put/oversized
.