New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Background Refresh / Refresh Ahead Policy #34
Comments
In my use case, I need the background refresh to happen repeatedly. Stopping after a single refresh really isn't useful. |
The background refresh, as its implemented now, only stops after a single refresh if the data is not requested within the expiry period after the refresh. Can you explain your use case in more detail? How big is the active data set, that needs to be refreshed constantly? How does it change over time? What is the expiry time? |
My use cache is an authentication token cache, with a business requirement that the tokens expire after a few minutes (such that an employee whose system access is deactivated will no longer be able to authenticate with our system). We’ll probably only cache a few hundred tokens, maximum, and it’s very rare that the data will change. Loading our authentication tokens takes a second, so we’d like to pro-actively refresh the tokens upon cache expiry. With this refresh in place, when a user accesses our system (perhaps once a day), they will not face a one second delay as their token will be present in the cache. |
Hey, I need the background refresh to happen repeatedly just as in Chris case. In my case I cache offers for clients and for me it's big performance gain to have more 'fresh' offers in cache than to fetch offers from external source on cache miss. Each offer is cached for about 5 minutes and we keep about 30k of them. In combination with bulk loaders there's less stress on external resource than without cache or with cache that expires unused entries. Sometimes offers have to be removed from the cache, because we got fixed size caches, but LRU is good enough. Even removing the oldest entry would make sense here and would be easy to implement. |
Here is an idea about a refresh policy:
|
Should be merged with #172. If we change to another mechanism to track the access, we could also recognize a second access after the initial load. This enables the possibility to do no refresh if the value was never accessed a second time. |
Another idea:
The method is executed when a value is first loaded or refreshed after the loader and expiry policy. It is counter intuitive but intentional that the policy is not evaluating an access count but returning the required number of hits. This way we only need to keep count until the requirement is met, thus, items that are requested very often have no additional overhead in the critical cache hit path. ExamplesCurrent behavior:
Require at least one access until next expiry / refresh.
Keep refreshing until not accessed for 12 hours:
Require that the value is accessed at least 5 times per minute:
|
@denghongcai ping. Maybe you like to check and comment on the last idea. |
back to year 2018, i use a ugly method to deal with my scenario private Cache < HelloGetParams, HelloCacheEntry > cache = new Cache2kBuilder < HelloGetParams, HelloCacheEntry > () {}
.name("Hello-cache")
.entryCapacity(2048)
.enableJmx(true)
.expireAfterWrite(4, TimeUnit.MINUTES)
.keepDataAfterExpired(true)
.loader(new AdvancedCacheLoader < HelloGetParams, HelloCacheEntry > () {
@Override
public HelloCacheEntry load(HelloGetParams HelloGetParams, long currentTime,
CacheEntry < HelloGetParams, HelloCacheEntry > currentEntry)
throws Exception {
if (currentEntry != null && currentEntry.getValue() != null) {
// return stale data and background refresh
if (currentTime <
currentEntry.getValue().getExpireTimeSinceEpoch() + MAX_STALE_TIME) {
refreshTaskManager.runInBackground(HelloGetParams, () - > {
try {
HelloCacheEntry cacheEntry = buildHelloCacheEntry(HelloGetParams);
cacheEntry.setExpireTimeSinceEpoch(
currentTime + cacheEntry.getCacheDuration().toMillis());
cache.put(HelloGetParams, cacheEntry);
} catch (Exception e) {
HelloErrorLogger.error(this.getClass().getSimpleName(), e);
}
});
return currentEntry.getValue();
}
}
HelloCacheEntry cacheEntry = buildHelloCacheEntry(HelloGetParams);
cacheEntry.setExpireTimeSinceEpoch(currentTime + cacheEntry.getCacheDuration().toMillis());
return cacheEntry;
}
})
.expiryPolicy((HelloGetParams, cacheEntry, loadTime, oldEntry) - > {
if (cacheEntry == null) {
return ExpiryTimeValues.NO_CACHE;
}
return loadTime + cacheEntry.getCacheDuration().toMillis();
})
.build(); my idea is set two timepoint:
refreshTime < expireTime |
@denghongcai thanks for sharing! In cache2k the expiry time is the time when an entry needs to be refreshed. Its identical. In old versions of cache2k I there was a In your logic you like to refresh the entry before its expiry, or, in other words, before it must be reloaded.
My line of thinking was more the second idea. Now I realize that there might be different interpretations. The first concept is what Guava and Caffeine implement. That leads into other problems. Now a policy is missing if you want to have variable refresh times, see: ben-manes/caffeine#504 In cache2k, if refresh ahead is enabled, would return expired entries while the load is ongoing (if sharp expiry is enabled, then a parallel request would block). An explicit control how long a stale/expired entry can be served is missing. I will somehow try to incorporate all these thoughts into the further enhancements. @denghongcai: If you have time, can you share a few more details? I'd like to make sure we don't have an XY-Problem by just focusing on technical properties. What is the allowed stale time or how much ahead of expiry would you typically do the refresh? |
… to a different access detection scheme, which keeps entry visible.
Great job, thanks very much! |
@javalover123:
The default policy would require at least one access for a refresh to happen. If you set it to 0 the refresh will happen always, even if nobody needs the value any more. So, setting it to 0 in general is not advised and probably not what most people want. However, this is the simple default policy. You can set you own policy via the It will also be possible to construct quite smart policies. E.g. refresh after 5 minutes and keep refreshing for a maximum of 2h if nothing was accessed. |
Sorry for my delay.
typically, less than a minute
it depends, typically less than ten seconds. i think it's not a major problem here.
data must be always fresh if it is warm. we had a DataGateway to server http request, it read a expensive config from somewhere then use it to get/parse/edit from backend service. you can image it like a GraphQL Gateway instead of it's a heavy operation on get GraphQL-ish DSL. |
Current semantic
When enabling background refresh with
CacheBuilder.refreshAhead(true)
:The old value will be returned by the cache, although it is expired, and will be replaced by the new value, once the loader is finished. In the case there are not enough threads available to start the loading, the entry will expire immediately and the next get() request will trigger the load.
Once refreshed, the entry is in a trail period. If it is not accessed until the next expiry, no refresh will be done and the entry expires regularly.
Analysis
We have indication from our applications, that this is not working well in all scenarios. Example:
This means the second load is mostly useless and the cache has always a miss, when the entry is accessed by the application.
How long the refreshing is done should be separated from the normal expiry duration. Especially at nighttime the access frequency gets lower. Administrators should be able to decide between:
Todo
Any more thoughts?
The text was updated successfully, but these errors were encountered: