-
Notifications
You must be signed in to change notification settings - Fork 703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CARBONDATA-3665] Support TimeBased Cache expiration using Guava Cache #3580
Conversation
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1646/ |
f677e3b
to
a3503ea
Compare
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1647/ |
retest this please |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1663/ |
a3503ea
to
4008b44
Compare
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1672/ |
* object | ||
*/ | ||
private Map<String, Cacheable> lruCacheMap; | ||
private Cache<String, Cacheable> lruCacheMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are scenarios where cache of some table need to be maintained and other tables can be dropped. It will be good if we support table level expiry. and not clear all the cache when expire.
Drop meta cache DML is already supporting table level dropping , but it is manual work now. If this can be handled by guava, it is great
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, using guava cache, we can do only time-based or size-based eviction for all values(not specific to table level) loaded to CarbonLruCache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackylk , @ravipesala : what do you think ?
@Indhumathi27 : I still think we must have entry level expiry, just removing all the cache after certain time is not so useful as cache need to be loaded again ?
check this guava issue for the same (here)
I just checked on internet and I found 2 alternatives (under apache license) that supports entry level expiry.
a. caffeine -- This is on top of guava as they support guava adaptor
b. expiringmap -- This is widely used by many companies in their project. see here. I guess we can use this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Indhumathi27 i agree with @ajantha-bhat , may we need to keep option, or give choices to user and we should have implementations based on that like time based, size based.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajantha-bhat @akashrn5 ok. I will check those caching libraries and will raise new PR.
String timeBasedExpiration = CarbonProperties.getInstance() | ||
.getProperty(CarbonCommonConstants.CARBON_LRU_CACHE_EXPIRATION_DURATION_IN_MINUTES); | ||
if (null != timeBasedExpiration) { | ||
duration = Long.parseLong(timeBasedExpiration); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forget to validate the content?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Configuration is already validated in core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
. Please check
new LinkedHashMap<String, Cacheable>(CarbonCommonConstants.DEFAULT_COLLECTION_SIZE, 1.0f, | ||
true); | ||
CacheBuilder.newBuilder().initialCapacity(CarbonCommonConstants.DEFAULT_COLLECTION_SIZE) | ||
.expireAfterWrite(duration, TimeUnit.MINUTES).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expire after write or access?
besides, guava also support size based cache, will it be supported later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Size-based eviction can be supported later if needed
if (null != timeBasedExpiration) { | ||
duration = Long.parseLong(timeBasedExpiration); | ||
} | ||
// initialise guava cache with time based expiration | ||
lruCacheMap = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to optimize this variable's name -- lruCache
is enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
@@ -327,15 +309,15 @@ public Cacheable get(String key) { | |||
*/ | |||
public void clear() { | |||
synchronized (lruCacheMap) { | |||
for (Cacheable cachebleObj : lruCacheMap.values()) { | |||
for (Cacheable cachebleObj : lruCacheMap.asMap().values()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use method invalidateAll()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the above code is for invalidating Cacheable objects. For LRUCache, cleanUp method is used for clearing the cache.
79ed635
to
754300d
Compare
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1752/ |
754300d
to
fd7c3a3
Compare
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1754/ |
CarbonCommonConstants.CARBON_LRU_CACHE_EXPIRATION_DURATION_IN_MINUTES_DEFAULT); | ||
long duration; | ||
try { | ||
duration = Long.parseLong( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to carbonCacheExpireDuration
.setProperty(CARBON_LRU_CACHE_EXPIRATION_DURATION_IN_MINUTES, | ||
Long.toString(duration)); | ||
} | ||
} catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this method, only NumberFormatException will be thrown, i dont see any other type of exception. If not instead of generalised exception, use NumberFormatException.
* object | ||
*/ | ||
private Map<String, Cacheable> lruCacheMap; | ||
private Cache<String, Cacheable> lruCache; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since now least recently used cache is changed to time based, does this variable name still makes sense? shall we give a better one to depict the time based cache implementation?
Why is this PR needed?
Currently, in Carbon, we follow LRU cache based mechanism. An least-recently used entry will be removed from the cache when it is full. There is no time-based cache expiration supported in carbon. In cloud, all vm's may not have enough memory to cache everything we could cache.
In that case, we can clear cache after a specified duration. This can be achieved by using cache libraries available.
One of the caching library is Guava Cache, which provides flexible and powerful caching features. Please refer GuavaCache for more info.
What changes were proposed in this PR?
Newly added carbon property:
carbon.lru.cache.expiration.duration.in.minutes
which takes long value.For example:
carbon.lru.cache.expiration.duration.in.minutes="5"
-> After 5 minutes, cache will be cleared.Does this PR introduce any user interface change?
Is any new testcase added?