Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-3665] Support TimeBased Cache expiration using Guava Cache #3580

Closed
wants to merge 1 commit into from

Conversation

Indhumathi27
Copy link
Contributor

@Indhumathi27 Indhumathi27 commented Jan 15, 2020

Why is this PR needed?

Currently, in Carbon, we follow LRU cache based mechanism. An least-recently used entry will be removed from the cache when it is full. There is no time-based cache expiration supported in carbon. In cloud, all vm's may not have enough memory to cache everything we could cache.
In that case, we can clear cache after a specified duration. This can be achieved by using cache libraries available.

One of the caching library is Guava Cache, which provides flexible and powerful caching features. Please refer GuavaCache for more info.

What changes were proposed in this PR?

  1. Replaced LinkedHashMap with Guava Cache
  2. Added Carbon property to allow user to specify cache expiration duration in minutes, to clear cache.
    Newly added carbon property:
    carbon.lru.cache.expiration.duration.in.minutes which takes long value.
    For example:
    carbon.lru.cache.expiration.duration.in.minutes="5" -> After 5 minutes, cache will be cleared.

Does this PR introduce any user interface change?

  • Yes. Added new property. Document is updated

Is any new testcase added?

  • Yes

@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1646/

@Indhumathi27 Indhumathi27 force-pushed the guava_cache branch 2 times, most recently from f677e3b to a3503ea Compare January 16, 2020 01:32
@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1647/

@Indhumathi27
Copy link
Contributor Author

retest this please

@Indhumathi27 Indhumathi27 changed the title [WIP] Support TimeBased Cache expiration using Guava Cache [CARBONDATA-3665] Support TimeBased Cache expiration using Guava Cache Jan 16, 2020
@Indhumathi27 Indhumathi27 changed the title [CARBONDATA-3665] Support TimeBased Cache expiration using Guava Cache [WIP][CARBONDATA-3665] Support TimeBased Cache expiration using Guava Cache Jan 16, 2020
@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1663/

@Indhumathi27 Indhumathi27 changed the title [WIP][CARBONDATA-3665] Support TimeBased Cache expiration using Guava Cache [CARBONDATA-3665] Support TimeBased Cache expiration using Guava Cache Jan 17, 2020
@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1672/

* object
*/
private Map<String, Cacheable> lruCacheMap;
private Cache<String, Cacheable> lruCacheMap;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are scenarios where cache of some table need to be maintained and other tables can be dropped. It will be good if we support table level expiry. and not clear all the cache when expire.

Drop meta cache DML is already supporting table level dropping , but it is manual work now. If this can be handled by guava, it is great

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, using guava cache, we can do only time-based or size-based eviction for all values(not specific to table level) loaded to CarbonLruCache.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackylk , @ravipesala : what do you think ?

@Indhumathi27 : I still think we must have entry level expiry, just removing all the cache after certain time is not so useful as cache need to be loaded again ?

check this guava issue for the same (here)
I just checked on internet and I found 2 alternatives (under apache license) that supports entry level expiry.
a. caffeine -- This is on top of guava as they support guava adaptor
b. expiringmap -- This is widely used by many companies in their project. see here. I guess we can use this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Indhumathi27 i agree with @ajantha-bhat , may we need to keep option, or give choices to user and we should have implementations based on that like time based, size based.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajantha-bhat @akashrn5 ok. I will check those caching libraries and will raise new PR.

String timeBasedExpiration = CarbonProperties.getInstance()
.getProperty(CarbonCommonConstants.CARBON_LRU_CACHE_EXPIRATION_DURATION_IN_MINUTES);
if (null != timeBasedExpiration) {
duration = Long.parseLong(timeBasedExpiration);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forget to validate the content?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Configuration is already validated in core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java. Please check

new LinkedHashMap<String, Cacheable>(CarbonCommonConstants.DEFAULT_COLLECTION_SIZE, 1.0f,
true);
CacheBuilder.newBuilder().initialCapacity(CarbonCommonConstants.DEFAULT_COLLECTION_SIZE)
.expireAfterWrite(duration, TimeUnit.MINUTES).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expire after write or access?

besides, guava also support size based cache, will it be supported later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Size-based eviction can be supported later if needed

if (null != timeBasedExpiration) {
duration = Long.parseLong(timeBasedExpiration);
}
// initialise guava cache with time based expiration
lruCacheMap =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to optimize this variable's name -- lruCache is enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@@ -327,15 +309,15 @@ public Cacheable get(String key) {
*/
public void clear() {
synchronized (lruCacheMap) {
for (Cacheable cachebleObj : lruCacheMap.values()) {
for (Cacheable cachebleObj : lruCacheMap.asMap().values()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use method invalidateAll() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the above code is for invalidating Cacheable objects. For LRUCache, cleanUp method is used for clearing the cache.

@Indhumathi27 Indhumathi27 force-pushed the guava_cache branch 2 times, most recently from 79ed635 to 754300d Compare January 23, 2020 06:18
@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1752/

@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1754/

CarbonCommonConstants.CARBON_LRU_CACHE_EXPIRATION_DURATION_IN_MINUTES_DEFAULT);
long duration;
try {
duration = Long.parseLong(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to carbonCacheExpireDuration

.setProperty(CARBON_LRU_CACHE_EXPIRATION_DURATION_IN_MINUTES,
Long.toString(duration));
}
} catch (Exception e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this method, only NumberFormatException will be thrown, i dont see any other type of exception. If not instead of generalised exception, use NumberFormatException.

* object
*/
private Map<String, Cacheable> lruCacheMap;
private Cache<String, Cacheable> lruCache;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since now least recently used cache is changed to time based, does this variable name still makes sense? shall we give a better one to depict the time based cache implementation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants