Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expiry after Access / Time to Idle / Idle Scan #39

Closed
cruftex opened this issue Feb 17, 2016 · 14 comments
Closed

Expiry after Access / Time to Idle / Idle Scan #39

cruftex opened this issue Feb 17, 2016 · 14 comments
Assignees
Labels
Milestone

Comments

@cruftex
Copy link
Member

cruftex commented Feb 17, 2016

Support of expiry after an entry is accessed is missing in core cache2k.

What is it?

An entry expires and is removed from the cache after being not accessed for a period of time. The main use is primarily for freeing resources and shrinking the cache when the application is not in use.

Current state:

The feature is available in the JCache support. However, it is not implemented very efficiently. Also, the standard functionality of JCache is not very practicable. JCache defines TTI, however it cannot be combined with expire after write.

Why it isn't available in cache2k, yet?

We almost never need it. Typically we run caches with a TTL between 30 seconds and 30 minutes. Which means that entries expire via the TTL setting and the cache shrinks. TTI would be needed if entries never expire via TTL or have a higher TTL than TTI. For example a web resource might be allowed to be cached for 6 days, however, if being not accessed we like to remove it from memory after one hour.

The second reason is the overhead. TTI is typically implemented via an LRU style linked list. This brings lock contention and concurrency issues.

Solutions

If needed several options are available via the existing cache2k features, e.g. the expiry can be modified in a cache request via the entry processor.

We also plan to do an efficient implementation of a TTI approximation which is based on the eviction algorithm.

@cruftex cruftex added this to the v1 milestone Feb 17, 2016
@cruftex cruftex mentioned this issue Feb 17, 2016
@cruftex cruftex changed the title Expiry after Access Expiry after Access / Time to Ilde Mar 29, 2016
@cruftex cruftex modified the milestones: v1.2, v1 May 18, 2016
@cruftex cruftex removed this from the v1.2 milestone Jul 3, 2018
@ricardojlrufino
Copy link

What status of this implementation?
The main feature I needed was this ...

@quaff
Copy link

quaff commented Aug 17, 2018

Ilde in title should be idle

@cruftex cruftex changed the title Expiry after Access / Time to Ilde Expiry after Access / Time to Idle Aug 17, 2018
@cruftex
Copy link
Member Author

cruftex commented Aug 17, 2018

@quaff Oops, thanks, I changed the title!

@cruftex
Copy link
Member Author

cruftex commented Aug 17, 2018

@ricardojlrufino Thanks for reaching out! Its good timing, since I have now stabilized the next version (1.2) and need to do some planing what important stuff to put the focus on next.

Two reasons it's not there yet: We rarely need it, there needs to be always a compromise on the performance we have achieved to provide/enable this feature.

To drive it in the right direction, I'd love to hear from people who actually need it. Can you share something about the scenario you want to use it for?

@ricardojlrufino
Copy link

@cruftex My application is from IoT and I cache the Devices in use and, after a while, I need to remove it from memory.
I ended up using the Guava library, unfortunately it is very large (2MB), and I only use it for cache.

Ref: https://github.com/google/guava/wiki/CachesExplained

@detohaz
Copy link

detohaz commented Jun 3, 2019

@cruftex This is a very important feature that I need now. I have more than 100 million elements to cache, but not all of them are accessed continuously (~ 5 millions). The cache eats my ram (> 70Gbs) so I cannot run two instances simultaneously (total 125Gbs).
I have divided into smaller partitions (32 partitions) to improve access time, and now I'm dealing with memory problems.
Hope that this feature will be implemented as soon as possible.
Thank @cruftex.

@cruftex
Copy link
Member Author

cruftex commented Jun 3, 2019

@detohaz, @ricardojlrufino: Thanks for your feedback! Unfortunately, I am a bit held up with other activities but there will be some progress here during 2019... For the use cases you describe I still wonder whether TTI is really the best solution. So I like to move in two directions here: Implement TTI but also maybe come up with a better cache feature for 90% of the use cases that use TTI now.

@detohaz: You know that around 5 million entries are accessed continuously and should therefore be cached. What about limiting the cache size to 8 million entries and keep the ram usage in check?

@cruftex
Copy link
Member Author

cruftex commented Sep 9, 2019

@detohaz: Did my suggestion to limit the cache size help and solve your problem?

In case you use a TTI like feature, the actual cache size would depend on the usage pattern. This can lower your heap memory usage, in times when you have low utilization. However, you don't have a guarantee for an upper memory limit.

@cruftex cruftex added this to the v2 milestone Sep 9, 2019
@cruftex cruftex self-assigned this Sep 9, 2019
@cruftex cruftex modified the milestones: v2, v2.x Oct 13, 2020
cruftex added a commit that referenced this issue Nov 16, 2021
@cruftex
Copy link
Member Author

cruftex commented Nov 16, 2021

I've been tinkering on a neat solution for this the last days / weeks.

Main configuration parameter is Cache2kBuilder.idleScanTime. The Java Doc says currently:

Sets the time for a regular scan round of all cache entries which evicts
idle cache entries that are not accessed since the last scan round.
The effect is similar then the setting time to idle or expire after access
in other caches when set to about half of the time to idle time. In contract
to the typical time to idle implementation there is no additional access
overhead. This is intended to shrink the cache size in case entries are unused.
Since coupled with the eviction algorithm the efficiency is better than
a typical time to idle implementation.

Early testing results look promising. Just checked in the current state. The solution has the following advantages over a usual TTI implementation based on a queue / linked list:

  • No additional overhead at access time, since we use the already existing structures for eviction
  • Early testing shows lower resource usage (average cache size and lower highest cache size) with a better hitrate when compared to the established TTI algorithm

I add more hard data, once the test scenarios stabilize.

@xuyenhabinh
Copy link

Hey cruftex, it's detohaz! This is my new identity.
It's exciting to hear you share about your improvements. Keep up the work as I always support this project.

cruftex added a commit that referenced this issue Nov 18, 2021
@cruftex
Copy link
Member Author

cruftex commented Nov 18, 2021

Updated the numbers in the table after changes, 23rd Nov

As promised some comparison data. For the comparison I use a recorded trace, which I included in the last commit.

About the trace:

Access trace on the product catalog on an eCommerce website of 24 hours starting from 4am goining to 4am the next morning. All requests of robots (web spiders from search engines like Google or Bing) were removed, so the trace contains mostly user activity. The requests were reduced by removing a random selection of keys to have a more compact sice for including in the project. The first integer of the trace is the time in seconds, the second one the accessed key. Both integers are normalized and starting at 0. The trace characteristics are:
Request count: 66319, unique keys: 8810, maximum hitrate: 86.72

For the comparison I implemented the established semantics (removal exactly after passing the idle duration), plus and run a playback of the above trace. Next I run the playback against the current cache2k code via a simulated clock source.

Find the code and traces via the referenced commit above.

Established Time To Idle semantics

TTI/Minutes Maximum cache size Average cache size Hitrate
55 3789 1370 70.09
56 3799 1386 70.29
57 3825 1402 70.47
58 3831 1417 70.66
59 3858 1432 70.87
60 3867 1448 71.03
61 3889 1463 71.18
62 3923 1478 71.35
63 3984 1493 71.49
64 4002 1507 71.65
65 4027 1522 71.83

Time To Idle emulation via scanning in cache2k 2.6

Scan round time/Minutes Maximum cache size Average cache size Hitrate
40 3219 1254 69.48
41 3297 1276 69.78
42 3357 1298 70.03
43 3337 1320 70.33
44 3494 1338 70.57
45 3478 1357 70.78
46 3483 1373 71.03
47 3642 1397 71.32
48 3559 1413 71.59
49 3733 1434 71.78
50 3669 1449 71.95

Using about three thirds of the TTI as scan time, produces a similar average cache size, while having a better hitrate and a lower maximum cache size. Thus, saving resources while being more effective.

With the established TTI semantics resource usage is growing or shrinking linear with the time setting. With the scanning approach the resource numbers are "jumping". This can happen because the results depend on how the scanning activity aligns with the requests.

For the above results cache2k runs with unbounded capacity and the eviction algorithm is not working as regular. Instead of splitting up the data between a cold and hot set, everything is in the hot set, or, one clock data structure that is scanned through. The next test is combined with normal eviction activity by setting the capacity to 2000 entries.

Time To Idle emulation via scanning in cache2k 2.6 with capacity limit of 2000 entries

Scan round time/Minutes Maximum cache size Average cache size Hitrate
40 2000 1218 69.73
41 2000 1244 70.13
42 2000 1260 70.32
43 2000 1272 70.57
44 2000 1292 70.88
45 2000 1305 71.02
46 2000 1322 71.34
47 2000 1338 71.55
48 2000 1359 71.95
49 2000 1367 72.00
50 2000 1387 72.27

Conclusion:

While there is more validation and testing work to do, it can be seen that the general direction makes sense and the approach is a valid alternative to the established straight forward TTI implementation.

When we cap the cache size to 2000 entries the scanning TTI emulation runs in cooperation with the normal eviction resulting in a lower maximum cache size but not much lower hit rates. That means there is potential for more improvement by making more use of the eviction algorithm, e.g. by removing "idle" entries faster that had never seen any hits.

I see different directions of "improvement":

  • A closer emulation of the original TTI semantics, for example, by doing multiple scan rounds and removing entries after several rounds idling
  • Instead of more predictable TTI semantics going the other direction and improve efficiency, e.g. by removing cold entries faster as hot entries

For now I will stabilize the current approach, which seems "good enough" to make it into production and then we can learn from it.

cruftex added a commit that referenced this issue Nov 23, 2021
@cruftex
Copy link
Member Author

cruftex commented Nov 23, 2021

Did some more testing and fixed an logical error that lead to one round scan and one round pausing. Corrected the numbers in the previous comment.

Here is a histogram of the actual time from last access to eviction for the 45 minutes scan time setting:

Idle time until eviction Count Count as Bar
35 112 :
40 5545 :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
45 7967 ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
50 741 ::::::::
55 655 :::::::
60 653 :::::::
65 470 :::::
70 414 ::::
75 460 :::::
80 544 ::::::
85 948 ::::::::::
90 711 :::::::
95 45
100 16
105 8

The time column represents a time span of 5 minutes.

Idle evictions between 45 and 90 minutes could be expected, however we see evictions a bit before and after. Longer times are happening during a phase of cache growing, smaller times are happening during a phase of cache shrinking.

As an example let's look at the shrink phase: One entry could be in the middle of the scan round. If most of the entries are removed in the round, the entry would be at the beginning of the next scan round. This means there is a chance that entries are removed much earlier during shrinking.

These outliers are not problematic, the opposite, the behavior is in sync with the workload phase. During the insert phase (rising activity) it makes sense to keep entries a bit longer, during the shrink phase (ebbing activity) evictions can be done earlier without doing harm.

@cruftex
Copy link
Member Author

cruftex commented Nov 23, 2021

Notes on compensating for inserts and removals while scanning:

Inserts: There is no compensation for inserts. We could speed up the scan since the cache size increases, however, this happens at the start of the new round anyway.
Removals: Entries might be removed by Cache.remove or expiry. In this case we would scan too much when keeping the original scan speed. We compensate and reduce the amount of remaining scans. We need to differentiate entries that were already scanned in the round and entries not yet scanned. Therefore each entry is tagged with a round counter which means the entry was scanned. Entries removed and not yet scanned will lower the scan rate accordingly. Inserts always happen with the current round tag, so if something is inserted and removed in the same round, no compensation is done.

@cruftex cruftex removed this from the v2.x milestone Dec 28, 2021
@cruftex cruftex added this to the v2.6 milestone Dec 28, 2021
@cruftex
Copy link
Member Author

cruftex commented Jan 24, 2022

Completed and shipped a while ago in:
https://github.com/cache2k/cache2k/releases/tag/v2.5.1.Alpha
Latest release for testing it:
https://github.com/cache2k/cache2k/releases/tag/v2.5.3.Beta

@cruftex cruftex closed this as completed Jan 24, 2022
@cruftex cruftex changed the title Expiry after Access / Time to Idle Expiry after Access / Time to Idle / Idle Scan Jan 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants