Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk Load / Loader that will use loadAll() #116

Closed
cybuch opened this issue Jul 4, 2019 · 36 comments
Closed

Bulk Load / Loader that will use loadAll() #116

cybuch opened this issue Jul 4, 2019 · 36 comments
Assignees
Labels
Milestone

Comments

@cybuch
Copy link

cybuch commented Jul 4, 2019

Hey,
is there any plan for releasing Cache implementation that would use Loader's loadAll()? It'd help me a lot with my use case, where I fetch data from HTTP endpoint which supports requests by many IDs. So for 100k objects in cache I could do 1k HTTP requests if each request would fetch 100 IDs instead of 100k requests 1 by 1.

Expected behaviour:

  • collect all cache entries eligible to refresh and load them via single loadAll method

A nice to have feature would be an option about the size of the batch OR time limit eg.:
the batch size is 1.000, but after 2 minutes there're only 512 keys eligible to load so the cache loads all the 512 keys and reset the timer

@cruftex cruftex changed the title Loader that will use loadAll() Bulk Load / Loader that will use loadAll() Jul 9, 2019
@cruftex cruftex self-assigned this Jul 9, 2019
@cruftex
Copy link
Member

cruftex commented Jul 9, 2019

Thanks for the request! We had this feature in previous versions. We stripped it away for the 1.0 version to get rid of all non essential things, since we wanted to have a concise and well tested cache implementation.
So it's on our list to add it again, however, I doubt it will happen the next months.
Any feedback and comments are welcome on this. If I can see a bigger need, maybe it will speed things up.

@chrisbeach
Copy link

@cruftex this feature would be very useful to me.

To make the internal Cache2K implementation simpler, it could be used whenever the getAll method is used. No need for any complex async operation - just a helpful method to make batch requests through the CacheLoader

@cruftex cruftex modified the milestones: v2, v2.x Oct 12, 2020
@mbechto
Copy link

mbechto commented Dec 11, 2020

@cruftex we really need this feature too.

Btw. I have noticed the new org.cache2k.io.CacheLoader interface does not declare loadAll.
So I am wondering, what is the recommended strategy implementing bulk loads with cache2k?

I can only think of something like using a Cache<K, List<V>> which seems quite inelegant to me.

Your opinion would be greatly appreciated :)

@globalworming
Copy link
Contributor

it may be naive.. but can't you just iterate over the list and add all entries?

@mbechto
Copy link

mbechto commented Dec 11, 2020

@globalworming well, yes of course. But that would be much slower than a real bulk load.

For example consider a CacheLoader that does 100 http requests for each key versus one that does only a single http request to fetch values for all 100 keys at once. That is what I meant by bulk load.

@cruftex
Copy link
Member

cruftex commented Dec 12, 2020

Thanks for reaching out. It is perfect timing. The version 2.0 is about to finish. So I priotise bulk
for 2.2 and postpone other stuff. Better to work on features that are in demand.

@MatthiasBechtold

Btw. I have noticed the new org.cache2k.io.CacheLoader interface does not declare loadAll.
So I am wondering, what is the recommended strategy implementing bulk loads with cache2k?

Yes, because I tried my best to clean up the interfaces. There will be an extra BulkCacheLoader interface.
Other caches have the bulk method and the single load in one interface, I can think of Guava, Caffeine and JCache.
Since the interface should be functional, loadAll needs to be to defined by default. A separate interface
has the advantage, that the cache "knows" that the loader supports bulk and that you don't need to define
a single load, when its mostly bulk requests.

I can only think of something like using a Cache<K, List> which seems quite inelegant to me.

I don't know your usage scenario. If the list values are always requested together, then it makes actually sense.
@MatthiasBechtold Maybe you can elaborate?

Did you have a chance to look at the async version of the loader?

@cruftex cruftex modified the milestones: v2.x, v2.2 Dec 12, 2020
cruftex added a commit that referenced this issue Dec 17, 2020
cruftex added a commit that referenced this issue Dec 17, 2020
cruftex added a commit that referenced this issue Dec 17, 2020
@cruftex
Copy link
Member

cruftex commented Dec 17, 2020

New interfaces are in this commit: ebc6a95

cruftex added a commit that referenced this issue Dec 22, 2020
cruftex added a commit that referenced this issue Dec 24, 2020
@cruftex
Copy link
Member

cruftex commented Dec 24, 2020

Just released a preview with bulk cache loader support of getAll, loadAll and invokeAll.
https://github.com/cache2k/cache2k/releases/tag/v2.1.1.Alpha

There will be more tests with concurrency and further optimizations.

@mbechto
Copy link

mbechto commented Jan 6, 2021

@cruftex thanks a lot for the update! (sorry for my delay, festive season took its toll)
I can see now how it all comes together.

In the meantime I reviewed our use case again. Until we are able plan to upgrade to 2.1.1 (and it's released of course) the current workaround should be fine. We eagerly load the whole cache content via putAll with a little additional code to handle expiration. It is not particularly pretty though.

@cruftex
Copy link
Member

cruftex commented Jan 28, 2021

@MatthiasBechtold 2.1.1 has some critical flaws with the bulk loader support. I'll release 2.1.2 soon. It also has a slightly changed/improved bulk interface. After 2.1.2 I plan to do some stress and concurrency tests and then it should be ready for production.

2.1.1: void loadAll(Set<K> keys, Set<Context<K, V>> contextSet, BulkCallback<K, V> callback)
2.1.2: void loadAll(Set<K> keys, BulkLoadContext<K, V> context, BulkCallback<K, V> callback)

cruftex added a commit that referenced this issue Jan 28, 2021
@dstango
Copy link

dstango commented Feb 16, 2021

Thanks for taking care of bulk loading!
I tried out the 2.1.1.Alpha-Version and it always seems to trigger loadAll() with only a single key.
My usage scenario is probably somewhat different from what is discussed before:
I prefetch all data from a rather slow system on initial startup of the application. Then I need to do various filter-operations on the data in memory. So I iterate over all the entries to filter out what I need -- I'm usually not accessing the cache by the ids of the elements (only sometimes). Anyhow I want my data be updated with the source system over time - a delay of sorts is very acceptable and no problem.
This szenario doesn't seem to trigger a "real bulk loader" at all -- only single keys get supplied to loadAll. I defined the same expiration-time for batches of entries together, and was hoping cache2k would notice what needs to be updated together at roughly the same time and create a bulk load out of it. But so far I was out of luck.
Maybe my szenario doesn't fit well to the indented idea -- but from my perspective it would be great if the refresh could simply run in the background and automagically create reasonable sets of entries to update in bulk fashion ...

@cruftex
Copy link
Member

cruftex commented Feb 17, 2021

@dstango thanks for testing. This version is actually pretty flawed. It would be great if you take a look at the next version. I am just debugging a nasty concurrency issue.

it always seems to trigger loadAll() with only a single key.

Is that happening at the initial load or after that during the refresh? With which methods do you access the cache?
Do you use the async loader?

Some explanation of the mode of operation of the core cache with bulk support:

Only a bulk access to the cache will cause a bulk access to the loader. The loader will be called with the keys requested or a subset thereof. Example: a getAll({1,2,3}) on an empty cache will call the loader with the identical set of keys. If the key 2 is already loaded only 1 and 3 are requested. Usually the application works concurrently. Lets say the application is doing two concurrent getAll({1,2,3}) on an empty cache. This may result in one loadAll, if the first thread is able to lock the whole key set, or two loadAll operations if one thread manages to lock key 1 and the other thread locks the other keys.

Each entry is expiring individually. I don't plan something like a bulk expire at the moment. However, its clear, that for background refreshing it is more efficient to load in bulk as well. I'd like to solve that problem as well, but not in the core cache functionality, since that is already very complex. The idea would be to have a loader which adapts the real loader and is coalescing several requests into a bigger by introducing some delay. This way its modular and can be tested separately.

@dstango
Copy link

dstango commented Feb 17, 2021

@cruftex: it happens not at the initial load, but during the refresh.

I access the cache in a maybe unintended way by calling entries() (not getAll()). I don't use getAll(), as I'm actually not interested in the keys, but the values(). I could probably use cache.getAll(cache.keys()) as a workaround to trigger loadAll().

I'm currently just implementing BulkCacheLoader.

I'm happy to try out a new version when it's available in maven.

Thanks for your efforts! :-)

@cruftex
Copy link
Member

cruftex commented Feb 17, 2021

@cruftex: it happens not at the initial load, but during the refresh.

@dstango: That's expected behavior then, because expiry and refresh is working on individual entries only. That's something the additional coalescing loader needs to fix. Good that you ask for it, so I know there is demand.

I access the cache in a maybe unintended way by calling entries() (not getAll()). I don't use getAll(), as I'm actually not interested in the keys, but the values(). I could probably use cache.getAll(cache.keys()) as a workaround to trigger loadAll().

A cache.getAll(cache.keys()) would not trigger a load, since the values are already loaded. A cache.reloadAll(cache.keys()) would.

I am a bit concerned because of your usage pattern. It seems by using Cache.entries() you expect that the values you have initially loaded are still in the cache, however, a cache is generally allowed to forget arbitrary entries. E.g. with refresh ahead enabled the cache might still decide to expire and drop an entry, in case not enough threads are available to refresh. Basically the application should still function (maybe just slowly) if caching is disabled or the maximum size is 0. Maybe you like to open another issue or ask on Stackoverflow about your use case? I am happy and interested to discuss it more, it's just off topic to this issue.

@dstango
Copy link

dstango commented Feb 17, 2021

@cruftex thanks for thinking it through :-) - that gives me more clarity about what to expect.

Actually I might try to "abuse" the cache in my situation. What I"m basically trying to do is to mirror all data of a certain type of a slow external system (customer addresses from some SAP system).
I need to be able to access these data for some REST interface I"m providing, and I need to be able to filter all of them quickly. The data don't need to be up-to-date by the second - minutes or even hours of stale data are totally acceptable in my use case.
So: yes, I want to update them, but I definitely don't want to miss any entry that has been removed from the cache automatically. So I set a cache size big enough to cover all these objects (35.000 - they will consume roughly 10 MB of ram)
I've already written a 'hand-made' solution yesterday which I now need to expand with some proper prefetching, when I learned that a regular cache is maybe not a perfect fit for my needs. Anyhow I have abstracted it enough to be able to plug in either my own thingie or cache2k to play around with.
I don't know if we want to discuss it further in a new issue, as I get the impression I'm moving out of scope of cache2k, and I don't want to try to hijack your library away from it's intended use ;-) -- that"s why I just added this as a comment here.

@cruftex
Copy link
Member

cruftex commented Mar 26, 2021

@MatthiasBechtold @dstango

The next alpha update with major changes to the bulk loader is out.
https://github.com/cache2k/cache2k/releases/tag/v2.1.2.Alpha

  • It does work and should be close to production quality now
  • I added a CoalescingBulkLoader implementation that will merge single refresh requests into a bulk request. That needs a bit more work and testing, though.

@dstango
Copy link

dstango commented Mar 27, 2021

@cruftext Thanks for putting effort into implementing the bulk merge. Will have to check it out.

@cybuch
Copy link
Author

cybuch commented Jul 6, 2021

Hey @cruftex

Thanks for your effort in implementing this feature. I've tested your solution and it works well. I've had my own solution built on top of Caffeine and I've replaced it with cache2k and it passed all the tests. I've also played with cache2k a bit and it seems that everything is OK. I haven't tested this solution in terms of performance, but if you're going to make to release it, then I will.

@cruftex
Copy link
Member

cruftex commented Jul 6, 2021

@cybuch, that's great news! I was kept busy with other stuff the last months and will get back to this now.
As far as I remember there was a bit more thorough concurrency testing left.

If you have a bit time it would be nice to get feedback what elements you are using in your scenario. Sync or async bulk loader? loadAll, getAll, invokeAll? Do you use the CoalescingBulkLoader?

@cybuch
Copy link
Author

cybuch commented Jul 6, 2021

@cruftex
I tested with 2.1.2.Alpha version and TBH I couldn't find CoalescingBulkLoader so I've tested sync bulk loader.

So in my scenario I've tested loadAll and getAll, but with customization. So for now, the expired entries are loaded once they're needed (eg. somebody has asked for entry X) and the first call trigger the loader. This isn't acceptable for me, but I've simply added CacheEntryExpiredListener which collects expired entries and when threshold is met (eg. 100 entries have expired) then it reloads those entries with loadAll .

@cruftex
Copy link
Member

cruftex commented Jul 6, 2021

@cybuch
The CoalescingBulkLoader is in the jar cache2k-addon. You can wrap an existing async bulk loader, e.g.:

    AsyncBulkCacheLoader<Integer, Integer> bulkLoader = (keys, context, callback) -> {
      // implement bulk loader here
    };
    CoalescingBulkLoader<Integer, Integer> coalescingLoader = new CoalescingBulkLoader<>(
      bulkLoader,
      TimeReference.DEFAULT,
      100, // delay milliseconds
      50 // batch size
    );
    Cache<Integer, Integer> cache = Cache2kBuilder.of(Integer.class, Integer.class)
      .bulkLoader(coalescingLoader)
      .refreshAhead(true)
      .expireAfterWrite(5, TimeUnit.MINUTES)
      .build();

Maybe you want to test it?

@cybuch
Copy link
Author

cybuch commented Jul 6, 2021

@cruftex
Yea, sure. Give me few days and I'll play with it

@cybuch
Copy link
Author

cybuch commented Jul 12, 2021

@cruftex
Hey, I've played a bit with CoalescingBulkLoader and there's a problem. I've set it as you've proposed and when I try to get single value from the cache then it hangs forever. I've tried to debug it and I've set a couple of breakpoints and actually none of the CoalescingBulkLoader methods were called. I've dumped the threads and it turns out that the application hangs in org.cache2k.core.Entry.waitForProcessing(Entry.java:417)

@cruftex
Copy link
Member

cruftex commented Jul 12, 2021

Hi @cybuch, thanks for trying. From what you explain it could be related to the async processing scheme. The CoalescingBulkLoader is only working with the AsyncBulkCacheLoader. The async scheme allows processing without assigned thread, but upon completion a callback method has to be called. If that method is never called, the cache waits forever. cache2k supports a mixed sync / async opertion. The call cache.get is synchronous and blocks until a result is available. Maybe that is what you experiencing. The CoalescingBulkLoader has a basic test case, so I expect that the basic functionality works.

Please double check. If there is still a problem, please provide me with an example how I can reproduce this and I will look into it instantly.

@cybuch
Copy link
Author

cybuch commented Jul 12, 2021

Hey @cruftex ,
indeed I've didn't call the callback. I've fixed the issue and finally I've tested CoalescingBulkLoader a bit. In general it works well, but there're some issues (I guess that they may be part of design).

  1. Cache hangs forever if async bulk cache loader's callback was called with empty map. I've set my cache to not allow null values and to make values eternal. I've tried to call CoalescingBulkLoader callback with empty map and I'd expect it to simply remove the keys from the cache for which there're no value, but instead the get call hung forever.
  2. I've set the cache to make entries expire after some time e.g. 2 seconds and to make entries eternal. I've simply watched what's going on and after first get call the value was loaded and returned. Then value got expired and was reloaded with updated value., but strange things have happened. The peek call returned null value, however get call returned the updated value and the call to cache.requestInterface(CacheControl.class).getSize() returned proper size. When I skipped the get call and I've simply waited a bit longer I'd expect the cache to reload given entry every time it got expired +- some delta. However the entry was reloaded only once. The cache didn't call load for given entry more than once and eventually the entry got removed from the cache and cache.requestInterface(CacheControl.class).getSize() returned 0.

@cruftex
Copy link
Member

cruftex commented Jul 12, 2021

Oh, yes, it works as designed.

  1. Cache hangs forever if async bulk cache loader's callback was called with empty map.

The bulk loader may return partial results. If you don't call the callback for the key which load is requested, its simply still loading. Either provide data for the key, or call the failure call back onLoadFailure which will complete all pending requests with the exception.

I've set the cache to make entries expire after some time e.g. 2 seconds and to make entries eternal. I've simply watched what's going on and after first get call the value was loaded and returned. Then value got expired and was reloaded with updated value., but strange things have happened.

This is intentional as well. With read through operation you would typically use Cache.get() to either retrieve the current value from the cache or trigger a load. Even with refresh ahead enabled the cache needs to still have a cache behavior and only store mappings that are requested by the application. If the mapping is not requested any more after its first refresh its expires normally. Its actually documented here: https://cache2k.org/docs/latest/user-guide.html#refresh-ahead
There is also an open issue for improvement to add more flexibility: #34

The behavior is counter intuitive. You do load a value into the cache via refresh, but its not appearing. However, the semantics of peek don't say something gets loaded. A "correct" application should behave identically with or without refresh ahead enabled, just a bit faster.

The practical reason is: I want to expire entries that are not accessed any more. So I need to know whether an entry is still accessed. However, OTOH I don't want to add more code for bookkeeping into the most critical path of a cache, which is the cache hit. This would make every program using the cache slower. That's why a refreshed entry isn't the same as a normal entry before its value actually is requested once via get. The cache is having an internal cache miss, to trigger bookkeeping and is storing/revealing the preloaded value.

@cybuch
Copy link
Author

cybuch commented Jul 13, 2021

Hey @cruftex, thanks for your answers.

The bulk loader may return partial results. If you don't call the callback for the key which load is requested, its simply still loading. Either provide data for the key, or call the failure call back onLoadFailure which will complete all pending requests with the exception.

That's exactly my case. Cache loader may return partial results or no results at all for given set of keys and then I'd like simply to remove those keys from the cache. Right now, if I'd call onLoadFailure instead of passing empty map to onLoadSuccess then the get caller will get an exception, but he shouldn't. It's not exceptional situation, the value simply is not there.

The practical reason is: I want to expire entries that are not accessed any more. So I need to know whether an entry is still accessed. However, OTOH I don't want to add more code for bookkeeping into the most critical path of a cache, which is the cache hit. This would make every program using the cache slower. That's why a refreshed entry isn't the same as a normal entry before its value actually is requested once via get. The cache is having an internal cache miss, to trigger bookkeeping and is storing/revealing the preloaded value.

I get your point of view. The question is, could be there another cache implementation that could wrap existing cache implementation to provide such feature? Then it wouldn't bog down cache performance for regular users (but to be honest, if this behaviour would be hidden by some flag, then probably it would be JITTed so regular users wouldn't notice any difference).

So my usage is: I've got a cache, that keeps and refreshes entries forever. Actually it's not forever, because as stated in the first paragraph the data may not be available anymore. The cache size is limited and if it's about to exceed it's size then some entries are removed using LRU policy. For me that's a huge performance gain, when all the data that user needs is in cache, even if it's not accessed so often.

@cruftex
Copy link
Member

cruftex commented Jul 13, 2021

Hi @cybuch. Although we are now a bit off topic w.r.t. to the issue, I am happy to help.
These are actually two good questions for stack overflow.

That's exactly my case. Cache loader may return partial results or no results at all for given set of keys and then I'd like simply to remove those keys from the cache.

You have two options here:

Negative caching: In case keys are requested that don't have data associated there is a problem. If you don't have a mapping for them in the cache you will repeatedly invoke the loader if those keys get requested again. The fact the cache remembers that there is no data, is called negative caching. cache2k is supporting this, by allowing to store a null value. You need to enable null values via permitNullValues, then make the loader return null for non-existing data.

Remove entries via the loader: You can do that as well, with the problem that repeated requests will invoke the loader again. If this is only happening scarcely then it might be okay. You can store nothing or remove a mapping if a key yields no data. To do so, set permit NullValues to false. For a key that maps to no data return null via the loader. Establish a special expiry policy that returns 0, meaning immediate expire, for a null value. This will stop the mapping to be stored in the cache. If you don't have an expiry policy yet, its simple in this case:
.expiryPolicy((key, value, loadTime, currentEntry) -> value == null ? 0 : Long.MAX_VALUE)
For non-null values the expiry time, configured with expiryAfterWrite will be used.

Side note: If your keys come from user input you always open the door for DOS attacks here. With negative caching a user has the opportunity to sweep the cache and exhausting memory resources. Without negative caching the loader can be invoked continuously the user and exhaust computing resources. Countermeasures we used are rate limiters that detect a high miss rate on a single IP. Another option are bloom filters, see: https://en.wikipedia.org/wiki/Bloom_filter#Examples

I get your point of view. The question is, could be there another cache implementation that could wrap existing cache implementation to provide such feature? Then it wouldn't bog down cache performance for regular users.

You would need it always when refresh ahead is enabled. Refresh ahead makes sense pretty much always when there is expiry and a loader. So every "regular user" would potentially use it. The semantic mismatch would still be there. It's counter intuitive as well if peek would cause a refresh but isn't meant to invoke the loader.

The current solution adds the functionality elegantly without the need of additional data structures or extending the critical code path.

Let's open another issue for this discussion: #172

So my usage is: I've got a cache, that keeps and refreshes entries forever. .... For me that's a huge performance gain, when all the data that user needs is in cache, even if it's not accessed so often.

Got it. That would be a use case for RefreshPolicy and issue #34. The expiry of the cache entry and the expiry aka refresh interval of the cached data need to be separated.

We have a similar problem in our applications. Ironically the response time at night is higher than at day time. The reason is only a few requests at night don't keep the caches warm.

Can you add a comment to #34 and roughly describe your use case?

Another thing: You said you were using Caffeine before, is there any particular feature that made you choose cache2k and not Caffeine?

@cruftex
Copy link
Member

cruftex commented Jul 13, 2021

@cybuch I thought about your scenario with "endless refresh". You can do that quite elegant as well. Here is an example how I would do it:

    AsyncBulkCacheLoader<Integer, Integer> bulkLoader = (keys, context, callback) -> {
      // implement bulk loader here
    };
    CoalescingBulkLoader<Integer, Integer> coalescingLoader = new CoalescingBulkLoader<>(
      bulkLoader,
      TimeReference.DEFAULT, // the cache might have a different time reference
      100, // delay milliseconds before sending the request
      50 // maximum batch size
    );
    AtomicReference<Cache<Integer, Integer>> cacheRef = new AtomicReference<>();
    Cache<Integer, Integer> cache = Cache2kBuilder.of(Integer.class, Integer.class)
      .loader((AsyncCacheLoader<Integer, Integer>) (key, context, callback) -> {
        if (context.getCurrentEntry() == null) {
          // no entry yet, that is the initial load
          coalescingLoader.load(key, context, callback);
        } else {
          // already has an entry, that is a refresh
          // trigger load but complete with current data immediately
          coalescingLoader.load(key, context, new AsyncCacheLoader.Callback<Integer>() {
            @Override
            public void onLoadSuccess(Integer value) { cacheRef.get().put(key, value); }
            @Override
            public void onLoadFailure(Throwable t) {
              // there is still data, ignore or log?
            }
          });
          callback.onLoadSuccess(context.getCurrentEntry().getValue());
        }
      })
      .refreshAhead(true)
      .expireAfterWrite(5, TimeUnit.MINUTES) // trigger a refresh every 5 minutes
      // remove mappings which have no data any more
      .expiryPolicy((key, value, loadTime, currentEntry) -> value == null ? 0 : Long.MAX_VALUE)
      .build();
    cacheRef.set(cache);

This is similar with your approach of the expiry listener. Using the loader and refresh has the advantage that the data keeps being available. Since the refresh is doing a cache.put now the data becomes visible and the normal expiry start again repeatedly. The entry is in the special "refreshed" state only between expiry and completion of the loader. I didn't test it, but it should run alright.

@cybuch
Copy link
Author

cybuch commented Jul 13, 2021

Hi @cruftex

Removing entries from the loader by setting expiry policy works like a charm for me. Thanks for sharing the info about DoS attacks, but the source of the data in my case are in-house clients so it's not the case for me.

Can you add a comment to #34 and roughly describe your use case?

Sure, I will.

We have a similar problem in our applications. Ironically the response time at night is higher than at day time. The reason is only a few requests at night don't keep the caches warm.

We have faced the same problem, but using caches that reloads the values forever solved this case.

You said you were using Caffeine before, is there any particular feature that made you choose cache2k and not Caffeine?

The problem is the Caffeine didn't have solution for my needs so I built a custom solution around Caffeine. However it's not easy to maintain it and for sure it's also hard to understand the code. I'm about to hand down the project to the other team and it would be nice to use solution provided by the libraries (whether it's Caffeine or cache2k) so the other team wouldn't have to worry about some cache mechanisms that I've implemented.
TBH I'm not sure if Caffeine doesn't have solution for me right now, I'd have to check. But with the new feature of cache2k that I test, the cache2k have almost everything that I need. Maybe I'll be able to live with the fact, that unused entries will get expired, I'd have to do some tests under real, but limited load.

BTW do you know when you'll be releasing this feature as regular version?

@cruftex
Copy link
Member

cruftex commented Jul 22, 2021

Just released another update mainly for the bulk support. Some concurrency issues fixed for corner cases.
API improvements. CoalescingBulkLoader can also be configured declarative, example:

 Cache<Integer, Integer> cache = Cache2kBuilder.of(Integer.class, Integer.class)
      .bulkLoader((keys, context, callback) -> { ... })
      .enableWith(CoalescingBulkLoaderSupport.class, b -> b
        .maxBatchSize(100)
        .maxDelay(0, TimeUnit.MILLISECONDS))
      .build();

Link to the release: https://github.com/cache2k/cache2k/releases/tag/v2.1.3.Alpha

@cruftex
Copy link
Member

cruftex commented Jul 22, 2021

@cybuch

BTW do you know when you'll be releasing this feature as regular version?

I plan to remove the old loader classes in org.cache2k.integration and some other deprecated things to clean up the interface.
Expect to be finished in two weeks since I am in vacation next week.

@cruftex
Copy link
Member

cruftex commented Aug 3, 2021

Just released 2.1.4.Beta as a final step to clean up the interface. Also the old loadAll and reloadAll methods are removed that worked with a listener in favor of the new methods with CompletableFuture introduced in 2.0.

See: https://github.com/cache2k/cache2k/releases/tag/v2.1.4.Beta

@cruftex
Copy link
Member

cruftex commented Aug 5, 2021

Just released 2.1.5.Beta with concurrency issues and exception handling fixed in the CoalescingBulkLoader. The async bulk loader callback interface was enhanced to support different exceptions per key to better support scenarios when incoming requests are split into multiple smaller bulk requests. Side note: The CoalescingBulkLoader might split incoming bulk requests and can be used to limit the maximum bulk request sizes.

See: https://github.com/cache2k/cache2k/releases/tag/v2.1.5.Beta

Possibly, this is the last change for version 2.2.

@cruftex
Copy link
Member

cruftex commented Aug 6, 2021

Did some more testing with the latest beta with some of our application code. Everything works correctly.

For the CoalescingBulkCacheLoader, as it is designed now, I found that it could have negative impact in some scenarios. So there is more room for improvement which I outlined here: #173 I don't intend to do it now, since the things we have now are worthy releasing and its better to gather some more feedback.

Only thing left is a look over the documentation which should mention the new bulk interfaces.

@cruftex
Copy link
Member

cruftex commented Aug 9, 2021

Decided to do #173 right away. Its now default that the CoalescingBulkLoader only combines requests triggered by a refresh.
This can be controlled by the configuration option 'refreshOnly'. The documentation about "cache loading" is updated to at least contain some hints on the new functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants