Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evicition with CacheWriter.delete does not appear atomic to Cache clients #186

Closed
ksperling opened this issue Sep 19, 2017 · 8 comments
Closed

Comments

@ksperling
Copy link

I'm using a LoadingCache to manage meta-data for a cache where the actual data is stored in files on disk; the CacheWriter.delete callback is used to delete the file from disk when the corresponding cache entry is evicted.

Overall this seems to be working OK, however I am seeing cases where cached entries seem to still being returned even though CacheWriter.delete has already been called, which then results in a FileNotFoundException in the code trying to use that cache entry. Note that I do have additional locking in place outside of the cache in the form of a Map<CacheKey, Lock>, and both the cache read (including the subsequent file access) and the delete callback are guarded by this lock.

This gist demonstrates the issue (I'm using Caffeine 2.5.5) https://gist.github.com/ksperling/241c865526a2df9ad0dc9d02dac4393c

Not sure if this is a bug or if I'm interpreting the guarantees around CacheWriter incorrectly?

@ben-manes
Copy link
Owner

It is atomic to other writes, as the removal operation is performed with a compute method. However, a read may be interwoven between writes, e.g. during the write or traverse a stale link chain (see ConcurrentHashMap). It sounds like you are assuming exclusive access for a read/write, which would incur a high penalty. The CacheWriter is an interceptor for ordering of writes to a particular key, but it doesn't block reads. You could perform all reads as a compute that returns the same value, but that degrades to the throughput of a write which is unlikely to be acceptable.

Most likely you need to perform some type of liveliness validation, e.g. in your gist retry if the read is determined to be stale. You might prefer using phantom references for resource cleanup instead to ensure no direct usage. Its hard to advise without more details, as unfortunately these are the types of races you have to design for.

@ben-manes
Copy link
Owner

ben-manes commented Sep 19, 2017

Actually in your case of lock striping the entry, there may not be a large penalty of using a cache.asMap().compute(key, (k, v) -> nv). That would stripe on the map's entries, which is documented as 1 / (8 * #elements). So delegating to the cache for locking might actually be simpler and offer similar performance.

@ksperling
Copy link
Author

Thanks for the lightning fast response.

I guess the bit that puzzled me is why my separate lock doesn't protect me from this case. So what's happening is that the thread doing the eviction does a computeIfPresent, my Writer.delete method invalidates under my external lock (but has no way of holding that lock long enough until the entry is actually no longer visible in the cache). Then a reader comes along, takes the external lock and does a plan old get which can return the stale entry because it's not a compute* call?

In my real code the external lock is actually a ReadWriteUpdateLock, with most read and write operations just taking a shared lock during most of the disk IO, so I can't easily do away with this in favour of just letting the Cache do the locking. Just using a no-op compute instead of get may work though.

The other thought I had was to maybe use a RemovalListener instead of a CacheWriter, presumably this is called after an entry is guaranteed to no longer be visible from the cache. The main issue there might be preventing the callback from clobbering a file that has been evicted but loaded again before the removal listener gets run.

@ben-manes
Copy link
Owner

When the Writer.delete call completes, the computeIfPresent hasn't yet returned. That means there is a gap when delete unlocks and ConcurrentHashMap fully removes the entry. There is also the possibility for a reader to walk a stale bucket chain, as in Java 5's it was immutably rebuilt on every write and may have similar properties in the Java 8 redesign. This is a small gap, but enlarged due to threads context switching out at the precisely wrong times.

If all operations were under a compute, then they are performed within a synchronized (entry) within the hash table. Then you have a guarantee that no read or write has a stale read because it must operate with exclusive access.

I would probably have the read/write operations validate under lock and retry if they have a dead entry. That requires using a loop around the get call to re-read if necessary. This way you don't incur much of a penalty for the rare case that it has to retry, as most often the entry is still valid.

@ksperling
Copy link
Author

Just tried using compute in the test code, and it ends up deadlocking due to the two threads both acquiring the external and entry locks, but in the opposite order. So it seems your suggestion of get and retry is the only feasible approach.

Thanks for your help!

@ben-manes
Copy link
Owner

Oh, right. The compute method assumed that you didn't need an external lock anymore. But it sounded like you still wanted that, so it wouldn't be help.

The validate or retry seems to work in your test code. Good luck!

@ksperling
Copy link
Author

Just thought I'd mention how I solved this in the end in case anyone runs across a similar scenario:

Instead of a CacheWriter I'm using a RemovalListener. Within the listener I acquire my external lock as before, and then check if the key has been re-added using cache.getIfPresent(key) == null. In my loading function I also handle the case of "finding" that the value I'm about to load actually hasn't been evicted yet; the actual loading of the data can simply be skipped in that case.

@ben-manes
Copy link
Owner

Nice job. You can also use cache.asMap.containsKey(key).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants