Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Values currently being loaded shows up as null entries in synchronous().asMap() #1626

Closed
chaoren opened this issue Apr 2, 2024 · 9 comments

Comments

@chaoren
Copy link
Collaborator

chaoren commented Apr 2, 2024

"A mapping is not present if the value is currently being loaded" according to

* Returns a view of the entries stored in this cache as a synchronous {@link Cache}. A mapping is
* not present if the value is currently being loaded. Modifications made to the synchronous cache

This might be an intentional design choice, but a mapping that is "not present" is not the same as being mapped to null (with regard to containsKey, size(), etc), so this documentation is inconsistent with the actual behaviors of synchronous().asMap().

Here's a test case demonstrating this:

@Test
public void synchronousAsMapLoading() {
  AsyncCache<String, String> cache = Caffeine.newBuilder().buildAsync();
  cache.put("key", new CompletableFuture<>());
  assertThat(cache.synchronous().asMap()).doesNotContainKey("key");
  assertThat(cache.synchronous().asMap()).isEmpty();
}
@ben-manes
Copy link
Owner

I think this is correct in behavior and documentation, but I understand its a little confusing. The aggregated calls like Map.size() and isEmpty() are not strict, as documented by Cache.estimatedSize().

/**
* Returns the approximate number of entries in this cache. The value returned is an estimate; the
* actual count may differ if there are concurrent insertions or removals, or if some entries are
* pending removal due to expiration or weak/soft reference collection. In the case of stale
* entries this inaccuracy can be mitigated by performing a {@link #cleanUp()} first.
*
* @return the estimated number of mappings
*/
@NonNegative
long estimatedSize();

Perhaps the asMap() views should include a snippet about the approximation. An expired or collected entry will also be included in those calls even though they cannot be read. That was also true in Guava. This is somewhat expected to be influx due to cache describing itself as a "semi-persistent mapping".

The synchronous view would be not present in the same way that an in-flight load for ConcurrentHashMap.computeIfAbsent is not yet present. Internally it inserts a reservation node and while the load is in-flight it is not considered present, so a null is returned. In an asynchronous cache the load completion cannot be atomic with the size counter and even ConcurrentHashMap doesn't try on their own (just a much smaller race). You can also call asyncCache.asMap() for the future map and make the distinction, so the synchronous view is a best effort of what is hopefully reasonable behavior.

@chaoren
Copy link
Collaborator Author

chaoren commented Apr 4, 2024

An expired or collected entry will also be included in those calls even though they cannot be read.

I see those inaccuracies as just being slightly out of date. With loading entries, you can have a sequence of observed events like this:

  • cache.synchronous().asMap().containsKey("loading")
  • cache.synchronous().asMap().get("loading") == null
  • cache.synchronous().asMap().containsKey("loading")

which shouldn't be possible to observe unless an entry was inserted and then removed.

It's also especially inaccurate if the loading entry ends up being cancelled for whatever reason. If that happened, the entry shouldn't have been observable at any point. It's not just observing an out-of-date state of the cache in that case.

@ben-manes
Copy link
Owner

If you're asking for containsKey to ignore async loading values, that seems like a reasonable change. I'd agree that key lookups should pretend like the loading value is not present.

If you are asking for the size, isEmpty, or other aggregated methods to mask it then I think that will be problematic. If we maintain our own size counter outside of ConcurrentHashMap, it can't be represented atomically with the future's state change. The dependent actions are run in LIFO order so there will be a delay, which could be large if a user's action is slow. The synchronous view will always have to be best effort and not linearizable due to the nature of the async mappings.

@chaoren
Copy link
Collaborator Author

chaoren commented Apr 4, 2024

it can't be represented atomically with the future's state change

Right, I don't think it needs to be atomically accurate. It can be slightly out of date like how evictions and invalidations are handled.

The synchronous view will always have to be best effort and not linearizable

I agree it has to be best effort, but I don't see why it can't be linearizable.

@ben-manes
Copy link
Owner

Being linearizable can be a good or bad characteristic, because it forces certain atomicity in visible state. A ConcurrentHashMap.computeIfAbsent is linearizable but blocks other writes, like a concurrent put or remove to that loading entry. It is stricter than sequentially consistency by requiring the time order of operations. This would greatly complicate and negatively impact when one wants to work on the async map vs the sync map, as both would have to fall into the worst case with significant delays to maintain a strict time ordering. For the synchronous cache we assert with Lincheck but don't try to make that promise for the async one.

@chaoren
Copy link
Collaborator Author

chaoren commented Apr 4, 2024

I think I'm only asking for sequential consistency and not linearizability. Let's say you start loading an entry, but it gets cancelled before it's finished. You shouldn't be able to see the map as being non-empty at any point, since that never happens from the synchronous view of the cache.

I think these inaccuracies should be tolerated in the synchronous().asMap() view:

  • a completed entry being treated as if it were still loading
  • an evicted entry being treated as if it were still present

but not these

  • a still loading entry being treated as if it had completed
  • a still present entry being treated as if it had been evicted

@ben-manes
Copy link
Owner

Maybe you can enumerate the changes you'd like to see? Right now containsKey returning false if still loading seems like a mistake that violates those assumptions. I'm unsure what else might, as I think we agree and it probably works mostly correct.

@chaoren
Copy link
Collaborator Author

chaoren commented Apr 4, 2024

Maybe you can enumerate the changes you'd like to see?

For AsyncCache.synchronous().asMap():

  • containsKey(loading) should return false for loading keys.
  • size() should not include loading entries.
  • keySet(), values(), and entrySet() should not contain loading entries.

Let's say we have a cache that's completely empty except for one entry that's currently loading and will not finish any time soon. I expect:

  • cache.synchronous().asMap().containsKey(loading) to be false
  • cache.synchronous().asMap().size() to be 0
  • cache.synchronous().asMap().isEmpty() to be true
  • cache.synchronous().asMap().keySet().isEmpty() to be true
  • cache.synchronous().asMap().values().isEmpty() to be true
  • cache.synchronous().asMap().entrySet().isEmpty() to be true

Basically, I just want the documentation on synchronous() to be accurate: "A mapping is not present if the value is currently being loaded."

Right now containsKey returning false if still loading seems like a mistake that violates those assumptions.

You meant to say that right now it's returning true, right? We want it to return false.

Interestingly, right now cache.synchronous().asMap().values() would show a size() of 1, but cache.synchronous().asMap().values.iterator().hasNext() would correctly return false.

@ben-manes
Copy link
Owner

Sorry, yes I agree with the containsKey fix and mistyped.

I think only key-based operations should be strict and only allow a successfully completed future value to be treated as present. A loading, failed, or future holding a null value should be treated as absent. So containsKey is a nice bug fix, thanks!

I disagree with size / isEmpty as they cross-cut multiple entries. That is already going to be inconsistent with the iterator for other reasons.

Improvements to the documentation are always worthwhile and any suggestions would of course be welcome.

ben-manes added a commit that referenced this issue Apr 15, 2024
fixes #1626)

The keySet and its iterator will now suppress in-flight entries for
the contains test and by its iterator. The retain and removal methods
will discard in-flight mappings, like the entrySet view does. This is
intentional as it is better to overly discard data then keep stale
contents due to incorrect linearization assumptions (of which async can
offer few). The synchronous views are best-effort, lenient, and try to
match a user's likely expected or intended behavior while being
conservative by being cautiously safe. Thus, small leaks of in-flight
behavior is preferrable, especially given the concurrent nature of the
use-case, and we encourage using the async asMap view when
orchastrating more nuanced behavior.
ben-manes added a commit that referenced this issue Apr 15, 2024
 #1626)

The keySet and its iterator will now suppress in-flight entries for
the contains test and by its iterator. The retain and removal methods
will discard in-flight mappings, like the entrySet view does. This is
intentional as it is better to overly discard data then keep stale
contents due to incorrect linearization assumptions (of which async can
offer few). The synchronous views are best-effort, lenient, and try to
match a user's likely expected or intended behavior while being
conservative by being cautiously safe. Thus, small leaks of in-flight
behavior is preferrable, especially given the concurrent nature of the
use-case, and we encourage using the async asMap view when
orchastrating more nuanced behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants