Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mitigate LRU struct tearing using SeqLock #593

Merged
merged 14 commits into from
May 26, 2024
Merged

Conversation

bitfaster
Copy link
Owner

@bitfaster bitfaster commented May 20, 2024

If the cache entry is a value type larger than the native pointer size (e.g. a Guid), writes are not atomic and if the value is updated and read concurrently, readers may see a torn value.

There are at least two ways to solve this:

  1. Lock the item containing the value on read. This PR implements option 1 using a SeqLock.
  2. Make a new LruItem and update the dictionary. See PR Mitigate value type torn writes during LRU update (enqueue) #545.

Option 1 is preferred, because option 2 can make cache size unstable (stale values consume queue slots, pushing out live cache entries).

SeqLock pros and cons:

  • Pros: reads are lock free. Reads cannot starve writes. Throughput for Read and Read+Write impact is pretty low. Lookup latency hardly changes. Good choice when reads outnumber writes or writes are not frequent.
  • Cons:
    • Can live lock. This is reproable via the unit test with a very tight loop on LruItem, but when exercised in the context of the cache codepath impact is less (see ConcurrentLruSoakTests.WhenValueIsBigStructNoLiveLock vs LruItemSoakTests.DetectTornStruct). Updating the same item in an extremely tight loop is not the common case. Live lock is mitigated in this improved version, but requires more memory and returns a stale result.
    • LruItem size increased by 4 bytes. For the common case (x64, object/Guid key with object reference), this fits inside the padding. But for other cases, there is no way a user can remove it.

Atomic/Scoped etc.

The update code paths for atomic/scoped caches generate new wrapper class instances and call cache update to replace the object. They are therefore not susceptible to torn reads - the structs inside them are not changed after the wrapper is created.

Using a lock statement (be8f0ed)

Naive implementation using a C# lock statement added to read makes LRU roughly the same latency as LFU with a Guid value. Since LruItem is already locked on update, torn reads are prevented by the lock. This comes with the overhead of the lock, which results in lock contention for concurrent reads.

BitFaster Caching Benchmarks LruJustGetOrAddGuid- NET 6 0-columnchart
BitFaster Caching Benchmarks LruJustGetOrAddGuid- NET Framework 4 8-columnchart

Read throughput is significantly reduced:
Results_Read_500

Using SeqLock (87fcd06)

See SeqLock. This is a good fit for our scenario, because we already have a single threaded update, and we can keep reads lock free and fast.

  • We continue to lock on update (to support synchronization between two writers) and add a counter to enable consistency for readers.
  • LruItem now contains an extra 32 bit integer field to represent the SeqLock counter. This fits inside the padding for LruItem<object,object>, so for the common case does not incur a memory overhead.
  • Reads are lock free and retry if the sequence number changed while data was being read.

BitFaster Caching Benchmarks LruJustGetOrAddGuid- NET 6 0-columnchart
BitFaster Caching Benchmarks LruJustGetOrAddGuid- NET Framework 4 8-columnchart
Results_Read_500
Results_ReadWrite_500

@coveralls
Copy link

coveralls commented May 20, 2024

Coverage Status

coverage: 99.141% (+0.01%) from 99.13%
when pulling 7a88edf on users/alexpeck/tornwrite2
into 8934324 on main.

@bitfaster bitfaster changed the title Mitigate value type torn writes during LRU update Mitigate value type torn writes during LRU update (SeqLock) May 20, 2024
@bitfaster bitfaster marked this pull request as ready for review May 21, 2024 01:05
@bitfaster bitfaster changed the title Mitigate value type torn writes during LRU update (SeqLock) Mitigate LRU update value type tearing (SeqLock) May 21, 2024
@bitfaster bitfaster changed the title Mitigate LRU update value type tearing (SeqLock) Mitigate LRU value type tearing (SeqLock) May 21, 2024
@bitfaster bitfaster changed the title Mitigate LRU value type tearing (SeqLock) Mitigate LRU struct tearing using SeqLock May 24, 2024
@bitfaster bitfaster merged commit 4c0e26a into main May 26, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants