A high-performance, concurrent in-memory cache for Go with self-tuning frequency-based eviction.
CloxCache uses a self-tuning protected-frequency eviction strategy that learns optimal thresholds for your workload, achieving 5-30% better hit rates than Otter (S3-FIFO) while providing 3-5x better concurrent throughput.
- Linearizable: Synchronous writes guarantee that a
Get()immediately afterPut()returns the value. Many caches (including Otter) use async/buffered writes that break this guarantee. - Self-tuning thresholds: Learns optimal graduation rate thresholds using gradient descent on hit rate
- Adaptive eviction threshold: Automatically adjusts protection level (k) based on workload characteristics
- Scan resistant: Maintains high hit rates even under scan-heavy workloads
- Lock-free reads: Reads use only atomic operations, enabling massive read concurrency
- Sharded writes: Minimises write contention across CPU cores
- Generic: Supports
stringor[]bytekeys with any value type
CloxCache protects frequently accessed items from eviction. The protection threshold (k) adapts automatically:
- Items with
freq > kare protected from eviction - k adapts per-shard based on observed graduation rate
- The graduation rate thresholds themselves are learned based on whether k changes improve hit rate
// Items with freq > k are protected from eviction
// k adapts per-shard based on observed graduation rate
// Thresholds self-tune via gradient descent on hit rate
if item.freq <= k {
// Candidate for eviction
} else {
// Protected (likely working set)
}go get github.com/bottledcode/cloxcachepackage main
import (
"fmt"
"github.com/bottledcode/cloxcache/cache"
)
func main() {
// Create cache for 10,000 entries
cfg := cache.ConfigFromCapacity(10000)
c := cache.NewCloxCache[string, *MyData](cfg)
defer c.Close()
c.Put("user:123", &MyData{Name: "Alice"})
if data, ok := c.Get("user:123"); ok {
fmt.Println(data.Name)
}
}
type MyData struct {
Name string
}Tested on real-world cache traces (4x back-to-back iterations for warm cache):
| Trace | Capacity | CloxCache | Otter (S3-FIFO) | vs Otter |
|---|---|---|---|---|
| OLTP | 20% | 71.34% | 70.54% | +0.80% |
| OLTP | 50% | 84.24% | 80.99% | +3.25% |
| OLTP | 70% | 92.00% | 80.15% | +11.85% |
| DS1 | 1% | 11.21% | 6.42% | +4.79% |
| DS1 | 50% | 58.10% | 43.86% | +14.24% |
| DS1 | 70% | 78.83% | 48.70% | +30.13% |
| Twitter52 | 30% | 84.84% | 84.11% | +0.73% |
| Twitter52 | 50% | 89.94% | 87.25% | +2.69% |
| Twitter52 | 70% | 94.57% | 87.09% | +7.48% |
CloxCache significantly outperforms Otter (S3-FIFO) across all workloads, especially at medium-to-high cache capacities where the ghost queue provides excellent scan resistance.
Raw benchmark output
=== RUN TestCapacitySweep/OLTP
Trace: OLTP, Accesses: 500000 (4x = 2000000), Unique keys: 121783
| Capacity | CloxCache | SimpleLRU | Otter | Clox vs LRU | Clox vs Otter | Avg K |
|----------|-----------|-----------|-------|-------------|---------------|-------|
| 1217 (1%) | 36.70% | 36.44% | 45.11% | +0.25% | -8.41% | 2.53 |
| 6089 (5%) | 55.02% | 55.45% | 58.23% | -0.42% | -3.21% | 2.59 |
| 12178 (10%) | 61.54% | 61.65% | 64.16% | -0.11% | -2.62% | 2.58 |
| 24356 (20%) | 71.34% | 67.32% | 70.54% | +4.02% | +0.80% | 2.91 |
| 36534 (30%) | 74.30% | 70.28% | 74.46% | +4.01% | -0.17% | 2.97 |
| 48713 (40%) | 78.10% | 71.90% | 79.27% | +6.20% | -1.16% | 2.98 |
| 60891 (50%) | 84.24% | 74.26% | 80.99% | +9.98% | +3.25% | 2.70 |
| 73069 (60%) | 91.19% | 76.33% | 79.13% | +14.86% | +12.07% | 2.03 |
| 85248 (70%) | 92.00% | 78.24% | 80.15% | +13.77% | +11.85% | 2.00 |
| 97426 (80%) | 92.65% | 79.83% | 84.13% | +12.82% | +8.52% | 2.00 |
=== RUN TestCapacitySweep/DS1
Trace: DS1, Accesses: 500000 (4x = 2000000), Unique keys: 320208
| Capacity | CloxCache | SimpleLRU | Otter | Clox vs LRU | Clox vs Otter | Avg K |
|----------|-----------|-----------|-------|-------------|---------------|-------|
| 3202 (1%) | 11.21% | 1.51% | 6.42% | +9.70% | +4.79% | 2.70 |
| 16010 (5%) | 16.12% | 5.54% | 17.07% | +10.57% | -0.96% | 2.64 |
| 32020 (10%) | 16.89% | 7.35% | 18.77% | +9.55% | -1.88% | 2.81 |
| 64041 (20%) | 22.94% | 9.66% | 27.71% | +13.28% | -4.77% | 2.78 |
| 96062 (30%) | 28.31% | 10.71% | 42.07% | +17.60% | -13.76% | 2.95 |
| 128083 (40%) | 40.32% | 13.76% | 43.94% | +26.56% | -3.62% | 2.75 |
| 160104 (50%) | 58.10% | 15.33% | 43.86% | +42.76% | +14.24% | 3.09 |
| 192124 (60%) | 77.01% | 17.06% | 50.40% | +59.96% | +26.61% | 7.00 |
| 224145 (70%) | 78.83% | 17.32% | 48.70% | +61.51% | +30.13% | 1.00 |
| 256166 (80%) | 80.46% | 17.87% | 50.12% | +62.59% | +30.34% | 1.06 |
=== RUN TestCapacitySweep/Twitter52
Trace: Twitter52, Accesses: 500000 (4x = 2000000), Unique keys: 82416
| Capacity | CloxCache | SimpleLRU | Otter | Clox vs LRU | Clox vs Otter | Avg K |
|----------|-----------|-----------|-------|-------------|---------------|-------|
| 824 (1%) | 60.21% | 62.60% | 67.57% | -2.39% | -7.36% | 2.59 |
| 4120 (5%) | 72.79% | 72.36% | 76.03% | +0.43% | -3.24% | 2.67 |
| 8241 (10%) | 76.85% | 76.25% | 79.56% | +0.60% | -2.71% | 2.53 |
| 16483 (20%) | 80.51% | 80.11% | 82.61% | +0.40% | -2.10% | 2.41 |
| 24724 (30%) | 84.84% | 81.84% | 84.11% | +2.99% | +0.73% | 2.70 |
| 32966 (40%) | 86.69% | 83.08% | 87.25% | +3.61% | -0.56% | 2.73 |
| 41208 (50%) | 89.94% | 83.85% | 87.25% | +6.10% | +2.69% | 2.52 |
| 49449 (60%) | 94.10% | 84.47% | 86.53% | +9.64% | +7.57% | 2.00 |
| 57691 (70%) | 94.57% | 84.93% | 87.09% | +9.64% | +7.48% | 2.00 |
| 65932 (80%) | 95.02% | 85.34% | 89.31% | +9.68% | +5.71% | 2.00 |
Where CloxCache really shines—lock-free reads scale with goroutines:
| Goroutines | CloxCache | Sharded LRU | Otter | vs LRU | vs Otter |
|---|---|---|---|---|---|
| 1 | 11M ops/s | 26M ops/s | 9M ops/s | 0.4x | 1.2x |
| 4 | 34M ops/s | 22M ops/s | 18M ops/s | 1.5x | 1.9x |
| 16 | 58M ops/s | 26M ops/s | 19M ops/s | 2.2x | 3.1x |
| 32 | 57M ops/s | 24M ops/s | 17M ops/s | 2.4x | 3.3x |
| 64 | 77M ops/s | 22M ops/s | 14M ops/s | 3.5x | 5.3x |
| 128 | 79M ops/s | 25M ops/s | 13M ops/s | 3.2x | 6.3x |
| 256 | 71M ops/s | 23M ops/s | 10M ops/s | 3.1x | 7.4x |
(90% reads, 10% writes workload)
Raw benchmark output
=== RUN TestConcurrentReadHeavy
Read-heavy concurrent test: 90% reads, 10% writes, 1000000 ops total
| Goroutines | CloxCache (ops/s) | SimpleLRU (ops/s) | Otter (ops/s) |
|------------|-------------------|-------------------|---------------|
| 1 | 11418211 | 26172066 | 9385530 |
| 2 | 18357213 | 18090921 | 12668403 |
| 4 | 34375650 | 21548263 | 18485753 |
| 8 | 54131157 | 23462212 | 17976868 |
| 16 | 57887904 | 25676727 | 18815351 |
| 32 | 56646010 | 23950319 | 17098614 |
| 64 | 77095575 | 22233853 | 14445578 |
| 128 | 79179151 | 24930167 | 12561981 |
| 256 | 71140637 | 23032562 | 9674261 |
Many high-performance caches achieve speed through async/buffered writes. This means:
cache.Set("key", "value")
v, ok := cache.Get("key") // May return ok=false!This is fine for web caches where eventual consistency is acceptable. But for database buffer pools, transaction caches, or any system requiring strong consistency, this is a correctness bug waiting to happen.
CloxCache guarantees linearizability: every Get() sees all prior Put() operations. This makes it safe for:
- Database page caches
- Write-ahead log buffers
- Transaction result caches
- Any system where "I just wrote it, why can’t I read it?" would be a bug
On Twitter production traces with simulated 100μs disk I/O on cache misses:
| Trace | CloxCache | Otter | Speedup |
|---|---|---|---|
| Twitter17 (4 iter) | 3.84M ops/s | 2.10M ops/s | 1.8x |
| Twitter44 (4 iter) | 3.15M ops/s | 1.25M ops/s | 2.5x |
CloxCache also achieves +2% higher hit rates due to linearizable writes (no "false misses" from async buffering).
At datacenter scale, this can translate to (rough estimates):
- ~50% fewer servers needed for the same throughput
- ~20% lower cost per request (fewer disk I/O from better hit rates)
- Lower tail latencies from fewer cache misses
Best for:
- Database caches requiring linearizability
- Read-heavy concurrent workloads (4+ goroutines)
- Web/API response caches
- Session stores
- Any workload where reads vastly outnumber the writes
Consider alternatives when:
- Cache is tiny relative to the working set (<10%)
- Single-threaded applications (LRU is simpler and faster)
- Eventual consistency is acceptable and raw write throughput is critical
// Create cache for N entries (automatically configures optimal settings)
cfg := cache.ConfigFromCapacity(10000)
c := cache.NewCloxCache[string, *MyValue](cfg)// Create cache for a specific memory budget
cfg := cache.ConfigFromMemorySize(256 * 1024 * 1024) // 256MB
c := cache.NewCloxCache[string, *MyValue](cfg)cfg := cache.Config{
NumShards: 64, // Must be power of 2, recommend 64-256
SlotsPerShard: 4096, // Must be power of 2
Capacity: 10000, // Max entries (distributed across shards)
CollectStats: true, // Enable hit/miss/eviction counters
SweepPercent: 15, // Percent of shard to scan during eviction (1-100)
}
c := cache.NewCloxCache[string, *MyValue](cfg)// Store a value (returns false if eviction failed)
ok := c.Put(key, value)
// Retrieve a value (lock-free)
value, found := c.Get(key)
// Get statistics (requires CollectStats: true)
hits, misses, evictions := c.Stats()
// Get adaptive threshold stats per shard
adaptiveStats := c.GetAdaptiveStats()
// Get average k across all shards
avgK := c.AverageK()
// Get average learned thresholds across all shards
rateLow, rateHigh := c.AverageLearnedThresholds()
// Clean shutdown
c.Close()For the full story of how CloxCache was developed and the theory behind it, see Frequency Thresholds and Sharded Eviction: Building CloxCache.
MIT License - see LICENSE