LRU Bounded Map
This repository contains several Haskell implementations of associative containers (maps / dictionaries) which retire elements in least recently used (LRU) order when growing past a specified limit (bounded).
A two-level cache for images fetched over HTTP / from disk into system memory and then finally into GPU memory. Images are looked up far more frequently than they are inserted / deleted, most of the lookups should be cache hits. The cache is expected to be at capacity most of the time, the size of the images dwarfs any structural overhead from the cache. Required operations are insert, update, lookup, delete (all updating the LRU timestamp) and retrieval of the newest and oldest item.
With the requirements above in mind, there are four different implementations in various stages of completion, polish and optimization. All containers are purely functional data structures.
Perhaps the most simplistic and obvious data structure satisfying the above requirements is a pair of maps, indexing the values with both the lookup keys and an incrementing tick to keep track of their age.
It's a bit tricky to keep both maps synchronized, and the important lookup operation needs five O(log n) operations for the lookup + LRU update. To facilitate retrieval of the oldest / newest element, at least the map containing the keys needs to be an ordered container, forcing us to chose the slower
Data.Map over the faster
The double map on top of a
Data.Map.Strict is simple, but the slowest of the solutions evaluated here.
Inspired by the lrucache package, this container maintains a doubly linked-list over the entries of the map. Like the double map, it also needs five O(log n) operations for a lookup + reshuffling of the linked-list during the LRU update. Since there is no requirement for the underlying map to be ordered anymore, we can now use
Data.HashMap.Strict for a speed boost.
Note that the size operation of the hash map is O(n) instead of the regular map's O(1). This causes the insert operation of our bounded map to be O(n) as well, as it has to check if the size of the map has crossed the limit after the insertion. This could easily be fixed by maintaining a separate size field (as done by the two Trie based containers).
Possible optimization, besides a constant-time size operation, would be implementing an underlying hash map container supporting a updateLookupWithKey function and re-use of hashed key values between operations. Also, if implemented on top of a mutable array hash table with indices / pointers instead of keys for the linked-list links, the performance would likely be far better than any of the purely functional data structures evaluated here (1x O(1) lookup + 5x updating pointer).
The implementation of this container is a bit more polished and optimized than the prior double map.
Data.HashMap is implemented as a Hash Array Mapped Trie (HAMT), and this is an attempt to extend this data structure to support LRU / MRU retrieval. Only an attempt, because some early benchmarks and tests indicated that a bitwise Trie seems like a better choice here. For approximate-LRU behavior, this Reddit post has some interesting ideas.
The current implementation does not implement any LRU / bounded behavior so far, and is basically a 16-way HAMT without the bitmap indexing for partially filled nodes present in
Like the double map, this container uses an incrementing tick to timestamp its entries. One issue to deal with is the overflow of the tick. Using a 64 bit tick is sufficient to guarantee it will never overflow, but creates a fair bit of memory overhead and a significant performance drop on 32 bit GHC, as it has very poorly performing support for 64 bit operations. The latter could be addressed by abusing a Double as a 53 bit integer, but a better overall solution seems to be to simply compact the tick range by rebuilding the map every time the 32 bit tick overflows (rather infrequently).
This container is the fastest / most polished and the one I actually ended up using for my use case.
If the performance of the containers here is not sufficient, the two most promising ideas seem to be the mutable linked-list hash table as described above, or this interesting map + MRU list data structure: Reddit post / Direct link to code.
Criterion has been used as the benchmarking framework.
keys.txt contains 5000 URL strings like this one
which are used as
ByteString keys during the benchmark. Care is taken that at least one hash collision exists among the keys. Values are simple Ints.
The benchmarked operations are insertion, deletion and lookup.
Some important notes on interpreting the results:
All measurements performed on a 2.26Ghz Core 2 Duo, compiled with GHC 7.6.3 / HP 2013.2.0.0
NOTE: Same system with GHC 7.10.3 and Stackage LTS 4.0 libraries gets considerably worse performance across all benchmarks. Not sure if there was just a performance regression across the board or I'm doing something wrong. Could also just be the cache due to the 32/64 bit switch.
The optimization effort put in differs between the containers, and priorities have been determined by the specific use case
Timings are always for performing the operation with all 5000 keys
(lim 5k)labels indicate the element limit specified for the container. Regardless, all benchmarks still measure time for 5000 keys. For instance, the insertion benchmarks are faster for limit 5k as the limit 1k container will have to retire old elements during insertion. The lookup benchmarks are faster for limit 1k, as the limit 5k will have to perform 4000 additional LRU updates for keys already retired from the smaller container
Data.IntMap.Strictwere included as a speed-of-light reference, even though they do not support any kind of LRU / bounded behavior. Some of our custom containers have non-LRU updating versions of their operations and are also included in those benchmark groups. The HAMT data structure has an incomplete implementation of the LRU / bounded behavior, and is only measured as such. Benchmarks are labelled as
w/ LRU updor
w/o LRU updto indicate which behavior is active during the measurement
Insertion benchmarks for the linked-list hash map are omitted as they run very slowly (see description in Implementations)
The order of the keys for delete / lookup operations is a random permutation of their insertion order, except for the
lookup (w/ LRU upd, ord keys)benchmarks. Those exist to show how the linked-list hash map gives an unrealistic result if keys are looked up in insertion order, as moving the oldest element to the front of LRU linked-list is faster than doing so with items from the middle of the list
See the full Criterion report from
Other useful bits
Two pieces of code which might useful outside of the LRU bounded map implementations:
FisherYatesShuffle.hscontains an implementation of the famous permutation algorithm
DoubleMap.hsis the two-key map used internally by
This program is published under the MIT License.
Developed by Tim C. Schroeder, visit my website to learn more.