Skip to content

Commit

Permalink
"LRUCacheStrong" optimization.
Browse files Browse the repository at this point in the history
This commit by wunderbar wunderkind constant contributor @Heliotrop3
optimizes the implementation of our strong LRU cache to internally
leverage an efficient non-reentrant `threading.Lock` primitive in place
of the prior inefficient reentrant `threading.RLock` primitive.
Specifically, this commit:

* Changes the locking mechanism used to ensure thread saftey from
  `RLock` to `Lock`.
* Streamlines the commentary around the strong LRU cache, its methods,
  and its test cases.
* Corrects the `LRUCacheStrong.__init__()` method docstring with respect
  to exception types raised by this method.

## Features Optimized

* **Strong LRU cache threading primitives.** The private
  `beartype._util.cache.utilcachelru.LRUCacheStrong` class has now been
  optimized to internally leverage an efficient non-reentrant
  `threading.Lock` primitive in place of the prior inefficient reentrant
  `threading.RLock` primitive.

Thanks for all the munificent magnificence, @Heliotrop3! (*Magenta magpies!*)
  • Loading branch information
Heliotrop3 committed Mar 6, 2021
1 parent 7bd1593 commit 99225ee
Show file tree
Hide file tree
Showing 2 changed files with 159 additions and 264 deletions.
308 changes: 117 additions & 191 deletions beartype/_util/cache/utilcachelru.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,76 +3,63 @@
# Copyright (c) 2014-2021 Beartype authors.
# See "LICENSE" for further details.

'''
"""
**Beartype Least Recently Used (LRU) caching utilities.**
This private submodule implements supplementary cache-specific utility
functions required by various :mod:`beartype` facilities, including callables
generated by the :func:`beartype.beartype` decorator.
This private submodule is *not* intended for importation by downstream callers.
'''

# ....................{ TODO }....................
#FIXME: Refactor "LRUCacheStrong" to use a fast "threading.Lock" rather than
#slow "threading.RLock". As @Heliotrop3 suggests, this should absolutely be
#doable -- probably by violating DRY and just repeating the logic of
#__getitem__() in __contains__(). After doing so, there should be *NO*
#reentrancy concerns anywhere. In theory. Maybe. Would you like to play a game?

"""
# ....................{ IMPORTS }....................
from beartype.roar import _BeartypeUtilLRUCacheException
from threading import RLock
from threading import Lock
from typing import Hashable
from beartype.roar import _BeartypeUtilLRUCacheException


# ....................{ CLASSES }....................
class LRUCacheStrong(dict):
'''
**Thread-safe strong Least Recently Used (LRU) cache** (i.e., a mapping
from strong references to arbitrary keys onto strong references to
arbitrary values, limited to some maximum capacity of key-value pairs by
implicitly and thread-safely removing the least recently accessed key-value
pair from this mapping on any setting of a new key-value pair that would
cause the size of this mapping to exceed this capacity).
"""
**Thread-safe strong Least Recently Used (LRU) cache**: A mapping
from strong referenced arbitrary keys onto strong referenced arbitrary
values, limited to some maximum capacity of key-value pairs which is
implicitly and thread-safely enforced.
Design
------
LRU cache implementations typically employ weak references for safety.
LRU cache implementations employing strong references invite memory leaks
by preventing stale cached keys and values that would otherwise be
garbage-collected from being garbage-collected.
LRU cache implementations typically employ weak references for safety;
Employing strong references invites memory leaks via preventing objects
*only* referenced by the cache (cache-only objects) from being garbage-collected.
Nonetheless, this cache intentionally employs strong references to persist
**cache-only objects** (i.e., objects *only* referenced by this cache)
across calls to callables decorated by the :func:`beartype.beartype`
decorator. Since cache-only objects are referenced only by this cache
rather than by both this cache and an external parent object, a cache-only
object cached under a weak reference would have *no* strong referents and
thus be immediately garbage-collected with all other short-lived objects in
the first generation (i.e., generation 0). The standard example of a
cache-only object is a container iterator (e.g., the items view returned by
the :meth:`dict.items` method).
Note that the equivalent LRU cache employing weak references to keys and/or
values may be trivially implemented by inheriting from the standard
:class:`weakref.WeakKeyDictionary` or :class:`weakref.WeakValueDictionary`
classes rather than the builtin :class:`dict` type instead.
these cache-only objects across calls to callables decorated with :func:`beartype.beartype`.
In theory, caching an object under a weak reference would result in immediate
garbage-collection as, with no external strong referents, the object would
get collected with all other short-lived objects in the first generation (i.e., generation 0).
Notes
-----
- The equivalent LRU cache employing weak references to keys and/or values
may be trivially implemented by swapping this classes inheritance from the
builtin :class:`dict` to either of the builtin :class:`weakref.WeakKeyDictionary`
or :class:`weakref.WeakValueDictionary`.
- The standard example of a cache-only object is a container iterator - :meth:`dict.items`
Attributes
----------
_size : int
**Cache capacity** (i.e., maximum positive number of key-value pairs
persisted by this cache).
_thread_lock : Lock
**Reentrant instance-specific thread lock** (i.e., low-level thread
locking mechanism implemented as a highly efficient C extension,
defined as an instance variable for non-reentrant reuse by the public
API of this class). Although CPython, the canonical Python interpreter,
*does* prohibit conventional multithreading via its Global Interpreter
Lock (GIL), CPython still coercively preempts long-running threads at
arbitrary execution points. Ergo, multithreading concerns are *not*
safely ignorable -- even under CPython.
'''
**Cache capacity** - maximum number of key-value pairs persisted by this cache.
_lock : Lock
**Instance-specific thread lock** - a low-level thread locking mechanism
implemented as a highly efficient C extension, defined as an instance
variable for non-reentrant reuse by the public API of this class.
Although the canonical Python interpreter - CPython - prohibits conventional
multithreading via its Global Interpreter Lock, it still coercively
preempts long-running threads at arbitrary execution points.
Ergo, multithreading concerns are *not* safely ignorable.
"""

# ..................{ CLASS VARIABLES }..................
# Slot all instance variables defined on this object to minimize the time
Expand All @@ -81,76 +68,46 @@ class LRUCacheStrong(dict):
# costs by approximately ~10%, which is non-trivial.
__slots__ = (
'_size',
'_thread_lock',
'_lock',
)

# ..................{ DUNDERS }..................
def __init__(self, size: int) -> None:
'''
Initialize this cache to the empty cache with the passed capacity.
"""
Initialize this cache to an empty cache with a capacity of this size.
Parameters
----------
size : int
**Cache capacity** (i.e., maximum positive number of key-value
pairs persisted by this cache).
**Cache capacity** (i.e., maximum number of key-value pairs held in this cache).
Raises
------
_BeartypeUtilLRUCacheException
If this capacity is either:
* *Not* an integer.
* A **non-positive integer** (i.e., either negative or zero).
'''

# If this size is *NOT* an integer, raise an exception.
_BeartypeUtilLRUCacheException:
If the capacity is *not* an integer or its a **non-positive integer** (i.e. Less than 1)
"""
if not isinstance(size, int):
raise _BeartypeUtilLRUCacheException(
f'LRU cache capacity {repr(size)} not integer.')
# Else, this size is an integer.
#
# If this size is non-positive, raise an exception.
elif size <= 0:
elif size < 1:
raise _BeartypeUtilLRUCacheException(
f'LRU cache capacity {size} not positive.')
# Else, this size is positive.

# Initialize our superclass to the empty dictionary.
super().__init__()

# Classify this parameter.
self._size = size
self._lock = Lock()

# Initialize all remaining instance variables.
self._thread_lock = RLock()

def __getitem__(self,
key: Hashable,

def __getitem__(
self,
key: Hashable,

# Superclass methods efficiently localized as default parameters.
__dict_delitem=dict.__delitem__,
__dict_getitem=dict.__getitem__,
__dict_setitem=dict.__setitem__,
) -> object:
'''
Value previously cached under the passed key if this key been recently
cached *or* raise an exception otherwise (i.e., if this key has either
not been previously cached *or* has but has since been removed due to
not having been recently accessed).
Specifically, this method (in order):
* If this key has *not* been recently cached, raises the standard
:class:`KeyError` exception.
* Else:
#. Gets the value previously cached under this key.
#. Prioritizes this key by removing and re-adding this key back to
the tail of this cache.
#. Returns this value.
# Superclass methods efficiently localized as default parameters.
__contains=dict.__contains__,
__getitem=dict.__getitem__,
__delitem=dict.__delitem__,
__pushitem=dict.__setitem__,
) -> object:
"""
Returns item previously cached under the passed key otherwise raises an exception.
Parameters
----------
Expand All @@ -160,52 +117,42 @@ def __getitem__(
Returns
----------
object
Arbitrary value recently cached under this key.
Arbitrary value cached under this key.
Raises
----------
TypeError
If this key is unhashable.
'''

# In a thread-safe manner...
with self._thread_lock:
# Value previously cached under this key if this key has already
# been cached *OR* raise the usual "KeyError" exception otherwise.
value = __dict_getitem(self, key)

# Prioritize this key by removing and re-adding this key back to
# the end of this cache.
__dict_delitem(self, key)
__dict_setitem(self, key, value)

# Return this value.
return value


def __setitem__(
self,
key: Hashable,
value: object,

# Superclass methods efficiently localized as default parameters.
__dict_hasitem=dict.__contains__,
__dict_delitem=dict.__delitem__,
__dict_setitem=dict.__setitem__,
__dict_iter=dict.__iter__,
__dict_len=dict.__len__,
) -> None:
'''
Cache the passed key-value pair while preserving LRU constraints.
Specifically, this method (in order):
#. If this key has already been cached, prioritize this key by removing
this key from this cache.
#. (Re-)add this key to the tail of this cache.
#. If adding this key caused this cache to exceed its maximum capacity,
silently remove the first (and thus least recently used) key-value
pair from this cache.
If this key is not hashable.
KeyError
If this key isn't cached.
Note
----
- **Practically** identical to :meth:`self.__contains__` except we return
an an object rather than a bool.
"""
with self._lock:
# Reset the key if it exists
if __contains(self, key):
val = __getitem(self, key)
__delitem(self, key)
__pushitem(self, key, val)
return val
raise KeyError(f'Key Error: {key}')

def __setitem__(self,
key: Hashable,
value: object,

# Superclass methods efficiently localized as default parameters.
__contains=dict.__contains__,
__delitem=dict.__delitem__,
__pushitem=dict.__setitem__,
__iter=dict.__iter__,
__len=dict.__len__,
) -> None:
"""
Cache this key-value pair while preserving size constraints.
Parameters
----------
Expand All @@ -217,46 +164,32 @@ def __setitem__(
Raises
----------
TypeError
If this key is unhashable.
'''
If this key is not hashable.
"""
with self._lock:

# In a thread-safe manner...
with self._thread_lock:
# If this key has already been cached, prioritize this key by
# removing this key from this cache *BEFORE* re-adding this key.
if __dict_hasitem(self, key):
__dict_delitem(self, key)
if __contains(self, key):
__delitem(self, key)
__pushitem(self, key, value)

# (Re-)add this key back to the end of this cache.
__dict_setitem(self, key, value)
# Prune the cache
if __len(self) > self._size:
__delitem(self, next(__iter(self)))

# If adding this key caused this cache to exceed its maximum
# capacity, silently remove the first (and thus least recently
# used) key-value pair from this cache.
if __dict_len(self) > self._size:
__dict_delitem(self, next(__dict_iter(self)))
def __contains__(self,
key: Hashable,

# Superclass methods efficiently localized as default parameters.
__contains=dict.__contains__,
__getitem=dict.__getitem__,
__delitem=dict.__delitem__,
__pushitem=dict.__setitem__,

def __contains__(
self,
key: Hashable,
) -> bool:
"""
Returns a boolean indicating whether this key is cached.
# Superclass methods efficiently localized as default parameters.
__dict_contains = dict.__contains__,
__self_getitem__ = __getitem__,
) -> bool:
'''
``True`` only if the passed key has been recently cached.
Specifically, this method (in order):
* If this key has *not* been recently cached, returns ``False``.
* Else:
#. Gets the value previously cached under this key.
#. Prioritizes this key by removing and re-adding this key-value
pair back to the tail of this cache.
#. Returns ``True``.
If this key is cached, it's popped and pushed back into the cache.
Parameters
----------
Expand All @@ -266,24 +199,17 @@ def __contains__(
Returns
----------
bool
``True`` only if this key has been recently cached.
``True`` if this key is cached otherwise ```False```
Raises
----------
TypeError
If this key is unhashable.
'''

# In a thread-safe manner...
with self._thread_lock:
# If this key has *NOT* been recently cached, return false.
if not __dict_contains(self, key):
return False
# Else, this key has been recently cached.

# Prioritize this key by deferring to the __getitem__() dunder
# method.
__self_getitem__(self, key)

# Return true.
return True
"""
with self._lock:
if __contains(self, key):
val = __getitem(self, key)
__delitem(self, key)
__pushitem(self, key, val)
return True
return False

0 comments on commit 99225ee

Please sign in to comment.