"LRUCacheStrong" optimization.

This commit by wunderbar wunderkind constant contributor @Heliotrop3 optimizes the implementation of our strong LRU cache to internally leverage an efficient non-reentrant `threading.Lock` primitive in place of the prior inefficient reentrant `threading.RLock` primitive. Specifically, this commit: * Changes the locking mechanism used to ensure thread saftey from `RLock` to `Lock`. * Streamlines the commentary around the strong LRU cache, its methods, and its test cases. * Corrects the `LRUCacheStrong.__init__()` method docstring with respect to exception types raised by this method. ## Features Optimized * **Strong LRU cache threading primitives.** The private `beartype._util.cache.utilcachelru.LRUCacheStrong` class has now been optimized to internally leverage an efficient non-reentrant `threading.Lock` primitive in place of the prior inefficient reentrant `threading.RLock` primitive. Thanks for all the munificent magnificence, @Heliotrop3! (*Magenta magpies!*)
beartype · Mar 6, 2021 · 99225ee · 99225ee
1 parent 7bd1593
commit 99225ee
Show file tree

Hide file tree

Showing 2 changed files with 159 additions and 264 deletions.
diff --git a/beartype/_util/cache/utilcachelru.py b/beartype/_util/cache/utilcachelru.py
@@ -3,76 +3,63 @@
 # Copyright (c) 2014-2021 Beartype authors.
 # See "LICENSE" for further details.
 
-'''
+"""
 **Beartype Least Recently Used (LRU) caching utilities.**
 
 This private submodule implements supplementary cache-specific utility
 functions required by various :mod:`beartype` facilities, including callables
 generated by the :func:`beartype.beartype` decorator.
 
 This private submodule is *not* intended for importation by downstream callers.
-'''
-
-# ....................{ TODO                              }....................
-#FIXME: Refactor "LRUCacheStrong" to use a fast "threading.Lock" rather than
-#slow "threading.RLock". As @Heliotrop3 suggests, this should absolutely be
-#doable -- probably by violating DRY and just repeating the logic of
-#__getitem__() in __contains__(). After doing so, there should be *NO*
-#reentrancy concerns anywhere. In theory. Maybe. Would you like to play a game?
-
+"""
 # ....................{ IMPORTS                           }....................
-from beartype.roar import _BeartypeUtilLRUCacheException
-from threading import RLock
+from threading import Lock
 from typing import Hashable
+from beartype.roar import _BeartypeUtilLRUCacheException
+
 
 # ....................{ CLASSES                           }....................
 class LRUCacheStrong(dict):
-    '''
-    **Thread-safe strong Least Recently Used (LRU) cache** (i.e., a mapping
-    from strong references to arbitrary keys onto strong references to
-    arbitrary values, limited to some maximum capacity of key-value pairs by
-    implicitly and thread-safely removing the least recently accessed key-value
-    pair from this mapping on any setting of a new key-value pair that would
-    cause the size of this mapping to exceed this capacity).
+    """
+    **Thread-safe strong Least Recently Used (LRU) cache**: A mapping
+    from strong referenced arbitrary keys onto strong referenced arbitrary
+    values, limited to some maximum capacity of key-value pairs which is
+    implicitly and thread-safely enforced.
 
     Design
     ------
-    LRU cache implementations typically employ weak references for safety.
-    LRU cache implementations employing strong references invite memory leaks
-    by preventing stale cached keys and values that would otherwise be
-    garbage-collected from being garbage-collected.
-
+    LRU cache implementations typically employ weak references for safety;
+    Employing strong references invites memory leaks via preventing objects
+    *only* referenced by the cache (cache-only objects) from being garbage-collected.
     Nonetheless, this cache intentionally employs strong references to persist
-    **cache-only objects** (i.e., objects *only* referenced by this cache)
-    across calls to callables decorated by the :func:`beartype.beartype`
-    decorator. Since cache-only objects are referenced only by this cache
-    rather than by both this cache and an external parent object, a cache-only
-    object cached under a weak reference would have *no* strong referents and
-    thus be immediately garbage-collected with all other short-lived objects in
-    the first generation (i.e., generation 0). The standard example of a
-    cache-only object is a container iterator (e.g., the items view returned by
-    the :meth:`dict.items` method).
-
-    Note that the equivalent LRU cache employing weak references to keys and/or
-    values may be trivially implemented by inheriting from the standard
-    :class:`weakref.WeakKeyDictionary` or :class:`weakref.WeakValueDictionary`
-    classes rather than the builtin :class:`dict` type instead.
+    these cache-only objects across calls to callables decorated with :func:`beartype.beartype`.
+    In theory, caching an object under a weak reference would result in immediate
+    garbage-collection as, with no external strong referents, the object would
+    get collected with all other short-lived objects in the first generation (i.e., generation 0).
+
+    Notes
+    -----
+     - The equivalent LRU cache employing weak references to keys and/or values
+       may be trivially implemented by swapping this classes inheritance from the
+       builtin :class:`dict` to either of the builtin :class:`weakref.WeakKeyDictionary`
+       or :class:`weakref.WeakValueDictionary`.
+
+     - The standard example of a cache-only object is a container iterator - :meth:`dict.items`
 
     Attributes
     ----------
     _size : int
-        **Cache capacity** (i.e., maximum positive number of key-value pairs
-        persisted by this cache).
-    _thread_lock : Lock
-        **Reentrant instance-specific thread lock** (i.e., low-level thread
-        locking mechanism implemented as a highly efficient C extension,
-        defined as an instance variable for non-reentrant reuse by the public
-        API of this class). Although CPython, the canonical Python interpreter,
-        *does* prohibit conventional multithreading via its Global Interpreter
-        Lock (GIL), CPython still coercively preempts long-running threads at
-        arbitrary execution points. Ergo, multithreading concerns are *not*
-        safely ignorable -- even under CPython.
-    '''
+        **Cache capacity** - maximum number of key-value pairs persisted by this cache.
+    _lock : Lock
+        **Instance-specific thread lock** - a low-level thread locking mechanism
+        implemented as a highly efficient C extension, defined as an instance
+        variable for non-reentrant reuse by the public API of this class.
+
+        Although the canonical Python interpreter - CPython - prohibits conventional
+        multithreading via its Global Interpreter Lock, it still coercively
+        preempts long-running threads at arbitrary execution points.
+        Ergo, multithreading concerns are *not* safely ignorable.
+    """
 
     # ..................{ CLASS VARIABLES                   }..................
     # Slot all instance variables defined on this object to minimize the time
@@ -81,76 +68,46 @@ class LRUCacheStrong(dict):
     # costs by approximately ~10%, which is non-trivial.
     __slots__ = (
         '_size',
-        '_thread_lock',
+        '_lock',
     )
 
     # ..................{ DUNDERS                           }..................
     def __init__(self, size: int) -> None:
-        '''
-        Initialize this cache to the empty cache with the passed capacity.
+        """
+        Initialize this cache to an empty cache with a capacity of this size.
 
         Parameters
         ----------
         size : int
-            **Cache capacity** (i.e., maximum positive number of key-value
-            pairs persisted by this cache).
+            **Cache capacity** (i.e., maximum number of key-value pairs held in this cache).
 
         Raises
         ------
-        _BeartypeUtilLRUCacheException
-            If this capacity is either:
-
-            * *Not* an integer.
-            * A **non-positive integer** (i.e., either negative or zero).
-        '''
-
-        # If this size is *NOT* an integer, raise an exception.
+        _BeartypeUtilLRUCacheException:
+            If the capacity is *not* an integer or its a **non-positive integer** (i.e. Less than 1)
+        """
         if not isinstance(size, int):
             raise _BeartypeUtilLRUCacheException(
                 f'LRU cache capacity {repr(size)} not integer.')
-        # Else, this size is an integer.
-        #
-        # If this size is non-positive, raise an exception.
-        elif size <= 0:
+        elif size < 1:
             raise _BeartypeUtilLRUCacheException(
                 f'LRU cache capacity {size} not positive.')
-        # Else, this size is positive.
 
-        # Initialize our superclass to the empty dictionary.
         super().__init__()
-
-        # Classify this parameter.
         self._size = size
+        self._lock = Lock()
 
-        # Initialize all remaining instance variables.
-        self._thread_lock = RLock()
-
+    def __getitem__(self,
+                    key: Hashable,
 
-    def __getitem__(
-        self,
-        key: Hashable,
-
-        # Superclass methods efficiently localized as default parameters.
-        __dict_delitem=dict.__delitem__,
-        __dict_getitem=dict.__getitem__,
-        __dict_setitem=dict.__setitem__,
-    ) -> object:
-        '''
-        Value previously cached under the passed key if this key been recently
-        cached *or* raise an exception otherwise (i.e., if this key has either
-        not been previously cached *or* has but has since been removed due to
-        not having been recently accessed).
-
-        Specifically, this method (in order):
-
-        * If this key has *not* been recently cached, raises the standard
-          :class:`KeyError` exception.
-        * Else:
-
-          #. Gets the value previously cached under this key.
-          #. Prioritizes this key by removing and re-adding this key back to
-             the tail of this cache.
-          #. Returns this value.
+                    # Superclass methods efficiently localized as default parameters.
+                    __contains=dict.__contains__,
+                    __getitem=dict.__getitem__,
+                    __delitem=dict.__delitem__,
+                    __pushitem=dict.__setitem__,
+                    ) -> object:
+        """
+        Returns item previously cached under the passed key otherwise raises an exception.
 
         Parameters
         ----------
@@ -160,52 +117,42 @@ def __getitem__(
         Returns
         ----------
         object
-            Arbitrary value recently cached under this key.
+            Arbitrary value cached under this key.
 
         Raises
         ----------
         TypeError
-            If this key is unhashable.
-        '''
-
-        # In a thread-safe manner...
-        with self._thread_lock:
-            # Value previously cached under this key if this key has already
-            # been cached *OR* raise the usual "KeyError" exception otherwise.
-            value = __dict_getitem(self, key)
-
-            # Prioritize this key by removing and re-adding this key back to
-            # the end of this cache.
-            __dict_delitem(self, key)
-            __dict_setitem(self, key, value)
-
-            # Return this value.
-            return value
-
-
-    def __setitem__(
-        self,
-        key: Hashable,
-        value: object,
-
-        # Superclass methods efficiently localized as default parameters.
-        __dict_hasitem=dict.__contains__,
-        __dict_delitem=dict.__delitem__,
-        __dict_setitem=dict.__setitem__,
-        __dict_iter=dict.__iter__,
-        __dict_len=dict.__len__,
-    ) -> None:
-        '''
-        Cache the passed key-value pair while preserving LRU constraints.
-
-        Specifically, this method (in order):
-
-        #. If this key has already been cached, prioritize this key by removing
-           this key from this cache.
-        #. (Re-)add this key to the tail of this cache.
-        #. If adding this key caused this cache to exceed its maximum capacity,
-           silently remove the first (and thus least recently used) key-value
-           pair from this cache.
+            If this key is not hashable.
+        KeyError
+            If this key isn't cached.
+
+        Note
+        ----
+         - **Practically** identical to :meth:`self.__contains__` except we return
+           an an object rather than a bool.
+        """
+        with self._lock:
+            # Reset the key if it exists
+            if __contains(self, key):
+                val = __getitem(self, key)
+                __delitem(self, key)
+                __pushitem(self, key, val)
+                return val
+            raise KeyError(f'Key Error: {key}')
+
+    def __setitem__(self,
+                    key: Hashable,
+                    value: object,
+
+                    # Superclass methods efficiently localized as default parameters.
+                    __contains=dict.__contains__,
+                    __delitem=dict.__delitem__,
+                    __pushitem=dict.__setitem__,
+                    __iter=dict.__iter__,
+                    __len=dict.__len__,
+                    ) -> None:
+        """
+        Cache this key-value pair while preserving size constraints.
 
         Parameters
         ----------
@@ -217,46 +164,32 @@ def __setitem__(
         Raises
         ----------
         TypeError
-            If this key is unhashable.
-        '''
+            If this key is not hashable.
+        """
+        with self._lock:
 
-        # In a thread-safe manner...
-        with self._thread_lock:
-            # If this key has already been cached, prioritize this key by
-            # removing this key from this cache *BEFORE* re-adding this key.
-            if __dict_hasitem(self, key):
-                __dict_delitem(self, key)
+            if __contains(self, key):
+                __delitem(self, key)
+            __pushitem(self, key, value)
 
-            # (Re-)add this key back to the end of this cache.
-            __dict_setitem(self, key, value)
+            # Prune the cache
+            if __len(self) > self._size:
+                __delitem(self, next(__iter(self)))
 
-            # If adding this key caused this cache to exceed its maximum
-            # capacity, silently remove the first (and thus least recently
-            # used) key-value pair from this cache.
-            if __dict_len(self) > self._size:
-                __dict_delitem(self, next(__dict_iter(self)))
+    def __contains__(self,
+                     key: Hashable,
 
+                     # Superclass methods efficiently localized as default parameters.
+                     __contains=dict.__contains__,
+                     __getitem=dict.__getitem__,
+                     __delitem=dict.__delitem__,
+                     __pushitem=dict.__setitem__,
 
-    def __contains__(
-        self,
-        key: Hashable,
+                     ) -> bool:
+        """
+        Returns a boolean indicating whether this key is cached.
 
-        # Superclass methods efficiently localized as default parameters.
-        __dict_contains = dict.__contains__,
-        __self_getitem__ = __getitem__,
-    ) -> bool:
-        '''
-        ``True`` only if the passed key has been recently cached.
-
-        Specifically, this method (in order):
-
-        * If this key has *not* been recently cached, returns ``False``.
-        * Else:
-
-          #. Gets the value previously cached under this key.
-          #. Prioritizes this key by removing and re-adding this key-value
-             pair back to the tail of this cache.
-          #. Returns ``True``.
+        If this key is cached, it's popped and pushed back into the cache.
 
         Parameters
         ----------
@@ -266,24 +199,17 @@ def __contains__(
         Returns
         ----------
         bool
-            ``True`` only if this key has been recently cached.
+            ``True`` if this key is cached otherwise ```False```
 
         Raises
         ----------
         TypeError
             If this key is unhashable.
-        '''
-
-        # In a thread-safe manner...
-        with self._thread_lock:
-            # If this key has *NOT* been recently cached, return false.
-            if not __dict_contains(self, key):
-                return False
-            # Else, this key has been recently cached.
-
-            # Prioritize this key by deferring to the __getitem__() dunder
-            # method.
-            __self_getitem__(self, key)
-
-            # Return true.
-            return True
+        """
+        with self._lock:
+            if __contains(self, key):
+                val = __getitem(self, key)
+                __delitem(self, key)
+                __pushitem(self, key, val)
+                return True
+            return False