This notebook demonstrates some of the capabilities of the caching utils in `utils.caching`.

In [1]:
import logging
from pathlib import Path
from tempfile import mkdtemp

import numpy as np

import utils.caching
from utils.caching import make_cached

In [2]:
logging.basicConfig()
logging.getLogger(utils.caching.__name__).setLevel(logging.DEBUG)

In [3]:
cached = make_cached(Path(mkdtemp()))

# Basic functionality

## Single item calculation

In [4]:
@cached()
def array_of_repeats(i) -> np.ndarray:
    return np.repeat(i, 10)

Calculate item and cache it on disk:

In [5]:
array_of_repeats(3)

INFO:utils.caching:Recalculating: array_of_repeats_3
INFO:utils.caching:Calculation time for array_of_repeats_3: 0.000078 s
DEBUG:utils.caching:Persisting calculation result: array_of_repeats_3
DEBUG:utils.caching:Writing /tmp/tmplwqv1hxp/array_of_repeats_3.npy


array([3, 3, 3, 3, 3, 3, 3, 3, 3, 3])

Repeated calls do not cause recalculation, instead the item is read from disk:

In [6]:
array_of_repeats(3)

array([3, 3, 3, 3, 3, 3, 3, 3, 3, 3])

If necessary, we can force recalculation (the recalculated result will be persisted on disk):

In [7]:
array_of_repeats.recalculate(3)

INFO:utils.caching:Recalculating: array_of_repeats_3
INFO:utils.caching:Calculation time for array_of_repeats_3: 0.000055 s
DEBUG:utils.caching:Persisting calculation result: array_of_repeats_3
DEBUG:utils.caching:Writing /tmp/tmplwqv1hxp/array_of_repeats_3.npy


array([3, 3, 3, 3, 3, 3, 3, 3, 3, 3])

It is also possible to recalculate the result without persisting it:

In [8]:
array_of_repeats.recalculate(3, persist=False)

INFO:utils.caching:Recalculating: array_of_repeats_3
INFO:utils.caching:Calculation time for array_of_repeats_3: 0.000057 s


array([3, 3, 3, 3, 3, 3, 3, 3, 3, 3])

## Batch calculation

In certain cases it is convenient to calculate several items in a batch rather than individually. For this, the `cached` decorator can be used in the batch mode. Note that the `item_type` must also be provided:

In [9]:
@cached(batch=True, item_type=np.ndarray)
def array_batch() -> list[np.ndarray]:
    return [np.repeat(i, 10) for i in range(5)]

When an individual item is requested, the caching infrastructure attempt to read it from disk. If the item is not available on disk, the full batch will be recalculated and cached:

In [10]:
array_batch(2)

INFO:utils.caching:Recalculating batch: array_batch
INFO:utils.caching:Calculation time for array_batch: 0.000120 s
DEBUG:utils.caching:Persisting calculation result: array_batch_0
DEBUG:utils.caching:Writing /tmp/tmplwqv1hxp/array_batch_0.npy
DEBUG:utils.caching:Persisting calculation result: array_batch_1
DEBUG:utils.caching:Writing /tmp/tmplwqv1hxp/array_batch_1.npy
DEBUG:utils.caching:Persisting calculation result: array_batch_2
DEBUG:utils.caching:Writing /tmp/tmplwqv1hxp/array_batch_2.npy
DEBUG:utils.caching:Persisting calculation result: array_batch_3
DEBUG:utils.caching:Writing /tmp/tmplwqv1hxp/array_batch_3.npy
DEBUG:utils.caching:Persisting calculation result: array_batch_4
DEBUG:utils.caching:Writing /tmp/tmplwqv1hxp/array_batch_4.npy


array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

Requesting another item from the same batch will not cause recalculation, since the item will already be available on disk:

In [11]:
array_batch(3)

DEBUG:utils.caching:Reading from disk cache: array_batch_3
DEBUG:utils.caching:Reading /tmp/tmplwqv1hxp/array_batch_3.npy


array([3, 3, 3, 3, 3, 3, 3, 3, 3, 3])

Attempting to request an item that is outside of the index range that is provided by the batch function will cause the batch to be recalculated before the index can be checked:

In [12]:
array_batch(100)

INFO:utils.caching:Recalculating batch: array_batch
INFO:utils.caching:Calculation time for array_batch: 0.000118 s
DEBUG:utils.caching:Persisting calculation result: array_batch_0
DEBUG:utils.caching:Writing /tmp/tmplwqv1hxp/array_batch_0.npy
DEBUG:utils.caching:Persisting calculation result: array_batch_1
DEBUG:utils.caching:Writing /tmp/tmplwqv1hxp/array_batch_1.npy
DEBUG:utils.caching:Persisting calculation result: array_batch_2
DEBUG:utils.caching:Writing /tmp/tmplwqv1hxp/array_batch_2.npy
DEBUG:utils.caching:Persisting calculation result: array_batch_3
DEBUG:utils.caching:Writing /tmp/tmplwqv1hxp/array_batch_3.npy
DEBUG:utils.caching:Persisting calculation result: array_batch_4
DEBUG:utils.caching:Writing /tmp/tmplwqv1hxp/array_batch_4.npy


IndexError: list index out of range

To prevent the needless recalculation of the batch when the size is known, it is possible to specify the batch size upfront:

In [13]:
@cached(batch=True, item_type=np.ndarray, batch_size=5)
def array_batch() -> list[np.ndarray]:
    return [np.repeat(i, 10) for i in range(5)]

In [14]:
array_batch(100)

IndexError: item index out of range

If need be, you can force recalculation of the batch:

In [15]:
array_batch.recalculate(3)

INFO:utils.caching:Recalculating batch: array_batch
INFO:utils.caching:Calculation time for array_batch: 0.000089 s
DEBUG:utils.caching:Persisting calculation result: array_batch_0
DEBUG:utils.caching:Writing /tmp/tmplwqv1hxp/array_batch_0.npy
DEBUG:utils.caching:Persisting calculation result: array_batch_1
DEBUG:utils.caching:Writing /tmp/tmplwqv1hxp/array_batch_1.npy
DEBUG:utils.caching:Persisting calculation result: array_batch_2
DEBUG:utils.caching:Writing /tmp/tmplwqv1hxp/array_batch_2.npy
DEBUG:utils.caching:Persisting calculation result: array_batch_3
DEBUG:utils.caching:Writing /tmp/tmplwqv1hxp/array_batch_3.npy
DEBUG:utils.caching:Persisting calculation result: array_batch_4
DEBUG:utils.caching:Writing /tmp/tmplwqv1hxp/array_batch_4.npy


array([3, 3, 3, 3, 3, 3, 3, 3, 3, 3])

## Single item calculation

It is also possible to cache a single item:

In [16]:
@cached()
def single_item() -> np.ndarray:
    return np.arange(10)

In [17]:
single_item()

INFO:utils.caching:Recalculating: single_item
INFO:utils.caching:Calculation time for single_item: 0.000034 s
DEBUG:utils.caching:Persisting calculation result: single_item
DEBUG:utils.caching:Writing /tmp/tmplwqv1hxp/single_item.npy


array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

## Non-integer arguments

In [18]:
@cached()
def custom_argument(key: str) -> np.ndarray:
    if key == 'ones':
        return np.ones(10)
    elif key == 'zeros':
        return np.zeros(10)
    else:
        return np.arange(10)

In [19]:
custom_argument('ones')

INFO:utils.caching:Recalculating: custom_argument_ones
INFO:utils.caching:Calculation time for custom_argument_ones: 0.000051 s
DEBUG:utils.caching:Persisting calculation result: custom_argument_ones
DEBUG:utils.caching:Writing /tmp/tmplwqv1hxp/custom_argument_ones.npy


array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [20]:
custom_argument('ones')

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [21]:
custom_argument('zeros')

INFO:utils.caching:Recalculating: custom_argument_zeros
INFO:utils.caching:Calculation time for custom_argument_zeros: 0.000012 s
DEBUG:utils.caching:Persisting calculation result: custom_argument_zeros
DEBUG:utils.caching:Writing /tmp/tmplwqv1hxp/custom_argument_zeros.npy


array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

## Indexing

For convenience, it's also possible to use the indexing syntax to access cached elements:

In [22]:
array_of_repeats[3]

array([3, 3, 3, 3, 3, 3, 3, 3, 3, 3])

In [23]:
array_batch[2]

DEBUG:utils.caching:Reading from disk cache: array_batch_2
DEBUG:utils.caching:Reading /tmp/tmplwqv1hxp/array_batch_2.npy


array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

## Read-only mode

When recalculation is undesirable, the decorator can be used in the read-only mode:

In [24]:
@cached(read_only=True)
def array_of_repeats(i) -> np.ndarray:
    raise NotImplementedError

The items are retrieved in the same way as before:

In [25]:
array_of_repeats[3]

DEBUG:utils.caching:Reading from disk cache: array_of_repeats_3
DEBUG:utils.caching:Reading /tmp/tmplwqv1hxp/array_of_repeats_3.npy


array([3, 3, 3, 3, 3, 3, 3, 3, 3, 3])

However, attempting to retrieve items that have not been cached previously will result in an error:

In [26]:
array_of_repeats[1]

DEBUG:utils.caching:Reading from disk cache: array_of_repeats_1
DEBUG:utils.caching:Reading /tmp/tmplwqv1hxp/array_of_repeats_1.npy


FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmplwqv1hxp/array_of_repeats_1.npy'