# Initialize Once
It is possible to cache initialization in a way that will save it across multiple runs of a function.

In [1]:
from time import sleep  # We use this to fake larger functions

## Part One: Make the function
A good pattern for the "warmable" function is to describe the initialization part and the computation part into separate functions. That will make it easy to mark which function to cache. 

We'll make this function two different ways: explicitly with a global and then a second time with a Python cache.

### Explicit Globals
The strategy here is to have our initialization function cache its state in a global.
If that state exists, it will return it rather than re-initializing

In [2]:
state = None

In [3]:
def initialize():
    """Initialize the state for the function"""
    global state
    if state is None:
        sleep(2)  # Faking an expensive function
        state = 5
    return state

In [4]:
%%time
initialize()

CPU times: user 12.7 ms, sys: 894 µs, total: 13.6 ms
Wall time: 2 s


5

The first time running it will take a few seconds. But, then it's fast

In [5]:
%%time
initialize()

CPU times: user 13 µs, sys: 3 µs, total: 16 µs
Wall time: 28.4 µs


5

We can use this inside our function to make it "warmable"

In [6]:
def function(x):
    weights = initialize()
    return x + weights

In [7]:
%%time
function(1)

CPU times: user 4 µs, sys: 1 µs, total: 5 µs
Wall time: 8.11 µs


6

It should be very fast

### Implicit Caching
We can use an [LRU Cache](https://docs.python.org/3/library/functools.html#functools.lru_cache) to store the initialization value instead of making our own global.

Python's LRU cache is created by decorating a function with the cache object.

In [8]:
from functools import lru_cache

In [9]:
@lru_cache()
def initialize():
    sleep(2)
    return 5

In [10]:
%%time
initialize()

CPU times: user 1.56 ms, sys: 299 µs, total: 1.86 ms
Wall time: 2 s


5

In [11]:
%%time
initialize()

CPU times: user 2 µs, sys: 0 ns, total: 2 µs
Wall time: 3.1 µs


5

Like the other example, it is faster on the second time because it looks up the cache. This time, Python has created that cache for us. 

### Initialization Changes with Inputs
There are some cases where you may have a slightly-different initialization step based on the inputs (e.g., a path to a different neural network).
LRU cache makes handling this case easy as it will return different results depending on the inputs.

In [12]:
@lru_cache(maxsize=1)  # maxsize defines how many initializations to cache
def initialize(model_id):
    sleep(2)
    return model_id    

Use it in a new function

In [13]:
def function(model_id, x):
    weights = initialize(model_id)
    return x + weights

In [14]:
%%time
function(1, 1)

CPU times: user 1.43 ms, sys: 225 µs, total: 1.65 ms
Wall time: 2 s


2

In [15]:
%%time
function(1, 2)

CPU times: user 3 µs, sys: 1 µs, total: 4 µs
Wall time: 6.44 µs


3

In [16]:
%%time
function(2, 1)

CPU times: user 0 ns, sys: 2.88 ms, total: 2.88 ms
Wall time: 2 s


3

In [17]:
%%time
function(2, 1)

CPU times: user 5 µs, sys: 1 µs, total: 6 µs
Wall time: 8.11 µs


3

Note how re-using the same model does not incur the initialization cost, but changing models does!

## Step 2: Build it in to module
We are going to use our last strategy, LRU cache for the initiliazation, to make an example.

We need to use a separate module for this function for it to work in a workflow engine. 
The workers used by this notebook need to know where to look for the definition of the function and the initialization function. 

I put them in a separate file [`module.py`](./module.py) in the same folder as this notebook, which is one of the places Python will look for the module definition.

In [18]:
from module import function

In [19]:
%%time
function(1, 1)

CPU times: user 621 µs, sys: 135 µs, total: 756 µs
Wall time: 2 s


2

In [20]:
from module import function

In [21]:
%%time
function(1, 2)

CPU times: user 3 µs, sys: 1 µs, total: 4 µs
Wall time: 6.2 µs


3

It has our desired caching behavior because that function uses the same `initialize` function across multiple invocations even though we loaded the function twice (there is only one module).

In [22]:
from module import initialize

In [23]:
initialize.cache_info()

CacheInfo(hits=1, misses=1, maxsize=1, currsize=1)

Note how the `initialize` function reports having run one and have re-used a cache value once.

And, because it is in a module, our workers will know how to access it. 
The [pickle library](https://docs.python.org/3/library/pickle.html) is how many workflow tools will send this function to workers.

In [24]:
import pickle as pkl
pkl.dumps(function)

b'\x80\x03cmodule\nfunction\nq\x00.'

Note how all it does is pass the location of the function (and not the cache)!

## Step 3: Show it off
We'll use Python's [`ProcessPoolExecutor`](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor), which has a similar behavior to many workflow engines in that it passes data to persistent Python processes via pickle-serialized messages.

In [25]:
from concurrent.futures import ProcessPoolExecutor

In [26]:
ex = ProcessPoolExecutor(1)  # Normally use this in the context-manager (with) syntax. I'm avoid that to show you some timings

The first function invocation will take a few seconds as we're running initialization on the worker.

In [27]:
%%time
ex.submit(function, 0, 1).result()

CPU times: user 6.54 ms, sys: 554 µs, total: 7.1 ms
Wall time: 2.05 s


1

The second is fast because the worker has it's cache now

In [28]:
%%time
ex.submit(function, 0, 1).result()

CPU times: user 947 µs, sys: 218 µs, total: 1.17 ms
Wall time: 1.72 ms


1

You're done! We now have a function where each worker will avoid re-doing initialization work.

In [29]:
from dataclasses import dataclass
@dataclass
class X:
    x = None

## Advanced Notes
There are a few thing we didn't address here:
- *Configuration is Complex*: Instead of passing a lot of little arguments that define, bunch them all as a [`dataclass`](https://docs.python.org/3/library/dataclasses.html). Base the options on simple Python data types (e.g., `str`, `int`) so that `lru_cache` reliably detects when you've supplied the same configuration.
- *Different Initialization on Different Hardware*: Some python functions require special commands to make them run quickly (e.g., placing PyTorch models on to the right device). My advice is to have your initialization function detect its environment based then act accordingly rather than adjust the source code between different workers or passing different arguments to the function to control how it works.
- *De-initialization*: Some times you can only initialize one setting at a time (e.g., loading an array that fills up your whole systems' memory). Set your cache size to 1 in those cases (as in this example) and, if any additional work is required beyond letting Python free the memory as the cache is updated, consider using [`cachetools`](https://cachetools.readthedocs.io/) and modifying the `popitem` function.