# About This Notebook

This notebook contains an example of the usage of the `gdbm_example.DBStorage` implementation of the `yaci.base_storage.CacheStorage` interface. **NOTE:** This is a reference implementaiton and should not be used in production without consideration with a few details that are glossed over here. For example:

* The use of `pickle` for serialization is insecure.
* No care is taken in the reference implementation to ensure that the location of the cache is modified, so you could run into a case where you accidentally invalidate the cache
* The keys and values for a `gdbm` implementation must be strings.

There are more reasons why you souldn't use the reference implementation in production code, but the point is that care must be taken when implementing a cache storage backend, and the examples here are quite niave.

In [1]:
import gdbm_example
from yaci import contexts
from yaci import dict_storage
from yaci import cache_manager

## A Pipeline Example

Here we create class for invoking functions in a pipeline, a bit contrived, but useful for this example. This pipeline will invoke the functions passed to its `run` method. First, we create an instance that uses the `dict_storage.NoopDictStorage` `CacheStorage` implementation. This instance runs each function every time. Next, we create an instance that uses our `gdbm_example.DBStorage` `CacheStorage` implementation. we now see that 

In [2]:
class Pipeline(object):
    def __init__(self, cache=None):
        # if the user does not supply a CacheManager then
        # we default do not cache
        self.cache = cache if cache is not None else cache_manager.CacheManager(storage=dict_storage.NoopDictStorage())
    
    def run(self, *inputs):
        """
        Tuples of functions and args to invoke. If no args are None, then the args from the previous step are used.
        
        pipline.run((foo, ['a', 'b']), (bar, None), )
        """
        
        last_result = None
        for func, args in inputs:
            next_args = args
            if next_args is None:
                next_args = last_result 
            key = str((func.__name__, next_args))
            last_result = self.cache.default_get(key, func, next_args)
            # print('Invoking {0} with arguments {1}'.format(func.__name__, next_args))
        return last_result

In [3]:
def double(x):
    print("Actually excuting the function double")
    return x**2

def mymax(x):
    print("Actually executing the function mymax.")
    return max(x)

In [4]:
pipeline = Pipeline()
pipeline.run((mymax, [1, 2, 3, 4]), (double, None))
pipeline.run((mymax, [1, 2, 3, 4]), (double, None))

Actually executing the function mymax.
Actually excuting the function double
Actually executing the function mymax.
Actually excuting the function double


16

In [5]:
storage = gdbm_example.DBStorage()
cache = cache_manager.CacheManager(storage=storage)

pipeline2 = Pipeline(cache=cache)

In [6]:
pipeline2.run((mymax, [1, 2, 3, 4]), (double, None))
pipeline2.run((mymax, [1, 2, 3, 4]), (double, None))

Actually executing the function mymax.
Actually excuting the function double


16

In [7]:
pipeline2.cache.close()

### This is why you shouldn't use this implementation in production!

The reference implementation doesn't deal with cleaning up resources.

In [8]:
pipeline2.run((mymax, [1, 2, 3, 4]), (double, None))

error: GDBM object has already been closed

### But!

We can create another storage instance, and notice that the results of our functions have persisted.

In [9]:
storage = gdbm_example.DBStorage()
cache = cache_manager.CacheManager(storage=storage)

In [10]:
pipeline3 = Pipeline(cache=cache)

In [12]:
pipeline3.run((mymax, [1, 2, 3, 4]), (double, None))
pipeline3.run((mymax, [1, 2, 3, 4]), (double, None))

16

### Now clear the cache and run again

In [13]:
pipeline3.cache.clear()

In [14]:
pipeline3.run((mymax, [1, 2, 3, 4]), (double, None))
pipeline3.run((mymax, [1, 2, 3, 4]), (double, None))

Actually executing the function mymax.
Actually excuting the function double


16