<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Which-cache-to-use?" data-toc-modified-id="Which-cache-to-use?-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Which cache to use?</a></span><ul class="toc-item"><li><span><a href="#Compare-performance" data-toc-modified-id="Compare-performance-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Compare performance</a></span></li><li><span><a href="#Unhashable-objects-as-input" data-toc-modified-id="Unhashable-objects-as-input-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Unhashable objects as input</a></span></li><li><span><a href="#Technicalities" data-toc-modified-id="Technicalities-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Technicalities</a></span><ul class="toc-item"><li><span><a href="#Source-code-hashing" data-toc-modified-id="Source-code-hashing-1.3.1"><span class="toc-item-num">1.3.1&nbsp;&nbsp;</span>Source code hashing</a></span></li></ul></li></ul></li></ul></div>

In [1]:
!pip install diskcache

Defaulting to user installation because normal site-packages is not writeable


In [2]:
import numpy as np
import joblib
import diskcache
from cartesian_explorer.caches import JobLibCache, FunctoolsCache_Disk, FunctoolsCache

In [3]:
%load_ext autoreload
%autoreload 2

# Which cache to use?

In [4]:
fcache = FunctoolsCache()
jcache = JobLibCache('/tmp/cache_jl')
dcache = FunctoolsCache_Disk('/tmp/cache_dc')

## Compare performance

In [5]:
fcache = FunctoolsCache()
jcache = JobLibCache('/tmp/cache_jl')
dcache = FunctoolsCache_Disk('/tmp/cache_dc')

In [6]:
def func(x):
    print('func called', x)
    return list(np.linspace(0, x, 1000))

cached_f = fcache.wrap(func)
cached_j = jcache.wrap(func)
cached_d = dcache.wrap(func)
#dcache.clear(cached_d) # clear state since we may have data on disk

In [7]:
%timeit cached_f(10)

func called 10
862 ns ± 107 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [8]:
%timeit cached_d(10)

329 µs ± 96.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [9]:
%timeit cached_j(10)

6.75 ms ± 1.16 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)


- `FunctoolsCache` stores data in RAM and is the fastest, but does not preserve data between restarts.
- `FunctoolsCache_Disk` uses `diskcache` as cache storage and `joblib` for function ispection. Fast disk storage.
- `JobLibCache` uses `joblib.Memory` and supports un-hashable large objects.

## Unhashable objects as input

In [10]:
def func(x):
    print('func called', x)
    return list(np.linspace(0, x[0], 1000))

cached_f = fcache.wrap(func)
cached_j = jcache.wrap(func)
cached_d = dcache.wrap(func)
#dcache.clear(cached_d) # clear state since we may have data on disk

In [11]:
def print_hashable_support(func):
    try:
        len(func([10]))
    except TypeError as e:
        print('Nope!', e)
    else:
        print('Yep!')

In [12]:
print('Does this cache support un-hashable inputs?')
print('- Functools:')
print_hashable_support(cached_f)
print('- Diskcache:')
print_hashable_support(cached_d)
print('- JobLib:')
print_hashable_support(cached_j)

Does this cache support un-hashable inputs?
- Functools:
Nope! unhashable type: 'list'
- Diskcache:
Nope! unhashable type: 'list'
- JobLib:
func called [10]
Yep!


---

---

## Technicalities

In [13]:
@dcache.wrap
def func_b(x, y):
    print('func called', x)
    return list(np.linspace(0, x, y))
@dcache.wrap
def func_a(x, y):
    print('func called', x)
    y = y*3
    return list(np.linspace(0, x, y+1))


In [14]:
dcache.clear(func_a)
dcache.clear(func_b)

In [15]:
len(func_b(1, 200))

func called 1


200

In [16]:
len(func_a(1, 200))

func called 1


601

### Source code hashing

In [17]:
func.__code__.co_filename


'<ipython-input-10-5dbf27bc577c>'

In [18]:
import inspect

In [19]:
inspect.getsourcelines(func)

(['def func(x):\n',
  "    print('func called', x)\n",
  '    return list(np.linspace(0, x[0], 1000))\n'],
 1)

Default `hash()` in python is salted so it will return different hashes for same string in different sessions.

Need to use hashlib

In [20]:
import hashlib

Default `hash()` in python is salted, so need to use hashlib

In [21]:
hashlib.sha1('th1'.encode()).hexdigest()

'a08f193dc39e5c9488e0cd6d21f7620bfec99a12'