# Joblib tutorial

Joblib can be used for

- Transparent and fast disk-caching of output value
- Embarrassingly parallel helper
- Logging/tracing
- Fast compressed Persistence

In [1]:
from joblib import Memory
import numpy as np

#### 1- Transparent and fast disk-caching of output value: 

a memoize or make-like functionality for Python functions that works well for arbitrary Python objects, including very large numpy arrays. Separate persistence and flow-execution logic from domain logic or algorithmic code by writing the operations as a set of steps with well-defined inputs and outputs: Python functions. Joblib can save their computation to disk and rerun it only if necessary:

In [2]:
mem = Memory(cachedir='/tmp/joblib')
a = np.vander(np.arange(3)).astype(np.float)
square = mem.cache(np.square)

In [3]:
%%time
# The call below did not trigger an evaluation
c = square(a)

CPU times: user 4 ms, sys: 0 ns, total: 4 ms
Wall time: 2.35 ms


#### 2- Embarrassingly parallel helper:

to make it easy to write readable parallel code and debug it quickly:

In [4]:
from joblib import Parallel, delayed
from math import sqrt

In [27]:
%%time 
r = Parallel(n_jobs=4)(delayed(sqrt)(i) for i in range(10**3))

CPU times: user 32 ms, sys: 20 ms, total: 52 ms
Wall time: 139 ms


In [30]:
%%time 
r2 = [sqrt(i) for i in range(10**3)]

CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 142 µs


#### 3 - Logging/tracing: 

The different functionalities will progressively acquire better logging mechanism to help track what has been ran, and capture I/O easily. In addition, Joblib will provide a few I/O primitives, to easily define logging and display streams, and provide a way of compiling a report. We want to be able to quickly inspect what has been run.


#### 4 - Fast compressed Persistence: 
a replacement for pickle to work efficiently on Python objects containing large data ( joblib.dump & joblib.load ).