## <font color='green'>JOBLİB</font> 

Joblib is a set of tools to provide lightweight pipelining in Python. In particular:

1. transparent disk-caching of functions and lazy re-evaluation (memoize pattern)
2. easy simple parallel computing

Joblib is optimized to be **fast** and **robust** on large data in particular and has specific optimizations for numpy arrays. It is **BSD-licensed.**

Joblib provides a simple helper class to write parallel for loops using multiprocessing. The core idea is to write the code to be executed as a generator expression, and convert it to parallel computing:

In [2]:
from math import sqrt
import time
start_t = time.time()
[sqrt(i ** 2) for i in range(10)]
print("exe_time-sec: ", time.time()-start_t)

exe_time-sec:  0.0


can be spread over 2 CPUs using the following:

In [3]:
import math
from joblib import Parallel, delayed

start_t = time.time()
Parallel(n_jobs=2)(delayed(sqrt)(i ** 2) for i in range(10))
print("exe_time-sec: ", time.time()-start_t)

exe_time-sec:  0.2472999095916748


By default **joblib.Parallel** uses the **'loky'** backend module to start separate Python worker processes to execute tasks concurrently on separate CPUs. This is a reasonable default for generic Python programs but can induce a significant overhead as the input and output data need to be serialized in a queue for communication with the worker processes

When you know that the function you are calling is based on a compiled extension that releases the Python Global Interpreter Lock (GIL) during most of its computation then it is more efficient to use threads instead of Python processes as concurrent workers. For instance this is the case if you write the CPU intensive part of your code inside a with nogil block of a Cython function.

To hint that your code can efficiently use threads, just pass prefer="threads" as parameter of the joblib.Parallel constructor. In this case joblib will automatically use the "threading" backend instead of the default "loky" backend:

In [4]:
start_t = time.time()
Parallel(n_jobs=2, prefer="threads")(delayed(sqrt)(i ** 2) for i in range(10))
print("exe_time-sec: ", time.time()-start_t)

exe_time-sec:  0.11553597450256348


### Shared-memory semantics

The default backend of joblib will run each function call in isolated Python processes, therefore they cannot mutate a common Python object defined in the main program.

However if the parallel function really needs to rely on the shared memory semantics of threads, it should be made explicit with require='sharedmem', for instance:

In [5]:
shared_set = set()
def collect(x):
    shared_set.add(x)

Parallel(n_jobs=2, require='sharedmem')(delayed(collect)(i) for i in range(5))

[None, None, None, None, None]

In [6]:
sorted(shared_set)

[0, 1, 2, 3, 4]

Keep in mind that relying a on the shared-memory semantics is probably suboptimal from a performance point of view as concurrent access to a shared Python object will suffer from lock contention.m

ref : 
https://joblib.readthedocs.io/en/latest/

## <font color='green'>Examples for Joblib</font>

General examples : General-purpose and introductory examples for joblib.

### Random state within joblib.Parallel

Randomness is affected by parallel execution differently by the different backends.

In particular, when using multiple processes, the random sequence can be the same in all processes. This example illustrates the problem and shows how to work around it.

In [7]:
import numpy as np
from joblib import Parallel, delayed

A utility function for the example

In [8]:
def print_vector(vector, backend):
    """Helper function to print the generated vector with a given backend."""
    print('\nThe different generated vectors using the {} backend are:\n {}'
          .format(backend, np.array(vector)))
    

#### Sequential behavior

*stochastic_function* will generate five random integers. When calling the function several times, we are expecting to obtain different vectors. For instance, we will call the function five times in a sequential manner, we can check that the generated vectors are all different.

In [9]:
def stochastic_function(max_value):
    """Randomly generate integer up to a maximum value."""
    return np.random.randint(max_value, size=5)


n_vectors = 5
random_vector = [stochastic_function(10) for _ in range(n_vectors)]
print('\nThe different generated vectors in a sequential manner are:\n {}'
      .format(np.array(random_vector)))


The different generated vectors in a sequential manner are:
 [[5 9 4 1 5]
 [7 1 0 8 2]
 [7 5 0 7 7]
 [0 2 1 7 4]
 [4 3 1 5 8]]


#### Parallel behavior

Joblib provides three different backend: loky (default), threading, and multiprocessing.

In [10]:
backend = 'loky'
random_vector = Parallel(n_jobs=2, backend=backend)(delayed(
    stochastic_function)(10) for _ in range(n_vectors))
print_vector(random_vector, backend)


The different generated vectors using the loky backend are:
 [[9 5 4 4 3]
 [6 3 4 9 9]
 [3 3 6 9 1]
 [3 4 9 2 7]
 [2 0 3 4 4]]


In [11]:
backend = 'threading'
random_vector = Parallel(n_jobs=2, backend=backend)(delayed(
    stochastic_function)(10) for _ in range(n_vectors))
print_vector(random_vector, backend)


The different generated vectors using the threading backend are:
 [[1 8 7 3 7]
 [6 2 0 5 9]
 [8 7 7 4 7]
 [2 1 4 9 9]
 [0 8 2 3 4]]


Loky and the threading backends behave exactly as in the sequential case and do not require more care. However, this is not the case regarding the multiprocessing backend.



Some of the generated vectors are exactly the same, which can be a problem for the application.

Technically, the reason is that all forked Python processes share the same exact random seed. As a results, we obtain twice the same randomly generated vectors because we are using n_jobs=2. A solution is to set the random state within the function which is passed to joblib.Parallel.

stochastic_function_seeded accepts as argument a random seed. We can reset this seed by passing None at every function call. In this case, we see that the generated vectors are all different.

ref : 
https://www.tutorialdocs.com/tutorial/joblib/examples.html