# Ayudantía 1 - Notebook 4
### Profesor: Elwin van 't Wout
### Ayudante: Alberto Almuna Morales (alberto.almuna@uc.cl)

The library `joblib` provides functionality for parallel computing. In this notebook, let us look into the different parallel backends.

In [1]:
from joblib import Parallel, delayed

Let us create the tasks to be performed in parallel by the workers:

In [2]:
def my_task(n):
    my_sum = 0
    for m in range(n):
        my_sum += m
    return my_sum

In [3]:
tasks = [delayed(my_task)(i) for i in range(40000)]

The `joblib` library has different parallel backends that perform can perform the parallel computations. By default, the `loky` backend is used, which is based on the *multiprocessing* model, creating different Python processes and assigning tasks to each of these processes. Alternatively, the `threading` backend is based on the *multithreading* model, creating different threads within the same process. Generally speaking, creating and managing processes requires more overhead than threads, but is more robust and reliable since there will be no race conditions.

Since all tasks in this tutorial are independent, there is no risk of ```race conditions``` and both backends can be used.

When performing the parallel tasks, look in the *Activity Monitor* or *Task Manager* to see the different processes.

In [4]:
n_jobs = 4
batch_size = 1000
verbose = 10

In [8]:
with Parallel(n_jobs=n_jobs, batch_size=batch_size, verbose=verbose, backend='loky') as parallel_pool:
    parallel_results = parallel_pool(tasks)

[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   2 tasks      | elapsed:    0.2s
[Parallel(n_jobs=4)]: Done   8 tasks      | elapsed:    0.2s
[Parallel(n_jobs=4)]: Done 5008 tasks      | elapsed:    0.4s
[Parallel(n_jobs=4)]: Done 10008 tasks      | elapsed:    0.9s
[Parallel(n_jobs=4)]: Done 17008 tasks      | elapsed:    2.1s
[Parallel(n_jobs=4)]: Done 24008 tasks      | elapsed:    5.3s
[Parallel(n_jobs=4)]: Done 33008 tasks      | elapsed:   10.2s
[Parallel(n_jobs=4)]: Done 35701 tasks      | elapsed:   11.4s
[Parallel(n_jobs=4)]: Done 36790 tasks      | elapsed:   12.4s
[Parallel(n_jobs=4)]: Done 38780 tasks      | elapsed:   13.2s
[Parallel(n_jobs=4)]: Done 40000 out of 40000 | elapsed:   14.0s finished


In [7]:
with Parallel(n_jobs=n_jobs, batch_size=batch_size, verbose=verbose, backend='threading') as parallel_pool:
    parallel_results = parallel_pool(tasks)

[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   2 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done   8 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 5008 tasks      | elapsed:    1.6s
[Parallel(n_jobs=4)]: Done 10008 tasks      | elapsed:    5.0s
[Parallel(n_jobs=4)]: Done 17008 tasks      | elapsed:    7.7s
[Parallel(n_jobs=4)]: Done 24008 tasks      | elapsed:   12.8s
[Parallel(n_jobs=4)]: Done 33008 tasks      | elapsed:   22.9s
[Parallel(n_jobs=4)]: Done 35701 tasks      | elapsed:   25.5s
[Parallel(n_jobs=4)]: Done 37691 tasks      | elapsed:   27.1s
[Parallel(n_jobs=4)]: Done 38780 tasks      | elapsed:   29.2s
[Parallel(n_jobs=4)]: Done 40000 out of 40000 | elapsed:   31.0s finished


The efficiency of each backend strongly depends on the specific arquitecture of the computer and the tasks to be performed.

Instead of specifying the the backend explicitly, one can also nudge `joblib` into using multiprocessing or multithreading by specifying ```prefer='processes'``` or ```prefer='threading'``` when creating the `Parallel` object.

For testing purposes, it can be useful to force sequential code by using ```backend='sequential'```.

In [9]:
with Parallel(n_jobs=n_jobs, batch_size=batch_size, verbose=verbose, backend='sequential') as parallel_pool:
    parallel_results = parallel_pool(tasks)

[Parallel(n_jobs=4)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done   4 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done   7 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done  12 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done  17 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done  24 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done  31 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done  40 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done  49 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done  60 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done  71 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done  84 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done  97 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 112 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 127 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Done 144 tasks      | elapsed:    0.0s
[Parallel(n_jobs=4)]: Do

In [10]:
# Implementación secuencial directa:

import time as tm

ti = tm.time()
sequential_result = [my_task(i) for i in range(40000)]
tf = tm.time()

print(tf-ti)

27.782367944717407
