# Parallel computing

- distributing the workload of complex processes can speed up work
- a lot of processes can be parallelised
- there are however cases where subprocesses can not be split up any more

- there is a fundamental limit to the speed up archievable

In [1]:
from joblib import Parallel, delayed
from math import sqrt
import time
import os 
os.system("taskset -p 0xFFFFFFFF %d" % os.getpid())
import numpy as np

def f(k):
    return 2*k

def benchmark(function, function_name):
    start = time.time()
    function()
    end = time.time()
    print("{0} seconds for {1}".format((end - start), function_name))

#### Normal fast code

In [2]:
def list_single_thread(BIG=20000):
    return [f(i) for i in range(BIG)]

#### Parallelized code

In [16]:
def list_multi_thread(n_jobs=2, BIG=20000): #default parameter is 2 cores
    return Parallel(n_jobs=n_jobs)(delayed(f)(i for i in np.split(np.arange(BIG), n_jobs)))

#### Do a benchmark

In [12]:
%time _ = list_single_thread()

CPU times: user 3.64 ms, sys: 4.06 ms, total: 7.7 ms
Wall time: 7.55 ms


In [18]:
list_multi_thread()

TypeError: cannot unpack non-iterable function object

# But why is it not working?
- the task is not 

In [6]:
def parallel_dot(A,B,n_jobs=2):
    """
     Computes A x B using more CPUs.
     This works only when the number 
     of rows of A and the n_jobs are even.
    """
    parallelizer = Parallel(n_jobs=n_jobs)
    # this iterator returns the functions to execute for each task
    tasks_iterator = (delayed(np.dot)(A_block,B) 
                      for A_block in np.split(A,n_jobs))
    result = parallelizer(tasks_iterator)
    # merging the output of the jobs
    return np.vstack(result)

A = np.random.randint(0,high=10,size=(1000,1000))
B = np.random.randint(0,high=10,size=(1000,1000))

In [7]:
%time _ = np.dot(A,B)

CPU times: user 1.92 s, sys: 0 ns, total: 1.92 s
Wall time: 1.92 s


In [8]:
%time _ = parallel_dot(A,B,n_jobs=4)

CPU times: user 34.8 ms, sys: 18.9 ms, total: 53.7 ms
Wall time: 1.04 s
