# Parallel computing with Python

https://docs.python.org/2/library/multiprocessing.html

We have seen how one can use OpenMP and shared memory programming to parallelized python code (well, Cython).

## Multiprocessing library

**multiprocessing** allows to utilize multiple processors on a given machine. It introduces a Pool object which offers a means of parallelizing the execution of a function across multiple input values, distributing the input data across processes (**data parallelism**). 

### Pool class

This basic example of data parallelism using the Pool class:

In [2]:
import multiprocessing as mp

def f(x):
    return x*x

if __name__ == '__main__':
    #specify number of child processes to spawn, use <= number of processes available
    nprocs = mp.cpu_count()
    p = mp.Pool(nprocs)
    print(p.map(f, [1, 2, 3, 4]))
    print(mp.cpu_count())

[1, 4, 9]
4


### Process and Queue classes

In `multiprocessing`, individual processes are spawned by creating a `Process` object or sub-classing it.
In the following example we are going to use the `multiprocessing.Queue` class which returns a process shared queue implemented using a pipe and a few locks. When a process first puts an item on the queue a feeder thread is started which transfers objects from a buffer into the pipe.

In [6]:
from multiprocessing import Process, Queue

class Worker(Process):

    def __init__(self, queue, idx, data):
        super(Worker, self).__init__()
        self.queue = queue
        self.idx = idx
        self.data = data

    def square(self):
        self.data = map(lambda x: x*x, self.data)
        return "Process idx=%s is called '%s'" % (self.idx, self.name)

    def run(self):
        self.queue.put(self.square())

## Create a list to hold running Worker objects
worker_processes = list()

if __name__ == "__main__":

    data = [1,2,3,4]  
    q = Queue()
    for i in range(0,5):
        p=Worker(queue=q, idx=i,data=data)
        worker_processes.append(p)
        p.start()
    for proc in worker_processes:
        proc.join()
        print "RESULT: %s" % q.get()

RESULT: Process idx=0 is called 'Worker-19'
RESULT: Process idx=1 is called 'Worker-20'
RESULT: Process idx=2 is called 'Worker-21'
RESULT: Process idx=3 is called 'Worker-22'
RESULT: Process idx=4 is called 'Worker-23'


## Joblib library

https://pythonhosted.org/joblib/

Joblib is a set of tools to provide data parallelism and pipelining in Python. 

joblib offers:
   1. disk-caching of the output values and lazy re-evaluation
   1. easy simple parallel computing

Let us go back to our dot poroduct example, and we will assume that we need to perfrom this to a list of vectors.

In [9]:
import numpy as np
from cython_dot2 import dot_product

Nvectors = 10
results = list()

for round in range(Nvectors):
    vec1 = np.arange(100000000,dtype=float)
    vec2 = np.arange(100000000,dtype=float)
    results.append(dot_product(vec1,vec2))

%time results 

ImportError: No module named cython_dot2

In [10]:
from joblib import Parallel, delayed  
import multiprocessing
import numpy as np
from cython_dot2 import dot_product

num_cores = 2 #multiprocessing.cpu_count()-1
print("Running on ", num_cores, " CPU cores")

Nvectors = 10
results = list()

def getProduct():
    vec1 = np.arange(100000000,dtype=float)
    vec2 = np.arange(100000000,dtype=float)
    return dot_product(vec1,vec2)

results = Parallel(n_jobs=num_cores)(delayed(getProduct)() for i in range(Nvectors))  

%time results    

ImportError: No module named joblib