# Python Multiprocessing
## Martin Robinson
## Oct 2019

# Python multiprocessing

- The **Python Global Interpreter Lock (GIL)** prevents more than one thread from executing the interpreter at a time (i.e. no parallel multithreading)
- However, you can run multiple *processes* in parallel
    - Both processes and threads are independent sequences of executing instuctions, but threads are treated as more lightweight and share the same address space (i.e. shared memory)
- The Python **multiprocessing** module provides an easy facility for **process creation, synchronisation and communication**

# Process creation

- Create a `multiprocessing.Process` object to spawn a new process

In [None]:
import multiprocessing as mp

def f(name):
    print('hello', name)

p = mp.Process(target=f, args=('bob',))
p.start()
p.join()

# Interprocess communication

- Can exchange objects between processes using `Queue` and `Pipe`
    - `Queue` is a one-way first-in-first-out (FIFO) queue of objects (thread-safe, implemented with a `Pipe` and a few locks/semaphores) 
    - `Pipe` is a two-way pipe between a pair of processes. Might become corrupted if two processes send at the same time


In [None]:
def f(queue, conn):
    queue.put('hello from queue!')
    conn.send('hello from pipe!')
    conn.close()

if __name__ == '__main__':
    queue = mp.Queue()
    parent_conn, child_conn = mp.Pipe()
    p = mp.Process(target=f, args=(queue,child_conn))
    p.start()
    print(queue.get())
    print(parent_conn.recv())
    p.join()

# Aside - Python pickle

- Objects transferred between processes must be serialised (converted to a stream of bytes)
- This is achieved via the `pickle` module
- Any object that is *pickleable* can be transferred between processes. This includes all the standard python containers (including numpy arrays)
- `pickle` is also very useful for storing arbitrary objects to files

In [None]:
import pickle

favorite_color = { "lion": "yellow", "kitty": "red" }

# note: pickle is a binary serialisation format!
pickle.dump( favorite_color, open( "save.p", "wb" ) ) 

saved_favorite_color = pickle.load( open( "save.p", "rb" ) )
print(saved_favorite_color)

# Process synchronisation

- `multiprocessing` has a number of synchronisation mechanisms: `Lock`, `RLock`, `Semaphore`, `Event`, `Barrier`
- all are useful for concurrent programming, but this is a bit beyond the scope of this lecture, so will just provide a simple example of a lock:



In [None]:
def f(l, i):
    l.acquire()
    try:
        print('hello world', i)
    finally:
        l.release()

if __name__ == '__main__':
    lock = mp.Lock()

    for num in range(10):
        mp.Process(target=f, args=(lock, num)).start()

# Process pools

- We mainly want to focus on data parallism, that is we want to calculate the result of a function across multiple data inputs
- The `multiprocessing` module provides the ability to create *pools* of processes and use these to parallelise function evaluations

In [None]:
def f(x):
    return x*x

if __name__ == '__main__':
    with mp.Pool(5) as p:
        print(p.map(f, [1, 2, 3]))

# Process pools

- `imap` is a lazy version of `map`

In [None]:
def f(x):
    return x*x

if __name__ == '__main__':
    with mp.Pool(5) as p:
        # imap returns an iterator
        print(p.imap(f, [1, 2, 3]))
        
        # loop is run as the results become available
        for i in p.imap(f, [1, 2, 3]):
            print(i)
            
         # result might be in a different order
        for i in p.imap_unordered(f, [1, 2, 3]):
            print(i)

# Process pools

- `starmap` is a version of `map` that allows you to use functions expecting multiple inputs

In [None]:
def f(x, y):
    return x*y

if __name__ == '__main__':
    with mp.Pool(5) as p:
        print(p.starmap(f, [(1,1), (2,2), (3,3)]))

# Summary

- Parallel programming in Python is complicated by the GIL
- **but**, as long as your objects are pickleable, the `multiprocessing` library provides an easy way to do multiprocess parallism, especially with `Pool`
- There is also nothing to stop you using OpenMP within C/C++, then wrapping this to use within Python