# Parallelization

Python will use a single processor by default.  We can do things with mulptiple cores in many ways.  Here are some common methods for doing so with examples using the multiprocessing module using the process and pool examples.

The overhead associated with parallelization is discussed.  Note that in the discussion here, no distinction is made between concurrency and parallelism.

## Multiprocessing Process

The multiprocessing module has a process function that allows us to spawn individual threads, which may do similar or different things.

In [15]:
import os as os
from multiprocessing import Process

def squareInput( inputVal ):
    output = inputVal ** 2
    pid = os.getpid()
    print( '{0} squared to {1} by pid: {2}'.format(inputVal, output, pid) )
     
vals = [1, 2, 3, 4, 5]
procsUsed = []

for indVal, valHere in enumerate(vals):
    procThis = Process( target = squareInput, args = ( valHere, ) )
    procsUsed.append( procThis )
    procThis.start( )

for procThis in procsUsed:
    procThis.join( )

1 squared to 1 by pid: 92505
2 squared to 4 by pid: 92506
3 squared to 9 by pid: 92507
4 squared to 16 by pid: 92508
5 squared to 25 by pid: 92509


## Multiprocessing Pool

The multiprocessing module allows you to map inputs to a repeated function onto a pool of processors using the pool function.

Be sure to close the pool of processers - you may get 'Open File' errors.

In [16]:
from multiprocessing import Pool

def fxnSer( input ):
    return input * input

procs = Pool( 2 )
outs = procs.map( fxnSer, range( 10 ) )
print( outs )
procs.close( )

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


In [17]:
%%timeit
procs = Pool( 1 )
outs = procs.map(fxnSer, range(1000))
procs.close()

14.6 ms ± 2.84 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [18]:
%%timeit
procs = Pool( 2 )
outs = procs.map( fxnSer, range( 1000 ) )
procs.close( )

16.9 ms ± 460 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


There is overhead associated with creating the parallel processes - what's done in the function needs to be compute intensive enough to take enough time.

Addditionally, not all compute resources will provide multiple CPUs - know the limitations of the jupyter environment you are trying to run in.  (Also, just because we have access to it, doesn't mean we can use it with other tasks and users on the system going.)

In [19]:
import multiprocessing as mp
print("#-# How many cpus do I have access to?")
print( mp.cpu_count( ) )

8


We can do I/O in parallel using multiprocessing by breaking up a file into sections to read with individual processors.  Here, we read in the complete works of Shakespeare (available: https://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespeare.txt)

In [20]:
from multiprocessing import Pool

def readLine( line ):
    return "%s" % line

procs = 4
pool = Pool( procs )
with open( 't8.shakespeare.txt' ) as skspFile:
    fullText = pool.map( readLine, skspFile, procs)

pool.close()

In [21]:
%%timeit
procs = 1
pool = Pool( procs )
with open( 't8.shakespeare.txt' ) as skspFile:
    fullText = pool.map( readLine, skspFile, procs)

pool.close()

6.99 s ± 681 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [22]:
%%timeit
procs = 2
pool = Pool( procs )
with open( 't8.shakespeare.txt' ) as skspFile:
    fullText = pool.map( readLine, skspFile, procs)

pool.close()

3.32 s ± 222 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [23]:
%%timeit
procs = 4
pool = Pool( procs )
with open( 't8.shakespeare.txt' ) as skspFile:
    fullText = pool.map( readLine, skspFile, procs)

pool.close()

1.82 s ± 21.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


You'll note that the time likely isn't being cut down by half each time - the overhead associated with the parallelization (creating the tasks, fitting the data back together) won't allow this to be a perfect half.  It is possible to have times that are lower than half when the compute resources are shared with other tasks/users

## Other Parallelization Modules

There are may different parallelization modules for python.  Here are some of the most commonly used:
* mpi4py: Use MPI commands within python; must link to existing MPI library
* PyMP: OpenMP for python
* VecPy: SIMD extensions for vectorization (Python 3 only)

Find more information at: https://wiki.python.org/moin/ParallelProcessing

# Check yourself

In [24]:
# Variables used in this example
import numpy as np
largeArray = np.arange(50000)

Create a function that cubes each number given to it.  Use pool to run this on largeArray using 1, 2, ..., topNum where topNum is the number of cores you have access to.

In [25]:
# Try it here
