# Lab - Parallel Processing in Python

Parallel processing is a mode of operation where the task is executed simultaneously in multiple processors in the same computer. It is meant to reduce the overall processing time.

However, there is usually a bit of overhead when communicating between processes which can actually increase the overall time taken for small tasks instead of decreasing it.

In python, the `multiprocessing` module is used to run independent parallel processes by using subprocesses (instead of threads). It allows you to leverage multiple processors on a machine (both Windows and Unix), which means, the processes can be run in completely separate memory locations.

By the end of this lab you will know:

* How to structure the code and understand the syntax to enable parallel processing using `multiprocessing`?
* How to implement synchronous and asynchronous parallel processing?
* How to parallelize a `Pandas` DataFrame?
* Solve 3 different use cases with the `multiprocessing.Pool()` interface.

Import `multiprocessing`

In [1]:
import multiprocessing as mp

Determine the maximum number of parallel processes can you run?

In [2]:
print("Number of processors: ", mp.cpu_count())

Number of processors:  4


## What is Synchronous and Asynchronous execution?

In parallel processing, there are two types of execution: Synchronous and Asynchronous.

A _synchronous_ execution is one the processes are completed in the same order in which it was started. This is achieved by locking the main program until the respective processes are finished.

Asynchronous, on the other hand, doesn’t involve locking. As a result, the order of results can get mixed up but usually gets done quicker.

There are 2 main objects in multiprocessing to implement parallel execution of a function: The `Pool` Class and the `Process` Class.

### `Pool` Class

* Synchronous execution
    * `Pool.map()` and `Pool.starmap()`
    * `Pool.apply()`
    
* Asynchronous execution
    * `Pool.map_async()` and `Pool.starmap_async()`
    * `Pool.apply_async())`

### `Process` Class

Let’s take up a typical problem and implement parallelization using the above techniques. In this lab, we stick to the `Pool` class, because it is most convenient to use and serves most common practical applications.

More info on these classes [here]((https://docs.python.org/3/library/multiprocessing.html))

## Example

Given a 2D matrix (or list of lists), count how many numbers are present between a given range in each row. We will work on the list prepared below.

In [3]:
import numpy as np
from time import time

Generate the data. Create a matrix of 1 million rows and 5 columns filled with random integers. 


In [19]:
np.random.RandomState(100)
arr = np.random.randint(0, 10, size=[500000, 5])
data = arr.tolist()


In [5]:
len(data)

1000000

In [6]:
data[:10]

[[4, 6, 3, 7, 6],
 [4, 5, 5, 5, 4],
 [4, 2, 5, 3, 9],
 [8, 6, 3, 0, 1],
 [6, 7, 7, 7, 1],
 [3, 2, 9, 3, 6],
 [9, 3, 5, 1, 3],
 [4, 9, 3, 0, 6],
 [5, 2, 4, 7, 4],
 [8, 7, 6, 1, 9]]

### Implement solution without parallelization

Let’s see how long it takes to compute it without parallelization. For this, we iterate the function `howmany_within_range()` to check how many numbers lie within range and returns the count.

In [7]:
def howmany_within_range(row, minimum, maximum):
    """Returns how many numbers lie within `maximum` and `minimum` in a given `row`"""
    count = 0
    for n in row:
        if minimum <= n <= maximum:
            count = count + 1
    return count

In [14]:
import time

start = time.time()
results = []
for row in data:
    results.append(howmany_within_range(row, minimum=4, maximum=8))

end = time.time()
print(end - start)

4.150521278381348


In [9]:
len(results)

1000000

In [10]:
print(results[:10])

[4, 5, 2, 2, 4, 1, 1, 2, 4, 3]


## How to parallelize any function?

The general way to parallelize any operation is to take a particular function that should be run multiple times and make it run in parallel using different processors.

To do this, you initialize a _Pool_ with n number of processors (or cores) and pass the function you want to parallelize to one of _Pools_ parallization methods.

`multiprocessing.Pool()` provides the `apply()`, `map()` and `starmap()` methods to make any function run in parallel.

Nice! So what’s the difference between `apply()` and `map()`?

Both `apply` and `map` take the function to be parallelized as the main argument. But the difference is that `apply()` takes an _args_ argument that accepts the parameters passed to the _function-to-be-parallelized_ as an argument, whereas `map` can take only one _iterable_ as an argument.

So `map()` is really more suitable for simpler iterable operations but does the job faster.

We will get to `starmap()` once we see how to parallelize `howmany_within_range()` function with `apply()` and `map()`.

### Parallelizing using `Pool.apply()`

Let’s parallelize the `howmany_within_range()` function using `multiprocessing.Pool()`.

In [11]:
# Parallelizing using Pool.apply()

import multiprocessing as mp

start = time.time()

# Step 1: Init multiprocessing.Pool()
pool = mp.Pool(mp.cpu_count())
t1 = time.time()

# Step 2: `pool.apply` the `howmany_within_range()`
results = [pool.apply(howmany_within_range, args=(row, 4, 8)) for row in data]
t2 = time.time()

# Step 3: Don't forget to close
pool.close()    
end = time.time()

print(end - start)
print(t1 - start)
print(t2 - t1)

print(results[:10])

135.67092084884644
0.05547595024108887
135.61527705192566
[4, 5, 2, 2, 4, 1, 1, 2, 4, 3]


### Parallelizing using `Pool.map()`

`Pool.map()` accepts only one iterable as argument. So as a workaround, I modify the howmany_within_range function by setting a default to the minimum and maximum parameters to create a new `howmany_within_range_rowonly()` function so it accetps only an _iterable list_ of rows as input. This is not a nice usecase of map(), but it clearly shows how it differs from apply().

In [15]:
# Parallelizing using Pool.map()
import multiprocessing as mp

# Redefine, with only 1 mandatory argument.
def howmany_within_range_rowonly(row, minimum=4, maximum=8):
    count = 0
    for n in row:
        if minimum <= n <= maximum:
            count = count + 1
    return count

start = time.time()
pool = mp.Pool(mp.cpu_count())

t1 = time.time()
results = pool.map(howmany_within_range_rowonly, [row for row in data])

t2 = time.time()
pool.close()
end = time.time()

print(end - start)
print(t1 - start)
print(t2 - t1)


print(results[:10])
#> [3, 1, 4, 4, 4, 2, 1, 1, 3, 3]

6.380372047424316
0.1287531852722168
6.251415967941284
[3, 3, 1, 1, 2, 5, 4, 3, 3, 2]


### Parallelizing using `Pool.starmap()`

In previous example, we have to redefine `howmany_within_range` function to make couple of parameters to take default values. Using `starmap()`, you can avoid doing this. How you ask?

Like `Pool.map()`, `Pool.starmap()` also accepts only one iterable as argument, but in `starmap()`, each element in that iterable is also a iterable. You can to provide the arguments to the _function-to-be-parallelized_ in the same order in this inner iterable element, will in turn be unpacked during execution.

So effectively, `Pool.starmap()` is like a version of Pool.map() that accepts arguments.

In [17]:
# Parallelizing with Pool.starmap()
import multiprocessing as mp

start = time.time()
pool = mp.Pool(mp.cpu_count())

t1 = time.time()
results = pool.starmap(howmany_within_range, [(row, 4, 8) for row in data])

t2 = time.time()
pool.close()

end = time.time()

print(end - start)
print(t1 - start)
print(t2 - t1)

print(results[:10])
#> [3, 1, 4, 4, 4, 2, 1, 1, 3, 3]

17.380495071411133
0.04459524154663086
17.335542678833008
[3, 3, 1, 1, 2, 5, 4, 3, 3, 2]


## Asynchronous Parallel Processing

The asynchronous equivalents `apply_async()`, `map_async()` and `starmap_async()` lets you do execute the processes in parallel asynchronously, that is the next process can start as soon as previous one gets over without regard for the starting order. As a result, there is no guarantee that the result will be in the same order as the input.

## Parallelizing with Pool.apply_async()

`apply_async()` is very similar to apply() except that you need to provide a callback function that tells how the computed results should be stored.

However, a caveat with `apply_async()` is, the order of numbers in the result gets jumbled up indicating the processes did not complete in the order it was started.

A workaround for this is, we redefine a new `howmany_within_range2()` to accept and return the iteration number (i) as well and then sort the final results.

In [20]:
# Parallel processing with Pool.apply_async()

import multiprocessing as mp

start = time.time()
pool = mp.Pool(4)

results = []
t1 = time.time()

# Step 1: Redefine, to accept `i`, the iteration number
def howmany_within_range2(i, row, minimum, maximum):
    """Returns how many numbers lie within `maximum` and `minimum` in a given `row`"""
    count = 0
    for n in row:
        if minimum <= n <= maximum:
            count = count + 1
    return (i, count)


# Step 2: Define callback function to collect the output in `results`
def collect_result(result):
    global results
    results.append(result)
t2 = time.time()

# Step 3: Use loop to parallelize
for i, row in enumerate(data):
    pool.apply_async(howmany_within_range2, args=(i, row, 4, 8), callback=collect_result)
t3 = time.time()
# Step 4: Close Pool and let all the processes complete    
pool.close()
pool.join()  # postpones the execution of next line of code until all processes in the queue are done.
t4 = time.time()

# Step 5: Sort results [OPTIONAL]
results.sort(key=lambda x: x[0])
results_final = [r for i, r in results]
end = time.time()

print(end - start)
print(t1 - start)
print(t2 - t1)
print(t3 - t2)
print(t4 - t3)

print(results_final[:10])
#> [3, 1, 4, 4, 4, 2, 1, 1, 3, 3]

54.20174217224121
1.463257074356079
0.0007958412170410156
30.491361141204834
21.98329496383667
[1, 1, 2, 5, 2, 3, 4, 4, 3, 2]


It is possible to use `apply_async()` without providing a `callback` function. Only that, if you don’t provide a callback, then you get a list of `pool.ApplyResult` objects which contains the computed output values from each process. From this, you need to use the `pool.ApplyResult.get()` method to retrieve the desired final result.

In [None]:
# Parallel processing with Pool.apply_async() without callback function

import multiprocessing as mp
pool = mp.Pool(mp.cpu_count())

results = []

# call apply_async() without callback
result_objects = [pool.apply_async(howmany_within_range2, args=(i, row, 4, 8)) for i, row in enumerate(data)]

# result_objects is a list of pool.ApplyResult objects
results = [r.get()[1] for r in result_objects]

pool.close()
pool.join()
print(results[:10])
#> [3, 1, 4, 4, 4, 2, 1, 1, 3, 3]

## Parallelizing with `Pool.starmap_async()`

You saw how `apply_async()` works. Can you imagine and write up an equivalent version for starmap_async and map_async? The implementation is below anyways.

In [None]:
# Parallelizing with Pool.starmap_async()

import multiprocessing as mp
pool = mp.Pool(mp.cpu_count())

results = []

results = pool.starmap_async(howmany_within_range2, [(i, row, 4, 8) for i, row in enumerate(data)]).get()

# With map, use `howmany_within_range_rowonly` instead
# results = pool.map_async(howmany_within_range_rowonly, [row for row in data]).get()

pool.close()
print(results[:10])
#> [3, 1, 4, 4, 4, 2, 1, 1, 3, 3]

# Exercises

## Problem 1

Use `Pool.apply()` to get the row wise common items in list_a and list_b:

```
list_a = [[1, 2, 3], [5, 6, 7, 8], [10, 11, 12], [20, 21]]
list_b = [[2, 3, 4, 5], [6, 9, 10], [11, 12, 13, 14], [21, 24, 25]]
```

In [25]:
list_a = [[1, 2, 3], [5, 6, 7, 8], [10, 11, 12], [20, 21]]
list_b = [[2, 3, 4, 5], [6, 9, 10], [11, 12, 13, 14], [21, 24, 25]]

def common_items(list1,list2):
    list1_unique = set(list1)
    inter = list1_unique.intersection(list2)
    inter_list = list(inter)
    return inter

pool = mp.Pool(mp.cpu_count())
results = [pool.apply(common_items,args=(la,lb)) for la,lb in zip(list_a,list_b)]
pool.close()
print(results[:10])

[{2, 3}, {6}, {11, 12}, {21}]


## Problem 2 

Suppose you have three scripts named `script1.py`, `script2.py`, `script3.py`. Use `Pool.map()` to run them in parallel.


In [None]:
import os

process = ('script1.py', 'script2.py', 'script3.py')
           
def run_python(process):
    os.system('python {}'.format(process))

pool = mp.Pool(process=3)
pool.map(run_python,process)

## Problem 3

Normalize each row of 2d array (list) to vary between 0 and 1. Exceute in parallel.

`list_a = [[2, 3, 4, 5], [6, 9, 10, 12], [11, 12, 13, 14], [21, 24, 25, 26]]`

In [29]:
list_a = [[2, 3, 4, 5], [6, 9, 10, 12], [11, 12, 13, 14], [21, 24, 25, 26]]

def normalize(list_1):
    min1 = min(list_1)
    max1 = max(list_1)
    return [(i - min1)/(max1-min1) for i in list_1]

pool = mp.Pool(mp.cpu_count())
results = [pool.apply(normalize,args=(l1,)) for l1 in list_a]
pool.close()
print(results[:10])

[[0.0, 0.3333333333333333, 0.6666666666666666, 1.0], [0.0, 0.5, 0.6666666666666666, 1.0], [0.0, 0.3333333333333333, 0.6666666666666666, 1.0], [0.0, 0.6, 0.8, 1.0]]
