# Learn Python Programming Meetup : Parallel Processing

### Hosted by:
Nishant Gandhi (DataRobot Inc)

**Join Us:** <br>
Slack: https://join.slack.com/t/learnpythonboston/shared_invite/zt-cvplmooz-rPBRaXBqh0xuXrGbeCwj~Q

**Learn More:** <br>
Github: https://github.com/Learn-Python-Programming-Meetup/workshop-content-archive <br>
Meetup: https://www.meetup.com/Learn-Python-Programming

### Topics in this Notebook:

+ Overview of Parallel Processing in Python
+ multiprocessing
+ multiprocessing: Programming guidelines
+ Usecase: Building Machine Learning Model in Parallel


## Overview for Parallel Processing in Python

### Category of Parallel Processing

+ Thread Based
+ Process Based
+ Distributed Processing Based
+ Concurrecy Based (Async)

We will be focusing on **Process Based** parallel computing in Python.

In [30]:
import multiprocessing
import os

from multiprocessing import Process, Pool

In [31]:
print("Number of cpu : ", multiprocessing.cpu_count())
print("Main PID : ", os.getpid())

Number of cpu :  8
Main PID :  56143


### multiprocessing.Process Module

In [59]:
def f(x):
    result = x*x
    print("Patent PID, Current PID, Result: ", os.getppid(), os.getpid(), result)
    return result

In [60]:
if __name__ == '__main__':
    result = 0.0
    for num in range(2,5):
        p = Process(target=f, args=(num,))
        p.start()
        p.join()
    print(result)

Patent PID, Current PID, Result:  56143 85929 4
Patent PID, Current PID, Result:  56143 85930 9
Patent PID, Current PID, Result:  56143 85931 16
0.0


### Interprocess Communication

#### Shared Memory

In [46]:
from multiprocessing import Process, Value, Array

In [68]:
def f(n, a):
    print("Patent PID, Current PID:", os.getppid(), os.getpid())
    n.value = 3.1415927
    for i in range(len(a)):
        a[i] = a[i] + 1

def f1(n, a):
    print("Patent PID, Current PID:", os.getppid(), os.getpid())
    n.value = 2
    for i in range(len(a)):
        a[i] = a[i] + 4
        
if __name__ == '__main__':
    # creating shared variables
    num = Value('d', 0.0)
    arr = Array('i', range(10))
    
    print(num.value)
    print(arr[:])

    p = Process(target=f, args=(num, arr))
    p1 = Process(target=f1, args=(num, arr))
    
    p.start()
    p1.start()
    
    p.join()
    p1.join()

    print(num.value)
    print(arr[:])

0.0
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Patent PID, Current PID: 56143 87422
Patent PID, Current PID: 56143 87423
2.0
[5, 6, 7, 8, 9, 10, 11, 12, 13, 14]


#### Communicating between Parent & Child Process : Server Process

In [51]:
from multiprocessing import Process, Manager

def f(d, l):
    d[1] = '1'
    d['2'] = 2
    d[0.25] = None
    l.reverse()

if __name__ == '__main__':
    with Manager() as manager:
        d = manager.dict()
        l = manager.list(range(10))

        p = Process(target=f, args=(d, l))
        p.start()
        p.join()

        print(d)
        print(l)

{1: '1', '2': 2, 0.25: None}
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]


### multiprocessing.Pool Module

In [94]:
def f(x):
    print("Patent PID, Current PID: ", os.getppid(), os.getpid())
    time.sleep(3)
    return x*x

In [99]:
if __name__ == '__main__':
    with Pool(5) as p:
        result1 = p.apply(f, [1])
        print("Done-1")
        result2 = p.apply(f, [2])
        print("Done-2")
        result3 = p.apply(f, [3])
        print("Done-3")
        print(result1)
        print(result2)
        print(result3)

Patent PID, Current PID:  56143 95450
Patent PID, Current PID:  56143 95451
Done-1
Patent PID, Current PID:  56143 95452
Done-2
Done-3
1
4
9


In [100]:
if __name__ == '__main__':
    with Pool(5) as p:
        result1 = p.apply_async(f, [1])
        print("Done-1")
        result2 = p.apply_async(f, [2])
        print("Done-2")
        result3 = p.apply_async(f, [3])
        print("Done-3")
        print(result1.get(timeout=4))
        print(result2.get(timeout=4))
        print(result3.get(timeout=4))

Patent PID, Current PID:  56143 95504
Patent PID, Current PID:  56143 95505
Patent PID, Current PID:  56143 95506
Done-1
Done-2
Done-3
1
4
9


In [101]:
if __name__ == '__main__':
    with Pool(5) as p:
        print(p.map(f, [1, 2, 3]))
        print("Done-1")
        print(list(p.imap(f, [1, 2, 3])))
        print("Done-2")
        print(list(p.imap_unordered(f, [1, 2, 3])))
        print("Done-3")

Patent PID, Current PID:  56143 95643
Patent PID, Current PID:  56143 95641
Patent PID, Current PID:  56143 95642
Patent PID, Current PID:  56143 95642
Patent PID, Current PID:  56143 95645
Patent PID, Current PID:  56143 95644
[1, 4, 9]
Done-1
Patent PID, Current PID:  56143 95642
Patent PID, Current PID:  56143 95643
Patent PID, Current PID:  56143 95641
[1, 4, 9]
Done-2
[1, 4, 9]
Done-3


### Ref:
+ https://towardsdatascience.com/using-pythons-multiprocessing-module-to-evaluate-a-set-of-algorithms-efficiently-1412d29ff849
+ https://docs.python.org/3/library/multiprocessing.html