Parallel processing can increase the number of tasks done by your program which reduces the overall processing time.

- For parallelism, the main problem is divided into sub-problems where each su-problem is independent of each other.

- Ways to Handle parallel programs 
    1. Shared Memory: Sub-units share the same memory and no need to handle communication.
    2. Distributed Memory: Each process is totally seperate & has own memory.Communication needs to be handled explicitly.

**Threads** are one of the ways to achieve parallelism. 

**Global Interpreter lock** allows only one python instruction to be executed at a time.

GIL limitation can be completely avoided by using processes instead of thread. Using processes have few disadvantages such as less efficient inter-process communication than shared memory, but it is more flexible and explicit.

**Multiprocessing for parallel processing**

Using the standard **multiprocessing** module, we can efficiently parallelize simple tasks by creating child processes. This module provides an easy-to-use interface and contains a set of utilities to handle task submission and synchronization.



**Process and Pool Class**

**Process**

By subclassing ***multiprocessing.process***, you can create a process that runs independently.

In [1]:
import multiprocessing
import time

In [27]:
class OCRParallel(multiprocessing.Process):
  def __init__(self, id):
    super(OCRParallel, self).__init__()
    self.id = id

  def run(self):
    print('Invoked run()')
    time.sleep(1)
    print(f'Process ID is {self.id}')

In [28]:
task = OCRParallel(100)
task.start()
task.join()

Invoked run()
Process ID is 100


In [29]:
p = OCRParallel(0)
p.start()
p.join()
p = OCRParallel(1)
p.start()
p.join()

Invoked run()
Process ID is 0
Invoked run()
Process ID is 1


**Pool class**

Pool class can be used for parallel execution of a function for different input data. The multiprocessing.Pool() class spawns a set of processes called workers and can submit tasks using the methods ***apply/apply_async*** and ***map/map_async***. For parallel mapping, you should first initialize a multiprocessing.Pool() object. The first argument is the number of workers; if not given, that number will be equal to the number of cores in the system.

In [34]:
pool = multiprocessing.Pool()# creates processes equal to the number of cores
pool = multiprocessing.Pool(processes = 2)

In [35]:
def square(x):
  return x*x

In [44]:
res = pool.map(square, [2,3,4,6])
res_async = pool.map_async(square, [2,3,4,6])

In [47]:
res

[4, 9, 16, 36]

In [46]:
res_async.get()

[4, 9, 16, 36]

***Pool.apply_async*** assigns a task consisting of a single function to one of the workers. It takes the function and its arguments and returns an AsyncResult object.

In [52]:
res_async = [pool.apply_async(square,args= (i,) ) for i in range(100)]
res = [r.get() for r in res_async]
print(res)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400, 441, 484, 529, 576, 625, 676, 729, 784, 841, 900, 961, 1024, 1089, 1156, 1225, 1296, 1369, 1444, 1521, 1600, 1681, 1764, 1849, 1936, 2025, 2116, 2209, 2304, 2401, 2500, 2601, 2704, 2809, 2916, 3025, 3136, 3249, 3364, 3481, 3600, 3721, 3844, 3969, 4096, 4225, 4356, 4489, 4624, 4761, 4900, 5041, 5184, 5329, 5476, 5625, 5776, 5929, 6084, 6241, 6400, 6561, 6724, 6889, 7056, 7225, 7396, 7569, 7744, 7921, 8100, 8281, 8464, 8649, 8836, 9025, 9216, 9409, 9604, 9801]


In [53]:
99*99

9801