# A Guided Tour of Ray Core: Multiprocessing Pool

© 2019-2022, Anyscale. All Rights Reserved

### Scaling CPU bound tasks using four different strategies:

 1. A serial approach for a CPU bound task
 2. A multi-threaded approach for a CPU bound task
 3. A multiprocess approach for a CPU bound task
 4. A Ray distributed multiprocess appraoch for a CPU bound task
 
### Learning objectives:
 * Understand various strategies for scaling CPU bound tasks
 * Undertand the pros and cons of each

In [46]:
import concurrent.futures as mt
import multiprocessing as mp
import time
import ray
from ray.util.multiprocessing import Pool
from defs import get_cpu_count, is_prime

In [47]:
num = 2000000

### 1. A serial approach for a CPU bound task

In [48]:
start = time.time()
prime_numbers = [n for n in range(num) if is_prime(n)]
end = time.time()
print(f"Serial access: Time elapsed: {end - start:4.2f} sec to compute all primes in {num} are {len(prime_numbers)} ")

Serial access: Time elapsed: 6.21 sec to compute all primes in 2000000 are 148935 


### 2. A multi-threaded approach for a CPU bound task

Python comes with the Global Interpreter Lock (GIL). While you can spawn as many threas, Python the GIL ensures that only one of those threads will ever be executing at any given time. For a CPU bound process, you'll only have a single thread execute, meaning only a single thread will execute computing a prime number.

**Note**: The duration is longer than serial approach.

In [49]:
cpu_count = get_cpu_count()
cpu_count

10

In [50]:
start = time.time()
with mt.ThreadPoolExecutor(cpu_count) as executor:
    prime_numbers = executor.map(is_prime, list(range(num)))
end = time.time()
print(f"Multi Threaded access: Time elapsed: {end - start:4.2f} sec to compute all primes in {num} are {sum(list(prime_numbers))}")

Multi Threaded access: Time elapsed: 25.01 sec to compute all primes in 2000000 are 148935


### 3. A multiprocess approach for a CPU bound task

In a multipleprocess setting, you are not encumbered by the GIL, as each process runs on a seperate core. This ought to be much faster than the above two strategies.

**Note**: Multiprocessing pool does not work in [IPython environments](https://stackoverflow.com/questions/41385708/multiprocessing-example-giving-attributeerror/42383397#42383397https://stackoverflow.com/questions/41385708/multiprocessing-example-giving-attributeerror/42383397#42383397). You have to define 
the function in an external file.

In [51]:
# Let's try multiprocess for each core, without being hindered by the GIL
# Since this is CPU I/O bound task, we should get better performance
# the serial and threading
#
start = time.time()
mp_pool = mp.Pool(cpu_count)
with mp_pool as p:
    prime_numbers = p.map(is_prime, list(range(num)))
end = time.time()
mp_pool.terminate()
print(f"Multi Process access: Time elapsed: {end - start:4.2f} sec to compute all primes in {num} are {sum(list(prime_numbers))}")

Multi Process access: Time elapsed: 0.97 sec to compute all primes in 2000000 are 148935


### 4. A Ray distributed multiprocess approach for a CPU bound task

This is most beneficial when your task is compute-intensive and you wish to scale horizontally across a large number of clusters. There lies the power and merit of using Ray replacement for `multiprocessing.pool`

**Note**: The Ray distributed version has the trade-off of an initial increased overhead, albeit now it can scale-out horizontally across a cluster. The benefits would be more pronounced and amortized over time with a more computationally expensive calculation.

In [52]:
# Let's try that with Ray multiprocessing pool
ray.init()
ray_pool = Pool(cpu_count)
lst = list(range(num))
results = []
start = time.time()
for result in ray_pool.map(is_prime, lst):
    results.append(result)
end = time.time()
ray_pool.terminate()
print(f"Ray Distributed Multi Process access: Time elapsed: {end - start:4.2f} sec to compute all primes in {num} are {sum(results)}")

2022-07-26 20:33:00,576	INFO services.py:1470 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8267[39m[22m


Ray Distributed Multi Process access: Time elapsed: 3.79 sec to compute all primes in 2000000 are 148935


In [53]:
ray.shutdown()

### References

 * [How use python for multi-threading and multi-processing applications](https://medium.com/towards-artificial-intelligence/the-why-when-and-how-of-using-python-multi-threading-and-multi-processing-afd1b8a8ecca)