### Scaling CPU bound tasks using four different strategies:

 1. A serial approach for a CPU bound task
 2. A multi-threaded approach for a CPU bound task
 3. A multiprocess approach for a CPU bound task
 4. A Ray distributed multiprocess appraoch for a CPU bound task

In [1]:
import concurrent.futures as mt
import multiprocessing as mp
import time
import ray
from ray.util.multiprocessing import Pool
from defs import get_cpu_count, is_prime

In [2]:
num = 2000000

### 1. A serial approach for a CPU bound task

In [3]:
# A CPU bound task
start = time.time()
prime_numbers = [n for n in range(num) if is_prime(n)]
end = time.time()
print(f"Serial access: Time elapsed: {end - start:4.2f} sec to compute all primes in {num} are {len(prime_numbers)} ")

Serial access: Time elapsed: 9.51 sec to compute all primes in 2000000 are 148935 


### 2. A multi-threaded approach for a CPU bound task

In [4]:
start = time.time()
with mt.ThreadPoolExecutor(get_cpu_count()) as executor:
    prime_numbers = executor.map(is_prime, list(range(num)))
end = time.time()
print(f"Multi Threaded access: Time elapsed: {end - start:4.2f} sec to compute all primes in {num} are {sum(list(prime_numbers))}")

Multi Threaded access: Time elapsed: 61.41 sec to compute all primes in 2000000 are 148935


### 3. A multiprocess approach for a CPU bound task
**Note**: Multiprocessing pool does not work in [IPython environments](https://stackoverflow.com/questions/41385708/multiprocessing-example-giving-attributeerror/42383397#42383397https://stackoverflow.com/questions/41385708/multiprocessing-example-giving-attributeerror/42383397#42383397). You have to define 
the function in an external file.

In [5]:
# Let's try multiprocess for each core, without being hindered by the GIL
# Since this is CPU I/O bound task, we should get better performance
# the serial and threading
#
start = time.time()
mp_pool = mp.Pool(get_cpu_count())
with mp_pool as p:
    prime_numbers = p.map(is_prime, list(range(num)))
end = time.time()
mp_pool.terminate()
print(f"Multi Process access: Time elapsed: {end - start:4.2f} sec to compute all primes in {num} are {sum(list(prime_numbers))}")

Multi Process access: Time elapsed: 2.03 sec to compute all primes in 2000000 are 148935


### 4. A Ray distributed multiprocess appraoch for a CPU bound task

**Note**: The Ray distributed version has the trade-off of increased overhead, although now it can scale-out horizontally across a cluster. The benefits would be more pronounced and amortorized over time with a more computationally expensive calculation.

In [6]:
# Let's try that with Ray multiprocessing pool
ray.init()
ray_pool = Pool(get_cpu_count())
lst = list(range(num))
results = []
start = time.time()
for result in ray_pool.map(is_prime, lst):
    results.append(result)
end = time.time()
ray_pool.terminate()
print(f"Ray Distributed Multi Process access: Time elapsed: {end - start:4.2f} sec to compute all primes in {num} are {sum(results)}")

2022-05-20 15:05:07,599	INFO services.py:1456 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m


Ray Distributed Multi Process access: Time elapsed: 5.59 sec to compute all primes in 2000000 are 148935


In [7]:
ray.shutdown()