In [1]:
import numpy as np
import pandas as pd
from datetime import datetime
import time
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

from utils import square, random_number, primeCheck

## Get the number of CPU cores available on your system
Good to check that we are logged in to the server.


In [2]:
with ThreadPoolExecutor(max_workers=None) as executor:
    num_cores = executor._max_workers
print(f"Number of CPU cores: {num_cores}")

Number of CPU cores: 14


In the following function, `multiprocessing` is used to parallelize the execution of a given function `func` over a sequence of arguments `args`. The function will utilize multiple worker processes, up to the number specified by `workers`. The `ProcessPoolExecutor` from the `concurrent.futures` module provides a pool of worker processes, which executes calls asynchronously.

Results from the workers are aggregated into a list and returned once all operations have completed.

We’ll use the function `primeCheck` to brute force whether a number is prime or not.

Note: put the function you're working with in a separate Python file and then import that function from within your notebook

In [3]:
def multiprocessing(func, args, workers):
    with ProcessPoolExecutor(workers) as ex:
        res = ex.map(func, args)
    return list(res)

def compute_intensive(num_workers):
    time_init = time.time()
    print(datetime.now())
    output  = multiprocessing(primeCheck, numbers, num_workers)
    time_end = time.time()
    print(datetime.now())
    print(f'Multiprocessing with {num_workers} processes took {(time_end - time_init):.4f}s.')
    return output

In [4]:
numbers = [2, 7, 13, 28, 99991, 188877, 1616161, 4441939, 90870847,
           92525533, 94939291, 98776551, 99999999, 100030001]

num_workers = 6

### Sequential example

In [5]:
num_workers = 1
if __name__ == '__main__':
    out = compute_intensive(num_workers)
    data_frame = pd.DataFrame(out)

2024-04-12 13:53:39.156680
2024-04-12 13:54:02.149227
Multiprocessing with 1 processes took 22.9925s.


### Sequential example

In [6]:
num_workers = 14

if __name__ == '__main__':
    out = compute_intensive(num_workers)
    data_frame = pd.DataFrame(out)

2024-04-12 13:54:02.157429
2024-04-12 13:54:09.887034
Multiprocessing with 14 processes took 7.7296s.


### To maximize resources, double check th Amdahl law
Amdahl's Law states that the speedup of a parallelized program is limited by the fraction of the program that must be executed sequentially.