# Multiprocessing package in Python

Multiprocessing is a package in Python for creating and managing processes. It is a part of the Python standard library and is used for parallel processing. Find documentation [here](https://docs.python.org/3/library/multiprocessing.html).

The below code cells demonstrate how to use the `multiprocessing` package in Python with a very simple example.

In [None]:
from multiprocessing import Process
import time
from random import random

In [None]:
def task(arg):
    # generate a random value between 0 and 1
    value = random()
    # pause the process for a fraction of a second
    time.sleep(value)
    # report a message
    print(f'.done {arg}, generated {value}', flush=True)

In [None]:
# Run the task a number of times sequentially using a for loop
if __name__ == '__main__':
    # run tasks sequentially
    for i in range(20):
        task(i)
    print('Done', flush=True)

In [None]:
# Run the task a number of times in parallel using multiprocessing
if __name__ == '__main__':
    # create all tasks
    processes = [Process(target=task, args=(i,)) for i in range(20)]
    # start all processes
    for process in processes:
        process.start()
    # wait for all processes to complete
    for process in processes:
        process.join()
    # report that all tasks are completed
    print('Done', flush=True)

The above code is not ideal because it might start more processes than the number of cores available on the machine. We can improve this by creating batches of processes and running these batches in parallel: 

In [None]:
# Run the task a number of times in batches using multiprocessing
if __name__ == '__main__':
    # define batch size
    batch_size = 8
    # execute in batches
    for i in range(0, 20, batch_size):
        # execute all tasks in a batch
        processes = [Process(target=task, args=(j,)) for j in range(i, i+batch_size)]
        # start all processes
        for process in processes:
            process.start()
        # wait for all processes to complete
        for process in processes:
            process.join()
    # report that all tasks are completed
    print('Done', flush=True)

Another (and probably cleaner) way to achieve this is to use a Pool of workers. The Pool class provides a way to distribute tasks across multiple processes. The Pool class has a map method that can be used to apply a function to a list of arguments. The map method blocks until all tasks are completed. If this sounds interesting to you, you can read more about it [here](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool), and try to implement it yourself below.

Now, let's see if we can do the same with a function that actually does something useful. The function below estimate pi using the Monte Carlo method. The idea is to generate random points in a square and count how many of them fall inside a circle. The ratio of the number of points inside the circle to the total number of points is an estimate of pi/4. The more points we generate, the better the estimate.

In [None]:
def calc_pi(N):
    M = 0
    for i in range(N):
        # Simulate impact coordinates
        x = np.random.uniform(-1, 1)
        y = np.random.uniform(-1, 1)

        # True if impact happens inside the circle
        if x**2 + y**2 < 1.0:
            M += 1
    return (4 * M / N, N)  # result, iterations

Try to parallelize the code using the Pool class from the multiprocessing package. And compare the time it takes to run the code with and without parallelization. Try to measure the speedup you get from parallelization. It might be that you run into issues when trying to use multiprocessing in a Jupyter notebook. If that is the case, you can try to run the code in a Python script instead.

Optional: a more detailed explanation and examples of this problem can be found [here](https://carpentries-incubator.github.io/lesson-parallel-python/04b-threads-and-processes/index.html).