# Exploring Python Multiprocessing

In this notebook we will explore parallel execution in Python. We first import the libraries we will need.

In [1]:
import multiprocessing as mp
import os
import random
from datetime import datetime

random.seed(12345)


### Computing $\pi$

The running example we will use is the computation of $\pi$ using monte-carlo simulation. To compute $\pi$, we need to sample points uniformly from $[-1, 1]^2$ and count how many points land within the unit circle. Then $$\pi = 4.0 * \frac{\text{points in circle}}{\text{total points}}$$

First implement a function `sample()` that generates a random sample in $[-1, 1]^2$ and returns whether it lands inside the unit circle.

In [2]:
def sample():
    pass
    # TODO

SyntaxError: incomplete input (2897461493.py, line 2)

### Serial Computation

First we will see how long it takes to perform the monte-carlo simulation serially. 

Write a function `sample_serial()` that takes `nsamples` as an argument and samples `nsamples` times and returns the number of samples that land in the unit circle. Then test it out for different values of `N`

In [5]:
# ------------------------------------------------
"""
TODO:
Implement sample_serial(nsamples) here.
"""


# ------------------------------------------------

N = int(2e7)
start = datetime.now()
hits = sample_serial(N)
pi = 4.0 * hits / N
print(f"Time: {datetime.now() - start}, pi: {pi:.9f}")

Time: 0:00:11.287341, pi: 3.141349200


#### Using `multiprocessing.Process`

We now look to use `multiprocessing.Process` to execute our sampling in parallel.

- Write a function `process_sample()` that executes `chunk_size` number of samples and places the result in a `multiprocessing.Queue` object.

- Split the number of samples into nprocesses number of chunks.

- Use `multiprocessing.Process` to spawn `nprocesses` number of processes that execute each `process_sample()` on chunk_size number of samples.

In [7]:

# ------------------------------------------------

# TODO: Implement process_sample(chunk_size, q)


# ------------------------------------------------

N = int(2e7)
hits = 0
nprocesses = 1 # TODO: modify.
chunk_size = 0 # TODO: modify.

start = datetime.now()
# ------------------------------------------------

# TODO: Use multiprocessing.Process to execute
#       process_sample() in parallel.


# ------------------------------------------------
pi = 4.0 * hits / N

print(f"Time: {datetime.now() - start}, pi: {pi:.9f}")

Time: 0:00:03.008394, pi: 3.141758400


##### Using `multiprocessing.Pool`

We now look to use `multiprocessing.Pool` and the `Pool.apply` method.

- Write a function `pool_sample()` that executes `nsamples` number of samples.

- Use `Pool.apply` to execute `pool_sample()` in parallel on chunk_size number of samples.

In [7]:

# ------------------------------------------------

# TODO: implement pool_sample()



# ------------------------------------------------

nsamples = int(2e7)
nprocesses = 1
chunk_size = 1 # TODO: modify.

start = datetime.now()
# ------------------------------------------------

# TODO: Use multiprocessing.Pool and Pool.apply to
#       execute pool_sample() in parallel.



# ------------------------------------------------

pi = 4.0 * hits / nsamples
print(f"Time: {datetime.now() - start}, pi: {pi:.9f}")


Time: 0:00:11.408697, pi: 3.141850200


Now run the above using `Pool.apply_async`. Was there a big time difference?

In [None]:
start = datetime.now()

# ------------------------------------------------

# TODO: Use multiprocessing.Pool and Pool.apply_async
#       to execute pool_sample() in parallel.


# ------------------------------------------------

pi = 4.0 * hits / nsamples
print(f"Time: {datetime.now() - start}, pi: {pi:.9f}")

We now turn to using `Pool.map`. Run `pool_sample()` in parallel using map. How should we construct the input argument in this case?

In [None]:
# Map

# ------------------------------------------------
args = [] # TODO: modify.

# ------------------------------------------------



start = datetime.now()
# ------------------------------------------------

# TODO: Use multiprocessing.Pool and Pool.map
#       to execute pool_sample() in parallel.

# ------------------------------------------------

pi = 4.0 * hits / nsamples
print(f"Time: {datetime.now() - start}, pi: {pi:.9f}")


Now run the above using `Pool.map_async`. Was there a big time difference?

In [12]:
start = datetime.now()
# ------------------------------------------------

# TODO: Use multiprocessing.Pool and Pool.map_async
#       to execute pool_sample() in parallel.

# ------------------------------------------------
pi = 4.0 * hits / nsamples
print(f"Time: {datetime.now() - start}, pi: {pi:.9f}")

Time: 0:00:03.556724, pi: 3.141332800
