# Exercise sheet 11 - Parallelisation

# Exercise 1 - Rigged dice

Create a rigged dice function that 25% of the time returns the number 6. The rest of the time it returns the integers 1,2,3,4,5 uniformly.
Test your function, by calling it **one billion times** (10^9) and checking that 6 is returned in the range of 249-251 million (inclusive) times. You do not need to check that numbers 1 to 5 are returned uniformly or randomly, but you need to check that your function returns integers in the range 1-6 (inclusive). **Time** how long it takes to run the script.

Now attempt to **parallelise the task with a method of your own choosing** and time how long it takes once more. How does this compare to the previous *un-optimised* run?


In [None]:
import numpy as np
from numpy import random
import random
import time
import threading
import multiprocessing as mp

In [None]:
def rigged_dice():
    sum1 = 0
    sum2 = 0
    sum3 = 0
    sum4 = 0
    sum5 = 0
    sum6 = 0
    for i in range(10**6):      #generate dice
        numbers = random.choice([1, 2, 3, 4, 5, 6], p=[0.15, 0.15, 0.15, 0.15, 0.15, 0.25])
        if numbers == 1:
            sum1 += 1
        elif numbers == 2:
            sum2 += 1
        elif numbers == 3:
            sum3 += 1
        elif numbers == 4:
            sum4 += 1
        elif numbers == 5:
            sum5 += 1
        elif numbers == 6:
            sum6 += 1
    print(sum1)
    print(sum2)
    print(sum3)
    print(sum4)
    print(sum5)
    print(sum6)

#time how long it takes to run the script
execution_time = timeit.timeit(rigged_dice, number=1)
print(f"execution time: {execution_time:.2f} seconds")

In [None]:
def rigged_dice(task_id): #the task_id is kinda like a dummy argument
    sum1 = 0
    sum2 = 0
    sum3 = 0
    sum4 = 0
    sum5 = 0
    sum6 = 0
    for i in range(10**6 // 4): #divide the amount of iterations by 4, so that each process runs 1/4 of the iterations
        numbers = random.choice([1, 2, 3, 4, 5, 6], p=[0.15, 0.15, 0.15, 0.15, 0.15, 0.25])
        if numbers == 1:
            sum1 += 1
        elif numbers == 2:
            sum2 += 1
        elif numbers == 3:
            sum3 += 1
        elif numbers == 4:
            sum4 += 1
        elif numbers == 5:
            sum5 += 1
        elif numbers == 6:
            sum6 += 1
    return [sum1, sum2, sum3, sum4, sum5, sum6] #return the results as a list since i now have a function


def rigged_dice_parallel(rigged_dice_fn):
    with multiprocessing.Pool(4) as pool:  #creating a pool of 4 processes
        results = pool.map(rigged_dice_fn, range(4)) #mapping the function to the pool of processes
        #combining results from all processes
        final_counts = [sum(result[i] for result in results) for i in range(6)]
        print(final_counts)

#execution time of the parallel code
execution_time_parallel = timeit.timeit(lambda: rigged_dice_parallel(rigged_dice), number=1)
print(f"Execution time parallel: {execution_time_parallel:.2f} seconds")

# Exercise 2 - Calculate $\pi$

Using the **DSMC method**, calculate the value of **$\pi$**.


**Approach:**
In order to do this, create a 2-dimensional domain (defined by the coordinates $x_{min}, x_{max}, y_{min}, y_{max}$) and launch a number P of particles at random locations within. Check which particles lie inside a circle with radius $$ \frac{x_{max}-x_{min}}{2}, $$ where $x_{min}, x_{max}$ are the x-limits of your 2D domain. 

Get your value for $\pi$ by using the following formula:
$\pi = \frac{4 \cdot n_{inside}}{P},$ where $n_{inside}$ is the number of particles inside the circle and $P$ is the total number of particles.

Play around with the number of particles. 

**a)** Try to improve this task by making use of threading (you can use either the **_thread** or **threading** module). What are your findings, is the script running faster? 

**b)** Now try to improve the running time of the code by employing the **multiprocessing** module. Are there any differences as compared to threading?

In [1]:
def estimate_pi(P):
    inside = 0
    for _ in range(P):      #loop randomly chooses a point within square [-1, 1] [-1,1] 
        x = random.uniform(-1, 1)
        y = random.uniform(-1, 1)
        if x*x + y*y <= 1:
            inside += 1
    return 4 * inside / P   #formula given in assignment (circle area / square area = pi/4)

P = 10_000_000

start = time.time()         #time package counts time for me
pi_est = estimate_pi(P)     #value of pi estimate
end = time.time()

print(f"Pi estimate: {pi_est}")
print(f"Time: {end - start:.2f} s")



def thread(P, results, idx):
    inside = 0
    for _ in range(P):
        x = random.uniform(-1, 1)
        y = random.uniform(-1, 1)
        if x*x + y*y <= 1:
            inside += 1
    results[idx] = inside       #saves each thread in an index

def estimate_pi_threaded(P, n_threads=4):   #amount of threads 
    threads = []
    results = [0] * n_threads
    P_per_thread = P // n_threads

    for i in range(n_threads):
        t = threading.Thread(target=thread, args=(P_per_thread, results, i))
        threads.append(t)
        t.start()

    for t in threads:
        t.join()

    return 4 * sum(results) / P             #formula given in assignment adjusted to sum the results of x threads 


start = time.time()
pi_est = estimate_pi_threaded(P, n_threads=4)
end = time.time()

print(f"Pi estimate (threaded): {pi_est}")
print(f"Time: {end - start:.2f} s")



def mproc(P):
    inside = 0
    for _ in range(P):                      #loop randomly chooses a point within square [-1, 1] [-1,1] 
        x = random.uniform(-1, 1)
        y = random.uniform(-1, 1)
        if x*x + y*y <= 1:                  #check if point is within our square
            inside += 1
    return inside

def estimate_pi_multiprocessing(P, n_processes=4):
    P_per_proc = P // n_processes           #splits work by amount of processes defined above

    with mp.Pool(processes=n_processes) as pool:                    #creates as many 'workers' as there are procs
        results = pool.map(mproc, [P_per_proc] * n_processes)       #calls mproc(P) for each process and saves it

    return 4 * sum(results) / P             #formula given in assignment adjusted to sum the results of x processes 

if __name__ == "__main__":                  #code only runs when the file is executed directly
    P = 10_000_000

    start = time.time()                                         #time package counts time for me
    pi_est = estimate_pi_multiprocessing(P, n_processes=4)      #value of pi estimate
    end = time.time()                                           

    print(f"Pi estimate (multiprocessing): {pi_est}")
    print(f"Time: {end - start:.2f} s")

    #threaded averages about 3s
    #multiprocessing averages about 7s
    #multiprocessing is inefficient on mac? uses 'spawn' compared to 'fork' on linux
    #spawn re-imports all our modules which takes time
    #P and amount of processes might be too light as well

Pi estimate: 3.1415196
Time: 2.78 s
Pi estimate (threaded): 3.141432
Time: 2.64 s


Process SpawnPoolWorker-1:
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/opt/anaconda3/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/anaconda3/lib/python3.12/multiprocessing/pool.py", line 114, in worker
    task = get()
           ^^^^^
  File "/opt/anaconda3/lib/python3.12/multiprocessing/queues.py", line 389, in get
    return _ForkingPickler.loads(res)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't get attribute 'worker' on <module '__main__' (<class '_frozen_importlib.BuiltinImporter'>)>
Process SpawnPoolWorker-3:
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/opt/anaconda3/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/

KeyboardInterrupt: 

# Exercise 3 - Mandelbrot fractals

Read about the Mandelbrot set: https://en.wikipedia.org/wiki/Mandelbrot_set. 
This set is defined by repeatedly applying this recurrence:

$z_{n+1} = z_{n}^2 + c$,

which starts with $z_{0} = 0$ for a given complex number $c$ (each pixel corresponds to one $c$). The idea is to check if a particle "escapes" at a certain iteration. At each iteration, one checks if the sequence $z_0, z_1, ... z_n$ growing or stays bound. The growing condition: $|z_n| > 2$. If the condition ```if (z.real*z.real + z.imag*z.imag) > 4``` is fulfilled at any step, we mark the particle as "escaped". If the sequence stays bound forever (never uncontrollably grows for a given number of iterations, for instance, 300), then it is inside the the Mandelbrot set.

**(A)** Create a script which visualizes the Mandelbrot set. The X-axis is the real part of the complex number, the Y-axis is the imaginary part. You can use the colorscheme of your choice. Mark the particles which never escape as one color, and color the escaped particles based on how fast they escaped (that is, use the iteration at which they escaped for your colorbar). You should define the width and height of your image (for instance, 1000 and 700, but you can change it if you like), and 

**(B)** Parallelize your Mandelbrot function using the *multiprocessing* module. Experiment with different sizes of datachunks you give separate processors (you can split the data by column chunks or row chunks and process them separately in separate processes).

