# Exercise sheet 11 - Parallelisation

# Exercise 1 - Rigged dice

Create a rigged dice function that 25% of the time returns the number 6. The rest of the time it returns the integers 1,2,3,4,5 uniformly.
Test your function, by calling it **one billion times** (10^9) and checking that 6 is returned in the range of 249-251 million (inclusive) times. You do not need to check that numbers 1 to 5 are returned uniformly or randomly, but you need to check that your function returns integers in the range 1-6 (inclusive). **Time** how long it takes to run the script.


In [1]:
import random
import time

samples= 10**7  # REDUCED! 10^9 would need about 4 hours on this old hardware


def riggedDice():
  if random.random() < 0.25:
    return 6
  else:
    return random.randint( 1, 5 )


# test riggedDice() sequential
count= [None,0,0,0,0,0,0]
t0= time.time()
for _ in range( samples ):
  count[ riggedDice() ]+= 1
duration= time.time() - t0

# print results and time
print( f'{samples} samples:\n' )
for n in range( 1, 7 ):
  percent= ( count[n] / samples ) * 100
  print( f'{n}: {count[n]} {percent:.3f}%' )

print( f'\nduration: {duration:.3f}s' )

10000000 samples:

1: 1500398 15.004%
2: 1500728 15.007%
3: 1500394 15.004%
4: 1502322 15.023%
5: 1497525 14.975%
6: 2498633 24.986%

duration: 15.257s



Now attempt to **parallelise the task with a method of your own choosing** and time how long it takes once more. How does this compare to the previous *un-optimised* run?


In [None]:
#! /usr/bin/python3

import random
import time
import multiprocessing as mp

samples= 10**7  # REDUCED! 10^9 would need about 2 hours on this old hardware

def riggedDice():
  if random.random() < 0.25:
    return 6
  else:
    return random.randint( 1, 5 )


def myTestProcess( n ):
  count= [n,0,0,0,0,0,0]
  for _ in range( n ):
    count[ riggedDice() ]+= 1
  return count


if __name__ == "__main__":
  mp.set_start_method("spawn", force=True)

  cores= mp.cpu_count()
  jobsPerCore= samples // cores
  remaining= samples % cores
  joblist= [jobsPerCore] * cores
  joblist[0]+= remaining

  # test riggedDice() in parallel
  t0= time.time()
  with mp.Pool(cores) as pool:
    result= pool.map( myTestProcess, joblist )
  duration= time.time() - t0

  count= [0] * 7
  for cpu in range( cores ):
    for throw in range( 1, 7 ):
      count[throw]+= result[cpu][throw]

  # print results and time
  print( f'{samples} samples:\n' )

  for n in range( 1, 7 ):
    percent= ( count[n] / samples ) * 100
    print( f'{n}: {count[n]} {percent:.3f}%' )

  print( f'\nduration: {duration:.3f}s' )



'''
thomas@thomas-laptop:~/Uni/NUM/NUM/UE11$ ./ue11ex01.py
10000000 samples:

1: 1500389 15.004%
2: 1501278 15.013%
3: 1499429 14.994%
4: 1499684 14.997%
5: 1499978 15.000%
6: 2499242 24.992%

duration: 8.448s

( about half the execution time, compared to single tasking )
'''

# Exercise 2 - Calculate $\pi$

Using the **DSMC method**, calculate the value of **$\pi$**.


**Approach:**
In order to do this, create a 2-dimensional domain (defined by the coordinates $x_{min}, x_{max}, y_{min}, y_{max}$) and launch a number P of particles at random locations within. Check which particles lie inside a circle with radius $$ \frac{x_{max}-x_{min}}{2}, $$ where $x_{min}, x_{max}$ are the x-limits of your 2D domain. 

Get your value for $\pi$ by using the following formula:
$\pi = \frac{4 \cdot n_{inside}}{P},$ where $n_{inside}$ is the number of particles inside the circle and $P$ is the total number of particles.

Play around with the number of particles. 


In [27]:
import numpy as np
import time
import math

xmin, xmax= 0.0, 1.0
ymin, ymax= 0.0, 1.0

centerX= ( xmax - xmin ) / 2
centerY= ( ymax - ymin ) / 2
R=       ( xmax - xmin ) / 2

np.random.seed( 8325876 )


def myPi( P ):
  x= np.random.uniform( xmin, xmax, P )
  y= np.random.uniform( ymin, ymax, P )

  inside= (x - centerX)**2 + (y - centerY)**2 <= R**2

  return ( 4.0 * inside.sum() ) / P



for P in [ 10**5, 10**6, 10**7, 10**8 ]:
  t0= time.time()
  pi_est= myPi( P )
  duration= time.time() - t0
  err= abs( math.pi - pi_est )
  print( f'P={P} ⇒ π~{pi_est:.7f}, Δ={err:.7f}, time= {duration:.3f}s' )

print( 'done')

#EOF

P=100000 ⇒ π~3.1388000, Δ=0.0027927, time= 0.009s
P=1000000 ⇒ π~3.1420520, Δ=0.0004593, time= 0.095s
P=10000000 ⇒ π~3.1418088, Δ=0.0002161, time= 1.001s
P=100000000 ⇒ π~3.1414857, Δ=0.0001070, time= 6.652s
done



**a)** Try to improve this task by making use of threading (you can use either the **_thread** or **threading** module). What are your findings, is the script running faster? 


In [30]:
import numpy as np
import time
import math
import threading

xmin, xmax= 0.0, 1.0
ymin, ymax= 0.0, 1.0

centerX= ( xmax - xmin ) / 2
centerY= ( ymax - ymin ) / 2
R=       ( xmax - xmin ) / 2

np.random.seed( 8325876 )


def myPi( P, retval, i ):
  x= np.random.uniform( xmin, xmax, P )
  y= np.random.uniform( ymin, ymax, P )

  inside= (x - centerX)**2 + (y - centerY)**2 <= R**2

  retval[i]= inside.sum()


def myPiThread( P ):
  perThread= 10**4
  numThreads= P // perThread
  threads= []
  retval= np.zeros( numThreads )
  for i in range( numThreads ):
    threads.append( threading.Thread( target=myPi, args=(perThread,retval,i,) ) )

  # Start each thread
  for t in threads:
    t.start()

  # Wait for all threads to finish
  for t in threads:
    t.join()

  return ( 4.0 * retval.sum() ) / P


for P in [ 10**5, 10**6, 10**7, 10**8 ]:
  t0= time.time()
  pi_est= myPiThread( P )
  duration= time.time() - t0
  err= abs( math.pi - pi_est )
  print( f'P={P} ⇒ π~{pi_est:.7f}, Δ={err:.7f}, time= {duration:.3f}s' )

print( 'done')

#EOF

P=100000 ⇒ π~3.1456400, Δ=0.0040473, time= 0.322s
P=1000000 ⇒ π~3.1405720, Δ=0.0010207, time= 0.374s
P=10000000 ⇒ π~3.1422716, Δ=0.0006789, time= 1.505s
P=100000000 ⇒ π~3.1414305, Δ=0.0001621, time= 16.080s
done



**b)** Now try to improve the running time of the code by employing the **multiprocessing** module. Are there any differences as compared to threading?

In [None]:
#! /usr/bin/python3

import numpy as np
import matplotlib.pyplot as plt
import time
import math
import multiprocessing as mp

xmin, xmax= 0.0, 1.0
ymin, ymax= 0.0, 1.0

centerX= ( xmax - xmin ) / 2
centerY= ( ymax - ymin ) / 2
R=       ( xmax - xmin ) / 2

def myPi( P ):
  x= np.random.uniform( xmin, xmax, P )
  y= np.random.uniform( ymin, ymax, P )

  inside= (x - centerX)**2 + (y - centerY)**2 <= R**2

  return inside.sum()


if __name__ == "__main__":
  mp.set_start_method("spawn", force=True)

  np.random.seed( 8325876 )

  for P in [ 10**5, 10**6, 10**7, 10**8 ]:
    cores= mp.cpu_count()
    jobsPerCore= P // cores
    remaining= P % cores
    joblist= [jobsPerCore] * cores
    joblist[0]+= remaining

    t0= time.time()
    with mp.Pool(cores) as pool:
      result= pool.map( myPi, joblist )
    duration= time.time() - t0
    
    inside= 0
    for cpu in range( cores ):
      inside+= result[cpu]

    pi_est= ( 4.0 * inside ) / P
    err= abs( math.pi - pi_est )
    print( f'P={P} ⇒ π~{pi_est:.7f}, Δ={err:.7f}, time= {duration:.3f}s' )

  print( 'done' )

#EOF


'''
thomas@thomas-laptop:~/Uni/NUM/NUM/UE11$ ./ue11ex02.py
P=100000 ⇒ π~3.1428000, Δ=0.0012073, time= 2.130s
P=1000000 ⇒ π~3.1387720, Δ=0.0028207, time= 2.196s
P=10000000 ⇒ π~3.1416300, Δ=0.0000373, time= 3.084s
P=100000000 ⇒ π~3.1413841, Δ=0.0002085, time= 5.623s
done

( about a third the execution time, compared to single tasking )

'''

# Exercise 3 - Mandelbrot fractals

Read about the Mandelbrot set: https://en.wikipedia.org/wiki/Mandelbrot_set. 
This set is defined by repeatedly applying this recurrence:

$z_{n+1} = z_{n}^2 + c$,

which starts with $z_{0} = 0$ for a given complex number $c$ (each pixel corresponds to one $c$). The idea is to check if a particle "escapes" at a certain iteration. At each iteration, one checks if the sequence $z_0, z_1, ... z_n$ growing or stays bound. The growing condition: $|z_n| > 2$. If the condition ```if (z.real*z.real + z.imag*z.imag) > 4``` is fulfilled at any step, we mark the particle as "escaped". If the sequence stays bound forever (never uncontrollably grows for a given number of iterations, for instance, 300), then it is inside the the Mandelbrot set.

**(A)** Create a script which visualizes the Mandelbrot set. The X-axis is the real part of the complex number, the Y-axis is the imaginary part. You can use the colorscheme of your choice. Mark the particles which never escape as one color, and color the escaped particles based on how fast they escaped (that is, use the iteration at which they escaped for your colorbar). You should define the width and height of your image (for instance, 1000 and 700, but you can change it if you like), and 

**(B)** Parallelize your Mandelbrot function using the *multiprocessing* module. Experiment with different sizes of datachunks you give separate processors (you can split the data by column chunks or row chunks and process them separately in separate processes).

