### Monte Carlo Simulation
One of the classic performance measurements for compute engines is the calculation of pi through Monte Carlo simulation. In this example:
1. You will set up an ipyparallel cluster with maximum of 8 cores. 
2. Test if all cores in the cluster respond.
3. Run Monte Carlo simulation through Python
4. Run Monte Carlo simulation through Bodo

At the end compare the Execution times. Bodo runs a bit faster than 8x. This is the power of bodo's compiler along with MPI and SPMD as part of the core compute engine. 


### Spin Up a Cluster Locally

In [1]:
import ipyparallel as ipp
import psutil

n = min(psutil.cpu_count(logical=False), 8)
rc = ipp.Cluster(engines='mpi', n=n).start_and_connect_sync(activate=True)

Starting 8 engines with <class 'ipyparallel.cluster.launcher.MPIEngineSetLauncher'>
100%|██████████| 8/8 [00:05<00:00,  1.38engine/s]


### Run with Python

In [2]:
import time
import numpy as np

def calc_pi(number_of_samples):
    t1 = time.time()
    xx = 2 * np.random.ranf(number_of_samples) - 1
    y = 2 * np.random.ranf(number_of_samples) - 1
    pi = 4 * np.sum(xx ** 2 + y ** 2 < 1) / number_of_samples
    print("Execution time:", time.time() - t1, "\n result:", pi)

calc_pi(100_000_000)  

Execution time: 4.3202431201934814 
 result: 3.14173872


### Run with Bodo in Parallel

To run this code with bodo, we need to add the @bodo.jit decorator on top of the same function. The argument of cache=True caches the compiled binary such that next time you run a code, there is no need to compile as long as the code text stays the same. 
For interactive applications like Jupyter notebook, we use the magic word %%px (parallel execution) on top of the code cell to send the code to the ipyparallel cluster. 

In [3]:
%%px

import time
import numpy as np
import bodo

@bodo.jit(cache=True)
def calc_pi(number_of_samples):
    t1 = time.time()
    xx = 2 * np.random.ranf(number_of_samples) - 1
    y = 2 * np.random.ranf(number_of_samples) - 1
    pi = 4 * np.sum(xx ** 2 + y ** 2 < 1) / number_of_samples
    print("Execution time:", time.time() - t1, "\n result:", pi)

calc_pi(100_000_000)    

%px:   0%|          | 0/8 [00:04<?, ?tasks/s]

[stdout:0] Execution time: 0.12795805931091309 
 result: 3.141279


%px: 100%|██████████| 8/8 [00:04<00:00,  1.85tasks/s]


### Scale Up Easily
With this amazing speed up, you can handle much larger data. Let's increase our simulation size by 100x. If run this with python, we would expect 100 times longer runtime as we saw with python (e.g., 400 sec). But run this code cell below and see it will run for about 14 sec. 

In [4]:
%%px

import time
import numpy as np
import bodo

@bodo.jit(cache=True)
def calc_pi(number_of_samples):
    t1 = time.time()
    xx = 2 * np.random.ranf(number_of_samples) - 1
    y = 2 * np.random.ranf(number_of_samples) - 1
    pi = 4 * np.sum(xx ** 2 + y ** 2 < 1) / number_of_samples
    print("Execution time:", time.time() - t1, "\n result:", pi)

calc_pi(100 * 100_000_000)  

%px:   0%|          | 0/8 [00:12<?, ?tasks/s]

[stdout:0] Execution time: 13.655680894851685 
 result: 3.1415909328


%px: 100%|██████████| 8/8 [00:12<00:00,  1.52s/tasks]
