# Optimisation: Parallelism

Modern computers often contain several cores and specialised High Performance Computing resources such as Imperial's HPC clusters are designed to enable rapid computations through the use of many cores simultaneously. Code written in this way is referred to a "parallel" code.

Only some problems are amenable to parallel computing and the degree of speed-up depends on the nubmer of cores available. In addition, writing parallel programs can be complex and require a lot of skill and knowledge. However, the gains to be made here are very large and so this learning process can be worth the time required.

This notebook does not intend to give you a working knowledge of how to implement parallel algorithms in Python but instead to give an example to demonstrate the power of the method.

## Monte Carlo Calculation of $\pi$

Monte Carlo calcualtions rely on performing the same procedure repesteadly using one or more random numbers to cause the repetitions to produce different results. The ensemble of results can then be analysed to calculate a value of interest.

A classic example of this is finding the value of $\pi$. This is done by randomly generating pairs of values for $x$ and $y$ coordiinates between 0 and 1. By finding their distance from the origin it can be found if each point is inside a circle with radius 1. For each point inside the circle, 1 is added to a tally. When this tally is multiplied by 4 and then divided by the total number of points, an estimate of $\pi$ can be found. The more points which are sampled the more accurate the estimate. A video desribing this algorithm can be found [here](https://www.youtube.com/watch?v=ELetCV_wX_c).

In the first code cell below we implement this algorithm in serial (non-parallel) Python. in the second cell, we use the [joblib](https://joblib.readthedocs.io/en/latest/parallel.html) package to implement a parallel implementation.

In [1]:
!pip install line_profiler

%load_ext line_profiler
import random

def calculate_pi(n):
  #Calcualte pi by smapling n points in 2D space and seeing if they fall inside a circle with radius 1
  tally=0
  for i in range(n):
    # Loop over n points, creating values of x and y for each
    x = random.random()
    y = random.random()
    if x ** 2 + y ** 2 < 1:
      # If the sum of the squares of x and y are less than 1, the point is within a circle with radius 1, so increasing tally
      tally = tally + 1

  # Our estimate of pi is equal to 4*tally/n
  return (4 * tally / n)

%lprun -f calculate_pi print(calculate_pi(1000000))

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting line_profiler
  Downloading line_profiler-3.5.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (67 kB)
[K     |████████████████████████████████| 67 kB 1.9 MB/s 
[?25hInstalling collected packages: line-profiler
Successfully installed line-profiler-3.5.1
3.141964


In [2]:
!pip install line_profiler

%load_ext line_profiler
from joblib import Parallel, delayed
import multiprocessing

def get_tally(n):
  # This function loops over n points and counts how many of them fall within the circle
  tally=0
  for i in range(n):
    x = random.random()
    y = random.random()
    if x ** 2 + y ** 2 < 1:
      tally = tally + 1

  return(tally)

def calculate_pi(n):
  #Calcualte pi by sampling n points in 2D space and seeing if they fall inside a circle with radius 1
  # First, find the number of cores available
  n_core = multiprocessing.cpu_count()

  # Ask for a tally from each core
  # Ask each core to sample n/n_core points so n points will be sampled in total
  tallies = Parallel(n_jobs=n_core)(delayed(get_tally)(n//n_core) for i in range(n_core))

  # We can get the total tally by adding the tally from each core 
  # Our estimate of pi is equal to 4*tally/n
  return(4 * sum(tallies) / n)

%lprun -f calculate_pi print(calculate_pi(1000000))
print("This calcualtion performed on ", multiprocessing.cpu_count(), "cores")

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
The line_profiler extension is already loaded. To reload it, use:
  %reload_ext line_profiler
3.142352
This calcualtion performed on  2 cores


The parallel result is significantly faster in the parallel case. When the number of cores is scaled up, such as when the code is deployed to HPC resources. This example was fairly simple as each Monte Carlo repetitions was entirely independent. However, some applciations can be a lot more complex.