# Intoduction to Monte Carlo integration

In the last notebook, we learnt about cross-sections and how they are related to what we see at a particle collider.

How are these cross-sections calculated? Well they often involve complicated integrals, and in this notebook we will learn about one of the main methods for calculating these integrals -- Monte Carlo integration.

## Monte Carlo integration

Integrals of simple functions can be calculated **analytically**, e.g.:

$$\int_a^b x \, \mathrm{d}x = \left [\frac{x^2}{2}\right ]_a^b = \frac{b^2-a^2}{2},$$

The integrals we evaluate are much more complex, and are often impossible to evaluate analytically $\Rightarrow$ we have to evaluate **numerically**.

One such method is the **Monte Carlo** method, which is suited to our complicated integrals.

This method is simple - we sample points **randomly** in the region required and approximate the integral:

$$\int_\mathrm{region} f(x) \mathrm{d}x \approx V \times \frac{1}{N} \sum_{i=1}^N f(x_i),$$

where $N$ points are sampled, and $V$ is the size of the integration region:

$$V = \int_\mathrm{region} \mathrm{d}x.$$

___
#### Exercise 2a: Monte Carlo integration 
Produce a Monte Carlo approximation for the integral of $f(x)=x$ in the region $0 \leq x \leq 1$, which takes the number of points $N$ as an argument.

In [None]:
# We will need some external libraries
# The numpy - numerical python - library
# contains many useful functions
import numpy as np
from numpy import random

# Define a function to integrate x between
# 0 and 1 with Monte Carlo for sampling 'n_points'

def integrate_x_0_1(n_points):
    
    # Your code here

    # Hints:
    # The numpy function 'random.uniform(a,b)'
    # will generate a uniformly sampled random
    # number between a and b
    
    # Google 'numpy documentation' for help
    # with numpy-specific problems!
    
    # You may need a 'for' loop or a 'while' loop
    # these allow you to execute code for multiple
    # values.
    
    return # Replace comment with return value

Evaluate your function for $N=10$, $N=100$, and $N=1000$, what do you notice?

In [None]:
# Use the function you defined earlier to
# integrate the function for the specified N
# values

# What should the answer be?

#### Exercise 2b: Extending the function

Can you modify your solution to 2a, allowing the end-points as arguments (such that you can integrate for any region)?

In [None]:
# Define a function to integrate x between
# a and b with Monte Carlo for sampling 'n_points'

def integrate_x_a_b(n_points, a, b):
    # Your code here
    return # Replace comment with return value

___
### Approximating $\pi$ with Monte Carlo

With the Monte Carlo (MC) method, a fun exercise is to approximate $\pi$!

If we construct a circle with radius $r$, and house it in a square of side-length $2r$, then the ratio of the area of the circle to the square is:

$$\frac{\text{area of circle}}{\text{area of square}} = \frac{\pi r^2}{4r^2} = \frac{\pi}{4}.$$

**Trivia**: This result is independent of the radius!

Then, if we sample $N$ points in the square at random, we can expect the following:

$$\frac{\text{area of circle}}{\text{area of square}} = \frac{\pi}{4} \approx \frac{\text{number of points in the circle}}{\text{total number of points sampled ($N$)}}.$$

Below is an example plot ([link](https://www.geeksforgeeks.org/estimating-value-pi-using-monte-carlo/)) for such a procedure:

<img src="MonteCarlo.png" alt="Monte Carlo approximating pi" width="300"/>

This means we can write:

$$\pi \approx 4\times \frac{\text{number of points in the circle}}{\text{total number of points sampled ($N$)}}$$

___
#### Exercise 2c: Approximating $\pi$

Produce a Monte Carlo approximation for $\pi$ as outlined above, taking the total number of points $N$ as argument

In [None]:
# We will need the numpy libraries again
import numpy as np
from numpy import random

# Define a function to approximate pi
# with Monte Carlo for sampling 'n_points'

def approximate_pi(n_points):
    
    # Your code here

    # Hints:
    # You can use a 'for' or 'while' loop as
    # for the previous exercise.
    
    # You may need an 'if' statement, this will
    # evaluate code satisfying a true/false
    # condition
    
    return # Replace comment with return value

Evaluate your function for $N=10$, $N=100$, and $N=1000$, what do you notice?

In [None]:
# Use the function you defined earlier to
# integrate the function for the specified N
# values

# What do you notice about the convergence?

___

## Plotting results

As important as being able to produce sophisticated calculations is presenting your findings in a **comprehensible** way: **plotting** theoretical predictions and data in a clear way helps greatly in understanding!

Example below is a distribution showing two Monte Carlo theoretical predictions and experimental data from [(Andersen et al., 2022)](https://link.springer.com/article/10.1007/JHEP03(2023)001), for Higgs production as analysed at the LHC [(ATLAS Collaboration, 2014)](https://link.springer.com/article/10.1007/JHEP09(2014)112):

<img src="example-distribution.png" alt="Example distribution for Higgs production" width="600"/>

While not a strict checklist, consider whether a good plot satisfies the following:
- Clear axis labels, with **units** shown.
- Sensible and accessible choices of colour scheme.
- Clearly highlighted **uncertainties**: in the figure above, theoretical uncertainties (on the theoretical prediction) are shown as shaded bands, statistical uncertainties (on the data) are shown as error bars.
- A robust legend that clearly describes the analysis.
- The **bin edges** are clearly defined and visible.
- Suitable $x$ and $y$ ranges to see the full behaviour of the distributions.
- Is not cluttered with too many lines/features (if this is the case, consider if you can make one plot into two).

Do you think the plot above satisfies all of these?

Python has several useful libraries for plotting, we will use `matplotlib`

___
#### Exercise 2d: Monte Carlo plots

Produce a plot similar to this one that that we showed above.

<img src="MonteCarlo.png" alt="Monte Carlo approximating pi" width="300"/>


In [None]:
import numpy as np
from numpy import random
# We need the plotting libraries
import matplotlib.pyplot as plt

# Define a function to approximate pi
# with Monte Carlo for sampling 'n_points'
# and plotting the results

def approximate_pi(n_points):
    
    # We have added some 'boilerplate' code
    # to set up the plot here, add more code
    # around your pi approximation code from
    # earlier to make a plot
    
    # Hints:
    # Look for the matplotlib documentation
    # online to help
    
    # The best function to use will be the
    # plt.scatter function - the docs can
    # help with this

    # Set aspect ratios to equal
    plt.gca().set_aspect("equal")
    
    # Set x,y labels
    plt.xlabel("Your x axis label")
    plt.ylabel("Your y axis label")
    
    # Set x,y ranges
    plt.xlim([-1, 1])
    plt.ylim([-1, 1])
    
    # Plot square
    plt.plot([1,-1,-1,1,1],[1,1,-1,-1,1], color="white")
    
    # Plot circle
    circle = plt.Circle((0, 0), 1, edgecolor="black", facecolor="white")
    plt.gca().add_patch(circle)

    # Show plot
    plt.show()    
    
    return # Replace comment with return value


N_POINTS = 10
print(approximate_pi(N_POINTS))

#### Bonus Exercise: Accuracy of the approximation

Modify the above code to produce a chart showing how accurate the approximation to $\pi$ is for different values of $N$, similar to the one from [here](https://www.kaggle.com/code/soachishti/monte-carlo-tutorial-calculating-pi):

<img src="pi-approx.png" alt="Accuracy of pi approximation example" width="400"/>
