# IL027 Core Lecture 3 part 1 - Random Numbers

### James Kermode

### School of Engineering 

## Overview

- Review of conditional `if` 
- generating random numbers: `rand, randn, randexp` 
- applications: Monte Carlo simulation of $\pi$, Brownian motion 
- `Distributions.jl` package
- fitting of distribution parameters

In [None]:
using Plots
Plots.gr(fmt=:png);  
# this selects a different plotting backend that we need for 
# animations. See Plots.jl documentation for more detail

## Review: Conditionals 

Before we get into the main topic of the lecture we briefly review a control flow statement that we introduced in **Assignment 1**, but did not cover properly in L1 and L2. For example, consider implementing the Heaviside function 
$$
    H(x) = \begin{cases}
        1, & x < 0  \\ 
        0, & x \geq 0
    \end{cases}
$$
This can be achieved with an `if` statement.

In [None]:
function H(x)
    if x < 0 
        return 1.0 
    else 
        return 0.0 
    end
end

In [None]:
x = linspace(-1, 1, 100)
plot(x, H.(x), lw = 3, xlims = (-1.1, 1.1), ylims = (-0.5, 1.5), label="H")

The general form of writing `if` statements can be looked up in the help text:

In [None]:
?if

As usual, there are many alternative ways to implement `H`, e.g., using the ternary operator, 
```
H1(x) = x < 0 ? 1.0 : 0.0
```
or simply a logical mask, 
```
H2(x) = Float64(x < 0)
```

**Lecture Question:** write a function `incircle` that checks whether two coordinates `x`, `y` belong to the unit circle

## Random Numbers


In [None]:
?random

In [None]:
?rand

In [None]:
# compute a single random number in [0, 1]
rand()

In [None]:
# compute many random numbers 
for n = 1:5 
    println(rand())
end 

In [None]:
# or get a vector of random numbers 
x = rand(5)

In [None]:
println("random integer between 1 and 10:")
@show rand(1:10)
println()
println("random element from a collection:")
@show rand(["the", "quick", "brown", "fox"])
println()
println("A 2 x 2 matrix of random integers:")
@show rand(1:5, (2,2))
;

Specifically, `rand()` draws a random number from the **Uniform Distribution** on $[0, 1]$. More precisely, this means that, if `x = rand()` and $[a, b] \subset [0, 1]$, then  
$$
    \mathbb{P}\big( x \in [a, b] \big) = b-a
$$
This is only one of many useful probability distributions. We will learn about other distributions below.

### Application: Compute π



The area of the unit circle is $\pi$. If we draw two random numbers $x, y$, uniformly distributed in $[-1, 1]$ then the probability that the point $(x, y)$ belongs to the unit circle is exactly $\pi / 4$ (the area of the circle divided by the area of the square $[-1, 1]^2$.  Or, simpler, if we draw two random numbers $x, y$ uniformly distributed in $[0, 1]$ then the probability that $(x, y)$ belongs to the quarter unit circle is again $\pi/4$. Therefore, to compute $\pi$, we can simply draw many random numbers and compute the expectation that $(x,y)$ is in the quarter circle.

<p><a href="https://commons.wikimedia.org/wiki/File:MonteCarloIntegrationCircle.svg#/media/File:MonteCarloIntegrationCircle.svg"><img width="300" src="https://upload.wikimedia.org/wikipedia/commons/thumb/2/20/MonteCarloIntegrationCircle.svg/1200px-MonteCarloIntegrationCircle.svg.png" alt="MonteCarloIntegrationCircle.svg"></a><br>By <a href="//commons.wikimedia.org/w/index.php?title=User:Yoderj&amp;action=edit&amp;redlink=1" class="new" title="User:Yoderj (page does not exist)">User:Yoderj</a>, <a href="https://en.wikipedia.org/wiki/User:Mysid" class="extiw" title="w:User:Mysid">Mysid</a> - <span class="int-own-work" lang="en">Own work</span>, derived in Inkscape from <a class="mw-selflink selflink">Image:MonteCarloIntegrationCircle.svg</a>, <a href="http://creativecommons.org/publicdomain/zero/1.0/deed.en" title="Creative Commons Zero, Public Domain Dedication">CC0</a>, <a href="https://commons.wikimedia.org/w/index.php?curid=35608043">Link</a></p>


In [None]:
# number of draws
Nsamples = 1_000_000

# specify the function testing that two numbers are in the circle
# NOTE: True ≡ 1, False ≡ 0
incircle(x, y) = (x^2 + y^2 < 1)

function estimate_pi(Nsamples)
    count = 0
    for n = 1:Nsamples 
        x, y = rand(), rand()
        count += incircle(x, y) 
    end 
    approx_pi = 4 * count / Nsamples
    return approx_pi
end

approx_pi = estimate_pi(Nsamples)
println("Approximation for π: ", approx_pi)
println("Error: ", abs(π - approx_pi))

The algorithm in the previous cell is in fact the simplest example of a Monte-Carlo Algorithm! See [Wikipedia](https://en.wikipedia.org/wiki/Monte_Carlo_method) for an introduction and for references.

**Lecture Question** Plot a graph to demonstrate that the error in the Monte Carlo estimate for $\pi$ goes down as $1/\sqrt{N}$ as the number of samples $N$ increases.

Another very common distribution is the Gaussian, or Normal distribution $N(m, \sigma)$, given by 
$$
    \mathbb{P}\big( x \in [a, b] \big) = \frac{1}{\sqrt{\pi}} \int_{a}^b e^{- (x-m)^2/\sigma} dx
$$
Here, $m$ is the mean and $\sigma$ the *variance* of the distribution. Informally, the variance measures how far random numbers are spread out from their average value; see [Wikipedia](https://en.wikipedia.org/wiki/Variance) for more information and references.

The normal distribution can be visualised as $\mathbb{P}\big( x \in [a, b] \big)$ being the shaded area under the curve:

In [None]:
a, b = 0.5, 1.5
x = linspace(-3, 3, 200)
f(x) = π^(-1/2) * exp(- x^2)
plot(x, f.(x), lw=5, label = "", xlim=(-3.0, 3.0), ylim = (-0.05, 0.6))
x1 = linspace(a, b, 100)
plot!([x1[1];x1;x1[end];x1[1]], [0.0;f.(x1);0.0;0.0], lw=3, label="", fill=true,
        xticks = ([-3, 0, 0.5, 1.5, 3], 
                  ["-3", "0", "a=0.5", "b=1.5", "3"]), tickfont=font(13))


More generally, if $f : \mathbb{R} \to \mathbb{R}_+$ with $\int_{\mathbb{R}} f(x) dx = 1$ then a probability distribution is defined by 
$$
    \mathbb{P}(x \in [a, b]) = \int_a^b f(x) dx.
$$
Julia comes out of the box with 
* Uniform distribution: `rand` 
* Normal distribution: `randn` 
* exponential distribution: `randexp`

### Application: Brownian Motion

<p><a href="https://commons.wikimedia.org/wiki/File:Brownian_motion_large.gif#/media/File:Brownian_motion_large.gif"><img src="https://upload.wikimedia.org/wikipedia/commons/c/c2/Brownian_motion_large.gif" alt="Brownian motion large.gif" height="240" width="240"></a><br>By <a href="//commons.wikimedia.org/w/index.php?title=User:Lookang&amp;action=edit&amp;redlink=1" class="new" title="User:Lookang (page does not exist)">Lookang</a> Author of computer model: Francisco Esquembre, Fu-Kwun and lookang - <span class="int-own-work" lang="en">Own work</span>, <a href="https://creativecommons.org/licenses/by-sa/3.0" title="Creative Commons Attribution-Share Alike 3.0">CC BY-SA 3.0</a>, <a href="https://commons.wikimedia.org/w/index.php?curid=19140345">Link</a></p>

The above video shows a large and heavy particle moving randomly by bouncing off many smaller surrounding particles, for example, Pollen grains in water (Brown) or small dust particles in air (Lucretius); see [Wikipedia](https://en.wikipedia.org/wiki/Brownian_motion) for a nice introduction. There are in fact a wide variety of physical scenarios where such random motion occurs naturally. 

Instead of accounting for every particle in the "bath", Einstein proposed to replace the "bath" with a random forcing term. We will not go into the mathematical or physical details of this *Brownian Motion* but instead simulate a time-discrete variant thereof:
$$\begin{align*}
    x_{n+1} &= x_n + \sqrt{\Delta t} R_n, \\ 
    y_{n+1} &= y_n + \sqrt{\Delta t} S_n
\end{align*}$$
where $(x_n, y_n)$ are the position of the particle at time $n$ and 
$R_n, S_n \sim N(0, 1)$ are independent and normally distributed random samples.

In [None]:
Nsteps = 1_000
dt = 0.1
# allocate arrays 
x = zeros(Nsteps)
y = zeros(Nsteps)
# simulate the RW 
for n = 2:Nsteps 
    a, b = rand()-0.5, rand() - 0.5
    x[n] = x[n-1] + sqrt(dt) * a 
    y[n] = y[n-1] + sqrt(dt) * b 
end 

In [None]:
# and animation of the random walk
@gif for n = 1:10:Nsteps
    plot(x[1:n], y[1:n], xlim = (-6, 6), ylim = (-6, 6), label="")
    plot!([x[n]], [y[n]], lw=0, marker=:o, ms=8, label = "")
end

In [None]:
# or a completely different interpretation? 
t = linspace(0, 365, Nsteps)
plot(t, x, label="")
plot!(t, y, label="", 
      xticks = (cumsum([31, 30, 31, 28, 31, 30, 31, 31, 30, 31, 30, 31]), 
                ["Jan", "Feb", "Mar", "Apr", "May", "Jun", 
                 "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]) )

### Other Distributions

Many distributions can be generated from `rand`, `randn` and `randexp`, however it is in general a subtle problem to implement new distributions in an efficient and numerically robust way. A better approach is to find a library that already has the needed distributions implemented: `Distributions.jl`

In [None]:
using Distributions

In [None]:
?Distributions

### Application: speed of Maxwell particles

Suppose a particle has velocity vector $\mathbf{v} = (v_1, v_2, v_3)$ with $v_j \sim N(\bar{v}, 1)$ and independently distributed (Maxwell distribution). Then the *speed* of the particle is distributed according to the [Chi Distribution](https://en.wikipedia.org/wiki/Chi_distribution): 
$$
    v = |\mathbf{v}| = \sqrt{v_1^2 + v_2^2 + v_3^2}
$$

In [None]:
?Chi

In [None]:
@show D = Chi(3)
@show mean(D)
@show var(D)     # variance 
;

For fun, we can perform a simple test that this is indeed the correct distribution, by creating samples of gaussian random vectors and putting their lengths into bins.

In [None]:
Nsamples = 1_000_000
# generate random speeds
v = [ norm(randn(3))  for n = 1:Nsamples ]
# Categorize into bins and plot the bins
histogram(v, nbins = 100, normed=true, label = "empirical")
# plot the probability density function on top 
x = linspace(0, 5, 100)
plot!(x, pdf.(D, x), lw=3, label="theoretical")


### Fitting a distribution

Suppose that we have some random (or, noisy) data, which we *believe* corresponds to a certain distribution but we don't know the correct parameters. Then `Distributions.jl` has a mechanism of determining the parameters for you: [Distribution Fitting](https://juliastats.github.io/Distributions.jl/latest/fit.html#Distribution-Fitting-1), implemented using [Maximum Likelihood estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation). We will not cover this method, but only use it.

In [None]:
?fit

**Warning:** Unfortunately, `fit` is not implemented for all distributions, e.g., it is not available for the $\chi$-distribution.

In [None]:
Nsamples = 1_000_000
v = [ norm(randn(3))  for n = 1:Nsamples ]
D = fit(Chi, v)

However, the $\chi^2$ fit is implemented, in that the $\chi^2$-distribution is a special case of the $\Gamma$-distribution.

In [None]:
# create many random samples
Nsamples = 1_000_000
v = [ norm(randn(3))^2  for n = 1:Nsamples ]   
# fit the data `v` to the Γ distribution 
D = fit(Gamma, v)

In [None]:
# Visualise the result to make sure the fit is ok!
# Categorize data into bins and plot the height of the bins
histogram(v, nbins = 100, normed=true, label = "empirical")
# plot the probability density function on top 
x = linspace(0, 20, 200)
plot!(x, pdf.(D, x), lw=3, label="fit to Gamma", xlim=(-2, 22), ylim = (0,0.25))

# # REMARK: if we naively fit to a normal (e.g.) then this 
# #         is of course admissible but the fit will be poor!
# Dn = fit(Normal, v)
# xn = linspace(-2, 15, 200)
# plot!(x, pdf.(Dn, x), lw=3, ls = :dot, label="fit to Normal", color = :black)

### Remark on Random Number Generators

A word of warning: a computer cannot generate *genuine* random numbers, instead it generates so-called pseudo-random numbers. They look (and for most practical purposes act) like real random numbers, but there are some limitations. For more on RNGs see [Wikipedia](https://en.wikipedia.org/wiki/Random_number_generation) and for more in RNGs in Julia see [the documentation](https://docs.julialang.org/en/stable/stdlib/numbers/#Random-Numbers-1)

Look in particular at the documentation for `srand` in order to understand how to "seed" (i.e initialise) random number generators.

In [None]:
?srand