# Monte Carlo methods: random number generation, *but with more flair*

The Monte Carlo methods are a family of methods that rely on random number generation (usually with specific constraints, such as being distributed according to a curve) either to find an approximate solution (most famous example: numerical integration and finding $\pi$'s digits) or for the sake of RNG itself (such as simulation of experiments). The general downside is that it requires a fairly large amount of extractions to increase accuracy when used for problem solving; the silver lining is that the general principle is *extremely* simple.

(or is it?)

## First taste of RNG: uniformly distributed generation

Let's declare 2 variables:
$$r_0 = 1 \qquad m = 37$$
where $r_i$ is the i-th random value we extracted (with $r_0$ being the "seed" value), $m$ is the number we're using to set a maximum value on our randomly generated numbers. Later on we'll see that this value also puts a cap on the "randomness".
One first formula we can use for *pseudo*random number generation is
$$r_i = (n \cdot r_{i-1})\mod m$$



and with the values we chose, this table emerges

n  r

1  1

2  5

3  25

4  14

5  33

6  17

7  11

8  18

9  16

10  6

11 30

12 2



(and so on until the 37th entry cause the professor for whatever reason wasn't happy enough with just 10 entries being a lengthy enough example)

**This is pseudorandomness**. However unpredictable this sequence of numbers appears to be, this sequence ***will repeat*** (same order or different order?) from the $m+1$-th entry. On top of that, there's no guarantee that the sequence of random numbers is **uniformly distributed**: it means that there's gonna be strong hidden correlations between the extracted values.

To improve on the randomness (which still does not obtain *true* randomness, but we'll take it) we may choose to constantly change $n$ and $m$ during the generation. An example of this is the (whatshisname) method, working with 3 seeds: it's slow, however it has an extraordinarily large period (on the order of $10^{12}$). There's better methods for this.

---

### Altering the uniform distribution

We start by setting

$$ 
0 < r_i < 1 \\
s_m < s_i < s_M
$$

where $s$ is a scaling factor to adjust our distribution bounds. This second "hypothetical" distribution $s$ has width $\Delta s$.

$$ s_i = r_i\Delta s  + s_m $$

(or we can subtract $s_M$ alternatively)

This gives us a number in a distribution that's scaled and moved vertically.

---



##

## Transformation method, lookup table method and rejection sampling

We may constrain randomness well beyond uniform distributions: we can force number extractions that follow a more general distribution. One consequence is that no matter the choice of distribution, the extracted number has *the same chance of being generated, regardless of choice of distribution*. Recalling, somehow, notions of lab1:
$$ \int_{-\infty}^r P(r) dr = \int_{-\infty}^x g(x)dx $$

where $P(r)$ is our known distribution so far, which we already constrained so the integral above may be rewritten as 

$$
\int_{0}^r dr = \int_{-\infty}^x g(x)dx \\
r_i =  \int_{-\infty}^{x_i} g(x)dx
$$

with $g$ being a generic distribution function. This is an *integral equation* (counterpart to a *differential equation*).

Let's take an example distribution $P(x)$:

$$ P(x)=
\begin{cases}
A(1+ax^2) \quad &-1 <x< 1 \\
0 \quad &\text{otherwise}
\end{cases}
$$

$$\int_{-1}^x A(1+ax^2)dx = A\int_{-1}^x (1+ax^2)dx = A\left[x+\frac{ax^3}{3}\right]_{-1}^x = r

and we're solving this last bit of the equation (omitting the step where I expand the definite integral bluh bluh). This last equation is nonlinear, so we *could* employ one of our known root-finding algorithms (even though at a glance this is a cubic so it's analytically solvable, but this applies in general with more complicated functions), however that comes with the risk of creating correlation (lowering the "quality" of randomness).

Another path is sampling the obtained function at regular steps, obtaining a sequence of $x_i$ values, and obtaining a random value $r$ through interpolation. This is called the *lookup table method*, sometimes used. However it may be affected by some bias.

---

We can create a so called rectangular "envelope" (IT: inviluppo) around the curve we're using, generating a pair of values $(r_1, r_2)$ that falls within the rectangle, so the coordinates (x1, y1) will be constrained like
$$ r_1 \rightarrow x_1 =  r_1 \Delta x + x_{min}$$
and similarly $r_2 \rightarrow y_1 =  r_2 \Delta y + y_{min}$.
In both cases, $\Delta$ represents the width of the considered interval.

In our current case

$$r_{1} \cdot 2 + (-1)$$
2 being the width of the interval (-1 to 1) and -1 being $x_{min}$.

What we're doing is keeping all the random points that fall *below* the curve, discarding those above. Once we obtain the random $x$ and $y$, we check that $y < g(x)$, in which case we can keep the $x$. This is a good method, but extraordinarily wasteful as the obtained $y$ is only being used for the aforementioned inequality check, wasting calculations (this is called *rejection sampling*), which has an efficiency of 50% and lower.

## Gaussian distributed values

Let's say we take a value $r$ in a range going from $0$ to $N$. The average value in this range will of course be $\mu = N/2$. Generating $r$ so that it belongs to a Gaussian distribution will be given by

$$r_g = \sum^N_{i=1} r_i - \mu$$

with $r_i$ being the usual random values from 0 to 1. The width $\sigma$ of the curve will be $\sqrt{N/12}$ (???), so a Gaussian centered in 0 and of width 1, we need to pick $\mu = 0, \sigma = 1 \implies N=12$.

This is *spectacularly inefficient* as only one number over the 12 generated values will *actually belong to the distribution*.

## Box-Mueller method
Let's take the general Probability Density Function (Gaussian curve) *in one variable*:

$$G(x) = \frac{1}{\sqrt{2 \pi}} e^{-\frac{x^2}{2}}$$

We can actually create a Gaussian function in 2 variables: recalling again notions from lab1, the probability of obtaining a tuple of multiple values equals the product of the probabilities to obtain each single value

$$G(x,y) = \frac{1}{\sqrt{2 \pi}} e^{-\frac{x^2}{2}}\cdot\frac{1}{\sqrt{2 \pi}} e^{-\frac{y^2}{2}} = \frac{1}{\sqrt{2 \pi}} e^{-\frac{x^2+y^2}{2}}$$

and graphically speaking we may examine just a cross section of said surface: this means we're just seeing a circle (with a certain radius $R$) on the cartesian xy plane. Let's swap from cartesian to polar coords. This means we're gonna be checking whether our random point is valid by comparing the radii and verifying $r<R$ (angle theta is just a random number taken from an uniform distribution over the [0, $2\pi$] interval). The likelihood of this is then obtained like usual probability calculation:
$$P(r<R) = \iint\frac{1}{{2 \pi}}e^{-\frac{x^2+y^2}{2}}dxdy=\frac{1}{{2 \pi}}\int_0^{2\pi}\int_0^r e^{-r^2/2} rdrd\theta = \int_0^r e^{-r^2/2} rdr$$
where the integral wrt $\theta$ cancels out with the $1/2\pi$. Substituting $r^2/2 = s$ yields $ds = rdr$, so

$$\int_0^r e^{-r^2/2} rdr = \int_0^s e^{-s}ds = 1-e^{-s} = 1-e^{-r^2/2} = e^{-r^2/2}

which is the probability we were looking for.

To avoid confusion, the random number in the uniform distribution (0,1] will now be called $u_i$. So,

$$u_i = e^{-r^2/2} \implies r = \sqrt{-2\ln {u_i}}$$

and going back to cartesian coordinates we're finally obtaining the Box-Mueller method.

$$x_i = \sqrt{-2\ln {u_i}} \cos(2\pi u_{i+1})\\ y_i=\sqrt{-2\ln {u_i}} \sin(2\pi u_{i+1})$$

So the process is:
- We generate a pair of random numbers $u_{1,2} \in (0,1]$ in the uniform distribution
- Plug those into the formulas and obtain the pair $g_1(= x), g_2(= y)$: what these values do is basically take a point that follows the exact specified law and just move it up or down to simulate it having "real error"/being gaussian-distributed around the exact function
- Move forward one sigma and reapply

### Practical example

We have a 10cm long rod (hehe), one end of it is at 0°K (physically impossible but it's just an example), other end is 100°K. We're assuming the temperature along the rod is linearly growing as $T(x) = a+bx$, with $T(0) = 0, T(10) = 100$, ergo $a=0, b=10$. There's an uncertainty $\sigma_T$ of 1K, which is the error we associate to the simulated temperatures.

We "took" (or pretend we're taking) 10 measurements, taking $x_{T0} = 0.5 \text {cm}$ as the first point, meaning we'll use a 1cm step.

Code example:

In [1]:
from math import sqrt
import random, numpy as np

x = 0.5 #placeholder to say x0 is given by the starting x value we took, in this case 0.5
step = 1 #step size (after-edit note: THEY'RE NOT THE SAME THING, UNCERTAINTY IS NOT TO BE TAKEN AS THE STEP SIZE, THIS IS A SPECIFIC EXAMPLE)
u1 = random.random()
u2 = random.random() #both of these should fall within (0,1] although the random() function per se includes 0 and excludes 1
a = 0 #lower bound
b = 10 #upper bound

T = lambda x, a, b: a+(b*x) #generic law of temperature distribution: we chose to simulate a rod where the temperature from one end to the other end grows linearly
T_delta = lambda x, a, b: a+(b*(x+step)) #same formula plus the increment we called sigma

BM_x = lambda u1, u2: np.real(sqrt((-2) * (np.log(u1))) * (np.cos((2 * np.pi * u2)))) #remember we have two formulas because we're making a graph of (x, y) points
BM_y = lambda u1, u2: np.real(sqrt((-2) * (np.log(u1))) * (np.sin((2 * np.pi * u2)))) #I should stop using lambdas like this, also BM means Box Mueller

#! Python misunderstands the argument sign within the square root because of the -2 and wrongly returns a complex typed value, which of course has null imaginary part 
#! as the whole calculation actually returns a positive argument. A "dirty fix" is, as shown above, forcing a type conversion to np.real().
#? In theory the above formula should be looped to generate different random values each time?

print("x\t T")
while x <= 10: #10 is the centimeters, length of the bar
    T1 = T(x, a, b)
    T2 = T_delta(x, a, b)

    T_random1 = step * BM_x(u1, u2) + T1 
    T_random2 = step * BM_y(u1, u2) + T2 #each cycle spits out 2 random temperature values, neat!
    print(f"{x}\t{T_random1} \n{x+step}\t{T_random2}")
    x += 2*step #each cycle we move 2 steps ahead to generate the next pair

x	 T
0.5	3.954578965470284 
1.5	15.65221247934697
2.5	23.954578965470283 
3.5	35.65221247934697
4.5	43.95457896547028 
5.5	55.65221247934697
6.5	63.95457896547028 
7.5	75.65221247934697
8.5	83.95457896547029 
9.5	95.65221247934697


## Poisson distributed values

$$P(x) = \frac{\mu^x}{x!}e^{-\mu}, x \in \Z$$
Usual business:
$$r = \sum_{x=0}^n P(x)dx = \sum_{x=0}^n \frac{\mu^x}{x!}e^{-\mu}$$

which, for $n$ to infinity, this converges to 1.

And we just opt for the look-up table for simplicity.

For $\mu = 8.4$ (arbitrarily chosen, could be anything else):

- $n$ is our "index" as well as the value of $n$ events we're measuring the probability of
- $p_n$ is the obtained value, or the poisson probability of obtaining the number $n$ (so basically $p_n = P(n)$)
- $s_n$ is the partial sum up to term $n$


n   p       s

0   0.0002  0.0002

1   0.0019  0.0021 

2   0.0079  0.0100

3   0.022   0.032

4   0.0..(?)

5   0.078   0.157

6   0.11    0.267    

7
...

after entry 9, p will decrease (because $9 > \mu$)