# Homework 12: sampling with random walks/MCMC in low and high dimension, Brownian motion and diffusion

In this exercise, you will apply some of the concepts seen in lecture 11 and 12. In exercise 1, you will estimate $\pi$ with Markov chains. Exercise 2 is optional and is a simple illustration of the curse of dimensionality. In exercise 3, you will use MCMC to sample a high-dimensional distribution and discuss the detailed balance condition. In exercise 4, you will numerically approximate the solution to the diffusion equation from random walks and compare it to the analytical solution.

In [3]:
import numpy as np
import matplotlib.pyplot as plt

# Exercise 1: Estimating $\pi$ by sampling the square with Markov chains.

In exercise 8 you estimated $\pi$ by sampling points uniformly on the square and counting the number of points that fell in a circle. In this exercise, we will solve the same problem using Markov chains.

**1.1** We want to sample points uniformly on a square to estimate $\pi$ by using the ratio of points inside and outside the unit circle. We will first try to solve this problem the **wrong way**, but perhaps more intuitive way. We use a Markov chain to sample the points. The chain $(\mathbf{x}(t=0), \mathbf{x}(t=1), ..., \mathbf{x}(t=T))$ has the following behavior:
1. Initialize the position of the walker in $\mathbf{x}(t=0)=(1,1)$.
2. Draw a step $\mathbf{\Delta} =(\Delta_x , \Delta_y )$ from a given distribution.
   - If the step brings the walker outside of the square, resample $\mathbf{\Delta} =(\Delta_x , \Delta_y )$ and repeat from step 2.
   - Otherwise, the walker moves and the new position is $\mathbf{x}(t+1)=\mathbf{x}(t)+\mathbf{\Delta}$
3. Sample the point $\mathbf{x}(t+1)$ **(i.e. we sample only when the walker moved)**.
4. Repeat from step 2 until $T$ is reached.

Implement this algorithm with $\Delta_x$ and $\Delta_y$ sampled uniformly between $-0.1$ and $0.1$. The algorithm should output the chain as a $(T+1)\times 2$ array and the total number of points $N_{in}$ inside the unit circle until $t$ as an array of size $T$.


In [4]:
# Your solution here:

**1.2** Simulate $3$ chains with $T=10^5$ and plot your estimation of $\pi$ after each step for each of the $3$ chains. The estimation of $\pi$ after $t$ steps is given by
\begin{equation}
    \pi (t) \approx \frac{4 N_{in}}{t}.
\end{equation}
Also draw a line where the real value of $\pi$ is. Do you find the correct value of $\pi$ ?

In [5]:
# Your solution here:

**1.3** We will now do it the **correct way**. Through the detailed balance condition, we can design the random evolution $\mathbf x(t) \to \mathbf x(t+1)$ such that the stationary distribution of $\mathbf x(t)$ is the distribution we want.
The detailed balance condition reads
$$
    P(a \to b) \pi(a) = P(b \to a) \pi(b) \quad \forall \,a,b
$$
where $P(a \to b)$ is the transition probability of going from $a \to b$, $\pi$ is the distribution we want to sample from (the stationary distribution of our MCMC), and $a,b$ are possible states of our random walker.

Write the detailed balance condition for $a = \mathbf x$, $b = \mathbf x + \Delta$, supposing that both $a, b$ are in the unit circle. In other words, we are considering the case in which we propose a step for our random walker, and the step keeps it in the circle. What does it imply for $P(a\to b)$? Answer in 2 lines.

Your answer here:

**1.4** Write the detailed balance condition for $a = \bm x$, $b = \bm x + \Delta$, supposing that both $a$ is in the unit circle, but $b$ is *not* in the unit circle. In other words, we are proposing a step for our random walker that would bring it out of the circle. What does it imply for the transition probability? Answer in two lines.

Your answer here:

**1.5** From 1.3 and 1.4 we obtain the following algorithm. It is the same as before except for the fact that we sample the point whether the walker moved or not:
1. Initialize the position of the walker in $\mathbf{x}(t=0)=(1,1)$.
2. Draw a step $\mathbf{\Delta} =(\Delta_x , \Delta_y )$ from a given distribution.
   - If the step brings the walker outside of the square, the walker doesn't move: $\mathbf{x}(t+1)=\mathbf{x}(t)$. (see 1.4)
   - Otherwise, the walker moves and the new position is $\mathbf{x}(t+1)=\mathbf{x}(t)+\mathbf{\Delta}$ (see 1.3)
3. Sample the point $\mathbf{x}(t+1)$ **whether the walker moved or not**.
4. Repeat from step 2 until $T$ is reached.
   
Implement this algorithm with $\Delta_x$ and $\Delta_y$ sampled uniformly between $-0.1$ and $0.1$. The algorithm should output the chain as a $(T+1)\times 2$ array and the total number of points inside the circle until $t$ as an array of size $T$. Additionally, your function should also output an array of length $T$ containing $1$ or $0$ if the walker is inside or outside the circle at time $t$ (this will be useful for the rest of the exercise).


In [6]:
# Your solution here

**1.6** Apply your algorithm for $T=300$ steps and plot the obtained chain. Set the aspect ratio to 'equal' to have a better visualization, and optionally also plot the unit circle.

In [7]:
# Your solution here:

**1.7** Repeat 1.2 with the new algorithm. This time it should converge to $\pi$.

In [8]:
# Your solution here:

**1.8** Do the same but this time sample $\Delta_y$ from a uniform distribution between $-1$ and $1$ (while keeping $\Delta_x$ uniform between $-0.1$ and $0.1$). Do you still converge to $\pi$ ?

In [9]:
# Your solution here:

**1.9** We now discuss how to estimate the error on the value of $\pi$. Consider again that $\Delta_x$ and $\Delta_y$ are sampled uniformly between $-0.1$ and $0.1$. Run a markov chain of length $T=10^5$ steps. Suppose that each point of the random walk is sampled i.i.d uniformly on the square. Give the obtained value of $\pi$ and its error. Recall that you can estimate the error using $\frac{1}{\sqrt{T}} s$ where $s^2$ is the unbiased empirical estimator of the  variance (see ex. 1.1 of homework 9 if you have doubts on how to compute the error). Alternatively you can also use the obtained variance from section 8.2 of the lecture notes. Is the true value of $\pi$ inside the error estimate ? Is the obtained error reasonable to you ?

In [10]:
# Your solution here:

**1.10** We can obtain a better estimation of the error on $\pi$ by running multiple Markov chains and computing an estimation of $\pi$ for each of them. If the chains are long enough, then the obtained values for the different chains are i.i.d. Compute the mean value of $\pi$ after $T=10^4$ steps for $k=10$ runs and its error. We now compute the error as $\frac{1}{\sqrt{k}} s$ where $s^2$ is the unbiased empirical estimator of the  variance of our $k$ values of $\pi$.

In [11]:
# Your solution here:

# Exercise 2 (optional): Curse of dimensionality

In this exercise, we present the curse of dimensionality with a simple example.

**2.1** Consider a hypercube with side of length $0.9$ and a hypercube with side of length $1$. What is the ratio of the volume of the small hypercube compared to the large hypercube for $d=1,2,5,10, 25, 50$ ?

In [12]:
# Your solution here:

## Exercise 3: sampling with MCMC in high dimension (adapted from the mock graded exercise of 2024)

In high dimensions, the curse of dimensionality may hinder the effectiveness of sampling algorithms that work just fine in low dimension. In the previous exercise, we also see that "high dimension" can mean as little as 10 dimensions !

Let us recall some definitions first. The $d$-dimensional unit cube $H_d$ is the set $H_d = [-1, 1]^d \subset \mathbb{R}^d$, i.e. the centered  cube with side of length 2. 
The $d$-dimensional disk $D_d$ is the set $D_d = \{ x \in \mathbb{R}^d \text{ such that } ||x|| \leq 1\} \subset \mathbb{R}^d$. 
The $d$-dimensional Euclidean norm is defined as 
$$
    ||x||^2 = \sum_{i=1}^d x_i^2 \, .
$$

In this exercise, we will implement Markov Chain Monte Carlo (MCMC) methods that allow to sample some high dimensional distribution.
We will sample i.i.d. points uniformly distributed in the disk using MCMC: we perform a carefully engineered random walk for $T$ time steps in the disk, such that all samples are directly i.i.d. in the disk.

We will then use the i.i.d. sample that we obtain to compute the average length of a vector uniformly distributed on the disk, i.e. to compute
$$
    \mathbb{E}_{x \sim D_d} [||x||] = \frac{
        \int_{||x|| \leq 1} d^dx \, ||x||
    }{
        \int_{||x|| \leq 1} d^dx 
    }
$$
where the notation $\mathbb{E}_{x \sim D_d} f(x)$ is a shorthand for "average of the function $f(x)$ for $x$ uniformly distributed over the set $D_d$". 


In class, and in exercise 1, we learned that it is possible to use Markov Chain Monte Carlo (MCMC) to sample from a generic distribution. 
MCMC is first of all a random walk, i.e. a sequence of points $x(t) \in \mathbb{R}^d$ that are generated one after the other with some kind of randomness. 
It is also a Markov Chain, which means that the evolution $x(t) \to x(t+1)$ is random, and that $x(t+1)$ depends only on $x(t)$, and not on all previous positions of the chain.

Let us design an MCMC algorithm to sample from $\pi = $ uniform distribution on the disk, i.e.
$$
    \pi(x) = \begin{cases}
        \frac{1}{\Omega_d} & \text{if } x \in D_d \\
        0 & \text{otherwise}
    \end{cases}
$$
where $\Omega_d$ is the volume of the $d$-dimensional disk $D_d$ (and acts as a normalisation of our p.d.f. $\pi$).
We consider a simple random walker, defined by $x(t=0) = 0 \in \mathbb{R}^d$, and $x(t+1) = x(t) + \Delta$.
$\Delta$ is our evolution step, and is a random variable, independent and identically distributed with p.d.f. $\rho(\Delta)$ at each different time step.
The only freedom we have left in the design of the random walk is the distribution $\rho(\Delta)$, so we need to use the detailed balance condition to fix $\rho$ in order to sample from $\pi$.

We saw in exercise 1 that the detailed balance condition gives us the following MCMC algorithm, which is guaranteed to sample the uniform distribution on the disk $D_d$ (after possibly some time steps of equilibration).
- Initialize $x(t=0) = 0$
- Generate randomly an increment $\Delta \in \mathbb{R}^d$ using a symmetric p.d.f. $\rho$ (see exercise 1).
- If $x(t) + \Delta \in D_d$, then set $x(t+1) = x(t)+\Delta$
- If $x(t) + \Delta \notin D_d$, then set $x(t+1) = x(t)$ (see exercise 1)
- Repeat up to a total time $T$.

**Comment:** notice that one could also use a non-symmetric $\rho$, but a symmetric $\rho$ is a simple way of implementing the symmetry of the transition probability.

In what follows, we set $\rho$ to be the uniform distribution in the interval $[-c, c]^d$.

**3.1** Write a function that implements the MCMC algorithm. It takes as input $d, T, c$ and returns an array of the positions $x(t)$ in the same format as the output of `direct_sampling`. Print the output of the function for $d=2$, $T=5$ and $c=1$.

In [13]:
# Your code here

**3.2** Write a function that computes the empirical average of the norm $||x||$ on the output of the function `mcmc`. Print its output on a dataset produced with `mcmc` and $d=2$, $T=5000$, $c=0.2$. Print also the corresponding theoretical value, which is $d/(d+1)$.

In [14]:
# Your code here

**3.3** 
Generate a dataset using `mcmc` with $d=2$, $T=10000$ and $c=0.5$.
For `k in np.arange(start=10, stop=10000, step=100)`, compute the empirical estimator for the norm using the first `k` points of the dataset.
Plot the value of the empirical estimator as a function of `k`.
Add an horizontal line with the theoretical value.

In [15]:
# Your code here

**3.4** Answer the following questions in one line per question.
1. Compute the expected square distance $||x(t+1)-x(t)||^2$, i.e. the square of how far the walker moves at each step.
2. How should we set $c$ in order to have expected square distance $||x(t+1)-x(t)||^2 = 1/2$? 

Your answer here:

**3.5** 
Reproduce exactly the plot in 3.5 for `d = [3,5,10,15,20,100]`. Plot each value of d on a different plot. Use $T = 10000$ and $c = 0.5 / \sqrt{d}$ (in order to have steps of length that is not scaling with $d$, see 3.6).

In [16]:
# Your code here

**3.6** Does the performance of MCMC degrade as the dimension $d$ increases ? Why ? 

Your answer here:

# Exercise 4: Brownian motion and Diffusion

In lecture 11, you saw dynamics of the form
$$
x(t+\tau)=x(t)+\Delta
$$
with $\Delta$ drawn i.i.d. at each time step from a given distribution. This dynamics gives rise to the diffusion equation
$$
\frac{\partial \rho}{\partial t}(x,t)=D \frac{\partial^2\rho}{\partial x^2}(x,t)\,\text{with}\,\, D=\frac{\text{Var}(\Delta)}{2\tau}.
$$
In this exercise, we will find a numerical solution to the diffusion equation by estimating $\rho$ from random walks.

**4.1** Consider a particle in 1 dimension. Set $\tau=1$, the discrete dynamics of the particle is then given by
$$
x(t+1)=x(t)+\Delta.
$$
Suppose that $\Delta=\pm 1$ with equal probability and independent of the time. Code a function that performs this random walk and returns an array of the position of a particle after each time step for a maximum of  $T=1000$ time steps. Initialize the position at $x(t=0)=0$. Repeat the experiment 3-5 times and plot the obtained trajectories on a graph with the time on the x-axis and the position on the y-axis.

Hint: To generate $\Delta$ you can define a random number generator using for instance `rng = np.random.default_rng()` and then use the function `rng.choice`.

In [17]:
# Your solution here:

**4.2** Do the same but now $\Delta$ is sampled from a gaussian with mean $0$ and standard deviation $1$. Compare qualitatively to the previous random walk: do the figures look the same ?

In [18]:
# Your solution here:

**4.3** We now want to sample the distribution $\rho$ with $N$ particles. Code a function that returns an array of size $N\times T$ representing the position of the $N$ particles after each timestep. $N$ and $T$ should be inputs of the function. $\Delta$ should be sampled from a uniform distribution between $-1$ and $1$. Initialize the position of each particle at $x=0$.

In [19]:
# Your solution here:

**4.4** Using your previously defined function, plot the distribution of the positions between $-10$ and $10$ for $N=10000$ particles using `plt.histogram` for $t=1,2,5,10$. Do a different figure for each value of $t$. Fix the y-axis so that you can compare the different figures. Additionnaly, plot the analytical solution of the diffusion equation 
$$
\rho(x,t)=\frac{1}{\sqrt{4 \pi D t}}\text{exp}(-\frac{x^2}{4Dt}), \,D=\frac{\text{Var}(\Delta)}{2\tau}
$$
on top of the histograms. Recall that in our case $\tau=1$.

Hint: The variance for the uniform distribution between $-1$ and $1$ is $1/3$.

In [20]:
# Your solution here:

**4.5** Plot the empirical standard deviation of the positions of the particles as a function of time for $T=100$ in a log-log scale. Also plot the standard deviation $\sqrt{2Dt}$ from the analytical solution, and check that they are the same.

In [21]:
# Your solution here:

**4.6** Repeat 4.4 and 4.5 but this time with $\Delta=\pm 1$ with equal probability. Be careful when choosing the bins and weights of the histogram to take into account the discrete space of the problem. Also plot the result for $t=99$ (adapt the xlim if needed).

In [22]:
# Your solution here: