In [None]:
import numpy as np
import numpy.random as random
import scipy.sparse, scipy.sparse.linalg
import matplotlib.pyplot as plt

## Review

Say we want to sample from some probability density $P(x)$.
There are two especially important parts of MCMC sampling:

* The **transition probability** $Q(x | y)$: given a state $x_k$, the next candidate $x_{k + 1}$ is generated by sampling from $Q(x | x_k)$.
* The **accept/reject** criterion: accept $x_{k + 1}$ with probability $\min(1, P(x_{k + 1})/P(x_k))$.

Usually, $Q(x | y)$ = normal distribution with mean $y$.
But this choice is arbitrary and we could use anything, so long as the transitions are reversible: $Q(x | y) = Q(y | x)$.

**Hamiltonian Monte Carlo is a clever choice of transition probability.**

## Hamiltonian mechanics

**Hamiltonian mechanics** is a particular way of describing classical physical systems.

* **The players**: position $q$, momentum $p$, and the total energy $H(q, p)$ of the system
* **The rules**: Hamilton's equations of motion,
$$\begin{align}
\dot q & = +\frac{\partial H}{\partial p} \\
\dot p & = -\frac{\partial H}{\partial q}
\end{align}$$
* When $p = m\dot q$, and $H = $ kinetic energy + potential energy, Hamilton's equations of motion are equivalent to Newton's.

Some very important things:
* The energy $H$ is conserved along trajectories of the ODE.
* The volume in phase space is conserved.
Take a "blob" $D$ of position/momentum pairs, now evolve them all for a time $t$ using Hamilton's equations; this gives a morphed blob, $D_t$.
Then $\mathrm{vol}(D) = \mathrm{vol}(D_t)$.

#### The Verlet algorithm

Suppose that the Hamiltonian is separable into a kinetic energy $K$ and a potential energy $U$:

$$H = K(p) + U(q)$$

The Verlet algorithm:

$$\begin{align}
q_{n + \frac{1}{2}} & = q_n + \frac{\delta t}{2} \cdot \frac{\partial K}{\partial p}(p_n) \\
p_{n + 1} & = p_n - \delta t \cdot \frac{\partial U}{\partial q}(q_{n + \frac{1}{2}}) \\
q_{n + 1} & = q_{n + \frac{1}{2}} + \frac{\delta t}{2}\cdot\frac{\partial K}{\partial p}(p_{n + 1})
\end{align}$$

**Useful things:**
* The phase volume is preserved, like for Hamiltonian systems.
* The trajectory exactly preserves a perturbed Hamiltonian $H + \delta H$, where $\delta H \sim \delta t$.

#### Ex: coupled oscillators

The Hamiltonian:

$$H = \frac{|p|^2}{2m} + \frac{1}{2}q^*Lq$$

where $L$ is the stiffness matrix.

In [None]:
def kinetic_energy(p):
    return 0.5 * np.sum(p**2)

def velocity(p):
    return p

n = 128
diag = np.ones(n)
diag[0] = 0
D = scipy.sparse.diags([diag, -np.ones(n - 1)], [0, -1])
Λ = scipy.sparse.diags([np.ones(n)], [0])
L = D.T * Λ * D
def potential_energy(q):
    return 0.5 * np.dot(L * q, q)

def potential_gradient(q):
    return L * q

In [None]:
q0 = random.RandomState().normal(size=n)
p0 = np.zeros(n)

δt = 0.01
num_steps = int(2 * np.pi / δt)

Let's try to integrate this with forward Euler, backward Euler, and the Verlet method.

In [None]:
q_verlet, q_feuler, q_beuler = q0.copy(), q0.copy(), q0.copy()
p_verlet, p_feuler, p_beuler = p0.copy(), p0.copy(), p0.copy()

energy_verlet, energy_feuler, energy_beuler = np.zeros(num_steps), np.zeros(num_steps), np.zeros(num_steps)

for k in range(num_steps):
    q_verlet += δt / 2 * velocity(p_verlet)
    p_verlet -= δt * potential_gradient(q_verlet)
    q_verlet += δt / 2 * velocity(p_verlet)
    energy_verlet[k] = kinetic_energy(p_verlet) + potential_energy(q_verlet)
    
    qk = q_feuler.copy()
    q_feuler += δt * velocity(p_feuler)
    p_feuler -= δt * potential_gradient(qk)
    energy_feuler[k] = kinetic_energy(p_feuler) + potential_energy(q_feuler)
    
    pk = p_beuler.copy()
    p_beuler = scipy.sparse.linalg.spsolve(scipy.sparse.eye(n) + δt**2 * L, pk - δt * L * q_beuler)
    q_beuler += δt * p_beuler
    energy_beuler[k] = kinetic_energy(p_beuler) + potential_energy(q_beuler)

In [None]:
fig, ax = plt.subplots()
ax.plot(energy_feuler, 'g', label='forward Euler')
ax.plot(energy_beuler, 'b', label='backward Euler')
ax.plot(energy_verlet, 'r', label='Verlet')
fig.legend()
ax.set_xlabel('timestep')
ax.set_ylabel('energy')
plt.show(fig)

In [None]:
print((np.max(energy_verlet) - np.min(energy_verlet)) / np.mean(energy_verlet))

## Hamiltonian Monte Carlo

MCMC simulation works with any reversible transition kernel.
The idea of HMC is to augment the state $q$ with a *pseudo-momentum* variable $p$ and use Hamiltonian dynamics to update both $q$ and $p$.