# Hamiltonian Monte Carlo
### (Or: watch your "p's"s and "q's")

Hamiltonian physics is a re-imagining (of sorts) of the fundamental idea of the conservation of energy. 

The classical formulation goes something like the following. Assume we have a particle, whose postion is denoted by the variable $\bf{q}$. The momentum of such a particle is defined by the fomula $\bf{p=mv}$, where $\bf{v}$ is the first derivative of the position variable, $\bf{\dot{q}}$.

The kinetic and potential energy of this particle may be represented as:

$$\bf{K(p,q) = \frac{1}{2}m\dot{q}^2 \hspace{1in} U(q)} $$

With the $Hamiltonian$ written as the sum of the kinetic and potential energies (scalable to many particles, if needed):

$$\bf{H = K(p,q) + U(q) = \frac{1}{2}m\dot{q}^2 + U(q) }$$

or

$$\bf{H = \frac{p}{2m} + U(q)}$$

The equation of motion for the particle (or system, for many particles) are given by $Hamilton's~ Equations$:

$$\boxed{\bf{q=\frac{\partial H}{\partial p}}} \hspace{1in} \boxed{\bf{p=-\frac{\partial H}{\partial q}}}$$

This is all fine and dandy, but really? $Why~do~we~care$?

Keep this in mind as we move forward: the solution of Hamilton's equations yields a trajectory $-$ positions and momenta as functions of time.

### Some (re-)definitions

For purposes here, Markov chain Monte Carlo (MCMC) is a method to determine expectations (some value of interest) from the posterior distribution of our model. To avoid a [traxoline](https://people.physics.tamu.edu/krisciunas/Traxoline.pdf) moment, the best explanation for this that I've found is: the posterior distribution is a probability distribution that represents your updated beliefs about the parameter after having seen the data. From this probability distribution, we can estimate the value of interest, as well as uncertainties in said value. 

We need the MCMC to converge to the true expectation value (true estimate) quickly. Fast convergence requires strong conditions of $\bf{ergodicity}$ - that is, a parameter space may be sufficiently explored statistically by MCMC in a finite amount of time. Specifically, the condition of geometric ergodicity is desirable. In this condition, MCMC estimators follow the central limit theorum, and the properly normalized sum of the probability distribution or posterior, tends towards a normal, or Gaussian distribution. $\bf{Geometric~ergodicity}$ applies to manifolds (high-dimentional surfaces) and has been historically important in the development of differential geometric analyses, such as General Relativity. 

$Hamiltonian~Monte~Carlo~(\bf{HMC})$ is unique in that when it fails to converge, it is recognisable. For example, with the split $\hat{\bf{R}}$ statistic, which for well-behaved parameter spaces, should be very near 1.0, and values above 1.1 indicate problems with the fit.

### Hamiltonian Monte Carlo application

Using Hamiltonian dynamics to sample from a distribution requires translating the density function for this distribution to a potential energy function and introducing "momentum" variables to go with the original variables of interest (now seen as "position" variables). We can then simulate a Markov chain in which each iteration resamples the momentum and then does a Metropolis update with a proposal found using Hamiltonian dynamics.

The first step of the HMC process changes only the momentum, with new values randomly extracted from a Gaussian distribution. In the second step, a Metropolis update is performed, using Hamiltonian dynamics to propose a new state. Care must be taken in choosing the number of steps and step size to avoid problems, such as periodicity in parameter space exploration.

### Why HMC?

Let's explore an example, at least graphically, why one might wish to use HMC over other types of MCMC estimation. This example will contrast HMC and random-walk Metropolis MCMC via a 100-dimensional multivariate Gaussian distribution in which the varialbes are independant, with means of zero, and standard deviations of 0.01, 0.02, ..., 0.99, 1.00. The results of the simulations are best seen in a series of plots.

<img src="Location_plot.jpg">
<img src="Accuracy.jpg">

### References:

1. http://www.mcmchandbook.net/HandbookChapter5.pdf
2. http://stats.stackexchange.com/questions/58564/help-me-understand-bayesian-prior-and-posterior-distributions
3. http://www.nyu.edu/classes/tuckerman/stat.mech/lectures/lecture_1/node4.html
4. http://mc-stan.org/users/documentation/case-studies/pystan_workflow.html
