**Leapfrog step** from **Algorithm 1**

**function** Leapfrog$(\theta,r,\varepsilon)$

Set $\tilde{r} \leftarrow r + \frac{\varepsilon}{2} \nabla_\theta \mathcal{L}(\theta)$

Set $\tilde{\theta} \leftarrow \theta + \varepsilon \tilde{r}$

Set $\tilde{r} \leftarrow r + \frac{\varepsilon}{2} \nabla_\theta \mathcal{L}(\tilde{\theta})$

**return** $\tilde{\theta},\tilde{r}$

$\mathcal{L}$ is the logarithm of the joint density of the variables of interest $\theta$. The Leapfrog function of Algorithm 1 implements the Stormer-Verlet ("leapfrog") integrator, which proceeds according to the updates:

$r^{t + \frac{\varepsilon}{2}} = r^t + \frac{\varepsilon}{2} \nabla_\theta \mathcal{L}(\theta^t)$

$\theta^{t + \varepsilon} = \theta^t + \varepsilon r^{t + \frac{\varepsilon}{2}}$

$r^{t + \varepsilon} = r^{t + \frac{\varepsilon}{2}} + \frac{\varepsilon}{2} \nabla_\theta \mathcal{L}(\theta^{t + \varepsilon})$

where $r^t$ and $\theta^t$ denote the values of the momentum and position variables $r$ and $\theta$ at time $t$, $\nabla_\theta$ denotes the gradient with respect to $\theta$ and $\varepsilon$ is the step size parameter.

The performance of Hamiltonian Monte Carlo (HMC) depends strongly on choosing suitable values for $\varepsilon$ and $L$, which is the number of times chosen to run the leapfrog step. If $\varepsilon$ is too large, then the simulation will be inaccurate and yield low acceptance rates. If $\varepsilon$ is too small, then computation will be wasted taking many small steps. If $L$ is too small, then successive samples will be close to one another, resulting in undesirable random walk behavior and slow mixing. If $L$ is too large, then HMC will generate trajectories that loop back and retrace their steps.

The No-U-Turn Sampler (NUTS) is an extension of HMC that eliminates the need to specify a fixed value of $L$, the number of leapfrog steps. It also incorporates schemes for setting $\varepsilon$ based on a dual averaging algorithm.



**Algorithm 3** Efficient NUTS

Given $\theta^0, \varepsilon, \mathcal{L}, M$:

**for** $m=1$ to $M$ **do**

&nbsp;&nbsp;&nbsp;&nbsp; Resample $r^0 \sim \mathcal{N}(0,I)$

&nbsp;&nbsp;&nbsp;&nbsp; Resample $u \sim \text{Uniform}([0, \text{exp}\{\mathcal{L}(\theta^{m-1}) - \frac{1}{2} r^0 \cdot r^0 \}])$

&nbsp;&nbsp;&nbsp;&nbsp; Initialize $\theta^- = \theta^{m-1},~\theta^+ = \theta^{m-1},~r^- = r^0,~r^+ = r^0,~j = 0,~\theta^m=\theta^{m-1},~n=1,~s=1$

&nbsp;&nbsp;&nbsp;&nbsp; **while** $s=1$ **do**

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Choose a direction $v_j \sim \text{Uniform}(\{-1,1\})$

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **if** $v_j=-1$ **then**

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $\theta^-,r^-,-,-,\theta',n',s' \leftarrow \text{BuildTree}(\theta^-,r^-,u,v_j,j,\varepsilon)$

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **else**

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $-,-,\theta^+,r^+,\theta',n',s' \leftarrow \text{BuildTree}(\theta^+,r^+,u,v_j,j,\varepsilon)$

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **end if**

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **if** $s'=1$ **then**

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; With probability $\text{min}\big\{1,\frac{n'}{n}\big\}$, set $\theta^m \leftarrow \theta'$

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **end if**

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $n \leftarrow n + n'$

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $s \leftarrow s' \mathbb{1}[(\theta^+ - \theta^-) \cdot r^- \geq 0] \mathbb{1}[(\theta^+ - \theta^-) \cdot r^+ \geq 0]$

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $j \leftarrow j+1$

&nbsp;&nbsp;&nbsp;&nbsp; **end while**

**end for**
<br>
<br>
<br>
**function** BuildTree$(\theta,r,u,v,j,\varepsilon)$

**if** $j=0$ **then**

&nbsp;&nbsp;&nbsp;&nbsp; *Base case - take one leapfrog step in the direction $v$*

&nbsp;&nbsp;&nbsp;&nbsp; $\theta',r' \leftarrow \text{Leapfrog}(\theta,r,v\varepsilon)$

&nbsp;&nbsp;&nbsp;&nbsp; $n' \leftarrow \mathbb{1}[u \leq \text{exp}\{\mathcal{L}(\theta') - \frac{1}{2} r' \cdot r' \}]$

&nbsp;&nbsp;&nbsp;&nbsp; $s' \leftarrow \mathbb{1}[\mathcal{L}(\theta') - \frac{1}{2} r' \cdot r' > \text{log}~u - \Delta_{max}]$

&nbsp;&nbsp;&nbsp;&nbsp; **return** $\theta',r',\theta',r',\theta',n',s'$

**else**

&nbsp;&nbsp;&nbsp;&nbsp; *Recursion - implicitly build the left and right subtrees*

&nbsp;&nbsp;&nbsp;&nbsp; $\theta^-,r^-,\theta^+,r^+,\theta',n',s' \leftarrow \text{BuildTree}(\theta,r,u,v,j-1,\varepsilon)$

&nbsp;&nbsp;&nbsp;&nbsp; **if** $s'=1$ **then**

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **if** $v=-1$ **then**

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $\theta^-,r^-,-,-,\theta'',n'',s'' \leftarrow \text{BuildTree}(\theta^-,r^-,u,v,j-1,\varepsilon)$

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **else**

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $-,-,\theta^+,r^+,\theta'',n'',s'' \leftarrow \text{BuildTree}(\theta^+,r^+,u,v,j-1,\varepsilon)$

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **end if**

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; With probability $\frac{n''}{n'+n''}$, set $\theta' \leftarrow \theta''$

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $s' \leftarrow s'' \mathbb{1}[(\theta^+ - \theta^-) \cdot r^- \geq 0] \mathbb{1}[(\theta^+ - \theta^-) \cdot r^+ \geq 0]$

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $n' \leftarrow n' + n''$

&nbsp;&nbsp;&nbsp;&nbsp; **end if**

&nbsp;&nbsp;&nbsp;&nbsp; **return** $\theta^-,r^-,\theta^+,r^+,\theta',n',s'$

**end if**


Before developing Algorithm 3, Efficient NUTS, the paper develops Algorithm 2, Naive NUTS. Algorithm 2 introduces a slice variable $u$ with conditional distribution $p(u|\theta,r)=\text{Uniform}(u;[0,\text{exp}\{\mathcal{L}(\theta) - \frac{1}{2} r \cdot r \}])$, which renders the conditional distribution $p(\theta,r|u) = \text{Uniform} (\theta,r;\{\theta',r'|\mathcal{L}(\theta) - \frac{1}{2} r \cdot r \} \geq u \})$. After resampling $u|\theta,r$, NUTS uses the leapfrog algorithm to trace out a path forwards and backwards, doing for 1 step, 2 steps, 4 steps, etc. This doubling process builds a balanced binary tree whose leaf-nodes correspond to position-momentum states. The process is halted when the trajectory starts to double back on itself. 

In summary, Algorithm 2 leaves the target distribution $p(\theta) \propto \text{exp}\{\mathcal{L}(\theta)\}$ invariant. It achieves this by resampling the momentum and slice variables $r$ and $u$, simulating a Hamiltonian trajectory forwards and backwards in time until that trajectory either begins retracing its steps or encounters a state with very low probability, selecting a subset of the states encountered on that trajectory that lie within the slice defined by the slice variable $u$, and finally choosing the next position and momentum variables $\theta^m$ and $r$ uniformly at random from the subset of the states encountered.

Algorithm 3 improves Algorithm 2 by breaking out of the recursion as soon as a zero value for the stop indicator $s$ is encountered.