# Debugging

## `stanc` compile errors

Check for

* forgotten semicolons.
* incompatible types. 
* missing parentheses (`{...}`)

## Runtime errors (and warnings)

Save the output (`stdout` and `stderr` files) of `cmdstan` to a file with a known location
```py
sam = sm.sample(output_dir="stan-cache", ...)
```
Common Warnings:

* **Initial guess rejected:** The initial parameter guess leads to a *very* small likelihood, or a zero posterior likelihood. Make sure the parameters have correct domains. Try specifying an initial guess
* **Zero log likelihood during sampling:** A proposed sample has a zero (or very small) likelihood. 
* **Dimension mismatch:** Stan does not check dimensions of arrays and vectors at compile time.

**`print` function**: Use the `print()` function for debugging. Don't forget to remove print statements as they slow down sampling!

## Diagnosis

**Gelman-Rubin R-statistic $\hat{R}$** is a measure of cenvergence of the chains. This works best if you have multiple independent chains with different initial guesses. Stan computes $\hat{R}$ for each parameter, Good $\hat{R}$ values are close to $1$.

*If $\hat{R}$ is NOT close to $1$, try using longer chains*

**Maximum tree depth reached** during transitions. The NUTS algorithm uses a binary tree to avoid "U-turns". This tree has a maximum "depth" to avoid spending too much time in one transition. Stan warns it this maximum is reached. If this happens the result is not "wrong", you'll get a smaller effective sample size.

*Try increasing the max tree death* 

```py
sm.sample(max_treedepth=12, ...) ## might result in slow sampling!
```

## Diagnosis

**Divergent transitions.** Not only is the HMC proposal "volume preserving", it also preserves "Energy", which is in this case $\mathcal{H}(\theta, p) = -\log(Q(\theta|D)) + \tfrac12 p' M^{-1} p$. So in theory, we could apply the Metropolis-Hastings step **before** we integrate the Hamiltons equations.

\begin{equation}
\exp(-\mathcal{H}(\theta_i), p_i) + \mathcal{H}(\theta_{i}', p_{i}')) = \exp(-\mathcal{H}(\theta_i, p_i) + \mathcal{H}(\theta_{i}, p_{0}))
\end{equation}

*However* the leapfrog integration scheme **does not** preserve engergy. 

A transition is called **divergent** if the $\mathcal{H}$ deviates too much from the initial value, which indicates that the step size is too large.

*Try to increase the `adapt_delta`* which determines the desired **acceptance ratio** of the MH algorithm. A large acceptance ratio leads to a smaller step size.

**Low ESS**. A low effective sample size means that your chains have high auto-correlation. You can increase the sample size by increasing the chain length.


### Further reading

[A Conceptual Introduction to Hamiltonian Monte Carlo](https://arxiv.org/pdf/1701.02434.pdf)