In [None]:
import matplotlib.pyplot as plt
import scipy.stats as sts
import numpy as np
import os

import sys
sys.path.append("..")

import stancourse.utilities as util

if os.name == "nt": ## adds compiler to path in Windows
    cmdstanpy.utils.cxx_toolchain_path() 

# The Stan Programming Language

* Hello World
* Basic blocks
* Special blocks
* Variables
* Functions
* Quirks

# Hello World

Example: a very simple Stan model

In [None]:
util.show_stan_model("../stan-models/hello_world.stan")

* Code is divided into **blocks**
* `parameters` block: declaration of free parameters
* `model` block: defines the (un-normalized) posterior likelihood $Q(x|D)$
* variables are **strongly typed** (like C/C++, unlike R, Python)
* Statements must end with a semi-colon (;)

In [None]:
def hello_world_plot(xs):
    fig, axs = plt.subplots(2, 1, figsize=(7,5))
    axs[0].plot(xs, color='k')
    axs[0].set_ylabel("$x$")
    axs[0].set_xlabel("iteration")
    axs[1].hist(xs, density=True, bins=50, label="sample")
    xs = np.linspace(-3,3, 1000)
    axs[1].plot(xs, sts.norm.pdf(xs), linewidth=3, label="PDF")
    axs[1].legend()
    axs[1].set_xlabel("$x$")
    axs[1].set_ylabel("density")
    fig.tight_layout()

In [None]:
import cmdstanpy ## import stan interface for Python
if os.name == "nt": ## adds compiler to path in Windows
    cmdstanpy.utils.cxx_toolchain_path() 
## compile the stan model
sm = cmdstanpy.CmdStanModel(stan_file="../stan-models/hello_world.stan")
sam = sm.sample(chains=1) ## sample 
xs = sam.stan_variable("x") ## extract "trace"
hello_world_plot(xs) ## plot result

## Basic Blocks

**Data block**: Declare model input (data, constants, settings)

In [None]:
util.show_stan_model("../stan-models/data_block.stan")

In [None]:
sm = cmdstanpy.CmdStanModel(stan_file="../stan-models/data_block.stan")

N = 100
sigma = 1
mu_gt = 0.5

data = {
    "N" : N,
    "X" : sts.norm.rvs(loc=mu_gt, scale=sigma, size=N),
    "sigma" : sigma
}

sam = sm.sample(data=data, chains=1)

mu = sam.stan_variable("mu")
m, l, u = np.mean(mu), *np.percentile(mu, [2.5, 97.5])
print(f"E[mu] = {m:0.2f}, 95% CrI: [{l:0.2f},{u:0.2f}]")

## Basic Blocks

**Parameters block:** Declare free parameters

* We can specify initial values for the parameters. 
* Otherwise chosen at random in interval $[-2,2]$

In [None]:
util.show_stan_model("../stan-models/hello_world.stan")

In [None]:
sm = cmdstanpy.CmdStanModel(stan_file="../stan-models/hello_world.stan")

init_params = {
    "x" : 10.0
}

sam = sm.sample(inits=init_params, chains=1)

x = sam.stan_variable("x")
m, l, u = np.mean(x), *np.percentile(x, [2.5, 97.5])
print(f"E[x] = {m:0.2f}, 95% CrI: [{l:0.2f},{u:0.2f}]")

## Basic Blocks

**Model block:** Define the log-posterior density (up-to a constant)

In [None]:
util.show_stan_model("../stan-models/model_block.stan")

## Special Blocks

**transformed data:** pre-process the data, define constants. Block is executed only once prior to sampling.

In [None]:
util.show_stan_model("../stan-models/transformed_data_block.stan")

## Special Blocks

**transformed parameters:** compute compound parameters. Keeps tedious computations out of the model block,
Transformed parameters are saved to Stan output files

In [None]:
util.show_stan_model("../stan-models/transformed_parameters_block.stan")

In [None]:
def trans_params_block_fig(chain):
    fig, axs = plt.subplots(1,2, figsize=(7,3))
    axs[0].scatter(chain["beta"], chain["gamma"], s=1)
    axs[0].set_xlabel("$\\beta$"); axs[0].set_ylabel("$\\gamma$")
    axs[1].hist(chain["R0"], bins=50)
    axs[1].set_xlabel("$R_0$")

In [None]:
N = 100
data = {
    "N" : N,
    "SndInfections" : sts.poisson.rvs(1.2, size=N)
}

sm = cmdstanpy.CmdStanModel(stan_file="../stan-models/transformed_parameters_block.stan")
sam = sm.sample(chains=1, data=data)
chain = sam.stan_variables() ## get dictionary with ALL variables
trans_params_block_fig(chain) ## make figure to show samples

## Special Blocks
**generated quantities block** additional output. Executed only once per sample. Only place where we can use RNGs

In [None]:
util.show_stan_model("../stan-models/generated_quantities_block.stan")

In [None]:
sm = cmdstanpy.CmdStanModel(stan_file="../stan-models/generated_quantities_block.stan")
sam = sm.sample(chains=1)
fig, ax = plt.subplots(1,1, figsize=(7,3))

ks = np.linspace(0, 10, 11, dtype=int)
ax.hist(sam.stan_variable("k"), bins=ks, density=True, 
        width=0.4, label="Stan sample")
ax.bar(ks+0.7, sts.nbinom.pmf(ks, 1, 0.5), width=0.4, 
       color='tab:orange', label="Negative Binomial PMF")
ax.legend()

## Special Blocks
**functions block:** user-defined functions (custom PDFs, system of ODEs, etc)

In [None]:
util.show_stan_model("../stan-models/functions_block.stan")

## Variables

Apart from `int` and `real`, Stan lets us use the following types
```cpp
vector[n] v; // real vector of length n
matrix[n,m] A; // n x m real matrix
simplex[n] s; // n-dimensional (real) simplex
```
\begin{equation}
s_1 + s_2 + \cdots + s_n = 1\,,\quad 0 < s_i < 1
\end{equation}
```cpp
ordered[n] x; // sorted vector
positive_ordered[n] y; // positive, sorted vector
```
\begin{equation}
x_1 < x_2 < \cdots < x_n\,,\quad 0 < y_1 < y_2 < \cdots < y_n
\end{equation}
```
cov_matrix[n] Sigma; // Covariance matrix (positive definite, symmetric)
```
and other special Linear Algebra related types.

We can add upper and lower bounds to variables
```cpp
real<lower=0> x; // x is positive
real<lower=0, upper=1> y; // 0 < y < 1 (excludes bounds)
int<lower=0, upper=10> n; // for integers, bounds are included
vector<lower=0>[n] v; // each element is positive
```
Stan's HMC requires unbounded variables, so bounds are implemented with $\log$ and ${\rm logit}$ transforms.
Let $x'$ and $y'$ be unbounded variables, then
\begin{equation}
x = \exp(x') \,,\quad y = \frac{1}{1 + \exp(-y')}
\end{equation}

We can make **$n$-dimensional arrays** from all types

```cpp
vector[n] vs[m]; // array of m vectors, each of length n
real A[n,m,k]; // n x m x k array of reals
matrix[n,m] M[k,l]; // 2d array of n x m matrices
```
Indexing is as follows
```cpp
vector[n] v; // vector with n elments
v[1]; // first element of v (indexing starts at 1)
vector[3] u = v[1:3]; // multi-indexing, we can omit the "1": v[:3]
vector[n-3] w = v[4:]; // w contains the final n-3 elements of v

matrix[n,m] M; // n x m matrix
matrix K[4, 4] = M[4:7, 2:5]; // select block of M
```
We can use integer arrays for multi-indexing
```cpp
int idxs[3] = {1,3,7}; // array of indices
vector[n] u; // vector with n elements
vector[3] z = u[idxs]; // z = [u[1], u[3], u[7]]'
```


## Functions

**probability density functions**
Real argument. Argument and parameters separated by pipe `|`, log-scale
```cpp
real x = normal_lpdf(0.1 | 0.0, 1.0); // standard normal distribution
real y = weibull_lpdf(0.2 | 1.5, 3.2); // Weibull distribution
```
**probability mass functions**
integer argument
```cpp
real z = binomial_lpmf(10 | 25, 0.1); // binomial distribution
```
**(complementary) cumulative density function**
\begin{equation}
{\rm CDF}_f(x) = \int_{-\infty}^x f(y)dy\,,\quad {\rm CCDF}_f(x) = \int_{x}^{\infty} f(y)dy
\end{equation}
again, on the log scale
```cpp
real a = normal_lcdf(0 | 0, 1); // z = log(0.5)
real b = student_t_lccdf(-1 | 3, 0, 1); // Student t-distribution
```
CDF and CCDF are useful for left and right censored data

**Random number generators**

* pre-defined distributions is Stan have associated random number generators (RNGs)
* RNGs can only be used in the `generated quantities` block.
```cpp
real x = exponential_rng(2.0); // x ~ Exponential(2.0)
int b = bernoulli_rng(0.5); // b = 1 with probability 1/2, else 0
```

## Functions

**mathematical functions**

Stan has lots of built-in mathematical and special functions
```
exp, log, sin, cos, logit, tgamma, beta, lambert_w0, ...
```
Many are "vectorized", e.g.
```cpp
real x[3] = {1,2,3}; // x is an array
real y[3] = exp(x); // also works for vectors x
```

**compound functions**
* Stan defines a number of compound functions that are equivalent to simple user-defined expressions
* Use these compund functions for better numerical stability
* Especially useful for keeping values on the log-scale
```cpp
log_sum_exp(x,y) == log(exp(x) + exp(y));
log_sum_exp(v) == log(sum(exp(v))); // v is a vector
log1m_exp(x) == log(1 - exp(x));
log1p(x) == log(1+x); // important when x is small
log_mix(p, x, y) == log(p * exp(x) + (1-p) * exp(y)); // mixture models
```

## Functions

**manipulating arrays, vectors and matrices**

* appending rows to matrices (and simularly columns)
```cpp
matrix[n, m] A = rep_matrix(1.0, n, m); // n x m matrix filled with 1s
row_vector[m] v = rep_vector(0.0, m); // vector with m 0s
matrix[n+1, m] B = append_row(A, v);
```
* We have to manually convert between arrays and vectors
```cpp
functions {
    real my_function(real[] x) {
        return x[1];
    }
}
model {
    vector[n] x;
    real z = my_function(to_array_1d(x));
}
```

## Functions

**linear algebra**

Addition of vectors and matrices is component wise
```cpp
vector[n] v;
vector[n] w;
/* ... */
vector[n] u = v + w;
```
Multiplication is matrix-multiplication
```cpp
matrix[n,m] A;
matrix[m,k] B;
/* ... */
matrix[n,k] C = A * B;
```
Use `.*` for component-wise multiplication (cf MATLAB, Julia)
```cpp
vector[n] v;
vector[n] w;
vector[n] u = v .* w;
vector[n] x = v * w; // ERROR!!! x should be a row_vector
```

## Control flow

**for-loops**
```cpp
for ( i in 1:N ) { // i does not need a type
    /* ... */
}
```
**if-else**

```cpp
if ( C[i] == 0 ) {
    X[i] ~ normal(mu, sigma);
} else if ( C[i] == 1 ) {
    X[i] ~ cauchy(mu, sigma);
} else {
    X[i] ~ double_exponential(mu, sigma);
}
```


## Exercise

Let $E_1, E_2, \dots, E_n$ be independent, exponentially distributed random variables with rates $a_1, a_2,\dots, a_n$ respectively. The sum $X = E_1 + E_2 + \cdots + E_n$ has a so-called *hypoexponential* distribution. A special case is the Erlang distribution, in which case $a_1 = a_2 = \cdots = a_n$. Stan does not provide the hypoexponential distribution, and so we have to implement this ourselves. On [wikipedia](https://en.wikipedia.org/wiki/Hypoexponential_distribution), we find that the PDF of $X$ is given by 

\begin{equation}
 f_X(t) = - \alpha \exp(t Q) Q 1
\end{equation}
where $\alpha = (1, 0, 0, \dots, 0) \in \mathbb{R}^n$ is a row vector, and $1$ is a vector of $n$ ones $(1, 1, \dots, 1)^T \in \mathbb{R}^n$. The matrix $Q$ is given by
\begin{equation}
 Q = \left(\begin{array}{ccccc}
 -a_1 & a_1 & 0 & \cdots & 0\\
 0 & -a_2 & a_2 & \cdots & 0 \\
 \vdots & \ddots & \ddots & \ddots & \vdots \\
 0 & \cdots & 0 & -a_{n-1} & a_{n-1} \\
 0 & \cdots & 0 & 0 & -a_n
 \end{array}\right)
\end{equation}
The function $\exp$ deontes matrix exponentiation and is available in Stan as `matrix_exp`.

Implement the hypoexponential distribution in the `functions` block of a Stan model

```cpp
functions {
    hypoexponential_lpdf(real t, vector a) {
        // your code goes here...
    }
}
```
Define a positive parameter `t` in the parameters block, and give `t` a hypoexponential distribution in the `model` block. 
In the `generated quantities` block, generate hypoexponential random numbers `x` using the fact that $X$ is the sum of $n$ exponential RVs. Check that the histograms of `t` and `x` are very similar.

## Quirks

**The user is responsible for specifying valid parameter domains**

* Make sure the model is well defined by specifying valid parameter domains (using `<lower=a, upper=b>` syntax)
* For instance, if $x \sim {\rm Beta}(\alpha, \beta)$, then we should make sure that $x\in [0,1]$
* Mistakes will show up as 

In [None]:
util.show_stan_model("../stan-models/invalid_domain.stan")

In [None]:
sm = cmdstanpy.CmdStanModel(stan_file="../stan-models/invalid_domain.stan")
sam = sm.sample(chains=1, output_dir="../stan-cache/")

**content of `invalid_domain-TIMESTAMP-1-stderr.txt`**
```
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: beta_lpdf: Random variable is -1.0463, but must be in the interval [0, 1] (in '/home/chris/Projects/StanWorkshop/stan-models/invalid_domain.stan', line 6, column 4 to column 23)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

```

In [None]:
util.show_stan_model("../stan-models/valid_domain.stan")

## Quirks

**Jacobian corrections**

Let's compile the following Stan model

In [None]:
util.show_stan_model("../stan-models/no_jac_correction.stan")

In [None]:
## compile and sample
sm = cmdstanpy.CmdStanModel(stan_file="../stan-models/no_jac_correction.stan")
sam = sm.sample(chains=1)

In [None]:
def plot_ecdf(ax, xs, **kwargs):
    ys = sorted(xs)
    fs = np.linspace(0, 1, len(ys))
    ax.step(ys, fs, where='post', **kwargs)

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(4,3))

plot_ecdf(ax, sam.stan_variable("theta1"), label="$\\theta_1$")
plot_ecdf(ax, sam.stan_variable("theta2"), label="$\\theta_2$")
ax.set_xscale('log'); ax.set_ylabel("$P(\\theta > x)$"); ax.set_xlabel("x")
ax.legend(); pass

**Solution:** correct for transformation using the log of the Jacobian of the parameter transformation

In [None]:
util.show_stan_model("../stan-models/jac_correction.stan")

In [None]:
## compile and sample
sm = cmdstanpy.CmdStanModel(stan_file="../stan-models/jac_correction.stan")
sam = sm.sample(chains=1)

fig, ax = plt.subplots(1, 1, figsize=(4,3))

plot_ecdf(ax, sam.stan_variable("theta1"), label="$\\theta_1$")
plot_ecdf(ax, sam.stan_variable("theta2"), label="$\\theta_2$")
ax.set_xscale('log'); ax.set_ylabel("$P(\\theta > x)$"); ax.set_xlabel("x")
ax.legend(); pass

**How to compute the correction term?**

* Suppose $x$ has a prior $\pi(x)$. The probability of sampling $x$ in the infinitesimal interval $[x, x+dx]$ is 
\begin{equation}
 \pi(x) dx
\end{equation}
* Let $y = f(x)$ for some transformation function $f$, By specifying a prior $\pi$ for $y$, we day that the probability of sampling $y$ in the interval $[y, d+dy]$ is 
\begin{equation}
 \pi(y)dy
\end{equation}
* Stan only knows how to sample $x$ and so we have to tell Stan the density of $x$
\begin{equation}
 \pi(y) dy = \pi(f(x)) df(x) = \pi(f(x)) \frac{\partial f}{\partial x}(x) dx
\end{equation}
* Stan works on log scale, so the log-prior for $x$ is then
\begin{equation}
 \log(\pi(f(x)) + \log\left|\frac{\partial f}{\partial x}(x)\right|
\end{equation}
* Second term has to be added to `target` manually

**Example 1:**
* $f(x) = \log(x)$ has Jacobian $\frac{\partial f}{\partial x} = 1/x$.
* Correction term is
\begin{equation}
 \log\left| \frac{\partial f}{\partial x} \right| = \log(1/x) = -\log(x)
\end{equation}
* Jacobian correction in Stan: `target += -log(x);`

**Example 2:** Suppose we have a model with states $A, B, C$ 

\begin{equation}
A \xrightarrow{x} B\,,\quad A \xrightarrow{y} C 
\end{equation}

and we have some prior information about the probability $x / (x+y)$ of how many individuals end up in state $B$, and the average time $1/(x+y)$ to exit state $A$. Hence, we have priors 

\begin{equation}
 u = \frac{x}{x+y} \sim \mathcal{D}_1\,, \quad
 v = \frac{1}{x+y} \sim \mathcal{D}_2
\end{equation}

* Compute the Jacobian of the transformation $(x, y) \mapsto (u,v)$
\begin{equation}
J = \frac{\partial(u,v)}{\partial(x,y)} = \frac{1}{(x+y)^2}
\left(\begin{array}{cc}
y & -x \\ -1 & -1
\end{array}\right)
\end{equation}
* Correction term is given by 
\begin{equation}
\log|\det(J)| = \log|-(x+y)^{-3}| = -3\log(x+y)
\end{equation}


## Quirks

**HMC does not allow estimation of discrete parameters**

* Only **real** parameters (vectors, matrices) can be declared in the `parameters` block.
* Workaround: "integrate out" discrete variables $x \in \{0,1,2,3,\dots,n\}$. Define a simplex $p \in \mathbb{R}^n$ with $\mathbb{P}[x = i] = p_i$.
* Only works with finite domains.

## Quirks

**Ragged data structures**

Suppose we have data 
\begin{equation}
    x_1 = (1,2,4,2,1) \quad
    x_2 = (1,1,5) \quad
    x_3 = (6,9,1,3) \quad
    x_4 = (2,7)    
\end{equation}
In python, R etc, we can define an array

In [None]:
X = [[1,2,4,2,1], [1,1,5], [6,9,1,3], [2,7]]

But in Stan, arrays have to be **rectangular**. Two workarounds:

1. padding
2. concatenation

**Padding**

\begin{equation}
 X = \left(
  \begin{array}{ccccc}
   1 & 2 & 4 & 2 & 1 \\
   1 & 1 & 5 & 0 & 0 \\
   6 & 9 & 1 & 3 & 0 \\
   2 & 7 & 0 & 0 & 0 
  \end{array}
 \right)\,,
 \quad M = (5, 3, 4, 2)
\end{equation}

In [None]:
util.show_stan_model("../stan-models/ragged_padding.stan")

**Concatenation**

\begin{equation}
    X = (1, 2, 4, 2, 1, 1, 1, 5, 6, 9, 1, 3, 2)\,, \quad M = (5,3,4,1)
\end{equation}

In [None]:
util.show_stan_model("../stan-models/ragged_concatenation.stan")

## Quirks

**No declaration after assignment**
 
The Stan compiler (stanc) does not allow any seclarations after an assignment in the same lexical block (`{ ... }`)
Examples:
```cpp
model {
    real x;
    x = 10;
    int y; // ERROR!!!
}
```
Exception to this rule is a declaration-assignment statement
```cpp
real x = 10; // declaration-assignment counts as declaration
int y; // OK!
```
Declarations are allowed in new blocks 
```cpp
real x;
x = 10;
{ // we can declare variables in the new scope
    int y = 3; // OK
} // y is no longer in scope
x *= y; // ERROR!!!
```

## Further reading

**[Stan user guide](https://mc-stan.org/docs/2_27/stan-users-guide-2_27.pdf)** (399 pages) Lots of examples

**[Stan language reference manual](https://mc-stan.org/docs/2_27/reference-manual-2_27.pdf)** (190 pages) detailed language rules

**[Stan language functions reference](https://mc-stan.org/docs/2_27/functions-reference-2_27.pdf)** (207 pages) documentation of all built-in functions

## Exercises

1. An inverse gamma distributed random variable $X$ has the proporty that $1/X \sim {\rm Gamma}(\alpha, \beta)$. Suppose your Stan model has the sampling statement `1/X ~ gamma(alpha, beta);` what is the Jacobian correction term required?

2. Look in the folder `stan-models` for the file `circular_density.stan`. This model was used in the Introduction to sample from the "moon-shaped" distribution. Identify the Jacobion correction term and show that this is indeed the right correction.