## *Disclaimer* {background-image="logo_tm.png" background-opacity="0.2" background-size="40%" background-position="0% 0%"}

I am by no means an expert in Bayesian Inference. I would describe myself
Bayesian "enthusiast" at best.

## Aims {background-image="logo_tm.png" background-opacity="0.2" background-size="40%" background-position="0% 0%"}

- To encourage you to consider Bayesian approaches for your analyses.
- To put STAN on your radar as a flexible and powerful tool for Bayesian inference.
- Give an overview of how STAN works.
- Provide some examples of how it can be used.

## Overview {background-image="logo_tm.png" background-opacity="0.2" background-size="40%" background-position="0% 0%"}

- Why Bayes?
- Why STAN?
- How does it work?
- Some Examples.

## The Canonical Formula {background-image="logo_tm.png" background-opacity="0.2" background-size="40%" background-position="0% 0%"}

Bayes Rule:
$$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $$

![](Thomas_Bayes3.jpg){.absolute bottom=80 left=200 height="45%"}
![](Laplace.jpg){.absolute bottom=80 right=200 height="45%"}

:::{style="margin-top: 350px"}
<center><small>Thomas Bayes (1701--1761; left) and Pierre-Simon Laplace (1749--1827; right)</small></center>
:::
## In practice

$$\underbrace{P(\boldsymbol{\theta}|x)}_{\text{posterior}}\propto \overbrace{\mathcal{L}(x|\boldsymbol{\theta})}^{\text{Likelihood}}\underbrace{P(\boldsymbol{\theta})}_{\text{prior}}$$

- $x$ is the observed data
- $\boldsymbol{\theta}$ are the parameters of interest
- Note 1: $P(x)$ is not tractable, but it is constant
- Note 2: A uniform prior results in $P(\boldsymbol{\theta}|x)\propto \mathcal{L}(x|\boldsymbol{\theta})$


## Bayesian versus Frequentist {auto-animate=true background-image="logo_tm.png" background-opacity="0.2" background-size="40%" background-position="0% 0%"}

**Frequentist:**
$$\hat{\boldsymbol{\theta}}_{\text{MLE}}=\underset{\boldsymbol{\theta}}{\text{argmax}} \;\mathcal{L}(x|\boldsymbol{\theta})$$

- The true parameter is treated as constant.
- Estimate aims to maximize the probability of observing data.

## Bayesian versus Frequentist {auto-animate=true background-image="logo_tm.png" background-opacity="0.2" background-size="40%" background-position="0% 0%"}

**Bayesian:**
$$\hat{\boldsymbol{\theta}}_{\text{MAP}}=\underset{\boldsymbol{\theta}}{\text{argmax}}  \;P(\boldsymbol{\theta}|x)$$

- The true parameter is treated as random.
- The estimate is chosen as the posterior mode.

## Benefits of Bayes over Frequentist{background-image="logo_tm.png" background-opacity="0.2" background-size="40%" background-position="0% 0%"}

- The ability to incorporate prior knowledge
- More interpretable
    - Credible intervals vs. confidence intervals
    - Estimation of probability of hypotheses
    - Resolves some of the limitations of p-values
- More flexible
    - Hierarchical models are more straightforward
    - Easier to take measurement uncertainty into account
    - Non-standard hypothesis testing by probing the posterior
    
## Drawbacks of the Bayes approach:{background-image="logo_tm.png" background-opacity="0.2" background-size="40%" background-position="0% 0%"}

- Computationally complex \& demanding
- Prior specification is very important
    - Too strong of a prior can bias results
    - Can affect tractability (remedied by conjugate priors)
- Not what people are used to seeing

## STAN{background-image="logo_tm.png" background-opacity="0.2" background-size="40%" background-position="0% 0%"}

- Probabilistic programming language for facilitating Bayesian inference 
- Has interfaces for R, Python, shell, MATLAB, Julia, and Stata
- Multithreaded and compiles down to C++
- Intuitive model specification 
  - e.g:  `y[i] ~ normal(mu,sigma);`
- Amazing [reference](https://mc-stan.org/docs/reference-manual/language.html) and [user's guide](https://mc-stan.org/docs/2_19/stan-users-guide/)

## How it works: HMC and NUTS {background-image="HAM.gif" background-opacity="0.15" background-size="50%" background-position="100% 0%"}

- Hamiltonian Markov Chain (HMC) is an alternative to Metropolis-Hastings or Gibbs Sampling
  - Allows for more efficient sampling of high-dimensional spaces
  - Analogous to [simulating a particle](https://chi-feng.github.io/mcmc-demo/app.html) moving over the posterior density
- The No U-Turns Sampler (NUTS) provides an implementation of HMC which automatically adapts the number of leap-frog steps

## Program Structure {auto-animate=true background-image="logo_tm.png" background-opacity="0.2" background-size="40%" background-position="0% 0%"}


```{stan}
data {
  // Define the inputs
}

parameters {
  // Define the outputs
}

model {
  // Define the model
}
```


## Program Structure {auto-animate=true background-image="logo_tm.png" background-opacity="0.2" background-size="40%" background-position="0% 0%"}


```{stan}
data {
  // Define the inputs
  int n;
  real y[n];
}

parameters {
  // Define the outputs
  real mu;
  real<lower=0> sigma;
}

model {
  // Define the model
  for (i in 1:n)
   y[i] ~ normal(mu,sigma);
  mu ~ normal(1.7,0.3);
  sigma ~ cauchy(0,1);
}
```