# Section 1: Bayesian refresher and introduction to ArviZ

_What does an end-to-end Bayesian workflow look like?_

## Learning Objectives
* Refresh our understanding of Bayes' Theorem
* Fit a small binomial model
* Show how a full statistical workflow, even outside of Bayesian methods, requires more steps more than just model fitting
* Introduce ArviZ

## Bayes' Theorem

### The most common formulation

$$
\Large
P(\theta \mid y) = \frac{ P(y \mid \theta)p(\theta)}{p(y)}
$$

This comes from a simple rearranging of terms for joint probabilities:

$$
P(\theta, y) = P(\theta)P(y | \theta) = P(y)P(\theta | y)
$$

This formula becomes interesting when we interpret $y$ as _data_ and $\theta$ as _parameters_ for a model. 

### Breaking it down
####  $P(\theta)$ -> Prior
_"What is the probability of parameters given no observations"_  
Before we've observed any data what is a plausible probability distribution of parameters? This may come from physical constraints (temperatures are above 0 Kelvin), or domain expertise (high temperatures in Austin in summer are between 80 and 110).

####  $P(y \mid \theta )$ -> Likelihood
_"What is the probability of the observed data given a model parameter value"_  

Likelihood functions tell us how "likely" the observed data is, for all the possible parameter values. Likelihoods perform roughly the same role as loss functions from "machine learning": evaluating how "good" of a set of model parameters are at explaining the data. Indeed, many common loss functions are derived from likelihoods.

####  $P(\theta \mid y)$ -> Posterior distribution
_"What is the distribution of parameters given the observed data?"_  

After obtaining data, or making observations, what is our belief regarding the parameters of the underlying statistical model? 

* Estimating the posterior distribution is the goal of Bayesian analysis. 
* The process of estimating the posterior distribution often referred to as **Inference**
* There are numerous ways to perform inference, [each with their own pros and cons](http://canyon289.github.io/pages/InferenceCheatsheet.html)
    * In this tutorial we will only be using Markov Chain Monte Carlo (MCMC)

####  $P(y)$ -> Marginal Probability of Evidence
_"What is the probability distribution of data?_

In most cases this term is difficult or impossible to calculate, so much so that most inference techniques cleverly get around their calculations. MCMC is one of those techniques


## Alternative formulations

### Likelihood notation
I particularly like this formulation because clearly demarcates difference between Likelihood and probability terms  

$$ P(\theta | y) = \frac{ L(\theta | y)p(\theta)}{p(y)} $$

### Defined as a proportion
While the posterior, likelihood, and prior are usually *distributions*, the denominator is a scalar that normalizes the numerator. In many modern Bayesian Inference Methodswe try to avoid calculating this

$$ P(\theta | y) \propto P(y | \theta)p(\theta) $$

### Defined with puppies
Even if you hate math, you'd have to be a monster to hate puppies. This pictorial formula is taken from  John Kruschke's excellent book [Doing Bayesian Data Analysis](https://www.amazon.com/Doing-Bayesian-Data-Analysis-Tutorial/dp/0124058884) Do note the lazy puppy on the right. The laziness is an indication of how little work this puppy does in most Bayesian Inference methods.  
![BayesianPuppies](../../img/Doing-DBA.png)
