(econmt-bayesian)=
# Bayesian Inference

In this chapter, we'll look at how to perform analysis and regressions using Bayesian techniques.

Let's import a few of the packages we'll need first. Two key packages that we'll be using that you might not have seen before are [**pymc3**](https://docs.pymc.io/), a Bayesian inference package, and [**Bambi**](https://bambinos.github.io/), which stands for *BAyesian Model-Building Interface*. We'll also use [**arviz**](https://arviz-devs.github.io/) for visualisation some Bayesian inference results but this will get installed when you intall **pymc3**. You should follow the install instructions for these carefully and, if you're confident with using different Python environments, it's a good idea to spin up a new 'bayes' environment to try them out in. The chapter on {ref}`code-preliminaries` covers basic information on how to install new packages.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings
import pymc3 as pm
import arviz as az

# Plot settings
plt.style.use(
    "https://github.com/aeturrell/coding-for-economists/raw/main/plot_style.txt"
)
az.style.use("arviz-darkgrid")

# Pandas: Set max rows displayed for readability
pd.set_option("display.max_rows", 6)

# Set seed for random numbers
seed_for_prng = 78557
prng = np.random.default_rng(seed_for_prng)  # prng=probabilistic random number generator
# Turn off warnings
warnings.filterwarnings('ignore')

## Introduction

The biggest difference between the Bayesian and frequentist approaches (that we've seen in the other chapters) is probably that, in Bayesian models, the parameters are not assumed to be fixed but instead are treated as random variables whose uncertainty is described using probability distributions. The data are considered fixed. You might see the 'inverse probabiliy' formulation of a Bayesian model written as $p(\theta | y)$ where the $y$ are the data, and the $\theta$ are the model parameters. An interesting aspect of Bayesianism is that there is just one estimator: Bayes' theorem.

This is a contrast with the frequentist view, which holds that the data are random but the model parameters are fixed, and models often expressed as functions, for example $f(y | \theta)$. Frequentist inference typically involves deriving estimators for the model parameters, and these are usually created to minimise the bias, minimise the variance, or maximise the efficiency.

As a reminder, Bayes' theorem says that ${\displaystyle P(A\mid B)={\frac {P(B\mid A)P(A)}{P(B)}}}$, where $A$ and $B$ are distinct events and $P(A)$ is the probability of event A happening. When dealing with data and model parameters, and ignoring a rescaling factor, this can be written as:

$$
p({\boldsymbol{\theta }}|{\boldsymbol{y}})\propto p({\boldsymbol{y}}|{\boldsymbol{\theta }})p({\boldsymbol{\theta }}).
$$

In these equations:

1. $p({\boldsymbol{\theta}})$ is the prior put on model parameters—what we think the distribution will look like.
2. $p({\boldsymbol{y}}|{\boldsymbol{\theta }})$ is the likelihood of this data given a particular set of model parameters.
3. $p({\boldsymbol{\theta }}|{\boldsymbol{y}})$ is the posterior probability of those model parameters given the observed data.

Bayesian modelling proceeds as highlighted in the review article, *Bayesian statistics and modelling* {cite}`van2021bayesian`:

![The Bayesian research cycle.](https://media.springernature.com/lw685/springer-static/image/art%3A10.1038%2Fs43586-020-00001-2/MediaObjects/43586_2020_1_Fig1_HTML.png?as=webp)

One key strength of the Bayesian approach is that it preserves uncertainty—by construction, it includes the degree of belief you have in a parameter. This makes it especially useful in cases where uncertainty is important. One disadvantage of the Bayesian approach is that it's not always as fast.


## Bayesian Modelling: A Simple Example

We're going to set up a very simple

In [None]:
# True parameter values
alpha_true, beta_true, gamma_true = 1, 2.5, 1.5

# Size of dataset
size = 100

# Predictor variable
X1 = prng.standard_normal(size)

# Simulate outcome variable
Y = alpha_true + beta_true * X1 + prng.standard_normal(size) * gamma_true

with pm.Model() as linear_model:
    # Priors for unknown model parameters
    alpha = pm.Normal("alpha", mu=0, sigma=10)
    beta = pm.Normal("beta", mu=0, sigma=10)
    gamma = pm.HalfNormal("sigma", sigma=1)

    # Expected value of outcome
    mu = alpha + beta * X1

    # Likelihood (sampling distribution) of observations
    Y_obs = pm.Normal("Y_obs", mu=mu, sigma=gamma, observed=Y)

    trace = pm.sample(1000, return_inferencedata=True)

In [None]:
az.plot_trace(trace)

In [None]:
az.summary(trace, round_to=2)

In [None]:
type(trace)

In [None]:
trace

In [None]:
pm_data = az.from_pymc3(
        prior=pm.sample_prior_predictive(model=linear_model),
        posterior_predictive=pm.sample_posterior_predictive(trace, model=linear_model),
        model=linear_model,
    )

In [None]:
pm_data.extend(trace)

In [None]:
pm.model_to_graphviz(linear_model)

In [None]:
az.plot_dist_comparison(pm_data, var_names=["sigma"])

In [None]:
az.plot_ppc(pm_data, group="posterior", figsize=(12, 6))