# **Class 1: in Bayesian Econometrics**

## **Motivation**

This is the first in a series of notebooks which document my journey learning Bayesian econometrics from Gary Koop's 2003 "Bayesian Econometrics". It is intended for people who, like me, have zero background in Bayesian statistics but have a background in statistics/econometrics. The point of this is not to rewrite the textbook, but to transparently show how I slowly began to understand things. My hope is that I can closely track the difficulties other new students are likely to encounter, and that the reader can learn from them. The reason I figured I had to learn Bayesian statistics is that Bayesian techniques yield practical advantages when applied to models many economists care about (e.g. dynamic macro models or state space time series models), though I do not understand how or why. Some Bayesian advantages include:

- Less over-fitting in small samples
- Can help avoid computational issues traditional (frequentist) methods run into in non-linear contexts
- More explicit treatment of parameteruncertainty
- Use nice properties of simulation methods, like Monte Carlo methods.

I leverage the power of python notebooks to do an important part of Bayesian econometrics---computation. A large goal of this series is therefore to help new students learn to use powerful Bayesian tools in python, chiefly `pyMC`.

My goal is to go over the necessary Bayesian theory first and to get to applications as soon as possible. However, I am a firm believer that economists should not use techniques whose statistical properties they do not understand. So, I plan to give the theory a thorough treatment. If the reader struggles with this, I hope they may find comfort in knowing that I did too.

Happy learning.

## **A Bayesian Interpretation of Probability**

The traditional interpretation of probability is that it represents a long-run frequency. This school of statitical thought is *frequentist* statistics. For example, the probability of flipping a coin on heads is the number of times it lands on heads divided by the number of times as you flip the coin (often called trials) as the number of trials approaches infinity. This emphasises the objective nature of probability; that probability is a byproduct of the system being studied.

An alternative approach to statistics is the *Bayesian* approach. It interprets probability as a subjective degree of belief of an event given current information. The task then becomes updating ones beliefs as new information is observed.

However, Bayesian statistics does not discard any of the rules of probability. The fundamental axioms of probability still hold (i.e. the set-theoretic foundations laid down by Kolmogorov). From an econometric point of view, the goal is still the same. We want to use observed data to learn about parameters, models, and make out-of-sample predictions.

## **Some Bayesian Basics**

This section covers some fundamental Bayesian basics. Its main purpose is to familiarise oneself with the objects of interest in Bayesian statistics.

The fundamental object in Bayesian statistics is $P (\boldsymbol{\theta}_i | \boldsymbol{Y}, M_i )$, the probability distribution (a measure of belief) over parameters given both the observed data and model. It is known as the *posterior distribution*. Some notation:
- $M_i$ is a model which relates the data and the parameters
- $\boldsymbol{Y}$ is a matrix (or vector) of data
- $\boldsymbol{\theta}_i$ is a vector of parameters of the model $M_i, \forall i \in \{1, \cdots, m\}$.

I found the concept of $M_i$ quite confusing. In the Bayesian world a model is a pair of objects:

1. A family of probability distributions $P(\boldsymbol{\theta}_i | M_i, \boldsymbol{Y}) $ over possible parameters $\boldsymbol{\theta}_i \in \boldsymbol{\Theta} $
2. A prior distribution over the model parameters given the model, $P( \boldsymbol{\theta}_i | M_i) $.

The key aim in Bayesian statistics is to learn about the *posterior distribution* from the other objects. Unsurprisingly, Bayes' rule is at the heart of this:

$$P(\boldsymbol{\theta}_i | M_i, \boldsymbol{Y})  = \frac{ P(\boldsymbol{Y}|\boldsymbol{\theta}_i, M_i) P(\boldsymbol{\theta}_i | M_i) }{P(\boldsymbol{Y}|M_i)}$$

So, using Bayes' rule we can update our beliefs about the model parameters.

Some more notation:
- $P(\boldsymbol{Y}|\boldsymbol{\theta}_i, M_i)$ is a *likelihood function*, a familiar object!
- $P(\boldsymbol{Y}|M_i)$ is called the *marginal likelihood*, it ensures a valid probability distribution

The *marginal likelihood* can be thought as a kind of summation of another object. More precisely,

$$ P(\boldsymbol{Y}|M_i)= \int_{\theta_i \in \boldsymbol{\Theta}} P(\boldsymbol{Y} | \boldsymbol{\theta}_i, M_i) P(\boldsymbol{\theta_i}|M_i)d\theta_i$$

Often, it is hard to estimate $P(\boldsymbol{Y})$

It can be computed using the *prior distribution* and the *likelihood function*.

Another object we care about is called the *posterior model probability*. That is, $P(M_i|\boldsymbol{Y})$, the probability of a model being the true one given the data. Believe it or not, but it, too, can be recast using Bayes' rule:

$$P(M_i|\boldsymbol{Y})=\frac{P(\boldsymbol{Y}|M_i)P(M_i)}{P(\boldsymbol{Y})}$$

Because the unconditional probability of the data $P(\boldsymbol{Y})$ is hard to compute, often the *posterior odds ratio* is used to compare models. That is:

$$\text{PO}_{i,j}=\frac{P(M_i|\boldsymbol{Y})}{P(M_j|\boldsymbol{Y})}=\frac{P(\boldsymbol{Y}|M_i)P(\boldsymbol{M}_i)}{P(\boldsymbol{Y}|M_j)P(\boldsymbol{M}_j)}$$

The posterior odds, depending on whether we assume that the collection of models considered are exhaustive, can be used to compute the *posterior model probability*.

Suppose that one assumes that the *prior odds ratio* is unity, that is $\frac{P(M_i)}{P(M_j)}=1$, then the Bayes Factor of two models can be defined as:

$$\text{BF}_{i,j} = \frac{P(M_i|\boldsymbol{Y})}{P(M_j|\boldsymbol{Y})} $$

Finally, say $\boldsymbol{Y}^*$ is some unobserved data we want to predict, then:

$$P(\boldsymbol{Y}^*|\boldsymbol{Y}) = \int_{\boldsymbol\theta \in \boldsymbol\Theta } \underbrace{P(\boldsymbol{Y}^*, \boldsymbol\theta | \boldsymbol Y )}_{P(\boldsymbol{Y}^*|\boldsymbol\theta, \boldsymbol{Y}) P(\boldsymbol\theta | \boldsymbol{Y})} d\boldsymbol\theta $$

According to Koop, one of the main advantages of the Bayesian framework is its generality. He says there are always familiar steps:

1. Choose a *prior distribution* and *likelihood function*
2. Obtain the *posterior distribution*
3. Do whatever statistical inference you like!

One of the biggest critiques of Bayesian methods surrounds the *prior distribution*. People say that it admits arbitrariness. I will keep this in mind as I proceed.