# Lecture 1: What does it mean to be Bayesian?

My objective in this course is to get you to a position where you have the ability to 
1. Contrast classical and Bayesian thinking
2. Estimate models using Bayesian methods
2. Create original pieces of research using Bayesian methods

This lecture is aimed at the first goal. 

By the end of this lecture we will have (hopefully) completed the first goal. To that end, we will
1. Recap how to compute and interpret probabilities and contrast Bayesian and frequentist interpretations of probability
2. Develop an understanding of Bayesian econometric theory
3. Apply the theory in an experiment

## Motivating questions
Answer the following questions:
1. Suppose I pull a coin out of my pocket, flip it and catch it. What is the probability that the outcome (i.e. Heads or Tails) is Heads? Explain your reasoning. 
2. Now suppose that I look at the outcome, but don't show it to you. What is the probability that I am observing Heads? Explain your reasoning. 
3. What are the probabilities of Donald Trump and Joe Biden respectively winning the 2020 US election? 
4. Suppose we estimate that the return to an additional year of education is, on average, 7% for which [5%,10%] represents a 95% interval estimate around this value. What is the probability that this interval estimate contains the true population value?

### What do your answers suggest about your thought process?
1. Under the assumption that the coin is fair, the correct answer from either a Bayesian or frequentist view is 1/2. We will soon learn that the difference between the two approaches will depend on our reasoning. If you reason that the probability is 1/2 because it represents the expected long-run frequency of observing Heads if I repeatedly tossed the coin, then you're thinking like frequentist. This is because frequentists view probabilities as representing the long-run frequency of an outcome in a large set of repeated experiments. In constrast, if you reasoned that the probability is 1/2 based on your past experience in probability classes, discussions with others, reading book etc, then you're thinking like a Bayesian. This is because Bayesians use their existing knowledge about the world to establish a *prior belief* about the probability, they then update this belief after seeing evidence (in this case outcomes of the coin flip). 

2. If you answered 1/2, or any other value in $[0,1]$, then you are thinking like a Bayesian. In contrast, if you answered either 0 or 1, then it depends on your reasoning. This is because Bayesians use probabilities to represent their uncertainty about outcomes, in this case about me observing Heads. Thus, any probability in $(0,1)$ is in line with a Bayesian interpretation of uncertainty about the outcome, while a probability of 0 or 1 with the reasoning that it represents absolute certainty about the outcome, is also in line with Bayesian thought. In contrast, to a frequentist, the outcome of the experiment has already been determined and I'm either looking at a Heads (probability = 1) or I'm not (probability = 0). Notice in this case, that the Bayesian and frequentists probability estimates can be the same (i.e. values of 0 or 1). This often occurs in practice.

3. If you answered this question with any probability, then you are thinking like a Bayesian. This is because you have used your existing knowledge about the the world to infer a probability. In contrast, if you stated that we can't assign a probability to this question then you are thinking like a frequentist. This is because frequentists only assign probabilities to repeatable events, while Bayesians can assign probabilities to either repeatable or non-repeatable events.

4. If you answered 95%, then you are thinking like a Bayesian. This is because Bayesians view population parameters as random variables and can therefore use probabilities to represent their degree of uncertainty about them.  In contrast, if you answered either 0 or 1, then you are thinking like a frequentist. This is because frequentists view population population parameters as predetermined fixed quantities and consequently don't assign probabilities to them. Note that a common interpretation error among users of frequentist methods is that a *confidence interval* expresses the probability that the population parameter lies within the interval. Instead, the correct interpretation is that if the experiment where repeated over and over again, then 95% of the confidence intervals generated will cover the population parameter. 


## What is a probability?
Recall that the *sample space* S of an experiment is the set of all possible outcomes, and an *event* $E$ is a subset of the sample space $S$. 

**Example:** 
    Experiment: flip a coin and record the result, i.e. Heads (H) or Tails (T). 
    Sample space: $S=\{H,T\}$
    Event: observed Heads, i.e. $E=\{H\}$

The earliest definition of the probability of an event was to count the number of ways that the event could happen and divide by the total number of possible outcomes for the experiment. Written as an equation we have
$P(E)=\frac{\text{number of outcomes favorable to } E}{\text{total number of outcomes in } S}$
This definition is formally known as the *naive definition of probability*.

**Example:** According to the naive definition, the probability that a coin flip will show Heads is 0.5. This is computed as follows: First compute that the sample space $S=\{H,T\}$ has two outcomes. Next, compute that the event of showing Heads $E=\{H\}$ has one outcome. Now use the definition to compute that the probability a flip will show heads is 1/2=0.5.

The naive definition is very restrictive in that it requires the sample space to be finite, and the outcomes to be equally likely. This is fine in some cases, e.g. our coin flip example, but is not applicable to many others. For instance, consider an experiment in which we roll a six sided die. In this case, the sample space is S={1,2,3,4,5,6} and by applying the naive definition of probability we can compute that the probability of landing any single outcome is 1/6. This is fine. Suppose, however, that the die was previously loaded with a weight so that the probability of rolling a one is 1/3, and the probability of landing any other single outcome is equal, i.e. 2/15. There is no way of computing such probabilities using the naive definition.

For this reason, a *general definition of probability* was proposed that relies on the notation of a *probability space*. A probability space consists of a sample space S and a probability function $P:E\subset S\to [0,1]$ (i.e. $P$ takes an event $E\subset S$ as an input and returns a real number in the interval $[0,1]$ as an output) which satisfies the following axioms:
1. $P(\emptyset) = 0 $ - The probability of nothing occurring (AKA a null event) is zero
2. $P(S) = 1 $ - The probability of the sample space occurring is one
3. If $E_i\cap E_j = \emptyset$ for $i\neq j$, then $P(\cup_{i=1}^{\infty}E_i) = \sum_{i=1}^{\infty}P(E_i)$ - The probability of one or more mutually exclusive events occurring (AKA disjoint sets) is given by the sum of their probabilities.

Unlike in the naive definition of probability, the general definition allows for a countably infinite number of outcomes in which each outcome may have a different probability of occurrence. Moreover, by applying logical arguments to these three axioms, we can derived the general rules of probability e.g. complements, unions, intersections etc. 

Further reading for those interested:
1. [A Short History of Probability](http://homepages.wmich.edu/~mackey/Teaching/145/probHist.html)

## Two interpretations of probability
What do we mean when we say that the probability that a coin flip will show Heads is equal to 1/2?

While the general definition of probability tells us what a probability function is, it doesn't tell us how probabilities should be interpreted. Two main schools of thought exist: 
1. The *frequentist* view of probability is that it represents a *long-run frequency* over a large number of repetitions of an experiment. This means that if a frequentist says that a coin flip has probability of showing Heads equal to 1/2, then they mean that: *if the coin were flipped it over and over again and the result recorded, then the coin will land Heads 50% of the time*. 
2. The *Bayesian* view of probability is that it represents a *degree of belief* about the event in question. This means that if a Bayesian says that a coin flip has probability of showing Heads equal to 1/2, then they mean that: *if the coin is flipped once, then I believe that there's a 50% chance that it shows Heads*.

**Remarks**: 
1. Regardless of which perspective we take, the general rules of probability e.g. complements, unions, intersections etc remain the same. 
2. Both the Bayesian and frequentist perspectives are non-falsifiable. Which school of thought you choose to subscribe to is therefore a matter of personal preference. In this course we will adopt a Bayesian perspective and draw comparisons with the frequentist view along the way.
3. Frequentists often argue that the concept of individual *degree of belief* is problematic, because people may have differing degrees of belief about different hypothesis, and science is about finding the correct answers to those hypothesis. The first point to note is that the Bayesian perspective of probability is *subjective* in the sense that it requires people to form a *prior belief* about an unknown quantity before seeing any results from an experiment. E.g. formulate the probability of the coin flip showing Heads, before seeing any outcomes from flipping the coin. Clearly this prior belief might be "wrong" in the sense that it might differ from the *true probability*. The second point to note, as we will see later in the lecture, however, Bayesian thinking does not stop here. Instead, we conduct experiments to gain evidence for or against the hypothesis (just like frequentists) and update our prior beliefs in a manner that is consistent with the rules of probability. This means that while a researcher might begin with a subjective *a priori* belief about an unknown quantity before seeing the data, they will then update their beliefs after seeing the data, and these *a posteriori* beliefs about the unknown quantity of interest will generally converge to the "truth". Thus, while the Bayesian interpretation of probability is fundamentally subjective, the process of inference is objective in the sense that it is consistent with the *likelihood principle* - the proposition that all relevant information about an unknown quantity obtained from an experiment is contained in the likelihood function. In fact, at the end of this lecture we will see that the Bayesian and frequentist methods often provide the same point estimates of unknown quantities. In such cases, the difference is in how we interpret the estimates. Finally, it's important to note that the frequentists do have a valid point. If Bayesian analysis is based on a *dogmatic prior belief* - in the sense that the person is not willing to update their belief in the face of evidence - and this prior is far away from the truth, then this will inevitably skew the results away from the truth. Fortunately, scientists know better than to stop here. Others will repeat the analysis with non-dogmatic priors and see if the results are robust. If they are not, then they will claim the initial analysis is wrong. Others will then repeat the experiment it again, and again, and confidence will grow  they have settled on the true outcome. This is *Bayesian updating* in practice...we start with no knowledge, make some observations, update our knowledge, and repeat.


Further reading for those interested:
1. A brief history of Bayesian thinking, titled: [When Did Bayesian Inference Become "Bayesian"?](https://projecteuclid.org/download/pdf_1/euclid.ba/1340371071), by Professor Stephen E. Fienberg
2. [Bayesian Epistemology](https://plato.stanford.edu/entries/epistemology-bayesian/) in the Stanford Encyclopedia of Philosophy

## Practical implications
While the distinction between frequentist and Bayesian interpretations about probability might seem abstract and irrelevant from the perspective of a pragmatic researcher, there are major practical implications for the way in which each school does statistics, and therefore econometrics. Three of the most important differences are:

1. Both Bayesians and frequenstsis believe that there is some true data generating process (DGP) that governs the outcomes of an observed sample. However, beliefs differ over whether the parameters or data are random. Frequentists believe that the parameters are non-random values and that the data is random. In contrast, Bayesian's treat all unknown quantities, e.g. population parameters, as random variables and all known quantities, e.g. observed data, as fixed. An implication of this difference is that Bayesians assign probability statements to the unknown quantities, e.g. population parameters, while frequentists assign probabilities to constructed functions of the data, e.g. estimators.

**Example**: If we toss a coin and think about the probability of showing Heads, then a Bayesian will view the unknown probability as a random variable, while a frequentist will view it as a number. A 95% Bayesian interval estimate around a point estimate, known as a *credible interval*, is interpreted by saying that: there is a 95% probability that the true value would lie within the interval, given the evidence provided by the observed data. In contrast, an analagous interval estimate from a frequentist perspective, known as a *confidence interval*, is interpreted by saying that: there is a 95% probability that the random interval (computed from the random data) contains the true value.

**Remark**: While the [theory underlying confidence intervals](https://pdfs.semanticscholar.org/6281/be0dff2f86781fcda53f8d5263cb98000797.pdf) is sound, in practice, many people misinterpret them as Bayesian credible intervals. To avoid making this mistake ever again, remember that frequentists only place probabilities on data or functions of the data, e.g. the confidence interval, and never on hypotheses that involve non-random values, e.g. the population parameter.

![](Cartoon_bvf.png)
Source: [Agoston Torok's discussion of Bayesianism vs Frequentism](https://agostontorok.github.io/2017/03/26/bayes_vs_frequentist/)


2. Since Frequentists define probability on the basis of a long-run frequency occurrence, they only assign probabilities to repeatable random events. In contrast, Bayesian's can assign probabilities to either repeatable or non-repeatable events. 

**Example**: A political scientist may be interested in answering the question: What is the probability that Donald Trump will win the 2020 US election? Since the 2020 election is not a repeatable event, a frequentist can not answer this question, however a Bayesian can use evidence from related sources, e.g. polls, to estimate the probability. This is one of the primary reasons that Bayesian statistics is popular among researchers in a range of disciplines including business analytics, finance, machine learning and the social sciences.

3. As we will see in the next section, all Bayesian statistical notions: estimation, inference, prediction stem from Bayes theorem. In contrast, frequentist's have distinct methods for each.

Further reading for those interested:
1. [Frequentist and Subjectivist Perspectives on the Problems of Model Building in Economics](https://www.jstor.org/stable/pdf/1942744.pdf?refreqid=excelsior%3Aaa23cccaa6b6092cb8e65d0c0d714888), by Dale J. Poirier
2. [Bayesian Methods in Applied Econometrics, or, Why Econometrics Should Always and Everywhere Be Bayesian](http://sims.princeton.edu/yftp/EmetSoc607/AppliedBayes.pdf), by Professor Chris Sims
3. [Why isn't everyone a Bayesian?](https://www.jstor.org/stable/pdf/2683105.pdf?refreqid=excelsior%3Ace3dc05a001f7fa7f16220451838ea89), by Professor Brad Efron
4. [Objections to Bayesian statistics](https://projecteuclid.org/download/pdf_1/euclid.ba/1340370429), by Professor Andrew Gelman

![](Image_cartoon.png)
Source: [RevBayes](https://twitter.com/revbayes/status/514231641300955137)

<!--
## Homework Problems:
1. Exercises 1.1 and 1.2 of [Bayesian Econometric Methods (Econometric Exercises)](https://www.amazon.com/Bayesian-Econometric-Methods-Exercises/dp/0521671736)
-->