#Unit 2 Exercises: Bayesian Building Blocks

This first set of exercises focuses on conceptual understanding of the three parts of bayesian statistics we'll manipulate the most: the prior, likelihood, and posterior.

These vocabulary words will help us categorize and explain all statistical models, even models that don't fit inside the standard bayesian framework.

**Task1**:

Why do we make guesses? In other words, what is the benefit of trying to predict something we are uncertain about?

In making decisions, we often care about the outcomes of hypothetical events or events that haven't happened yet. If we can predict the outcome of these events (which are necessarily uncertain because they have not occured), then we can make better informed choices

**Task2**:

Is is it possible to make a guess/prediction without making an assumption?

If yes, then give an example of such a guess, and state whenever that guess would be useful or not.

If no, then briefly justify why we need assumptions to make predictions.

I would say that we need assumptions to make any predictions. Any information you consider in making your prediction is immediately an assumption. If you think that the sun will rise tommorow, and your reasoning is that the sun has risen every prior day, you assume that whether the sun rose in the past has some bearing of whether it will rise tommorow. You assume that the suns rise today happens by the same mechanism to extrapolate.

Without assumptions, all predictions become totally baseless, and we have no reason to believe any of them

**Task3**:

Should we use all the available information we have to make a guess/prediction? Justify your answer.

I think we should. Not all available information is directly relevant to the outcome we predict, but it is still ideally worth considering if only to decide to factor it in not at all or very little. Take the free throw percentage example. It isn't very likely that the phase of the moon effects a player's free throw percentage, but it isn't unreasonable to consider it, then decide that it isn't very important. Maybe we do find some slight relationship, and then we can use the phase of the moon as an input that is factored in very little. I think the only reason not to consider literally everything is because that is computationally impossible and resource expensive

**Task4**:

What is a prior? How are priors related to
- context?
- assumptions?
- predictions?

A pior is the probability distribution we assume before taking into account additional data or evidence.

Priors are related to assumptions in that they are assumptions themselves. When we choose a prior, we assume that distrobution, whatever it's based on, is in some way indicutive of the thing we're trying to predict now

Priors a product of context. We can often take some data set from the past, and make the (pretty reasonable) assumption that our new event will follow similar patterns to those in the past. This context of the past data allows us to make a better guess at what our current probability is

Prior help us make predictions, they give a starting "guess" for a probability distribution that we can then updatate with more information. In another sense, they are predictions themselves. We try and make our priors as close as possible to what the final distribution will be once we add the extra data, so we predict the distribution as bast as we can without that information, acknowledging that its fine to be wrong.

**Task5**:

What is a likelihood? How are likelihoods related to:

- context?
- assumptions?
- predictions?

The likelihood of an certain outcome of an event is the number of times you would expect that outcome in a given number of instances of the event.

lim_(attempts --> infinity) (outcome/attemps) = likelihood

Likelihoods using likelihoods ccan help apply context to our assumptions to get a better idea of the overall probabilities (the posterior) which is an important part of making good predictions. Specifically, we can use likelihoods as a way to adjust our prior with new information. We can multiply the likelihood of some "true probability" of some outcome of an event based on our prior by the likelihood that that "true probability" would generate the outcome we actually observed.  This grounds our assumptions back in reality and improved predictive power.

**Task6**:

What is a posterior? How are posteriors related to:

- context?
- assumptions?
- predictions?

The posterior is what it looks like once we reincorporate the context of real life data into our assumptions. We reality check our prior by accounting for the likelihood that different true probabilities it outputs produce the data we observe. We can then use this new distribution to make better predictions.

**Task7**:

Why would anyone want to define a prior and a likelihood in order to make a prediction? In other words, what's the point of using a likelihood and a prior to form a posterior?

Using a prior and a likelihood helps make our predictions better informed by letting us integrate both our prior knowledge and new data. It also lets us update our understanding of the situation well as we get new data (just keep iterating through more posteriors as more data comes in).

## Bayes' Rule Math

The following exrcises will be graded for completion, with no accuracy component. That said, correct answers below will replace mistakes in tasks 1-7.



### Mathematical Framing

In this series of exercises, we'll calculate a probability using the full version of Bayes' Rule.

The version seen in the notes, $p(θ|y) ∝ p(y|θ)p(θ)$, ignores the normalizing constant found in the full equation: $p(θ|y) = \frac{p(y|θ)p(θ)}{p(y)}$.

As stated in the notes, in practical applications we don't need to worry too much about $p(y)$, AKA the marginal likelihood, AKA the prior predictive density, AKA the normalizing constant. And when we do, we'll approximate like we do everything else.

But we these exercises are closer to theoretical abstraction, rather than practicality.

So why do them?

These exercises will hopefully help you gain additional inuition for probability, and how it behaves.

As you work through the exercises, consider $p(y)$, the  prior predictive density, and why using it to divide $p(y|θ)p(θ)$ gurantees we get a probability.

Additonaly, wonder about:
- the likelihood $p(y|θ)$
- the prior $p(θ)$,
- why multiplying the likelihood and piror (almost) gives us the posterior $p(θ|y)$.

###Problem Setting

Imagine we have a bag of red and white marbles, identical in every other way. Let's assume there are 4 total marbles, we can't see inside the bag, and when we grab a ball from the bag, we replace it and shake the bag to scramble the balls.

Additionally:

- we draw three balls in this order: red-white-red. Call these the data, $y$. Remember, we replaced the ball and shook the bag between each draw.
- we are interested in finding the true proportion of red balls in the bag, called $θ$

**Task8**:

Write out all the possible color compositions of the marbless in the bag, before we observed our data $y$.

Let each of these possible color compositions be a possible $θ$, or true proportion of red marbles.

4 red -> θ_0

3 red 1 white -> θ_1

2 red 2 white -> θ_2

1 red 3 white -> θ_3

4 white -> θ_4

**Task9**:

Which color compositions are possible after seeing the data $y$?

θ_0 and θ_4 are impossible as we see both colors drawn from the bag

**Task10**:

How many ways can you select red-white-red, assuming that there are 2 red marbles and 2 white marbles?

0   1   2   3

r1 r2 w1 w2

8 different ways (2 options for each draw (r1,2 w1,2 r1,2)

**Task11**:

How many different ways can you select three balls so that order matters, given that there are 2 red marbles and 2 white marbles?

4 * 4 * 4 = 64 marbles (4 choices for each draw)

**Task12**:

What's the probablity you select red-white-red, given that there are 2 red marbles and 2 white marbles?

Stated differently, Find the likelihood $p(y|θ)$, where $θ=RRWW$

(1/2) * (1/2) * (1/2) = 1/8

Because we replace the marbles after each pull, there is always a 1/2 chance of getting the color we want.

**Task13**:

Find:

- $p(y|WWWW)$
- $p(y|RWWW)$
- $p(y|RRWW)$
- $p(y|RRRW)$
- $p(y|RRRR)$

p(y|WWWW) = 0 (0/4 chance of red pull)

p(y|RWWW) = (1/4) * (3/4) * (1/4) = 3/64

p(y|RRWW) = (1/2) * (1/2) * (1/2) = 1/8

p(y|RRRW) = (3/4) * (1/4) * (3/4) = 9/64

p(y|RRRR) = (4/4) * (0/4) * (4/4) = 0/64

**Task14**:

Find the probablity of getting red-white-red, $p(y)$

Since we don't have a real prior, I'm gonna assume that all of the potential compositions of the bag are equally likely. Then, we can take the average of all the likelihoods.

$$p(y) =  \frac{1}{5} × (0 +\frac{3}{64}+\frac{1}{8}+\frac{9}{64}+0)$$
$$p(y) = \frac{1}{16}$$

**Task15**:

Given that all color compositions are equally likely, and after observing  a draw of red-white-red, find the probability that there are two red marbles and two white marbles in the bag.

In other words, find $p(θ|y)$, where $θ=RRWW$.

$$p(\theta = RRWW|y) ∝ p(y|\theta)p(\theta)=\frac{1}{8}\times\frac{1}{5} = \frac{1}{40}$$

$$p(\theta|y)\times p(!\theta|y) = 1$$

$$p(!\theta|y) ∝ p(y|!\theta)p(!\theta)=\frac{3}{64}\times\frac{4}{5} = \frac{3}{80}$$

$$\frac{1}{40}+\frac{3}{80}=\frac{1}{16}$$

$$p(\theta|y)=\frac{1}{40}\times\frac{1}{16}=\frac{2}{5}$$

**Task16**:

Story time: The marble factory produces bags of four marbles. They want to make red marbles rare, so that people will get excited about them.  Therefore, for each 1 bag containing four red, they made 2 that contain three red, 3 that contain two red, 4 that contain one red, and 5 that contain zero red.

With this new prior information, find $p(θ|y)$, where $θ=RRWW$.

This new prior information does not actually change our probabilities at all. We are told the $\frac{3}{15}=\frac{1}{5}$ bags will be RRWW, but our old assumption also told us $\frac{1}{5}$ bags will be RRWW, so we can do the same math as above and get a probability of $\frac{2}{5}$

**Task17**:

Write down similarities and differences between this marble example, and the VIctor Wembanyama FT example.

Both problems involve trying to find a true proportion (free throw percentage or red marbles), and in both situations we base new likelihoods on observations. However, the examples are different in that in the marble example we are told the probability distribution of our possible Θ values, while in the free throw example we make an estimation using the beta function. Also, the marble example is discrete while the free throw example is continuous.