<a href="https://colab.research.google.com/github/AlyW8/Data-Science-Exercises/blob/main/Unit2ExercisesSF.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Unit 2 Exercises: Bayesian Building Blocks

This first set of exercises focuses on conceptual understanding of the three parts of bayesian statistics we'll manipulate the most: the prior, likelihood, and posterior.

These vocabulary words will help us categorize and explain all statistical models, even models that don't fit inside the standard bayesian framework.

**Task1**:

Why do we make guesses? In other words, what is the benefit of trying to predict something we are uncertain about?

We make guesses to try and make plans when facing an uncertain future

**Task2**:

Is is it possible to make a guess/prediction without making an assumption?

If yes, then give an example of such a guess, and state whenever that guess would be useful or not.

If no, then briefly justify why we need assumptions to make predictions.

No, we have to at least assume that the future will follow the pattern of our past data to use the past data to predict the future.

**Task3**:

Should we use all the available information we have to make a guess/prediction? Justify your answer.

We should use all relevant, unbiased data to make a prediction. That way, we aren't biasing our data by selecting specific data points or contaminate our data set with biased data.

**Task4**:

What is a prior? How are priors related to
- context?
- assumptions?
- predictions?

A prior distribution is a distribution made based off of a piece of data (called the prior) mapping out what you think the probability distribution might look like.
We make assumptions based on the context when we make a prior distribution. We can assume that the data is exactly indicative of the true probability and use a beta distribution, or assume that the data is entirely unrelated to the true probability and use a uniform distribution, assuming that every potential theta is equally likely to be true.
The prior distribution can be used to predict what theta, the true probability, might be.

**Task5**:

What is a likelihood? How are likelihoods related to:

- context?
- assumptions?
- predictions?

A likelihood is the probability of achieving a specific data set given the theta (the truth). It can be used to determine if the theta is reasonably likely based on the probability it would give the observed data set.
We have to assume that the observed data set is under the same conditions as the data that formed the predicted theta, or that the true probability is the same for both of them. We use the likelihood to determine if predictions of theta are reasonable.
We use the prior and the likelihood to generate the posterior.

**Task6**:

What is a posterior? How are posteriors related to:

- context?
- assumptions?
- predictions?

The posterior is the probability of the true probability being true given the observed data set. It's the probability of the likelihood and prior happening at once, or the probability that our proposed theta is right. It represents variation in gathering data sets (and a lot more)??

**Task7**:

Why would anyone want to define a prior and a likelihood in order to make a prediction? In other words, what's the point of using a likelihood and a prior to form a posterior?

We can know how likely it is that our estimation of the truth that we got from the prior is the true theta, or close to it. We have to know both the prior and the likelihood in order to calculate the posterior.

## Bayes' Rule Math


---



The following exercises will be graded for completion, with no accuracy component. That said, correct answers below will replace mistakes in tasks 1-7.



### Mathematical Framing

In this series of exercises, we'll calculate a probability using the full version of Bayes' Rule.

The version seen in the notes, $p(θ|y) ∝ p(y|θ)p(θ)$, ignores the normalizing constant found in the full equation: $p(θ|y) = \frac{p(y|θ)p(θ)}{p(y)}$.

As stated in the notes, in practical applications we don't need to worry too much about $p(y)$, AKA the marginal likelihood, AKA the prior predictive density, AKA the normalizing constant. And when we do, we'll approximate like we do everything else.

The following exercises are closer to theoretical abstraction, rather than practicality.

So why do them?

These exercises will (hopefully) help you gain additional intuition for probability, and how it behaves.

As you work through the exercises, consider $p(y)$, the  prior predictive density, and why using it to divide $p(y|θ)p(θ)$ gurantees that we get a probability.

Additonaly, wonder about:
- the likelihood $p(y|θ)$
- the prior $p(θ)$,
- why multiplying the likelihood and piror *almost* gives us the posterior $p(θ|y)$.

###Problem Setting

Imagine we have a bag of red and white marbles, identical in every other way. Each individual marble is either entirely white or entirely red.

Let's assume there are 4 total marbles, that we can't see inside the bag, and when we grab a ball from the bag, we replace it and shake the bag to scramble the balls.

Additionally:

- we draw three balls in this order: red-white-red. Call these the data, $y$. Remember, we replaced the ball and shook the bag between each draw.
- we are interested in finding the true proportion of red balls in the bag, called $θ$

**Task8**:

Write out all the possible color compositions of the marbles in the bag, before we observed our data $y$ (which are the marbles drawn in the order of red-white-red).

Let each of these possible color compositions be a possible $θ$, or true proportion of red marbles.

RRRR

RRRW

RRWW

RWWW

WWWW

**Task9**:

Which color compositions are possible after seeing the data $y$?

RRRW (3/4)

RRWW (2/4)

RWWW (1/4)

**Task10**:

How many ways can you select red-white-red, assuming that there are 2 red marbles and 2 white marbles?

8

**Task11**:

How many different ways can you select three balls so that order matters, given that there are 2 red marbles and 2 white marbles?

64

**Task12**:

What's the probablity you select red-white-red, given that there are 2 red marbles and 2 white marbles?

Stated differently,

Find the likelihood $p(y|θ)$, where $θ=RRWW$

8/64

**Task13**

If--before seeing the data--all color compostions are equally likely,

then what is $p(\theta)$, if $\theta = RRWW$?

1/5

**Task14**:

Find:

- $p(y|WWWW)$
- $p(y|RWWW)$
- $p(y|RRWW)$
- $p(y|RRRW)$
- $p(y|RRRR)$

p(y|WWWW) = 0

p(y|RWWW) = 1/4 * 3/4 * 1/4 = 3/64

p(y|RRWW) = 2/4 * 2/4 * 2/4 = 8/64

p(y|RRRW) = 3/4 * 1/4 * 3/4 = 9/64

p(y|RRRR) = 0

**Task15**

Assume that each color compostions is equally likely before seeing the data.

Find:

- $p(y|WWWW)p(WWWW)$
- $p(y|RWWW)p(RWWW)$
- $p(y|RRWW)p(RRWW)$
- $p(y|RRRW)p(RRRW)$
- $p(y|RRRR)p(RRRR)$


p(y|WWWW)p(WWWW) = 0

p(y|RWWW)p(RWWW) = 3/64 * 1/5 = 3/320

p(y|RRWW)p(RRWW) = 8/64 * 1/5 = 8/320

p(y|RRRW)p(RRRW) = 9/64 * 1/5 = 9/320

p(y|RRRR)p(RRRR) = 0


**Task16**:

Find the probablity of getting red-white-red, $p(y)$, given each possible color combination is equally likely.

3/320 + 8/320 + 9/320 = 20/320

**Task17**:

After observing a draw of red-white-red, find the probability that there are two red marbles and two white marbles in the bag. Assume that all color compositions were equally likely before the draw.

In other words, find $p(θ|y)$, where $θ=RRWW$.

(8/320) / (20/320) = 8/20 = 2/5

**Task18**:

Story time: The marble factory produces bags of four marbles. They want to make red marbles rare, so that people will get excited about them.  Therefore, for each 1 bag containing four red, they made 2 that contain three red, 3 that contain two red, 4 that contain one red, and 5 that contain zero red.

With this new prior information, find $p(θ|y)$, where $θ=RRWW$.

**NOTE**: You MUST calculate  a new marginal likelihood $p(y)$ with the new prior information.

p(y) = p(rwr)

p(WWWW) = 5/15

p(RWWW) = 4/15

p(RRWW) = 3/15

p(RRRW) = 2/15

p(RRRR) = 1/15

---

p(y|WWWW) = 0

p(y|RWWW) = 1/4 * 3/4 * 1/4 = 3/64

p(y|RRWW) = 2/4 * 2/4 * 2/4 = 8/64

p(y|RRRW) = 3/4 * 1/4 * 3/4 = 9/64

p(y|RRRR) = 0

---

p(y|WWWW)p(WWWW) = 0

p(y|RWWW)p(RWWW) = 3/64 * 4/15 = 12/960

p(y|RRWW)p(RRWW) = 8/64 * 3/15 = 24/960

p(y|RRRW)p(RRRW) = 9/64 * 1/15 = 9/960

p(y|RRRR)p(RRRR) = 0

---

p(rwr) = p(y) = 12/960 + 24/960 + 9/960 = 45/960

p(θ|y) = (24/960) / (45/960) = 24/45

**Task19**:

Write down similarities and differences between this marble example, and the Victor Wembanyama FT example.

- We didn't graph the marble example or this example when the FT example did, and we didn't have multiple data points
- We knew p(θ) in the last example, and not the others
- We didn't take a second data set for the likelihood in the marbles examples