# Problem 1: Legal Reasoning

Suppose a crime has been committed and blood is found at the scene. This blood type is present in 1% of the population.

The above scenario can be modeled with Baye's Thereom as 

$$P(C|B) = \frac{P(B|C)P(C)}{P(B)}$$ 

where random variable $C$ represents whether an individual commited the crime, and random variable $B$ is whether or not an individual has this rare blood type.

## Part 1

The prosecution claims that "there is a 1% chance that the defendant would have the crime blood type if he were innocent. Thus, there is a 99% chance that he is guilty."

The prosecutor computed the likelihood term in Baye's theorem. Given that the accused is innocent, then there is a 1% chance of observing the crime blood type. $P(B=\textrm{rare type} | C = \textrm{innocent}) = 0.01$. Thus, taking $1 - 0.01 = 0.99$ the prosecutor concludes that the accused is likely guilty. 

However, the prosecutor has committed an error by going from the probability of observing a blood type given innocence, to the probability probability of being guilty given a blood type. If we establish that $P(B=\textrm{rare type} | C = \textrm{innocent}) = 0.01$, then we cannot say that $P(C = \textrm{guilty} | B= \textrm{rare type)} = 1 - 0.01$. 

This is apparent when we consider Baye's thereom as written above because $P(C = \textrm{guilty} | B= \textrm{rare type}) \propto \textrm{likelihood} \times \textrm{prior}$

## Part 2

The defendant claims that because the crime occured in a city of 800,000 people, then the blood type would be found in approximately 8,000 people. Thus, there is a $1/8000$ chance that the accused is guilty.

In this case the defendant computed $P(C=\textrm{guilty}|B=\textrm{rare type}) = P(B=\textrm{rare type}) = 1/8000$. We immediately see that this a violation of Baye's theorem because the defendant has ignored the likelihood and prior terms, and has only computed the normalization term in the denominator. 

# Problem 2: Poisson Distribution MLE

### Part 1

Derive the maximum likelihood estimator for $\lambda$ for the Poisson Distribution

$$P(X=k) = \frac{\lambda^k e^{-\lambda}}{k!}$$

Suppose that we have a data set $X_1, X_2, \dots, X_m$. Suppose that each count is an independent event. Thus

$$P(X_1, \dots, X_m | \lambda) = \Pi_{i=1}^{m} P(X_i|\lambda) = \Pi_{i=1}^{m} \frac{\lambda^{x_i} e^{-\lambda}}{x_i!}$$

In order to find the value of $\lambda$ that maximizes this function, I start by taking the logarithm of the expression above.

$$\ln[\Pi_{i=1}^{m} \frac{\lambda^{x_i} e^{-\lambda}}{x_i!}] = \sum_{i=1}^{m} x_i \ln{(\lambda)} - \lambda - \ln{(x_i!)}$$

Next, I take the derivative of the summed logarithm with respect to $\lambda$

$$\frac{\partial}{\partial \lambda} \sum_{i=1}^{m} x_i \ln{(\lambda)} - \lambda - \ln{(x_i!)} = \sum_{i=1}^{m} \frac{x_i}{\lambda} - 1$$

Finally I set the derivative equal to $0$ and solve for $\lambda$ in order to find the maximum likelihood

$$ 
\frac{\partial}{\partial \lambda} = 0 = \sum_{i=1}^{m} \frac{x_i}{\lambda} - 1 
\\ \Rightarrow \frac{1}{\lambda} \sum_{i=1}^{m}x_i - \sum_{i=1}^{m} 1 = 0
\\ \Rightarrow \frac{1}{\lambda} \sum_{i=1}^{m}x_i - m = 0
\\ \Rightarrow \boxed{\lambda = \frac{1}{m} \sum_{i=1}^{m}x_i}
$$

Which gives us our maximum likelihood value for $\lambda$.

### Part 2 

Evaluate our estimator for the dataset $[1, 2, 2, 3]$. In this example $m=4$ because there are 4 observations and the random variables are $(X_1 = x_1 = 1), (X_2 = x_2 = 2), (X_3 = x_3 = 2), (X_4 = x_4 = 3)$. Thus, using the equation above I can compute $\lambda$ as

$$\lambda = \frac{1}{4}(1 + 2 + 2 + 3) = \frac{8}{4} = \boxed{2}$$

# Problem 3: Naive Bayes

### Part 1: The Setup

In our Naive Bayes Classifier we take our Bayesian classifier

$$P(C=C_k | \boldsymbol{x}) \propto P(\boldsymbol{x}|C=C_k)P(C=C_k)$$

For a vector of words $\boldsymbol{x}$. The naive portion of our classifier is that observations of words are independent. With this assumption we can rewrite the above as

$$P(C=C_k | \boldsymbol{x}) \propto P(x_1|C=C_k) \dot \dots \dot P(x_n|C=C_k) P(C=C_k)$$

However, a problem arises when we evaluate a specific sequence of words using our classifier. Each likelihood requires a prior which we compute as 

$$P(x_i | C = C_k) = \frac{\textrm{# of occurences of } x_i \textrm{ in } C_k}{\textrm{Total # of words in } C_k}$$

Suppose we encounter a word $x_{s}$ that we haven't seen before. Then using our equation for a prior above we would get $P(x_s | C = C_k) = \frac{0}{\textrm{Total # of words in } C_k} = 0$. 

This means that for any vector of words $\boldsymbol{x}$ such that $x_s \in \boldsymbol{x}$, then $P(C=C_k | \boldsymbol{x}) = 0$.

### Part 2: The Sting

As a way to ameliorate this problem, we can set any prior with an unseen word equal to $1$. This approach has to benefits to solving the problem explained above:

1) The classifier will now evaluate to a non-zero value given an unseen word $x_s \in \boldsymbol{x}$.

2) Unseen words <b>do not</b> contribute any numerical probability to the likelihood estimator. Since $P(x_s | C = C_k) = 1$ in this approach, then the product including $P(x_s | C = C_k)$ is exactly the same as the product that excludes this term.

In this way we can still use our already trained classifier to evaluate probabilities on a vector of words.

# Problem 4: Monty Hall

Define a random variable $D=d$ which represents whether or not the prize is behind door $d$. Since we are picking among 3 doors at random, and each door is equally likely to contain the prize, then the probability of winning is $P(D=d) = 1/3 \textrm{ for } d \in [1, 2, 3]$. Thus, at the game's first step we can pick any of the three doors and be equally likely to win. Suppose without loss of generality that we pick door 1. 

In the next move of the game, Monty must open a door to reveal its contents. <i>However, Monty cannot open the door containing the prize</i>. This fact is critical for our Bayesian analysis. We can model the second step in the game using two random variables: $D$ the currently selected door, and $O$ for the door that Monty opened. Thus $P(D=d|O=o)$ is the probability of winning by selecting door $d$ given that Monty opened door $o$. Once again suppose without loss of generality that Monty opens door 2. Thus, our two posteriors are 

$$P(D=1 | O=2)\textrm{ and }P(D=3 | O=2)$$

It is important to note that $P(D=1 | O=2)\textrm{ and }P(D=3 | O=2)$ are the only two options because $P(D=2 | O=2)$ is against the rules of the game, and likely Monty's employment contract. Next, I expand the first posterior using Baye's theorem:

$$P(D=1 | O=2) = \frac{P(O=2 | D=1)P(D=1)}{P(O=2)}$$

The prior term $P(D=1)$ in this case is the easiest to consider. As stated above, all doors are equally likely to contain the prize, so $P(D=1) = 1/3$. We can use the fact that Monty must open a door containing a goat to compute the likelihood. If $D=1$, then Monty can choose to open doors 2 or 3 with equal likelihood. Thus, $P(O=2|D=1)=1/2$. Lastly, apply the sum and product rules to the evidence term

$$P(O=2) = P(O=2, D=1) + P(O=2, D=3) \\
= P(O=2|D=1)P(D=1) + P(O=2|D=3)P(D=3) $$

I know from above that $P(D=1)=P(D=3)=1/3$. In addition, I know that $P(O=2|D=1)=1/2$. Consider the term $P(O=2|D=3)$. Given that we've selected door 1, and that Monty cannot open the door with the prize, then if the prize is behind door 3 the only door that Monty can open is door 2. Thus, $P(O=2|D=3) = 1$. With these numbers, we can solve for the denominator of Baye's Theorem

$$P(O=2|D=1)P(D=1) + P(O=2|D=3)P(D=3) = (1/2)(1/3) + (1)(1/3) = 1/2$$

Combining the numerator and denominator we can solve for the posterior

$$P(D=1 | O=2) = \frac{P(O=2 | D=1)P(D=1)}{P(O=2)} = \frac{(1/2)(1/3)}{1/2} = 1/3$$

Finally, $P(D=3 | O=2) = 1 - P(D=1 | O=2) = 2/3$.

This result tells us that after Monty has opened a door, we have a 1/3 probability of winning the prize if our choice remains door 1, but a $2/3$ probability of winning the prize if we change our selection to door 3. 