This assignment will be reviewed by peers based upon a given rubric. Make sure to keep your answers clear and concise while demonstrating an understanding of the material. Be sure to give all requested information in markdown cells. It is recommended to utilize Latex.

### Problem 1

What does it mean for one event $C$ to cause another event $E$ — for example, smoking ($C$) to cause cancer ($E$)? There is a long history in philosophy, statistics, and the sciences of trying to clearly analyze the concept of a cause. One tradition says that causes raise the probability of their effects; we may write this symbolically is 
$$
\begin{equation} 
P(E | C)>P(E). 
\end{equation} \qquad \qquad (1)
$$ 

#### Part a)

Does equation (1) imply that $P(C | E) > P(C)$? If so, prove it. If not, give a counter example.

Here we apply Bayes' Rule:


\begin{align*}
P(C\vert E) & =\frac{P(E\vert C)P(C)}{P(E)}\\
 & >\frac{P(E)P(C)}{P(E)}\\
 & =P(C)
\end{align*}

Thus, if we subscribe to the above definition of causality, cancer causes smoking!

#### Part b)

Another way to formulate a probabilistic theory of causation is to say that
$$P (E | C) > P(E | C^C). \qquad \qquad (2)$$ Show that equation (1) implies equation (2).

\begin{align*}
P(E\vert C) & >P(E)\\
 & =P(E|C)P(C)+P(E|C^{C})P(C^{C})\\
\implies P(E\vert C)\left(1-P(C)\right) & >P(E|C^{C})P(C^{C})\\
\implies P(E\vert C)P(C^{C}) & >P(E|C^{C})P(C^{C})\\
\implies P(E\vert C) & >P(E|C^{C})
\end{align*}

#### Part c)

Let $C$ be the drop in the level of mercury in a barometer and let $E$ be a storm. Briefly describe why this leads to a problem with using equation (1) (or equation (2)) as a theory of causation.

The probabilty of a storm conditioned a drop in the barometer reading is likely greater than the unconditional probability of a storm occuring. But does this mean that the barometer drop caused the storm?

#### Part d)

Let $A$, $C$, and $E$ be events. If $P(E | A \cap C) = P(E |C )$, then $C$ is said to screen $A$ off from $E$. Suppose that $P (E \cap C) > 0.$ Show that screening off is equivalent to saying that $P(A \cap E | C)=P(A|C)P(E | C).$ What does this latter equation say in terms of independence?

This one seems complicated, but all it requires is using the definition of conditional probability:

\begin{align*}
P(E\vert A\cap C) & =P(E\vert C)\\
\implies\frac{P\left(E\cap A\cap C\right)}{P(A\cap C)} & =P(E\vert C)\\
\implies\frac{P\left(E\cap A\cap C\right)}{P(A\cap C)P(C)} & =\frac{P(E\vert C)}{P(C)}\\
\implies\frac{P\left(E\cap A\cap C\right)}{P(C)} & =P(E\vert C)\frac{P(A\cap C)}{P(C)}\\
\implies P\left(E\cap A|C\right) & =P(E\vert C)P(A|C)\\
\end{align*}


#### Part e)

Now let $A$ be a the drop in the level of mercury in a barometer, $E$ be a storm, and $C$ be a drop in atmospheric pressure. Does the result from part (d) help fix the problem suggested in part (c)?

The take-away here is that E and A are conditionally independent given C. That is, once we know there's been a drop in the pressure, the barometer can't tell us anything new about the storm and vice-versa. Perhaps the existence of an event C that makes E and A conditionally independent precludes a directly causal relationship between them.

# Problem 2

Suppose you have two bags of marbles that are in a box. Bag 1 contains 7 white marbles, 6 black marbles, and 3 gold marbles. Bag 2 contains 4 white marbles, 5 black marbles, and 15 gold marbles. The probability of grabbing the Bag 1 from the box is twice the probability of grabbing the Bag 2.

If you close your eyes, grab a bag from the box, and then grab a marble from that bag, what is the probability that it is gold?

**Part a)** 

Solve this problem by hand. This should give us a theoretical value for pulling a gold marble.

Let $G$ be the event that we pull a gold marble, $B_1$ be the event that we select Bag 1, and $B_2$ be the event that we select Bag 2. We can use the Law of Total Probability to solve for the total probability:

\begin{align*}
    P(G) &= P(G|B_1)P(B_1) + P(G|B_2)P(B_2) \\
    &= (3/16)(2/3) + (15/24)(1/3) \\
    &= 1/3
\end{align*}

**Part b)**

Create a simulation to estimate the probability of pulling a gold marble. Assume you put the marble back in the bag each time you pull one out. Make sure to run the simulation enough times to be confident in your final result.

Note: To generate $n$ random values between [0,1], use the `runif(n)` function. This function generates $n$ random variables from the Uniform(0,1) distribution, which we will learn more about later in this course!

In [30]:
prob.gold.marbles = function(n){
    # Initialize a list to store whether the nth marble is gold
    marbles = numeric(n)
    # These for loops represent the simulation.
    # Each loop results in 1 marble pulled
    # Repeat the simulation n times
    for(sim in 1:n){
        # Generate a random variable on [0,1].
        x = runif(1)
        # If x is less than 2/3, pick bag 1, else pick bag 2.
        if(x < 2/3){
            # This is bag 1
            # Bag 1 has a 3/16 chance to be a gold marble
            # Generate another value on [0,1]
            # if less than 3/16, then the marble is gold
            y = runif(1)
            if(y < 3/16){
                # Picked a gold marble
                marbles[sim] = 1
            }
            else{
                # Picked a non-gold marble
                marbles[sim] = 0
            }
        }
        else{
            # This is Bag 2
            # Bag 2 has a 15/24 chance of being a gold marble
            # Repeat the process of Bag 1
            y = runif(1)
            if(y < 15/24){
                # Picked a gold marble
                marbles[sim] = 1
            }
            else{
                # Picked a non-gold marble
                marbles[sim] = 0
            }
        }
    }
    # When we exit the loop, we've finished the simulation
    # Have a list of n values, where each 1 is a gold marble being pulled.
    # To get the probability, divide the number of golds by the total number of simulations
    gold.prob = sum(marbles) / n
    return(gold.prob)
}

prob.gold.marbles(10000)

# Problem 3

Suppose you roll a fair die two times. Let $A$ be the event "the sum of the throws equals 5" and $B$ be the event "at least one of the throws is a $4$".

**Part a)** 

By hand, solve for the probability that the sum of the throws equals 5, given that at least one of the throws is a 4. That is, solve $P(A|B)$.

We want to compute 

\begin{align*}
P(A|B) = \dfrac{P(A \cap B)}{P(B)}
\end{align*}

So we need to solve for $P(A \cap B)$ and $P(B)$.

$P(A \cap B)$ is the event "at least one of the throws is a 4 and the sum of the throws equals 5." The combination of these events occurs only if one roll is a 4, and the other is a 1. This leaves us with two events out of the sample space: $\{4,1\}$ and $\{1,4\}$. There is a total of $6 \times 6 = 36$ possible combinations of rolls. Therefor, $P(A \cap B) = 2/36$.

$P(B)$ can also be solved by looking at the total number of events that occur with a 4. In total, there are 11 rolls where at least one of the rolls is a 4: $\{4,1\}, \{4,2\}, \{4,3\}, \{4,4\}, \{4,5\}, \{4,6\}, \{1,4\}, \{2,4\}, \{3,4\}, \{5,4\}, \{6,4\}$. That means $P(B) = 11/36$.

Plugging these values into our origional equation, we get:

\begin{align*}
P(A|B) = \dfrac{2/36}{11/36} \\
= \dfrac{2}{11} \\
\approx 0.1818
\end{align*}

**Part b)**

Write a simple simulation to confirm our result. Make sure you run your simulation enough times to be confident in your result.

Hint: Think about the definition of conditional probability.

In [29]:
set.seed(1)
samples = 10000
# Generate lists of independent rolls
rolls.1 = sample(1:6, samples, replace=TRUE)
rolls.2 = sample(1:6, samples, replace=TRUE)
# For each pair of rolls, determine if event A or event B occured
event.A = ((rolls.1 + rolls.2) == 5)
event.B = ((rolls.1 == 4) | (rolls.2 == 4))
# Use the definition of conditional probability
sum(event.A & event.B) / sum(event.B)