In [0]:
#@title Imports
!pip install -q symbulate
from symbulate import *

# Law of Total Probability

Let $X$ be a random variable and $A$ be any event. Then:

$$ P(A) = \sum_x P(X=x) P(A | X=x). $$

This strategy is useful when $X$ represents information _you wish you knew_ for calculating $P(A)$ so that $P(A | X = x)$ is particularly simple.

## Example 1.

The ELISA test is used to screen blood for HIV. 

- When the blood contains HIV, it gives a positive result 99\% of the time. 
- When the blood does not contain HIV, it gives a negative result 94\% of the time. 

If the prevalence of HIV is 0.7\% in the adult male population, what is the probability that a randomly selected adult male will test positive?

Suppose an adult male patient has just tested positive and wants to know the probability that he has HIV. What would you tell him?

To calculate the probability of $A = \{ \text{test positive}\}$, it would be useful to know whether the person have HIV or not. So let $X = 1$ if he has HIV and $X=0$ if he does not. We are given in the problem that 

- $P(A | X=1) = .99$
- $P(A | X=0) = 1 - .94 = .06$

By the Law of Total Probability, the probability that a random selected adult male tests positive is

$$ P(A) = P(X=1) P(A|X=1) + P(X=0) P(A|X=0) = .007 (.99) + (1 - .007) (.06) = 0.06651. $$

The patient wants to know $P(X=1|A)$. We can use the conditional probability formula:

$$ P(X=1|A) = \frac{P(X=1 \cap A)}{P(A)} = \frac{P(X=1) P(A|X=1)}{P(A)} = \frac{.007 (.99)}{.06651} = .10419, $$

so the patient only has about a 10% chance of having the disease, even though he tested positive. This simple application of conditional probability and the Law of Total Probability is called Bayes' rule.

## Example 2.

You draw two cards from a well-shuffled deck of cards. What is the probability that the 2nd card is a heart?

In [0]:
model = DeckOfCards(size=2)

def second_card_is_heart(draws):
  return (draws[1][1] == "Hearts")

model.sim(10000)#.apply(second_card_is_heart).tabulate()

### Argument by Symmetry

Without any other information, the 2nd card is equally likely to be any of the 52 cards. So the probability must be $13 / 52$.

### Argument by Law of Total Probability

It is true that the probability that the 2nd card is a heart changes, depending on what the 1st card was. But in the absence of any information about the 1st card, the probability that the 2nd card is a heart is $13/52$.

If you fail to be convinced by this argument, let's do a calculation. Let $A$ be the event that the 2nd card is a heart. If we condition on whether the 1st card is a heart ($X=1$), then the probabilities are uncontroversial:

- $P(A | X=1) = 12/51$
- $P(A | X=0) = 13/51$

Now we use the Law of Total Probability:

$$ P(A) = P(X=0) P(A|X=0) + P(X=1) P(A|X=1) = \frac{39}{52} \frac{13}{51} + \frac{13}{52} \frac{12}{51} = \frac{13}{52}. $$

# Sums of Independent Random Variables

Suppose $X$ and $Y$ are independent random variables. What is the distribution of their sum $S = X + Y$?

We can use the Law of Total Probability, conditioning on the value of $X$:
\begin{align}
P(S = s) &= \sum_x P(X=x) P(S=s | X=x) \\
&= \sum_x P(X=x) P(X+Y=s | X=x) & \text{(definition of $S$)} \\
&= \sum_x P(X=x) P(Y=s-x | X=x) & \text{(using given information)} \\
&= \sum_x P(X=x) P(Y=s-x) & \text{(by independence)} \\
\end{align}

If $p_X$ is the p.m.f. of $X$ and $p_Y$ is the p.m.f. of $Y$, then the p.m.f. of the sum is given by 
$$ p_S[s] = \sum_x p_X[x] p_Y[s - x]. $$
This is called the _convolution_ of $p_X$ and $p_Y$. Convolution is an important operation in signal processing.

## Example 3.

Let $X \sim \text{Binomial}(n, p)$ and $Y \sim \text{Binomial}(m, p)$. What is the distribution of $S = X + Y$?

\begin{align}
P(S=s) &= \sum_{x=0}^n P(X=x) P(Y=s-x) \\
&= \sum_{x=0}^n \binom{n}{x} p^x (1-p)^{n-x} \binom{m}{s-x} p^{s-x} (1-p)^{m-(s-x)} \\
&= \sum_{x=0}^n \binom{n}{x} \binom{m}{s-x} p^s (1-p)^{n + m - s} \\
&= \binom{n+m}{s} p^s (1-p)^{n+m-s}
\end{align}
In the last step, we used the fact that the hypergeometric p.m.f. 
$$ p[x] = \frac{\binom{n}{x} \binom{m}{s-x}}{\binom{n+m}{s}} $$
must sum to 1, so the numerator must sum to the denominator.

Now, we recognize this as the p.m.f. of a $\text{Binomial}(n + m, p)$ distribution.

Here's an intuitive way to see the above result that avoids all computation.

Let $X_1, ..., X_n$ be the individual draws from the box, so that each $X_i$ is either 0 or 1. These random variables are independent, since the draws are made with replacement. Then the binomial random variable $X$ can be expressed as a sum of these independent random variables.
$$ X = X_1 + ... + X_n. $$

We can represent $Y$ as the sum of $m$ more draws from the same box:
$$ Y = Y_1 + ... + Y_m. $$
(The $Y_i$s cannot overlap with the $X_i$s because we need $Y$ to be independent of $X$.)

Now their sum is just the sum of $m + n$ draws from the box
$$ X + Y = X_1 + ... + X_n + Y_1 + ... + Y_m, $$
which must therefore be $\text{Binomial}(n+m, p)$.
