In [None]:
!pip install -q symbulate
from symbulate import *
%matplotlib inline

# Named Distributions

We've seen how the distribution of a random variable $X$ is described by its p.m.f.

$$ p[x] \overset{\text{def}}{=} P(X = x), $$

which is a function describing the probability that the random variable $X$ takes on different values.

Some distributions arise so frequently that they have names. For example, a random variable $X$ that only takes on two possible values, 0 and 1, is said to follow a **Bernoulli distribution**. Its p.m.f. is 

$$ p[x] = \begin{cases} 1 - p & x = 0 \\ p & x = 1 \\ 0 & \text{otherwise} \end{cases}. $$

To completely specify this distribution, we need to specify the value of $p$, which is called a **parameter** of the Bernoulli distribution.

# The Binomial Distribution

To appreciate the usefulness of named distributions, let's consider a more complex example. Suppose we draw $n$ tickets, with replacement, from the following box:

$$ \fbox{$\overbrace{\underbrace{\fbox{0}\ \fbox{0}\ \cdots\ \fbox{0}}_{\text{$N_0$ tickets}}\ \underbrace{\fbox{1}\ \fbox{1}\ \cdots\ \fbox{1}}_{\text{$N_1$ tickets}}}^{\text{$N$ tickets}}$}. $$

The _number of $\fbox{1}$s_ we get in these $n$ draws is a random variable. It is said to follow a $\text{Binomial}(n, N_1, N_0)$ distribution. What is the p.m.f. of this distribution?

## A Special Case

To be concrete, let's first suppose $X \sim \text{Binomial}(n=5, N_1=4, N_0=6)$ and calculate $p[2]$. That is, we consider the probability we get exactly 2 $\fbox{1}$s in $5$ draws with replacement from the following box:

$$ \fbox{$\overbrace{\underbrace{\fbox{0}\ \fbox{0}\ \fbox{0}\ \fbox{0}\ \fbox{0}\ \fbox{0}}_{\text{$N_0 = 6$}}\ \underbrace{\fbox{1}\ \fbox{1}\ \fbox{1}\ \fbox{1}}_{\text{$N_1=4$}}}^{\text{$N=10$}}$}. $$

There are $10 \times 10 \times 10 \times 10 \times 10 = 10^5$ equally likely ways to draw 5 tickets from this box. How many of them result in exactly 2 $\fbox{1}$s, our event of interest?

One possibility is to draw the 2 $\fbox{1}$s first, followed by 3 $\fbox{0}$s, i.e., 

$$ \fbox{1}\ \fbox{1}\ \fbox{0}\ \fbox{0}\ \fbox{0}. $$ 

The number of outcomes like this is 

$$ 4 \times 4 \times 6 \times 6 \times 6 = 4^2 \cdot 6^3. $$

But maybe the 2 $\fbox{1}$s and 3 $\fbox{0}$s were observed in some other order, like 

$$ \fbox{0}\ \fbox{1}\ \fbox{0}\ \fbox{0}\ \fbox{1}. $$ 

The number of outcomes like this is 

$$ 6 \times 4 \times 6 \times 6 \times 4 = 4^2 \cdot 6^3. $$

**No matter how the 2 $\fbox{1}$s and 3 $\fbox{0}$s are arranged, there will be $4^2 \cdot 6^3$ such outcomes.**

Therefore, the number of ways of getting exactly 2 $\fbox{1}$s is 

\begin{align*}
(\text{# ways to get 2 $\fbox{1}$s in 5 draws}) &= (\text{# ways to arrange 2 $\fbox{1}$s and 3 $\fbox{0}$s}) \cdot 4^2 \cdot 6^3 \\
&= \binom{5}{2} \cdot 4^2 \cdot 6^3
\end{align*}

To convert this to a probability, we simply divide by the total number of outcomes:

$$ P(X = 2) =  \frac{\binom{5}{2} \cdot 4^2 \cdot 6^3}{10^5}. $$

## The General Case

Now let's generalize the derivation above to general $n$, $N_1$, and $N_0$. (In the example above, $n = 5$, $N_1 = 4$ and $N_0 = 6$.) The probability of getting exactly $x$ $\fbox{1}$s is 

$$ p[x] = P(X = x) = \frac{\binom{n}{x} N_1^x N_0^{n-x}}{N^n}. $$

Notice that this formula can be rewritten in terms of $p = N_1 / N$:

\begin{align*}
P(X = x) &= \frac{\binom{n}{x} N_1^x N_0^{n-x}}{N^n} \\
&= \binom{n}{x} \frac{ N_1^x N_0^{n-x}}{N^x N^{n-x}} \\
&= \binom{n}{x} p^x (1 - p)^{n-x}.
\end{align*}

This is the more common way of expressing the p.m.f. of the binomial.

Once you specify the parameters of the binomial distribution, you can plot or evaluate its p.m.f. in Symbulate:

In [None]:
Binomial(n=5, p=.4).plot()

In [None]:
Binomial(n=5, p=.4).pdf(2)

This is why named distributions are so useful. Once you identify a random variable as following a binomial distribution, you can calculate probabilities by plugging numbers into the formula for the p.m.f. (or have software do it for you).

## Example. Roulette

In roulette, the ball is equally likely to land in any one of 38 pockets, which includes 18 red, 18 black, and 2 green. What is the probability the ball lands in a red pocket **at least once** in 4 spins of the roulette wheel? 

Tips:
- Set up a box model, identify the binomial distribution, and use the p.m.f. of the binomial distribution.
- There are multiple ways to answer this question. I encourage you to try multiple methods and make sure they agree.

In [None]:
# YOUR CODE HERE