In [0]:
!pip install -q symbulate
from symbulate import *
%matplotlib inline

# The Binomial Model

In the previous lecture, we imagined tossing a coin with probability $p$ of coming up heads. We modeled the coin tosses as draws (with replacement) from a box of $\fbox{0}$s and $\fbox{1}$s, where a $\fbox{1}$ represented heads.

$$ \fbox{$\overbrace{\underbrace{\fbox{0}\ \fbox{0}\ \cdots\ \fbox{0}}_{\text{$N_0$ tickets}}\ \underbrace{\fbox{1}\ \fbox{1}\ \cdots\ \fbox{1}}_{\text{$N_1$ tickets}}}^{\text{$N$ tickets}}$} $$

If the coin is fair ($p = 1/2$), then we can take $N_0 = N_1 = 1$. But in general, the composition of the box depends on the value of $p$. We choose $N_0$ and $N_1$ so that the fraction of $\fbox{1}$s in the box matches the value of $p$, i.e.,

$$ p = \frac{N_1}{N}. $$

For example, to simulate $n = 5$ tosses of a coin with a probability $0.4$ of coming up heads, we did the following.

In [0]:
model = BoxModel([0, 0, 0, 0, 0, 0, 1, 1, 1, 1], size=5, replace=True)
X = RV(model, sum)
xs = X.sim(10000)
xs

In [0]:
xs.plot()

We can approximate any probability by counting the number of simulations. For example, here's how we could approximate $P(X = 2)$ or $P(X < 2)$ using the 10000 simulations above:

In [0]:
xs.count_eq(2) / 10000, xs.count_lt(2) / 10000

The problem with simulations is that they are not exact. We can get closer to the exact probabilities by increasing the number of simulations, but we will never get exact probabilities by simulation. (Not to mention that we get a different number each time the simulation is run.)

In this lecture, we derive an exact formula for the probabilities in the binomial model, in terms of $n$, $N_1$, and $N_0$. To summarize, the **binomial model** describes:

- the number of $\fbox{1}$s you get
- when you draw $n$ tickets _with_ replacement 
- from a box of $\fbox{0}$s and $\fbox{1}$s.

## A Special Case

Let's work out the probability that we get exactly 2 $\fbox{1}$s when we draw 5 tickets with replacement from the box above:

$$ \fbox{$\overbrace{\underbrace{\fbox{0}\ \fbox{0}\ \fbox{0}\ \fbox{0}\ \fbox{0}\ \fbox{0}}_{\text{$N_0 = 6$}}\ \underbrace{\fbox{1}\ \fbox{1}\ \fbox{1}\ \fbox{1}}_{\text{$N_1=4$}}}^{\text{$N=10$}}$}. $$

There are $10 \times 10 \times 10 \times 10 \times 10 = 10^5$ equally likely ways to draw 5 tickets from this box. How many of them result in exactly 2 $\fbox{1}$s, our event of interest?

One possibility is to draw the 2 $\fbox{1}$s first, followed by 3 $\fbox{0}$s, i.e., 

$$ \fbox{1}\ \fbox{1}\ \fbox{0}\ \fbox{0}\ \fbox{0}. $$ 

The number of such outcomes is 

$$ 4 \times 4 \times 6 \times 6 \times 6 = 4^2 \cdot 6^3. $$

But maybe the 2 $\fbox{1}$s and 3 $\fbox{0}$s were observed in some other order, such as 

$$ \fbox{0}\ \fbox{1}\ \fbox{0}\ \fbox{0}\ \fbox{1}. $$ 

The number of such outcomes is 

$$ 6 \times 4 \times 6 \times 6 \times 4 = 4^2 \cdot 6^3. $$

**No matter how the 2 $\fbox{1}$s and 3 $\fbox{0}$s are arranged, there will be $4^2 \cdot 6^3$ such outcomes.**

Therefore, the number of ways of getting exactly 2 $\fbox{1}$s is 

\begin{align*}
(\text{# ways to get 2 $\fbox{1}$s in 5 draws}) &= (\text{# ways to arrange 2 $\fbox{1}$s and 3 $\fbox{0}$s}) \cdot 4^2 \cdot 6^3 \\
&= \binom{5}{2} \cdot 4^2 \cdot 6^3
\end{align*}

To convert this to a probability, we simply divide by the total number of outcomes:

$$ P(X = 2) =  \frac{\binom{5}{2} \cdot 4^2 \cdot 6^3}{10^5}. $$

In [0]:
factorial(5) / (factorial(2) * factorial(3)) * (4 ** 2) * (6 ** 3) / (10 ** 5)

This probability is exact, unlike the one we simulated earlier.

## The General Case

Now let's generalize the derivation above to general $n$, $N_1$, and $N_0$. (In the example above, $n = 5$, $N_1 = 4$ and $N_0 = 6$.) The probability of getting exactly $x$ $\fbox{1}$s is 

$$ p_{n, N_1, N_0}(x) = P(X = x) = \frac{\binom{n}{x} N_1^x N_0^{n-x}}{N^n}. $$

Notice that this formula can be rewritten in terms of $p = N_1 / N$:

\begin{align*}
P(X = x) &= \frac{\binom{n}{x} N_1^x N_0^{n-x}}{N^n} \\
&= \binom{n}{x} \frac{ N_1^x N_0^{n-x}}{N^x N^{n-x}} \\
&= \binom{n}{x} p^x (1 - p)^{n-x} = p_{n, p}(x).
\end{align*}

This is the more common way of expressing the **probability mass function** (p.m.f.) of the binomial. The p.m.f. is a function that returns the probability of the outcome you plug into that function.

You can plot or evaluate the p.m.f. in Symbulate:

In [0]:
Binomial(n=5, p=.4).plot()

In [0]:
Binomial(n=5, p=.4).pdf(2)

## Example. Roulette

In roulette, the ball is equally likely to land in any one of 38 pockets, which includes 18 red, 18 black, and 2 green. What is the probability the ball lands in a red pocket **at least once** in 4 spins of the roulette wheel? 

Tips:
- Set up a box model, identify the binomial distribution, and use the p.m.f. of the binomial distribution.
- There are multiple ways to answer this question. I encourage you to try multiple methods and make sure they agree.

In [0]:
# YOUR CODE HERE