# MIT 18.05: Selected Problems & Exercises
Ravi Dayabhai

In [1]:
%load_ext autoreload
%autoreload 2
%config InlineBackend.figure_format = 'retina'

In [2]:
import numpy as np
import pandas as pd
from scipy.stats import geom

## Discrete Distributions

### Binomial $\rightarrow$ Normal

Let's explore how, in the limiting case under special conditions, the Binomial approximates the Normal.

#### Claim

Because $Y \sim \text{Bin}(n, p)$ can be expressed as $\sum_{i=1}^{n} X_{i}$ for $X_{i} \stackrel{\text{i.i.d.}}{\sim} \text{Bern}(p)$), we can use the Central Limit Theorem (CLT) provided that $n$ is sufficiently large. Specifically, the CLT tells us:

$$
\begin{align}
Z  &= \frac{Y - \mu_{Y}}{\sigma_{Y}} \stackrel{d}{\rightarrow} \mathcal{N}(0,1)\\
Z  &= \frac{Y - np}{\sqrt{npq}}\\
\end{align}
$$

#### Justification

Notice that getting from the above to a more familiar incarnation of the CLT for a Bernoulli random variable $X$ is just a matter of transforming $Y$; said another way, $\frac{Y}{n} = \bar{X_{n}}$ is a sampled mean and we know that the sampling distribution of the sample mean is Normal.

$$
\begin{align}
\frac{Y}{n} = \bar{X_{n}} \implies Z &= \frac{\frac{Y}{n} - \mu_{\bar{X_{n}}}}{\sigma_{\bar{X_{n}}}} \stackrel{d}{\rightarrow} \mathcal{N}(0,1)\\
Z  &= \sqrt{n}\frac{\bar{X_{n}} - p}{\sqrt{pq}} = \sqrt{n}\frac{\hat{p_{n}} - p}{\sqrt{pq}}\\
\end{align}
$$

So, to get from the bottom relationship to the top we can simply multiply the numerator and denominator on the right-hand side by $n$!

### Geometric

Suppose that the inhabitants of an island plan their families by having babies until the first girl is born. Assume the probability of having a girl with each pregnancy is 0.5 independent of other pregnancies, that all babies survive and there are no multiple births. What is the probability that a family has $k$ boys?

In [3]:
# Geometric PDF (up to a point)
p = 0.5
loc = -1
for k in range(20 + 1):
    print(f"The probability of having {k} boys:", f"{geom.pmf(k, p, loc):.5f}")

The probability of having 0 boys: 0.50000
The probability of having 1 boys: 0.25000
The probability of having 2 boys: 0.12500
The probability of having 3 boys: 0.06250
The probability of having 4 boys: 0.03125
The probability of having 5 boys: 0.01562
The probability of having 6 boys: 0.00781
The probability of having 7 boys: 0.00391
The probability of having 8 boys: 0.00195
The probability of having 9 boys: 0.00098
The probability of having 10 boys: 0.00049
The probability of having 11 boys: 0.00024
The probability of having 12 boys: 0.00012
The probability of having 13 boys: 0.00006
The probability of having 14 boys: 0.00003
The probability of having 15 boys: 0.00002
The probability of having 16 boys: 0.00001
The probability of having 17 boys: 0.00000
The probability of having 18 boys: 0.00000
The probability of having 19 boys: 0.00000
The probability of having 20 boys: 0.00000


What about the ratio of boys to girls on the island?

In [4]:
# Expectation of Geometric distribution
geom.mean(p, loc)

1.0

### Expectation

Suppose you are playing a game where you roll two dice. The payoff function can be described as:

$$
Y_{i, j} = ij - 10
$$

1. What's the expected payoff for this game? 
2. Is this a better game to play versus a game where you win \\$500 on rolling a 7 (this game sums the dice values) and lose \\$100 otherwise?

In [5]:
# Create two dice arrays
dice_1 = np.arange(1, 6+1)
dice_2 = dice_1.copy()

# Construct sample space, probability space, and outcomes
sample_space = np.array([np.tile(dice_1, len(dice_2)), 
                     np.repeat(dice_2, len(dice_1))])
rv_value = np.prod(sample_space, axis=0, keepdims=True) - 10
prob_distribution = np.full_like(rv_value, 1/sample_space.shape[1], dtype='float')

# Expected value of playing this dice game
expected_value = rv_value.dot(prob_distribution.transpose()).item(0)
expected_value

2.25

This is better game than the alternative because the alternative's expected value is 0!