### Spread

$\huge{\sqrt{\frac{p(1-p)}{n}}}$

The standard deviation of the sample proportion.

### Shape

A normal model is a good fit if the expected number of sucesses and failures is at least 10:

$ \huge{np \ge 10 \text{ and } n(1-p) \ge 10}$  

In [7]:
# 62% of all graduates from public universities have student loans.
p = .62

# n = 20
n = 30
s = f' n = {n} '
expected_success, expected_failure = n*p, n*(1-p)
print(f'{s:#^20}')
print(expected_success, expected_failure)

###### n = 30 ######
18.6 11.4


### Try It: Distribution of Sample Proportions (4 of 6)

In [23]:
n = 50  
s = f' n = {n} '
p = .62
std_error = (p*(1-p)/n)**.5

print(f'{s:#^20}', '\n', std_error)
p - std_error, p + std_error

###### n = 30 ###### 
 0.08861903482510591


(0.5313809651748941, 0.7086190348251059)

### 5 students selecting 50 M&M's. p = .2 for orange M&M's.

In [28]:
def simulate_one_selection():
    n = 50
    p = .2
    std_err = (p*(1-p)/n)**.5
    return p - std_error, p + std_error

In [29]:
n = 50
p = .2
std_err = (p*(1-p)/n)**.5
std_err

0.05656854249492381

In [30]:
outcomes = []
for _ in range(5):
    outcomes.append(simulate_one_selection())
outcomes

[(0.1113809651748941, 0.2886190348251059),
 (0.1113809651748941, 0.2886190348251059),
 (0.1113809651748941, 0.2886190348251059),
 (0.1113809651748941, 0.2886190348251059),
 (0.1113809651748941, 0.2886190348251059)]

### Distribution of Sample Proportions (5 of 6) 

$$ \huge{ z = \frac{\text{statistic}-\text{parameter}}{\text{standard error}}} 
\\
 \quad \\
 z = \huge{\frac{\hat{p} - p}{\sqrt{\frac{p(1-p)}{n}}}}
 $$

 The number of standard errors a sample proportion is from the parameter.

In [38]:
# Of 3.1 million 18 - 24 year-olds, 10% are enrolled in community college.

p = .1

# Randomly selecting 100 young adults in this age group, 15% of the sample are in community college.

n = 100
p_hat = .15

std_err = ( p*(1-p)/n)**.5
z_score = (p_hat-p)/std_err

std_err, z_score

(0.030000000000000002, 1.666666666666666)

$ \large{P(\hat{p} >= 0.15 | p = 0.10 \approx 0.0475)} $

### Try It: Distribution of Sample Proportions (5 of 6)

In [40]:
p = .4

# What is the probability that a sample of 200 had less than .35?

p_hat = .35
n = 200

std_error = (p*(1-p)/n)**.5
z = (p_hat - p)/std_error

print(std_err, z)

def check_normality(p, n):
    condition_1 = (n*p) >= 10
    condition_2 = n*(1-p) >= 10
    
    return condition_1 and condition_2

0.030000000000000002 -1.4433756729740657


In [42]:
check_normality(p,n)

True

### Distribution of Sample Proportions (6 of 6)

#### Overweight Men example

In [48]:
p = .3 # overweight

n = 100
p_hat = .25 # or .35

std_error = (p*(1-p)/n) **.5
z = (p_hat - p)/std_error

print(std_error, z)

check_normality(p,n)

# What is the probability that the sample proportion will differ by more than 10%?
prob = 0.1379 * 2
prob

0.0458257569495584 -1.0910894511799616


0.2758