In [None]:
import statsmodels.stats.proportion as stm 
import scipy.stats as st

| |$H_0$ false | $H_0$ true|
|---| --- |---|
|Reject $H_0$| Correct rejection.  | Type I error = $\alpha$|
|Accept $H_0$| Type II error  | Correct acceptance|

### Hypothesis Testing (1 of 5)

There are two roles for inference so far:
- estimate a population parameter with a confidence interval
- test a claim about a population parameter with a hypothesis test

#### Example - Research Question about testing claims

- Is the average course load for community college students less than 12 semester hours?


- Do the majority of community college students qualify for federal student loans?

- In community colleges, do female students and male students have different mean GPAs?

- Are college athletes more likely than nonathletes to receive academic advising? 

#### What is the null hypothesis?

A claim about the value of a parameter.

It gives the value of the parameter we will use to create a sampling distribution.

It states what we assume to be true about the population.

#### What is the alternative hypothesis?

A claim about the value of a parameter.

The parameter is 'greater than', 'less than' or 'not equal to' the value we assume to be true for the null hypothesis.

#### Example - Stating Hypotheses

Is the average course load for community college students less than 12 semester hours?

$H_0: \mu = 12 \text{semester hours}$

$H_a: \mu < 12 \text{semester hours}$ 

Do the majority of community college students qualify for federal student loans?

$H_0: p = 0.5 $

$H_a: p > 0.5 $

*When the research question contains a claim that compares two populations, the null hypothesis states the parameters are equal. "No difference" in parameter values.*

In community colleges, do female students and male students have different mean GPAs?

$H_0: m_{gpa} = f_{gpa}$

$H_a: m_{gpa} \neq f_{gpa}$

Are college athletes more likely than nonathletes to receive academic advising?

$H_0: p_{athletes} = 0.5$

$H_a: p_{athletes} > 0.5$

#### Comments about Hypotheses

- the hypotheses are competing claims about the parameter(s)
- both hypotheses are statements about the same parameter(s)
- null hypothesis contains an equal sign
- alternative hypothesis is always an inequality statement


According to the Centers for Disease Control and Prevention, the proportion of U.S. adults age 25 or older who smoke is .22. A researcher suspects that the rate is lower among U.S. adults 25 or older who have a bachelor's degree or higher education level.

$H_0: r_{\text{25 or older}} = 0.22 $

$H_a: r_{\text{25 or older}} < 0.22 $


### Hypothesis Testing (2 of 5)

### Example - Data Use on Smart Phones

$H_0: \mu = 62 \text{MB}$

$ H_a: \mu > 62 \text{MB}$

In [None]:
n = 50
x_bar = 75
std = 45

What is $P(\bar{x} | \mu = 62 )$?

What is $\alpha$-level?

The significance level. If the p-value is less than the significance level, the result of the test shows a significance difference

## Hypothesis Testing (3 of 5)

### Example - community college students and federal student loans

Is the proportion of community colleges that do not participate in federal loan programs less than 25%, as reported? Let’s conduct a hypothesis test to find out.

$H_0: p = 0.25$

$H_a: p < 0.25$



In [None]:
n = 80
p = 16/80

But what is the p-value?

i.e. $ P( \hat{p} | p = 0.25)$

### Hypothesis Testing (4 of 5)

### Example - What is a P-value?

$H_0: p = .4$

$H_a: p < .4$

In [32]:
n = 200
p_hat = .35
p_value = 0.078

stm.proportions_ztest(70, 200, .4, 'smaller')

(-1.4824986333222037, 0.06910383348701273)

In [33]:
# # test for a proportion based on a normal (z) test
# zstat, p_value = stm.proportions_ztest( # -> zstat, p-value
#     # count of successes
#     # count of observations
#     # value of null hypothesis
#     # alternative
#         # two-sided
#         # smaller
#         # larger
#     # variance ( False or (0,1) )
# )

### Hypothesis Testing (5 of 5)

## Hypothesis Test for a Population Proportion