In [1]:
from math import *
from scipy.stats import norm, t, binom

## Tax Filing Example: lower-tail test with population SD
### Hypotheses Development and Information Collection

$H_0: \mu \geq \$1056$ (last-minute filers receive a mean refund of **at least** $1056)

$H_a: \mu < \$1056$ (last-minute filers receive a mean refund of **less than** $1056)

This is a lower tail test.

n = 400, $\bar{X} = 910, \sigma = 1600, \alpha = 0.05$ (population SD is available; therefore, we use z-statistic)

### Compute the test statistic, p-value and critical value

$z = \dfrac{\bar{X} - \mu_0}{\sigma \sqrt{n}} = \dfrac{910 - 1056}{1600 \sqrt{400}} = -1.825$

$p-value = 0.034 < \alpha = 0.05$ (Conclude that we reject $H_0$)

$ z = -1.825 < critical  value = -1.645$ (Conclude that we reject $H_0$)



In [2]:
(910-1056)/80.

-1.825

In [3]:
norm.cdf(-1.825)

0.034000514669822422

In [4]:
norm.ppf(0.05)

-1.6448536269514729

## Wall Street Example: two-tailed test with population SD

In [5]:
30000./sqrt(40)

4743.416490252569

In [6]:
z = (118000. - 125500)/30000*sqrt(40)
print(z)

-1.5811388300841898


In [7]:
cz_lower = norm.ppf(0.05/2)
cz_upper = norm.ppf(1 - 0.05/2)
print(cz_lower, cz_upper)

-1.95996398454 1.95996398454


In [8]:
p_value = 2*norm.cdf(z)
print(p_value)

0.113846298007


## Unemployment Benefit Example: lower-tail test w/o population SD

In [9]:
t_score = (231. - 238)/80*10
print(t_score)

-0.875


In [10]:
critical_t = t.ppf(0.05, 99)
print(critical_t)

-1.660391156


In [11]:
p_value = t.cdf(t_score, 99)
print(p_value)

0.191845893845


## CNN Viewership Example: two-tailed test w/o population SD

In [12]:
65000/sqrt(40)

10277.402395547233

In [13]:
t_score = (612000. - 600000)/65000*sqrt(40)
print(t_score)

1.167610212985248


In [14]:
p_value = 2*(1 - t.cdf(t_score, 39))
print(p_value)

0.250053080758


In [15]:
critical_t = t.ppf(0.005, 39)
print(critical_t)

-2.70791318352


## Union Membership Example: upper-tail test for proportion

In [16]:
p_bar = 52/400
print(p_bar)

0.13


In [17]:
SE = sqrt(p_bar*(1-p_bar)/400)
print(SE)

0.016815171720800236


In [18]:
z_score = (p_bar - 0.125)/SE
p_value = 1 - norm.cdf(z_score)
print(z_score)
print(p_value)

0.2973505167250266
0.383099458961


In [19]:
critical_z = norm.ppf(.9)
print(critical_z)

1.28155156554


## Brand Name Example: two-tailed test or one-tailed test for proportion?
* A study by Consumer Reports showed that 64% of supermarket shoppers believe super-market brands to be as good as national name brands. To investigate whether this result applies to its own product, the manufacturer of a national name-brand ketchup asked a sample of shoppers whether they believed that supermarket ketchup was as good as the national brand ketchup.
* Formulate the hypotheses that could be used to determine whether the percentage of supermarket shoppers who believe that the supermarket ketchup was as good as the national brand ketchup differed from 64%.


$H_0$: p = 64%

$H_a$: p $\neq$ 64%

#### If a sample of 100 shoppers showed 52 stating that the supermarket brand was as good as the national brand, what is the p-value?
n = 100, $\bar{p} = 52%$, $\sigma_\bar{p}$ = SE = $\sqrt{\frac{\bar{p}(1-\bar{p}}{n}}$

z = $\sqrt{\frac{\bar{p} - p}{\sigma_\bar{p}}}$

p-value = Prob(|Z| $\geq$ |z|) = 2norm.cdf(z)

In [20]:
from math import sqrt
from scipy.stats import norm

In [21]:
p_bar = 52./100
SE = sqrt(p_bar*(1-p_bar)/100)
z = (p_bar - 0.64)/SE
p_value = 2*norm.cdf(z)
print(p_bar, SE)
print(z)
print(p_value)

0.52 0.049959983987187186
-2.401922307076307
0.0163091718778


#### At $\alpha$ = .05, what is your conclusion?
Since p-value < $\alpha$, we, the national brand ketchup manufacturer, reject $H_0$ and conclude that the percentage of shoppers who believe the supermarket brand was as good as our national brand is different from 64%.

#### Should the national brand ketchup manufacturer be pleased with this conclusion? Explain.
Not quite. Good different or bad different?
From the perspective of the national brank manufacturer, the lower the percentage is, the better. Therefore, we can set up a lower tail test for that purpose.

$H_0$: p $\geq$ 64%

$H_a$: p < 64%

In [22]:
# Note that SE and z are all the same as in the two-tailed test.
# The only difference will be the p-value. 
# The p-value in this lower tail test will simply be half of the p-value in the previous two-tailed test
p_value = norm.cdf(z)
print(p_value)

0.00815458593888


#### With new p-value being much smaller than the significance level of 5%, we reject $H_0$ and conclude that the percentage of shopper who believe supermarket brand was as good as our national brand is LESS than 64%.









  

## Calculation of Type II Error and Power: An Example
Fowle Marketing Research, Inc., bases charges to a client on the assumption that telephone surveys can be completed within 15 minutes or less. If more time is required, a premium rate is charged. With a sample of 35 surveys, a population standard deviation of 4 minutes, and a level of significance of .01, the sample mean will be used to test the null hypothesis $H_0$: μ <= 15.

$H_0: \mu \leq 15$ (mean survey time is **at most** 15 minutes)

$H_a: \mu > 15$ (mean survey time is **greater than** 15 minutes)

This is a upper tail test.

#### What is your interpretation of the Type II error for this problem? What is its impact on the firm?
* $\beta$ = Prob(Type II error) = Prob(Accept $H_0$ | $H_0$ is false)
* $\beta$ = Prob(Accept $\mu \leq 15$ | $\mu > 15$)

#### What is the probability of making a Type II error when the actual mean time is μ = 17 minutes?

In [23]:
# We compute critical z value given $\alpha = .01$ first.
# This is an upper tail test. Therefore,
crit_z = norm.ppf(.99)
print(crit_z)

2.32634787404


In [24]:
# Based on critical z, we compute the value of critical sample mean, beyond which we reject $H_0$
# critical sample mean = hypothesized mean (15) + critical z * SE
crit_sample_mean = 15. + crit_z*4/sqrt(35)
print(crit_sample_mean)

16.572898243


#### For a sample mean lower than 16.573 minutes, $H_0$ will be accepted; otherwise, we reject $H_0$
#### Now given μ = 17 (any value greater than 15), we can compute $\beta$, the probability of making Type II error.
#### We are essentially computing the probability that a sample mean will be less than 16.573 given μ = 17. 

In [25]:
z_score = (crit_sample_mean - 17)/4*sqrt(35)
print(z_score)

-0.631692017509


In [26]:
beta = norm.cdf(z_score)
print(beta)

0.263794072565


In [27]:
z_score = (crit_sample_mean - 18)/4*sqrt(35)
print(z_score)

-2.11071196328


In [28]:
beta = norm.cdf(z_score)
print(beta)

0.0173985385396
