In [1]:
from scipy.stats import norm, t

## Cruise Ship Rating Example

Condé Nast Traveler conducts an annual survey in which readers rate their favorite cruise ship. All ships are rated on a 100-point scale, with higher values indicating better service. A sample of 37 ships that carry fewer than 500 passengers resulted in an average rating of 85.36, and a sample of 44 ships that carry 500 or more passengers provided an average rating of 81.40. Assume that the population standard deviation is 4.55 for ships that carry fewer than 500 passengers and 3.97 for ships that carry 500 or more passengers.

a. What is the point estimate of the difference between the population mean rating for ships that carry fewer than 500 passengers and the population mean rating for ships that carry 500 or more passengers?

b. At 95% confidence, what is the margin of error?

c. What is a 95% confidence interval estimate of the difference between the population mean ratings for the two sizes of ships?

d. (New Questions) Test whether small cruise ships have better rating than large cruise ships at 1% level of significance.

#### population 1: ships that carry fewer than 500 passengers
$\bar{X}_1$ = 85.36, n1 = 37, $\sigma_1$ = 4.55

#### population 2: ships that carry more than 500 passengers
$\bar{X}_2$ = 81.40, n2 = 44, $\sigma_2$ = 3.97

In [2]:
x1 = 85.36; n1 = 37; sigma1 = 4.55; x2 = 81.40; n2 = 44; sigma2 = 3.97; CL = 0.95; alpha = 0.01

In [3]:
# a: What is the point estimate of the difference between the population mean ratings?

x1 - x2

3.9599999999999937

In [4]:
# b: At 95% confidence, what is the margin of error?

se_2means = lambda s1, s2, n1, n2: pow(s1*s1/n1 + s2*s2/n2, 0.5)
SE = se_2means(sigma1, sigma2, n1, n2)

print(SE)

0.957981889053389


In [5]:
crit_z = norm.ppf(0.5 + CL/2)
print(crit_z)

1.95996398454


In [6]:
MOE = crit_z*SE
print(MOE)

1.87761000039


In [7]:
# c: What is a 95% confidence interval estimate of the difference between the population mean ratings?
[x1-x2 - MOE, x1-x2 +MOE]

[2.0823899996137056, 5.8376100003862819]

#### Suppose we want to test whether small cruise ships have better rating than large cruise ships.
$H_0: \mu_1 - \mu_2 \leq 0$

$H_1: \mu_1 - \mu_2 > 0$

In [8]:
z = (x1-x2)/SE
p_value = 1 - norm.cdf(z)
print(z, p_value)

4.13368983824213 1.78492582694e-05


## Commute Distance Example
The U.S. Department of Transportation provides the number of miles that residents of the 75 largest metropolitan areas travel per day in a car. Suppose that for a simple random sample of 50 Buffalo residents the mean is 22.5 miles a day and the standard deviation is 8.4 miles a day, and for an independent simple random sample of 40 Boston residents the mean is 18.6 miles a day and the standard deviation is 7.4 miles a day.

a. What is the point estimate of the difference between the mean number of miles that Buffalo residents travel per day and the mean number of miles that Boston residents travel per day?

b. What is the 95% confidence interval for the difference between the two population means?

In [9]:
x1 = 22.5; n1 = 50; s1 = 8.4; x2 = 18.6; n2 = 40; s2 = 7.4; CL = 0.95

In [10]:
# a: What is the point estimate of the difference?

print(x1 - x2)

3.8999999999999986


In [11]:
# b: What is the 95% confidence interval for the difference between the two population means?

SE = se_2means(s1, s2, n1, n2)
print(SE)

1.6673931749890307


In [12]:
df_2means = lambda s1, s2, n1, n2: ((s1**2/n1 + s2**2/n2)**2)/((s1**2/n1)**2/(n1-1) + (s2**2/n2)**2/(n2-1))
df = df_2means(s1, s2, n1, n2)
print(df)

87.14418174007069


In [13]:
t_val = t.ppf((1 + CL)/2, df)
MOE = t_val*SE

print(t_val, MOE)
print([x1-x2 - MOE, x1-x2 +MOE])

1.98756191451 3.31404717113
[0.58595282887375699, 7.2140471711262402]


In [14]:
se_2means(6000, 7000, 40, 50)

1371.1309200802089

In [15]:
(56100-59400)/1371.1309200802089

-2.40677235971526

In [16]:
df_2means(6000, 7000, 40, 50)

87.55182926829268

## PGA 6-foot Putt Example
The Professional Golf Association (PGA) measured the putting accuracy of professional golfers playing on the PGA Tour and the best amateur golfers playing in the World Amateur Championship (Golf Magazine, January 2007). A sample of 1075 6-foot putts by professional golfers found 688 made puts. A sample of 1200 6-foot putts by amateur golfers found 696 made putts.

a. Estimate the proportion of made 6-foot putts by professional golfers. Estimate the proportion of made 6-foot putts by amateur golfers. Which group had a better putting accuracy?

b. What is the point estimate of the difference between the proportions of the two populations? What does this estimate tell you about the percentage of putts made by the two groups of golfers?

c. What is the 95% confidence interval for the difference between the two population proportions? Interpret his confidence interval in terms of the percentage of putts made by the two groups of golfers.

d. (New Question) Test whether pros are significantly better than amateurs in putting accuracy (alpha = 0.05).


In [40]:
# a: Estimate the proportion of made 6-foot putts by professional golfers. 
# Estimate the proportion of made 6-foot putts by amateur golfers. Which group had a better putting accuracy?

p1 = 688./1075; n1 = 1075; p2 = 696./1200; n2 = 1200; CL = 0.95; alpha = 0.05
print(p1, p2)

0.64 0.58


In [41]:
# b: What is the point estimate of the difference between the proportions of the two populations? 
# What does this estimate tell you about the percentage of putts made by the two groups of golfers?

print(p1-p2)

0.06000000000000005


In [42]:
# c: What is the 95% confidence interval for the difference between the two population proportions? 
# Interpret his confidence interval in terms of the percentage of putts made by the two groups of golfers.

se_2proportions_ci = lambda p1, p2, n1, n2: pow(p1*(1-p1)/n1 + p2*(1-p2)/n2, 0.5)

SE = se_2proportions_ci(p1, p2, n1, n2)
print(SE)

0.020428548195976844


In [43]:
z = norm.ppf(0.5 + CL/2)
MOE = z*SE
print (z, MOE)
print([p1-p2 - MOE, p1-p2+MOE])

1.95996398454 0.0400392187206
[0.019960781279444749, 0.10003921872055535]


In [44]:
# d: Test whether pros are significantly better than amateurs in putting accuracy (alpha = 0.05).

# H0: p1 <= p2; Ha: p1 > p2

# Compute SE
pbar = lambda p1, p2, n1, n2: (n1*p1 + n2*p2)/(n1 + n2)
se_2proportions_ht = lambda p1, p2, n1, n2: pow(pbar(p1, p2, n1, n2)*(1-pbar(p1, p2, n1, n2))*(1/n1 + 1/n2), 0.5)

SE = se_2proportions_ht(p1, p2, n1, n2)
print(SE)


0.020498465033880177


In [45]:
z = (p1 - p2)/SE
p_val = 1 - norm.cdf(z)
print(z, p_val)

2.927048435130686 0.00171097792167
