# Hypothesis Testing Exercises

---

### For each of the following questions, formulate a null and alternative hypothesis (be as specific as you can be), then give an example of what a true positive, true negative, type I and type II errors would look like. Note that some of the questions are intentionally phrased in a vague way. It is your job to reword these as more precise questions that could be tested.

---

### Has the network latency gone up since we switched internet service providers?

#### Source of Data
* speed of queries

#### Hypotheses
* $H_0$: The speed of query execution hasn't changed since we switched internet service providers
* $H_a$: The speed of query execution has gone up since we switched internet service providers

#### Possible Results
* **True Positive**: We find that the speed of query execution has gone up and it has gone up.
* **True Negative**: We find that the speed of query execution hasn't changed and it hasn't changed.
* **False Positive (Type I Error)**: We find that the speed of query execution has gone up but we had upgraded to fiber optics also.
* **False Negative (Type II Error)**: We find that the speed of query execution hasn't changed but we have more server requests due to a new department.

---

### Is the website redesign any good?

#### Source of Data

- Number of visits

#### Hypotheses
* $H_0$: The number of visits have not changed.
* $H_a$: The number of visits are higher.

#### Possible Results
* **True Positive**: We find that the number of visits are higher than the previous website and they are higher.
* **True Negative**: We find that the number of visits haven't changed and it hasn't changed.
* **False Positive (Type I Error)**: We find that the number of visits are higher but we recently ran an internet add campaign.
* **False Negative (Type II Error)**: We find that the number of visits haven't changed but our ad campaign just ended.

---

### Is our television ad driving more sales?

* Total sales

#### Hypotheses
* $H_0$: Our total sales haven't changed.
* $H_a$: Our toal sales have gone up.

#### Possible Results
* **True Positive**: We find total sales have gone up and they have gone up.
* **True Negative**: We find that the total sales haven't changed and they aren't causing more or less sales.
* **False Positive (Type I Error)**: We find that total sales have gone up but we also just opened a new store.
* **False Negative (Type II Error)**: We find that the total sales haven't changed but we also just closed a store.

# T-Test Exercises
---
## Ace Realty wants to determine whether the average time it takes to sell homes is different for its two offices. A sample of 40 sales from office #1 revealed a mean of 90 days and a standard deviation of 15 days. A sample of 50 sales from office #2 revealed a mean of 100 days and a standard deviation of 20 days. Use a .05 level of significance.

$H_0$: The average time it takes to sell a home is not different between the two offices

$H_a$: The average time it takes to sell a home is different between the two offices.

$\alpha$ = 0.05

In [1]:
from math import sqrt

In [2]:
xbar1 = 90
xbar2 = 100

n1 = 40
n2 = 50

s1 = 15
s2 = 20

degf = n1 + n2 - 2

s_p = sqrt(((n1 - 1) * s1**2 + (n2 - 1) * s2**2) / (n1 + n2 - 2))
s_p

standard_error = se = sqrt(s1**2 / n1 + s2**2 / n2)

t = (xbar1 - xbar2) / (s_p * sqrt(1/n1 + 1/n2))
t

In [3]:
from scipy import stats

In [4]:
p = stats.t(degf).cdf(t) * 2
p

0.01020985244923939

In [5]:
alpha = 0.05

p > alpha

False

So we reject $H_0$

## Load the mpg dataset and use it to answer the following questions:

In [6]:
from pydataset import data

mpg = data('mpg')
mpg

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact
...,...,...,...,...,...,...,...,...,...,...,...
230,volkswagen,passat,2.0,2008,4,auto(s6),f,19,28,p,midsize
231,volkswagen,passat,2.0,2008,4,manual(m6),f,21,29,p,midsize
232,volkswagen,passat,2.8,1999,6,auto(l5),f,16,26,p,midsize
233,volkswagen,passat,2.8,1999,6,manual(m5),f,18,26,p,midsize


### Is there a difference in fuel-efficiency in cars from 2008 vs 1999?

In [7]:
mpg['avg_mpg'] = (mpg.cty + mpg.hwy) / 2 
mpg

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class,avg_mpg
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact,23.5
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact,25.0
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact,25.5
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact,25.5
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact,21.0
...,...,...,...,...,...,...,...,...,...,...,...,...
230,volkswagen,passat,2.0,2008,4,auto(s6),f,19,28,p,midsize,23.5
231,volkswagen,passat,2.0,2008,4,manual(m6),f,21,29,p,midsize,25.0
232,volkswagen,passat,2.8,1999,6,auto(l5),f,16,26,p,midsize,21.0
233,volkswagen,passat,2.8,1999,6,manual(m5),f,18,26,p,midsize,22.0


$H_0$: There is no difference between fuel-efficiency of cars from 2008 and 1999

$H_a$: There is a difference between fuel-efficiency of cars from 2008 and 1999

$\alpha$ = 0.01

In [8]:
x1 = mpg[mpg.year == 2008].avg_mpg
x2 = mpg[mpg.year == 1999].avg_mpg

tstat, p = stats.ttest_ind(x1, x2)
tstat, p

(-0.21960177245940962, 0.8263744040323578)

In [9]:
alpha = 0.01

p > alpha

True

So we fail to reject $H_0$

### Are compact cars more fuel-efficient than the average car?

$H_0$: There is no difference between compact cars fuel-efficiency and the average car fuel efficiency

$H_a$: There is a difference between compact cars fuel-efficiency and the average car fuel efficiency

$\alpha$ = 0.01

In [10]:
compact = mpg[mpg['class'] == 'compact'].avg_mpg

tsat, p = stats.ttest_1samp(compact, mpg.avg_mpg.mean())
tsat, p

(7.896888573132535, 4.1985637943171336e-10)

In [11]:
alpha = 0.01

p > alpha

False

So we reject $H_0$

### Do manual cars get better gas mileage than automatic cars?

$H_0$: There is no difference in gas mileage between manual and automatic cars.

$H_a$: There is a difference in gas mileage between manual and automatic cars.

$\alpha$ = 0.01

In [12]:
mpg

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class,avg_mpg
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact,23.5
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact,25.0
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact,25.5
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact,25.5
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact,21.0
...,...,...,...,...,...,...,...,...,...,...,...,...
230,volkswagen,passat,2.0,2008,4,auto(s6),f,19,28,p,midsize,23.5
231,volkswagen,passat,2.0,2008,4,manual(m6),f,21,29,p,midsize,25.0
232,volkswagen,passat,2.8,1999,6,auto(l5),f,16,26,p,midsize,21.0
233,volkswagen,passat,2.8,1999,6,manual(m5),f,18,26,p,midsize,22.0


In [13]:
mpg.trans = mpg.trans.str[:-4]
mpg.trans

1        auto
2      manual
3      manual
4        auto
5        auto
        ...  
230      auto
231    manual
232      auto
233    manual
234      auto
Name: trans, Length: 234, dtype: object

In [14]:
x1 = mpg[mpg.trans == 'auto'].avg_mpg
x2 = mpg[mpg.trans == 'manual'].avg_mpg

tstat, p = stats.ttest_ind(x1, x2)
tstat, p

(-4.593437735750014, 7.154374401145683e-06)

In [15]:
alpha = 0.01

p > alpha

False

So we reject $H_0$