### For each of the following questions, formulate a null and alternative hypothesis (be as specific as you can be), then give an example of what a true positive, true negative, type I and type II errors would look like. Note that some of the questions are intentionally phrased in a vague way. It is your job to reword these as more precise questions that could be tested.

#### * Has the network latency gone up since we switched internet service providers?

* Null hypothesis:  There is no difference in network latency since we switched ISPs.
* Alternative hypothesis:  Network latency has increased since we switched ISPs.
* True positive:  Average network latency is higher with the new ISP than the previous ISP.
* True negative:  Average network latency is the same with the new ISP than the previous ISP.
* Type 1 error:  Average network latency appears higher because of periods of excessive traffic.
* Type 2 error:  Average network latency appears the same because of periods of extremely low traffic.

#### * Is the website redesign any good?

* Null hypothesis:  There is no difference in the click-through rate before th redesign.
* Alternative hypothesis:  The website redesign has increase click-through rate.
* True positive:  Click-through rate has increased by 10%
* True negative:  Click-through rate is within 10% of the previous value.
* Type 1 error:  Click-through rate has increased because of a sale, but has a similar click-through rate to the previous design during a sale period.
* Type 2 error:  Click-through rate decreases due to website down time and does not increase enough to be considered positive.

#### * Is our television ad driving more sales?

* Null hypothesis:  Sales revenue remained the same with the TV ad.
* Alternative hypothesis:  The TV ad has increase sales revenue.
* True positive:  Daily revenue is 20% higher during the period of the TV ad.
* True negative:  Daily revenue is within 20% of the daily revenue with no ad.
* Type 1 error:  Daily revenue increases because of a large recurring order during the period of the TV ad.
* Type 2 error:  Daily revenue is within 20% of the daily revenue with no ad because of an economic recession causing customers to buy less.

In [25]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy import stats
from env import user, password, host, get_db_url
from hypothesis import evaluate_hypothesis

### 1. Ace Realty wants to determine whether the average time it takes to sell homes is different for its two offices. A sample of 40 sales from office #1 revealed a mean of 90 days and a standard deviation of 15 days. A sample of 50 sales from office #2 revealed a mean of 100 days and a standard deviation of 20 days. Use a .05 level of significance.

In [20]:
null_hypothesis = "There is no difference in the average time it takes to sell houses between the two offices."
alternative_hypothesis = "The average time it takes to sell houses between the two offices differ."

confidence = .95
alpha = 1 - confidence

In [21]:
t, p = stats.ttest_ind_from_stats(90, 15, 40, 100, 20, 50)

In [22]:
evaluate_hypothesis(null_hypothesis, alternative_hypothesis, alpha, p, t)

t:  -2.6252287036468456, p:  0.01020985244923939, a:  0.050000000000000044

We reject the null hypothesis.
We move forward with the alternative hypothesis:  The average time it takes to sell houses between the two offices differ.


#### The average time it takes to sell house IS different between the two offices.

### 2. Load the mpg dataset and use it to answer the following questions:

In [11]:
from pydataset import data

# Same confidence level for all questions
confidence = .95
alpha = 1 - confidence

mpg = data("mpg")
mpg['avg_mpg'] = mpg[['cty', 'hwy']].mean(axis = 1)

mpg.head()

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class,avg_mpg
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact,23.5
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact,25.0
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact,25.5
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact,25.5
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact,21.0


In [12]:
# Alternative solution
# harmonic mean = 2 / (1/cty + 1/hwy); use harmonic if calculating the average while keeping the unit of the numerator constant (miles in this case)
# arithmetic mean = (cty + hwy) / 2; use arithmetic if calculating the average while keeping the unit of the denominator constant (gallons of fuel)
mpg['hmean_mpg'] = stats.hmean(mpg[['cty', 'hwy']], axis = 1)
mpg.head()

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class,avg_mpg,hmean_mpg
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact,23.5,22.212766
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact,25.0,24.36
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact,25.5,24.313725
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact,25.5,24.705882
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact,21.0,19.809524


* Is there a difference in fuel-efficiency in cars from 2008 vs 1999?

In [None]:
# Two-sample, two-tailed test
null_hypothesis = "There is no difference in avg_mpg between 1999 and 2008 cars"
alt_hypothesis = "There is a difference in avg_mpg between 1999 and 2008 cars"

mpg_2008 = mpg[mpg.year == 2008]
mpg_1999 = mpg[mpg.year == 1999]

t, p = stats.ttest_ind(mpg_1999.avg_mpg, mpg_2008.avg_mpg)

In [13]:
evaluate_hypothesis(null_hypothesis, alternative_hypothesis, alpha, p, t)

t:  -2.6252287036468456, p:  0.01020985244923939, a:  0.050000000000000044

We reject the null hypothesis.
We move forward with the alternative hypothesis:  The average time it takes to sell houses between the two offices differ.


#### We cannot reject the null hyothesis that there is no difference in avg_mpg between 1999 and 2008 cars

* Are compact cars more fuel-efficient than the average car?

In [14]:
# One-sample, one-tailed test
null_hypothesis = "Compact cars are just as fuel-efficient as the average car"
alt_hypothesis = "Compact cars are more fuel-efficient than average cars"

mpg_compact = mpg[mpg['class'] == "compact"]
mpg_non_compact = mpg[mpg['class'] != "compact"]

t, p = stats.ttest_ind(mpg_compact.avg_mpg, mpg_non_compact.avg_mpg)

In [15]:
# Alternative solution
t, p =stats.ttest_1samp(mpg_compact.avg_mpg, mpg_non_compact.avg_mpg.mean(), alternative = "greater")

In [16]:
evaluate_hypothesis(null_hypothesis, alt_hypothesis, alpha, p, t, "greater")

t:  9.881668054080286, p:  2.967609174837242e-13, a:  0.050000000000000044

We reject the null hypothesis.
We move forward with the alternative hypothesis:  Compact cars are more fuel-efficient than average cars


#### We reject the null hypothesis that compact car are just as fuel-efficient as the average car 

* Do manual cars get better gas mileage than automatic cars?

In [17]:
# Two-sample, one-tail test
null_hypothesis = "Manual cars have the same avg_mpg as automatic cars"
alt_hypothesis = "Manual cars have a higher avg_mpg than automatic cars"

mpg['simple_trans'] = mpg.trans.str.rsplit("(", 1).str[0]

mpg_manual = mpg[mpg.simple_trans == "manual"]
mpg_auto = mpg[mpg.simple_trans == "auto"]

t, p = stats.ttest_ind(mpg_manual.avg_mpg, mpg_auto.avg_mpg)

In [18]:
evaluate_hypothesis(null_hypothesis, alt_hypothesis, alpha, p, t, "greater")

t:  4.593437735750014, p:  7.154374401145683e-06, a:  0.050000000000000044

We reject the null hypothesis.
We move forward with the alternative hypothesis:  Manual cars have a higher avg_mpg than automatic cars


#### We reject the null hypothesis that manual cars have the same avg_mpg as automatic cars and move forward with the hypothesis that manual cars have a higher avg_mpg than automatic cars.