### For each of the following questions, formulate a null and alternative hypothesis (be as specific as you can be), then give an example of what a true positive, true negative, type I and type II errors would look like. Note that some of the questions are intentionally phrased in a vague way. It is your job to reword these as more precise questions that could be tested.

#### * Has the network latency gone up since we switched internet service providers?

* Null hypothesis:  There is no difference in network latency since we switched ISPs.
* Alternative hypothesis:  Network latency has increased since we switched ISPs.
* True positive:  Average network latency is higher with the new ISP than the previous ISP.
* True negative:  Average network latency is the same with the new ISP than the previous ISP.
* Type 1 error:  Average network latency appears higher because of periods of excessive traffic.
* Type 2 error:  Average network latency appears the same because of periods of extremely low traffic.

#### * Is the website redesign any good?

* Null hypothesis:  There is no difference in the click-through rate before th redesign.
* Alternative hypothesis:  The website redesign has increase click-through rate.
* True positive:  Click-through rate has increased by 10%
* True negative:  Click-through rate is within 10% of the previous value.
* Type 1 error:  Click-through rate has increased because of a sale, but has a similar click-through rate to the previous design during a sale period.
* Type 2 error:  Click-through rate decreases due to website down time and does not increase enough to be considered positive.

#### * Is our television ad driving more sales?

* Null hypothesis:  Sales revenue remained the same with the TV ad.
* Alternative hypothesis:  The TV ad has increase sales revenue.
* True positive:  Daily revenue is 20% higher during the period of the TV ad.
* True negative:  Daily revenue is within 20% of the daily revenue with no ad.
* Type 1 error:  Daily revenue increases because of a large recurring order during the period of the TV ad.
* Type 2 error:  Daily revenue is within 20% of the daily revenue with no ad because of an economic recession causing customers to buy less.

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy import stats
from env import user, password, host, get_db_url

### 1. Ace Realty wants to determine whether the average time it takes to sell homes is different for its two offices. A sample of 40 sales from office #1 revealed a mean of 90 days and a standard deviation of 15 days. A sample of 50 sales from office #2 revealed a mean of 100 days and a standard deviation of 20 days. Use a .05 level of significance.

In [2]:
office_1_mean = 90
office_1_std = 15
office_1_sales = 40
office_1_outcomes = np.random.normal(loc = office_1_mean, scale = office_1_std, size = office_1_sales)

office_2_mean = 100
office_2_std = 20
office_2_sales = 100
office_2_outcomes = np.random.normal(loc = office_2_mean, scale = office_2_std, size = office_2_sales)

n_simulations = 100_000

office_1_simulations = np.random.choice(office_1_outcomes, size = n_simulations)
office_2_simulations = np.random.choice(office_2_outcomes, size = n_simulations)

In [3]:
null_hypothesis = "There is no difference in the average time it takes to sell houses between the two offices."
alternative_hypothesis = "The average time it takes to sell houses between the two offices differ."

confidence = .95
alpha = 1 - confidence

# def evaluate_p_value(p, alpha, null_hypothesis, alternative_hypothesis):
#     if p < alpha:
#         print("Reject the null hypothesis.")
#         print(f"Move forward with the alternative hypothesis:  {alternative_hypothesis}")
#     else:
#         print(f"Fail to reject null hypothesis:  {null_hypothesis}")
        
t, p = stats.ttest_ind(office_1_simulations, office_2_simulations)
t, p

print(f"t:  {t}, p:  {p}, a:  {alpha}")

t:  -146.30810889686924, p:  0.0, a:  0.050000000000000044


In [4]:
if p < alpha:
    print("Reject the null hypothesis")
else:
    print("Fail to reject the null hypothesis")

Reject the null hypothesis


#### The average time it takes to sell house IS different between the two offices.

### 2. Load the mpg dataset and use it to answer the following questions:

In [5]:
from pydataset import data

# Same confidence level for all questions
confidence = .95
alpha = 1 - confidence

mpg = data("mpg")
mpg['avg_mpg'] = mpg[['cty', 'hwy']].mean(axis = 1)

mpg.head()

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class,avg_mpg
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact,23.5
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact,25.0
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact,25.5
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact,25.5
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact,21.0


* Is there a difference in fuel-efficiency in cars from 2008 vs 1999?

In [6]:
# Two-sample, two-tailed test
null_hypothesis = "There is no difference in avg_mpg between 1999 and 2008 cars"
alt_hypothesis = "There is a difference in avg_mpg between 1999 and 2008 cars"

mpg_2008 = mpg[mpg.year == 2008]
mpg_1999 = mpg[mpg.year == 1999]

t, p = stats.ttest_ind(mpg_1999.avg_mpg, mpg_2008.avg_mpg)
print(f"t:  {t}, p:  {p}, a:  {alpha}")

if p < alpha:
    print("We reject the null hypothesis")
else:
    print("We cannot reject the null hypothesis")

t:  0.21960177245940962, p:  0.8263744040323578, a:  0.050000000000000044
We cannot reject the null hypothesis


#### We cannot reject the null hyothesis that there is no difference in avg_mpg between 1999 and 2008 cars

* Are compact cars more fuel-efficient than the average car?

In [None]:
# One-sample, one-tailed test
null_hypothesis = "Compact cars are just as fuel-efficient as the average car"
alt_hypothesis = "Compact cars are more fuel-efficient than average cars"

mpg_compact = mpg[mpg['class'] == "compact"]
mpg_non_compact = mpg[mpg['class'] != "compact"]

t, p = stats.ttest_ind(mpg_compact.avg_mpg, mpg_non_compact.avg_mpg)
print(f"t:  {t}, p:  {p}, a:  {alpha}")

In [18]:
if p/2 < alpha and t > 0:
    print("We reject the null hypothesis")
else:
    print("We fail reject the null hypothesis")

t:  6.731177612837954, p:  1.3059121585018135e-10, a:  0.050000000000000044
We reject the null hypothesis


#### We reject the null hypothesis that compact car are just as fuel-efficient as the average car 

* Do manual cars get better gas mileage than automatic cars?

In [30]:
# Two-sample, one-tail test
null_hypothesis = "Manual cars have the same avg_mpg as automatic cars"
alt_hypothesis = "Manual cars have a higher avg_mpg than automatic cars"

mpg['simple_trans'] = mpg.trans.str.rsplit("(", 1).str[0]

mpg_manual = mpg[mpg.simple_trans == "manual"]
mpg_auto = mpg[mpg.simple_trans == "auto"]

t, p = stats.ttest_ind(mpg_manual.avg_mpg, mpg_auto.avg_mpg)
print(f"t:  {t}, p:  {p}, a:  {alpha}")

t:  4.593437735750014, p:  7.154374401145683e-06, a:  0.050000000000000044


In [31]:
if p/2 < alpha and t > 0:
    print("We reject the null hypothesis")
else:
    print("We fail to reject the null hypothesis")

We reject the null hypothesis


#### We reject the null hypothesis that manual cars have the same avg_mpg as automatic cars and move forward with the hypothesis that manual cars have a higher avg_mpg than automatic cars.