# Hypothesis Testing Exercise

By Lindy Castellaw, Jessica Gardin, David Schneemann, Meredith Wang

# 1 Has the network latency gone up since we switched internet service providers?

H0 = Network latency is equal or has increased.

Ha = Network Latency has decreased.

True Positive

    small p-value -- < alpha (.001)
    reject 𝐻0

    (The data shows that survey score did improve)
    avg survey score before switching internet service: 65 ms
    avg survey score after switching internet service: 60 ms

False Positive

    small p-value
    reject 𝐻0
    
    avg survey score before switching internet service: 65 ms
    avg survey score after switching internet service: 60 ms

    test was re-taken while internet activity had significantly increased, causing an increase in ping time

True Negative

    higher p-value
    fail to reject 𝐻0

   
    avg survey score before switching internet service: 65 ms
    avg survey score after switching internet service: 60 ms

False Negative

    higher p-value
    fail to reject 𝐻0
    
    test was originally taken while internet activity had significantly increased, causing an increase in ping time



# 2 Is the website redesign any good?

H0 = Website has same or fewer visits.

Ha = Website has more visits.

True Positive

    small p-value -- < alpha (.001)
    reject 𝐻0

    (The data shows that redesign did improve the website)
    avg visits before redesign: 1,000
    avg visits after redesign: 5,000
    
    website design is genuinely better

False Positive

    small p-value
    reject 𝐻0

    avg visits before redesign: 1,000
    avg visits after redesign: 5,000
    
    Second test was done on Cyber Monday

True Negative

    higher p-value
    fail to reject 𝐻0

    (The data shows that the website did worse after redesign)
    avg visits before redesign: 5,000
    avg visits after redesign: 1,000
    
    website is genuinely worse

False Negative

    higher p-value
    fail to reject 𝐻0
    
    (The data shows that the website did worse after redesign)
    avg visits before redesign: 5,000
    avg visits after redesign: 1,000

    website crashed during re-test

# 3 Is our television ad driving more sales?

H0 = Monthly Avg sales stayed the same or decreased after airing ad

Ha = Monthly Avg sales increased after airing ad

True Positive

    small p-value -- < alpha (.001)
    reject 𝐻0

    (The data shows that sales increased after airing ad)
    avg monthly sales before ad: 10,000
    avg monthly sales after ad: 15,000

False Positive

    small p-value
    reject 𝐻0

    (The data shows that sales increased after airing ad)
    avg monthly sales before ad: 10,000
    avg monthly sales after ad: 15,000
    
    competitor across street closed for second test

True Negative

    higher p-value
    fail to reject 𝐻0

    (The data shows that sales decreased after airing ad)
    avg monthly sales before ad: 15,000
    avg monthly sales after ad: 10,000

False Negative

    higher p-value
    fail to reject 𝐻0
    
    (The data shows that sales decreased after airing ad)
    avg monthly sales before ad: 15,000
    avg monthly sales after ad: 10,000

    extreme weather caused lower customer traffic 

In [5]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy import stats

# T-Test Exercises

Ace Realty wants to determine whether the average time it takes to sell homes is different for its two offices. A sample of 40 sales from office #1 revealed a mean of 90 days and a standard deviation of 15 days. A sample of 50 sales from office #2 revealed a mean of 100 days and a standard deviation of 20 days. Use a .05 level of significance.

In [14]:
# Ho = sales of office 1 and 2 are equal (=)
# Ha = sales of office 1 are different from office 2 (!=)

mean1 = 90
std1 = 15
nobs1 = 40
mean2 = 100
std2 = 20
nobs2 = 50
alpha = 0.05

In [15]:
t,p = stats.ttest_ind_from_stats(mean1, std1, nobs1, mean2, std2, nobs2, equal_var=True, alternative='two-sided')

In [16]:
t, p, alpha

(-2.6252287036468456, 0.01020985244923939, 0.05)

In [19]:
if p < alpha:
    print("We reject the null hypothesis")
else:
    print("We fail to reject the null hypothesis")

We reject the null hypothesis


#### Load the mpg dataset and use it to answer the following questions:

Is there a difference in fuel-efficiency in cars from 2008 vs 1999?
Are compact cars more fuel-efficient than the average car?
Do manual cars get better gas mileage than automatic cars?

In [2]:
from pydataset import data

In [3]:
data('mpg', show_doc=True) # view the documentation for the dataset
mpg = data('mpg') # load the dataset and store it in a variable

mpg

PyDataset Documentation (adopted from R Documentation. The displayed examples are in R)

## Fuel economy data from 1999 and 2008 for 38 popular models of car

### Description

This dataset contains a subset of the fuel economy data that the EPA makes
available on http://fueleconomy.gov. It contains only models which had a new
release every year between 1999 and 2008 - this was used as a proxy for the
popularity of the car.

### Usage

    data(mpg)

### Format

A data frame with 234 rows and 11 variables

### Details

  * manufacturer. 

  * model. 

  * displ. engine displacement, in litres 

  * year. 

  * cyl. number of cylinders 

  * trans. type of transmission 

  * drv. f = front-wheel drive, r = rear wheel drive, 4 = 4wd 

  * cty. city miles per gallon 

  * hwy. highway miles per gallon 

  * fl. 

  * class. 




In [4]:
mpg.head()

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact


In [25]:
mpg['average_mileage'] = (mpg.cty + mpg.hwy)/2
mpg.head()

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class,average_mileage
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact,23.5
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact,25.0
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact,25.5
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact,25.5
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact,21.0


In [36]:
sample_2008 = mpg[mpg['year'] == '2008'].average_mileage
sample_1999 = mpg[mpg['year'] == '1999'].average_mileage

In [38]:
sample_1999

Series([], Name: average_mileage, dtype: float64)

In [37]:
sample_2008.var(), sample_1999.var()

(nan, nan)

In [29]:
t, p = stats.ttest_ind(sample_2008,sample_1999, equal_var=False)

In [30]:
t,p,alpha

(nan, nan, 0.05)