The null hypothesis, is the "status quo", and usually reflects no change or no difference, while the alternative hypothesis says that there is a difference or change.

Some examples:

H0: There is no difference in right-handed people and left-handed individual's heights.

Ha: There is a difference in right-handed people and left-handed individual's heights.

H0: The amount of sleep a student gets the night before an exam makes no difference on the student's exam score.

Ha: Less sleep the night before an exam leads to a lower exam score.

## Reformulate to more meaningful/clear questions that are testable.
    
   Has the network latency gone up since we switched internet service providers?
        
        What was the average ping during peak hours with our old ISP compared to our new ISP?
        
        Null Hypothesis: There is no change in network latency since changing ISP.
        Alternative hypothesis: There is a change in network latency since changing ISP.
        Type I error(False Positive): Yes, network latency has increased (in fact, latency is the same or has decreased)
        Type II error (False Negative): No, network latency is the same (in fact, latency has increased or decreased)

    Is the website redesign any good?
        What was the average website traffic before the website redesign? 
        Average time on the site? 
        Average number of items sitting in cart and abandoned? How does that compare to the new site?
            
       Null Hypothesis: The website redesign produces the same average daily website traffic has remained unchanged.
       Alternative Hypothesis: The average daily website traffic
       Type I error(False Positive):
       Type II error(False Negative):
    
    Is our television ad driving more sales?
        Have sales increased or decreased compared to season adjusted average sales before the ad?
        
        Null Hypothesis: The sales are exactly the same as they were before the ad.
        Alternative Hypothesis: Sales have increased since the television ad.
        Type I error(False Positive):
        Type II error(False Negative):



In [1]:
import numpy as np
import seaborn as sns
import scipy.stats as stats
import pandas as pd
import matplotlib.pyplot as plt
import env

In [2]:
# Ace Realty wants to determine whether the average time it takes to sell homes is different for its two 
#offices. A sample of 40 sales from office #1 revealed a mean of 90 days and a standard deviation of 15 days. 
#A sample of 50 sales from office #2 revealed a mean of 100 days and a standard deviation of 20 days. 
#Use a .05 level of significance.

#sample 1
alpha = 0.5

mean1 = 90
std1 = 15
sample1 = 40

#sample 2

mean2 = 100
std2 = 20
sample2 = 50

t, p = stats.ttest_ind_from_stats(mean1, std1, sample1, mean2, std2, sample2, equal_var = False)
t,p

(-2.7091418459143854, 0.00811206270346016)

In [3]:
from pydataset import data

In [4]:
mpg = data('mpg')



Load the mpg dataset and use it to answer the following questions:

    Is there a difference in fuel-efficiency in cars from 2008 vs 1999?
    Are compact cars more fuel-efficient than the average car?
    Do manual cars get better gas mileage than automatic cars?



In [5]:
mpg.head()

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact


In [6]:
# Is there a difference in fuel-efficiency in cars from 2008 vs 1999?
    # can use harmonic mean bc calculating the rate of something(mpg)
    # axis = 1 (row-wise)

mpg['afe'] = stats.hmean(mpg[['cty', 'hwy']], axis = 1) #harmonic mean is appropriate here
mpg['arithmetic_afe'] = (mpg.cty + mpg.hwy) / 2
mpg.head()

#could've calculated the arithmetic mean by using: df['avg_mpg'] = (df.cty + df.hwy)/2

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class,afe,arithmetic_afe
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact,22.212766,23.5
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact,24.36,25.0
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact,24.313725,25.5
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact,24.705882,25.5
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact,19.809524,21.0


In [7]:
fuel_economy_99 = mpg[mpg.year == 1999].afe
fuel_economy_08= mpg[mpg.year == 2008].afe

In [8]:
t,p = stats.ttest_ind(fuel_economy_99, fuel_economy_08) #ttest_ind bc two individual means

t,p #is there a difference? (no directionality); p

print(f'p ({p:.3f}) is > alpha(0.5)') #bc p is 0.764 which is higher than 0.5, fail to reject null (essentially no diff)

p (0.764) is > alpha(0.5)


In [9]:
# Are compact cars more fuel-efficient than the average car? one sample test, implying directionality by asking if more than, so two tail test

compact_fuel_economy = mpg[mpg['class'] == 'compact'].afe #class is a protected word. double brackets needed
μ = mpg.afe.mean()

t,p = stats.ttest_1samp(compact_fuel_economy, μ) #more fuel efficcient (directionality); p/2
t,p
print(f' p/2({p/2:.9f}) < alpha (0.5)') #bc p is less than alpha, reject null

 p/2(0.000000001) < alpha (0.5)


In [10]:
# Do manual cars get better gas mileage than automatic cars?

auto_fuel_economy = mpg[mpg.trans.str.startswith('a')].afe
manual_fuel_economy = mpg[mpg.trans.str.startswith('m')].afe

t,p = stats.ttest_ind(manual_fuel_economy, auto_fuel_economy) #better gas mileage (directionality); p/2
t,p
print(f' p/2({p/2:.7f}) < alpha(0.5)') #because p < alpha reject null

 p/2(0.0000028) < alpha(0.5)


---

In [17]:
#import os
#os.getcwd()