# Hypothesis Testing

In [1]:

import math

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats



plt.rc('figure', figsize=(16, 9))
plt.rc('axes.spines', top=False, right=False)
plt.rc('patch', ec='black', force_edgecolor=True)
plt.rc('font', size=13)

np.random.seed(13)




## Simulation Based Approach

- Example: Codeup starts producing a scratch off lottery ticket called "Golden Codys". The odds of winning a prize   must be advertised to the state, and since we are manufacturing the tickets, we know exactly how many winners     and losers there are.
   - Theory: Our population is the 1000 tickets we have manufactured
   - Implementation: Create a boolean pandas Series
       - we model win with true and lose with false
       - because of this, we can take the mean of a series to measure the likelihood of winning



In [None]:
tickets = pd.Series([True] * 250 + [False] * 750)
tickets.value_counts().set_axis(['Lose', 'Win']).plot.bar()
plt.xticks(rotation=0)
print('The actual probability of winning is %.2f' % tickets.mean())

## Scenario: Each student in a classroom is given a Golden Cody scratch off at random
## Theory

We we take a random sample and calculate the win rate, we won't expect to see exactly 25% even though we know that is what the overall rate is.

Examples of biased samples:

   - non-response: people that don't respond are a diff population
   - undercoverage: some groups aren't well represented
   - voluntary: people volunteer for surveys
   - leading questions
   - social desirability
   
- Implementation: Use the .sample method on our series to randomly choose 25 tickets, representing 25 students in a classroom. Use .mean to calculate what percentage of the values are True, or are winning tickets.

In [None]:
n = 25
tickets.sample(n).mean()

- Scenario: each student in 4 different cohorts is given a Golden Codys scratch off ticket, we want to calculate     the win rate for each classroom.
- Theory: it is unlikely any classroom will have exactly a 25% winrate, instead we will see slightly different   
  numbers for each classroom.
- Implementation: loop through a range of 5 to get a sample and calculate the winrate 5 times. Each of the 5       
  numbers produced represents the calculated winrate for one classroom of 25 students.

In [None]:
[tickets.sample(n).mean() for _ in range(5)]

- Theory
    
    The win rates we see aren't ever exactly 25%, but we would expect the rate to be closer to 25% than further away. That is, it is more likely that a classroom will come up with a win rate of 24% than a winrate of 10%.

     If we were to repeat this experiment a whole bunch of times, say 10,000, we would find that the calculated winrates are, in fact, normally distributed.

     That is, we would expect the calculated average winrate for a classroom to follow a normal distribution, even though the underlying random process is not based on a normal distribution
     
     
- Implementation: To visualize this, we'll create a series full of calculated winrates for 10,000 classrooms, each classroom made up of 25 students. That is, we will simulate 10,000 classrooms of 25 students getting Golden Codys and calculate the average winrate for each of the 10,000 classrooms.

In [None]:
calculated_winrates = pd.Series([tickets.sample(n).mean() for _ in range(10_000)])
calculated_winrates.plot.hist(bins=15)



   - Theory

    The above visual represents the sampling distribution for the winrate of Golden Codys, based on a class of 25.

    The Central Limit Theorm says that sampling distribution will be normally distributed even if the underlying random variable isn't.




Mini Exercise -- CLT

   -- Calculate, theoretically (in your head is fine), the expected value for the average of 6 dice rolls.
   -- Write the code necessary to simulate taking the average of 6 dice rolls 10,000 times. Store each of the 10,000 averages in a list.
   -- Plot the distribution of the calculated averages. What do you notice about the shape of the distribution of calculated averages? What kind of distribution is the underlying random variable (i.e. one dice roll?)

   - Theory: in practice, we won't have 10,000 samples to work with, but rather just 1. We can stil rely on the sampling distribution being normally distributed, and this has some intersting properties.

   - the mean of the sampling distribution of the sample means will be the "true" mean
   - the variance of the sampling distribution is the population variable divided by the sample size
   - the standard deviation of the sampling distribution is pqn‾‾‾√ (for our example), aka the standard error

In [None]:
se = math.sqrt((.25 * .75) / n)

Let's compare our calculated standard error (i.e. the standard deviation of the sample means) to the experimental value:

In [None]:
print('SE:                 %.4f' % se)
print('sample_means.std(): %.4f' % calculated_winrates.std())

- That the sampling distribution being normally distributed is very powerful and has important implications:

    - confidence intervals for population parameter estimates (e.g. estimating the population mean)

       - as n goes up, interval becomes smaller
       - as certainty goes up, interval becomes larger
       - 99% of calculated CIs will contain the true mean (e.g. if we ran simulations and re-sampled, 99% of the  
         simulations' calculated CIs would contain the true mean)
    
    - hypothesis testing

       - One classroom calculates their winrate at 20% and claims the tickets were distributed unfairly. What is 
         the likelihood we observe a 20% (or smaller) winrate just due to chance?


In [None]:
mean = calculated_winrates.mean()
sd = calculated_winrates.std()

stats.norm(mean, sd).cdf(.2)

The probability we've calculated is a p-value. The likelihood that we observe the data that we have if the null hypothesis is true.

## Null and Alternative Hypothesis

- H0: status quo: no difference or no effect
- Ha: alternative: there is a difference or effect
- whether Ha is ≠, <, >, determines one-tailed vs two-tailed test
  - < or > == one-tailed
  - ≠ == two-tailed
  
### Hypothesis Testing Results

![image.png](attachment:image.png)

# Null Hypothesis Questions

## 1. Has the network latency gone up since we switched internet service providers?
  

In [None]:
# null hypothesis 
"""
There is no statistically significant difference in network latency time between service providers. 
"""
# alternate hypothesis
"""
There is a statistically significant difference in network latency time between service providers.
"""
# true positive (REJECT the null, and the null is FALSE)
"""
When latency was tested a p-value lower than our significance level (0.05) was identified.
(i.e. evidence enough to support the alternative) so and must reject the null hypothesis. 
"""
# true negative (ACCEPT the null, and the null is TRUE)
"""
When latency was tested a p-value greater than our significance level (0.05) was identified.
(i.e. evidence enough to support the null) so we fail to reject the null hypothesis.
"""
# type I error (REJECT the null hypothesis, but, in reality, the null hypothesis is TRUE)
"""
When latency was tested a p-value lower than our significance level (0.05) was identified.
(i.e. evidence enough to support the alternative) but we failed reject the null hypothesis. 
"""
# type II error (ACCEPT the null hypothesis when it is actually FALSE.)
"""
When latency was tested a p-value greater than our significance level (0.05) was identified.
(i.e. evidence enough to support the null) but we rejected the null hypothesis.
"""

## 2. Is the website redesign any good?
   

In [None]:
# null hypothesis
"""
There is no statistically significant quantifiable difference between the old and new design.
"""
# alternate hypothesis
"""
There is a statistically significant quantifiable difference between the old and new design.
"""
# true positive
"""
When quantifiable differences were tested a p-value lower than our significance level (0.05) was identified.
(i.e. evidence enough to support the alternative) so and must reject the null hypothesis.
"""
# true negative
"""
When quantifiable differences were tested a p-value greater than our significance level (0.05) was identified.
(i.e. evidence enough to support the null) so we fail to reject the null hypothesis.
"""
# type I error
"""
When quantifiable differences were tested a p-value lower than our significance level (0.05) was identified.
(i.e. evidence enough to support the alternative) but we failed reject the null hypothesis.
"""
# type II error
"""
When quantifiable differences were tested a p-value greater than our significance level (0.05) was identified.
(i.e. evidence enough to support the null) but we rejected the null hypothesis.
"""

## 3.   Is our television ad driving more sales?

In [None]:
# null hypothesis
"""
The ad has not had a statistically significant impact on sales
"""
# alternate hypothesis
"""
The ad has provided a statistically significant impact on sales
"""
# true positive
"""
When sales were examined a p-value lower than our significance level (0.05) was identified.
(i.e. evidence enough to support the alternative) so and must reject the null hypothesis.
"""
# true negative
"""
When sales were examined a p-value greater than our significance level (0.05) was identified.
(i.e. evidence enough to support the null) so we fail to reject the null hypothesis.
"""
# type I error
"""
When sales were examined a p-value lower than our significance level (0.05) was identified.
(i.e. evidence enough to support the alternative) but we failed reject the null hypothesis.
"""
# type II error
"""
When sales were examined a p-value greater than our significance level (0.05) was identified.
(i.e. evidence enough to support the null) but we rejected the null hypothesis.
"""

# T-Test Problems

In [2]:
from math import sqrt
from scipy import stats

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pydataset import data

## 1. Ace Realty wants to determine whether the average time it takes to sell homes is different for its two offices.
 - A sample of 40 sales from office #1 revealed a mean of 90 days and a standard deviation of 15 days. 
 
 - A sample of 50 sales from office #2 revealed a mean of 100 days and a standard deviation of 20 days. Use a .05 level of significance.

In [None]:
# x1 = office #1
# x2 = office #2

# Mean
xbar1 = 90
xbar2 = 100

#Sales
n1 = 40
n2 = 50

#Standard Deviation
s1 = 15
s2 = 20

degf = n1 + n2 - 2

s_p = sqrt(((n1 - 1) * s1**2 + (n2 - 1) * s2**2) / (n1 + n2 - 2))
s_p

standard_error = se = sqrt(s1**2 / n1 + s2**2 / n2)

t = (xbar1 - xbar2) / (s_p * sqrt(1/n1 + 1/n2))
t

p = stats.t(degf).sf(t) * 2

print(f't = {t:.5f}')
print(f'p = {p:.5f}')


In [None]:
"""
P value is greater than 0.05 so we fail to reject the null:

No statistically significant difference between offices
"""

## 2. Load the mpg dataset and use it to answer the following questions:

   - Is there a difference in fuel-efficiency in cars from 2008 vs 1999?
   - Are compact cars more fuel-efficient than the average car?
   - Do manual cars get better gas mileage than automatic cars?

In [None]:
mpg = data('mpg')
mpg_data = pd.DataFrame(mpg)
mpg_data['average_mileage'] = (mpg_data.cty + mpg_data.hwy) / 2

In [None]:
bool_series = mpg.year == 2008
oh_eight = mpg[bool_series]

In [None]:
bool_series = mpg.year == 1999
nine_nine = mpg[bool_series]

# Is there a difference in fuel-efficiency in cars from 2008 vs 1999?

In [None]:
#Plot Distribution, Estabolish Hypothesese, Set Significance, Verify Assumptions, Compute T-Test, Decide

### Plot Distributions

In [None]:
oh_eight['average_mileage'].hist()

In [None]:
nine_nine['average_mileage'].hist()

### Estabolish Hypotheses
$H_{0}$: Mean fuel efficency of cars in 2008 = Mean fuel efficency of cars in 1999

$H_{a}$: Mean fuel efficency of cars in 2008 != Mean fuel efficency of cars in 1999

### Set Alpha

In [None]:
alpha = .05

###  Verify Assumptions

In [None]:
# of Observations
print(oh_eight.average_mileage.shape)
print(nine_nine.average_mileage.shape)

In [None]:
# Variance (2 Sample T-Test)
print(oh_eight.average_mileage.var())
print(nine_nine.average_mileage.var())

### Compute Test Statistic

In [None]:
t, p = stats.ttest_ind(oh_eight.average_mileage, nine_nine.average_mileage)
t, p/2

### Decide

In [None]:
print("is p/2 < alpha? ", p/2 < alpha)
print("is t > 0? ", t > 0)

In [None]:
if p/2 > alpha:
    print("We fail to reject $H_{0}$")
elif t < 0:
    print("We fail to reject $H_{0}$")
else:
    print("We reject $H_{0}$")

# Are compact cars more fuel-efficient than the average car?

In [None]:
# Filtering for compact and average

bool_series_1 = mpg['class'] == 'compact'
compact = mpg[bool_series_1]

bool_series_2 = mpg['class'] != 'compact'
not_compact = mpg[bool_series_2]

In [None]:
#Order of Opperations
'''
Plot Distribution 
Estabolish Hypothesese
Set alpha
Verify Assumptions
Compute T-Test
Decide
'''

In [None]:
#plot distributions
compact['average_mileage'].hist()

In [None]:
not_compact['average_mileage'].hist()

#Estabolish Hypothesese

𝐻0: Mean fuel efficency of compact cars = Mean fuel efficency of the average car

𝐻𝑎: Mean fuel efficency of compact cars > Mean fuel efficency of the average car

In [None]:
# Set alpha
alpha = .05

In [None]:
# verify assumptions

print(compact.average_mileage.shape)
print(not_compact.average_mileage.shape)

In [None]:
# Variance (2 Sample T-Test)
print(compact.average_mileage.var())
print(not_compact.average_mileage.var())

In [None]:
#compute test statistic
t, p = stats.ttest_ind(compact.average_mileage, not_compact.average_mileage)
t, p

In [None]:
# decide

print("is p/2 < alpha? ", p < alpha)
print("is t > 0? ", t > 0)

In [None]:
if p/2 > alpha:
    print("We fail to reject $H_{0}$")
elif t < 0:
    print("We fail to reject $H_{0}$")
else:
    print("We reject $H_{0}$")

# Do manual cars get better gas mileage than automatic cars?

In [None]:
# Creating a dataframe for manual automobiles
mpg_m = mpg.trans.str.replace('(m5)', '').str.replace('(m6)', '').str.replace('(', '').str.replace(')', '')
mpg_m = mpg_m == 'manual'
mpg_m_df = mpg[mpg_m]

In [None]:
# Creating a dataframe for automatic automobiles
mpg_a = mpg.trans.str.replace('(av)', '').str.replace('(l5)', '').str.replace('(i4)', '').str.replace('(s6)', '').str.replace('(', '').str.replace(')', '')
mpg_a = mpg_a == 'auto'
mpg_a_df = mpg[mpg_a]


In [None]:
#Order of Opperations
'''
Plot Distribution 
Estabolish Hypothesese
Set alpha
Verify Assumptions
Compute T-Test
Decide
'''

In [None]:
# Plot distributions

In [None]:
manual_mpg = mpg_m_df.average_mileage
auto_mpg = mpg_a_df.average_mileage

In [None]:
manual_mpg.hist()

In [None]:
auto_mpg.hist()

Estabolish Hypothesese

𝐻0: Mean gas mileage of manual cars = Mean gas mileage of automatic cars

𝐻𝑎:  Mean gas mileage of manual cars > Mean gas mileage of automatic cars


In [None]:
# Set alpha
alpha = .05

In [None]:
# verify assumptions

print(manual_mpg.shape)
print(auto_mpg.shape)

In [None]:
# Variance (2 Sample T-Test)
print(manual_mpg.var())
print(auto_mpg.var())

In [None]:
#compute test statistic
t, p = stats.ttest_ind(manual_mpg,auto_mpg)
t, p/2

In [None]:
# decide

print("is p/2 < alpha? ", p < alpha)
print("is t > 0? ", t > 0)

In [None]:
if p/2 > alpha:
    print("We fail to reject $H_{0}$")
elif t < 0:
    print("We fail to reject $H_{0}$")
else:
    print("We reject $H_{0}$")

# Correlation Problems

 ## 1. Use the telco_churn data. Does tenure correlate with monthly charges? Total charges? What happens if you control for phone and internet service?
 

In [3]:
telco = pd.read_csv("telco_data.csv")
telco.head()

Unnamed: 0,customer_id,gender,is_senior_citizen,partner,dependents,phone_service,internet_service,contract_type,payment_type,monthly_charges,...,has_phone,has_internet,has_phone_and_internet,partner_dependents,start_date,average_monthly_charges,contract_details,phone_service_details,internet_details,product_key
0,2923-ARZLG,Male,0,Yes,Yes,1,0,1,Mailed check,$19.70,...,True,False,False,3,2020-08-27,#DIV/0!,1 Year,One Line,No Internet Service,1
1,2775-SEFEE,Male,0,No,Yes,2,1,2,Bank transfer (automatic),$61.90,...,True,True,True,2,2020-08-27,#DIV/0!,2 Year,Two or More Lines,DSL,1
2,3115-CZMZD,Male,0,No,Yes,1,0,2,Mailed check,$20.25,...,True,False,False,2,2020-08-27,#DIV/0!,2 Year,One Line,No Internet Service,1
3,3213-VVOLG,Male,0,Yes,Yes,2,0,2,Mailed check,$25.35,...,True,False,False,3,2020-08-27,#DIV/0!,2 Year,Two or More Lines,No Internet Service,1
4,4367-NUYAO,Male,0,Yes,Yes,2,0,2,Mailed check,$25.75,...,True,False,False,3,2020-08-27,#DIV/0!,2 Year,Two or More Lines,No Internet Service,1


In [4]:
#First step it to clean up monthly charges data
telco_m = telco.monthly_charges.str.strip('$')
telco_m = telco_m.astype(float)
telco['monthly_charges_clean'] = telco_m
telco.drop(columns = ['monthly_charges'], inplace = True)
telco.head()

Unnamed: 0,customer_id,gender,is_senior_citizen,partner,dependents,phone_service,internet_service,contract_type,payment_type,total_charges,...,has_internet,has_phone_and_internet,partner_dependents,start_date,average_monthly_charges,contract_details,phone_service_details,internet_details,product_key,monthly_charges_clean
0,2923-ARZLG,Male,0,Yes,Yes,1,0,1,Mailed check,,...,False,False,3,2020-08-27,#DIV/0!,1 Year,One Line,No Internet Service,1,19.7
1,2775-SEFEE,Male,0,No,Yes,2,1,2,Bank transfer (automatic),,...,True,True,2,2020-08-27,#DIV/0!,2 Year,Two or More Lines,DSL,1,61.9
2,3115-CZMZD,Male,0,No,Yes,1,0,2,Mailed check,,...,False,False,2,2020-08-27,#DIV/0!,2 Year,One Line,No Internet Service,1,20.25
3,3213-VVOLG,Male,0,Yes,Yes,2,0,2,Mailed check,,...,False,False,3,2020-08-27,#DIV/0!,2 Year,Two or More Lines,No Internet Service,1,25.35
4,4367-NUYAO,Male,0,Yes,Yes,2,0,2,Mailed check,,...,False,False,3,2020-08-27,#DIV/0!,2 Year,Two or More Lines,No Internet Service,1,25.75


In [5]:
#Next step is to clean up total charges data

telco_t = telco.total_charges.str.replace('$', '').str.replace(',', '')
telco_t = telco_t.astype(float)
telco_t = telco_t.fillna(0)
telco['total_charges_clean'] = telco_t
telco.drop(columns = ['total_charges'], inplace = True)
telco.head()

Unnamed: 0,customer_id,gender,is_senior_citizen,partner,dependents,phone_service,internet_service,contract_type,payment_type,churn,...,has_phone_and_internet,partner_dependents,start_date,average_monthly_charges,contract_details,phone_service_details,internet_details,product_key,monthly_charges_clean,total_charges_clean
0,2923-ARZLG,Male,0,Yes,Yes,1,0,1,Mailed check,No,...,False,3,2020-08-27,#DIV/0!,1 Year,One Line,No Internet Service,1,19.7,0.0
1,2775-SEFEE,Male,0,No,Yes,2,1,2,Bank transfer (automatic),No,...,True,2,2020-08-27,#DIV/0!,2 Year,Two or More Lines,DSL,1,61.9,0.0
2,3115-CZMZD,Male,0,No,Yes,1,0,2,Mailed check,No,...,False,2,2020-08-27,#DIV/0!,2 Year,One Line,No Internet Service,1,20.25,0.0
3,3213-VVOLG,Male,0,Yes,Yes,2,0,2,Mailed check,No,...,False,3,2020-08-27,#DIV/0!,2 Year,Two or More Lines,No Internet Service,1,25.35,0.0
4,4367-NUYAO,Male,0,Yes,Yes,2,0,2,Mailed check,No,...,False,3,2020-08-27,#DIV/0!,2 Year,Two or More Lines,No Internet Service,1,25.75,0.0


# Reference Information for Correlations

In [6]:
n = telco.shape[0]     # number of observations
degf = n - 2        # degrees of freedom: the # of values in the final calculation of a statistic that are free to vary.
conf_interval = .95 # desired confidence interval
α = 1 - conf_interval

In [None]:
telco

### Does tenure correlate with monthly charges?

In [None]:
"""
H0: There is no significantly significant correlation between tenure and monthly charges
"""
x = telco['monthly_charges_clean']
y = telco['monthly_tenure']

def stdev(x):
    variance = ((x - x.mean()) ** 2).sum() / n
    return sqrt(variance)

r_xy = (((x - x.mean()) * (y - y.mean())).sum() / n) / (stdev(x) * stdev(y))
r_xy

In [None]:
t = (r_xy * sqrt(n - 2)) / sqrt(1 - r_xy**2)
t

In [None]:
p = stats.t.sf(t, df=degf) * 2 # *2 for a two-tailed test
p

In [None]:
p < α

In [None]:
corr, p = stats.pearsonr(x, y)
corr, p

### P-value is less than alpha, must reject the null (a significant correlation between tenure and monthly charge)

# Does tenure correlate with total charges? 

In [None]:
"""
H0: There is no statistically significant correlation between tenure and total charges.
"""
x = telco.total_charges_clean
y = telco.monthly_tenure

def stdev(x):
    variance = ((x - x.mean()) ** 2).sum() / n
    return sqrt(variance)

r_xy = (((x - x.mean()) * (y - y.mean())).sum() / n) / (stdev(x) * stdev(y))
r_xy

t = (r_xy * sqrt(n - 2)) / sqrt(1 - r_xy**2)
t

p = stats.t.sf(t, df=degf) * 2 # *2 for a two-tailed test
p

p < α

In [None]:
corr, p = stats.pearsonr(x, y)
corr, p

### P-value is less than alpha, must reject the null (a significant correlation between tenure and total charges)

# Does tenure correlate with monthly charges for phone service?

In [15]:
"""
H0: There is no significantly significant correlation between tenure and monthly charges for phone service
"""
# First must make boolean mask for phone service 

telco_p = telco[telco.has_phone == True]

In [18]:
x = telco_p.monthly_charges_clean
y = telco_p.monthly_tenure

def stdev(x):
    variance = ((x - x.mean()) ** 2).sum() / n
    return sqrt(variance)

r_xy = (((x - x.mean()) * (y - y.mean())).sum() / n) / (stdev(x) * stdev(y))
r_xy

t = (r_xy * sqrt(n - 2)) / sqrt(1 - r_xy**2)
t

p = stats.t.sf(t, df=degf) * 2 # *2 for a two-tailed test
p

p < α

True

In [19]:
corr, p = stats.pearsonr(x, y)
corr, p

(0.24459127886286897, 2.6787677242042115e-87)

### P-value is less than alpha, must reject the null (a significant correlation between tenure and total charges)

# Does tenure correlate with monthly charges for internet service?

In [22]:
"""
H0: There is no significantly significant correlation between tenure and monthly charges for internet service
"""
# First must make boolean mask for internet service

telco_i = telco[telco.has_internet == True]

In [23]:
x = telco_i.monthly_charges_clean
y = telco_i.monthly_tenure

def stdev(x):
    variance = ((x - x.mean()) ** 2).sum() / n
    return sqrt(variance)

r_xy = (((x - x.mean()) * (y - y.mean())).sum() / n) / (stdev(x) * stdev(y))
r_xy

t = (r_xy * sqrt(n - 2)) / sqrt(1 - r_xy**2)
t

p = stats.t.sf(t, df=degf) * 2 # *2 for a two-tailed test
p

p < α

True

In [24]:
corr, p = stats.pearsonr(x, y)
corr, p

(0.372004034289178, 1.2123250564761397e-180)

### P-value is less than alpha, must reject the null (a significant correlation between tenure and total charges)

 ## 2. Use the employees database.

   - Is there a relationship between how long an employee has been with the company and their salary?
   - Is there a relationship between how long an employee has been with the company and the number of titles they 
     have had?

## 3. Use the sleepstudy data. Is there a relationship between days and reaction time?