# Null Hypothesis Questions

## 1. Has the network latency gone up since we switched internet service providers?
  

In [1]:
# null hypothesis 
"""
There is no statistically significant difference in network latency time between service providers. 
"""
# alternate hypothesis
"""
There is a statistically significant difference in network latency time between service providers.
"""
# true positive (REJECT the null, and the null is FALSE)
"""
When latency was tested a p-value lower than our significance level (0.05) was identified.
(i.e. evidence enough to support the alternative) so and must reject the null hypothesis. 
"""
# true negative (ACCEPT the null, and the null is TRUE)
"""
When latency was tested a p-value greater than our significance level (0.05) was identified.
(i.e. evidence enough to support the null) so we fail to reject the null hypothesis.
"""
# type I error (REJECT the null hypothesis, but, in reality, the null hypothesis is TRUE)
"""
When latency was tested a p-value lower than our significance level (0.05) was identified.
(i.e. evidence enough to support the alternative) but we failed reject the null hypothesis. 
"""
# type II error (ACCEPT the null hypothesis when it is actually FALSE.)
"""
When latency was tested a p-value greater than our significance level (0.05) was identified.
(i.e. evidence enough to support the null) but we rejected the null hypothesis.
"""

'\nWhen latency was tested a p-value greater than our significance level (0.05) was identified.\n(i.e. evidence enough to support the null) but we rejected the null hypothesis.\n'

## 2. Is the website redesign any good?
   

In [2]:
# null hypothesis
"""
There is no statistically significant quantifiable difference between the old and new design.
"""
# alternate hypothesis
"""
There is a statistically significant quantifiable difference between the old and new design.
"""
# true positive
"""
When quantifiable differences were tested a p-value lower than our significance level (0.05) was identified.
(i.e. evidence enough to support the alternative) so and must reject the null hypothesis.
"""
# true negative
"""
When quantifiable differences were tested a p-value greater than our significance level (0.05) was identified.
(i.e. evidence enough to support the null) so we fail to reject the null hypothesis.
"""
# type I error
"""
When quantifiable differences were tested a p-value lower than our significance level (0.05) was identified.
(i.e. evidence enough to support the alternative) but we failed reject the null hypothesis.
"""
# type II error
"""
When quantifiable differences were tested a p-value greater than our significance level (0.05) was identified.
(i.e. evidence enough to support the null) but we rejected the null hypothesis.
"""

'\nWhen quantifiable differences were tested a p-value greater than our significance level (0.05) was identified.\n(i.e. evidence enough to support the null) but we rejected the null hypothesis.\n'

## 3.   Is our television ad driving more sales?

In [3]:
# null hypothesis
"""
The ad has not had a statistically significant impact on sales
"""
# alternate hypothesis
"""
The ad has provided a statistically significant impact on sales
"""
# true positive
"""
When sales were examined a p-value lower than our significance level (0.05) was identified.
(i.e. evidence enough to support the alternative) so and must reject the null hypothesis.
"""
# true negative
"""
When sales were examined a p-value greater than our significance level (0.05) was identified.
(i.e. evidence enough to support the null) so we fail to reject the null hypothesis.
"""
# type I error
"""
When sales were examined a p-value lower than our significance level (0.05) was identified.
(i.e. evidence enough to support the alternative) but we failed reject the null hypothesis.
"""
# type II error
"""
When sales were examined a p-value greater than our significance level (0.05) was identified.
(i.e. evidence enough to support the null) but we rejected the null hypothesis.
"""

'\nWhen sales were examined a p-value greater than our significance level (0.05) was identified.\n(i.e. evidence enough to support the null) but we rejected the null hypothesis.\n'

# T-Test Problems

In [4]:
from math import sqrt
from scipy import stats

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pydataset import data

## 1. Ace Realty wants to determine whether the average time it takes to sell homes is different for its two offices. A sample of 40 sales from office #1 revealed a mean of 90 days and a standard deviation of 15 days. A sample of 50 sales from office #2 revealed a mean of 100 days and a standard deviation of 20 days. Use a .05 level of significance.

In [5]:
# x1 = office #1
# x2 = office #2

# Mean
xbar1 = 90
xbar2 = 100

#Sales
n1 = 40
n2 = 50

#Standard Deviation
s1 = 15
s2 = 20

degf = n1 + n2 - 2

s_p = sqrt(((n1 - 1) * s1**2 + (n2 - 1) * s2**2) / (n1 + n2 - 2))
s_p

standard_error = se = sqrt(s1**2 / n1 + s2**2 / n2)

t = (xbar1 - xbar2) / (s_p * sqrt(1/n1 + 1/n2))
t

p = stats.t(degf).sf(t) * 2

print(f't = {t:.5f}')
print(f'p = {p:.5f}')


t = -2.62523
p = 1.98979


In [6]:
"""
P value is greater than 0.05 so we fail to reject the null:

No statistically significant difference between offices
"""

'\nP value is greater than 0.05 so we fail to reject the null:\n\nNo statistically significant difference between offices\n'

## 2. Load the mpg dataset and use it to answer the following questions:

   - Is there a difference in fuel-efficiency in cars from 2008 vs 1999?
   - Are compact cars more fuel-efficient than the average car?
   - Do manual cars get better gas mileage than automatic cars?

In [7]:
mpg = data('mpg')
mpg_data = pd.DataFrame(mpg)
mpg_data['average_mileage'] = (mpg_data.cty + mpg_data.hwy) / 2

In [8]:
bool_series = mpg.year == 2008
oh_eight = mpg[bool_series]

In [9]:
bool_series = mpg.year == 1999
nine_nine = mpg[bool_series]

In [10]:
#Is there a difference in fuel-efficiency in cars from 2008 vs 1999?

x1 = oh_eight.average_mileage
x2 = nine_nine.average_mileage

xbar1 = x1.mean()
xbar2 = x2.mean()

n1 = x1.shape[0]
n2 = x2.shape[0]

s1 = x1.std()
s2 = x2.std()

degf = n1 + n2 - 2

s_p = sqrt(((n1 - 1) * s1**2 + (n2 - 1) * s2**2) / (n1 + n2 - 2))
s_p

standard_error = se = sqrt(s1**2 / n1 + s2**2 / n2)

t = (xbar1 - xbar2) / (s_p * sqrt(1/n1 + 1/n2))
t

p = stats.t(degf).sf(t) * 2

stats.ttest_ind(x1, x2)

Ttest_indResult(statistic=-0.21960177245940962, pvalue=0.8263744040323578)

In [11]:
"""
P value is greater than 0.05 so we fail to reject the null:

No statistically significant difference between fuel economy
"""

'\nP value is greater than 0.05 so we fail to reject the null:\n\nNo statistically significant difference between fuel economy\n'

# Correlation Problems

 ## 1. Use the telco_churn data. Does tenure correlate with monthly charges? Total charges? What happens if you control for phone and internet service?
 

In [13]:
telco = pd.read_csv("telco_data.csv")
telco

Unnamed: 0,customer_id,gender,is_senior_citizen,partner,dependents,phone_service,internet_service,contract_type,payment_type,monthly_charges,...,has_phone,has_internet,has_phone_and_internet,partner_dependents,start_date,average_monthly_charges,contract_details,phone_service_details,internet_details,product_key
0,2923-ARZLG,Male,0,Yes,Yes,1,0,1,Mailed check,$19.70,...,True,False,False,3,2020-08-27,#DIV/0!,1 Year,One Line,No Internet Service,1
1,2775-SEFEE,Male,0,No,Yes,2,1,2,Bank transfer (automatic),$61.90,...,True,True,True,2,2020-08-27,#DIV/0!,2 Year,Two or More Lines,DSL,1
2,3115-CZMZD,Male,0,No,Yes,1,0,2,Mailed check,$20.25,...,True,False,False,2,2020-08-27,#DIV/0!,2 Year,One Line,No Internet Service,1
3,3213-VVOLG,Male,0,Yes,Yes,2,0,2,Mailed check,$25.35,...,True,False,False,3,2020-08-27,#DIV/0!,2 Year,Two or More Lines,No Internet Service,1
4,4367-NUYAO,Male,0,Yes,Yes,2,0,2,Mailed check,$25.75,...,True,False,False,3,2020-08-27,#DIV/0!,2 Year,Two or More Lines,No Internet Service,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7038,7083-MIOPC,Female,0,No,No,1,0,2,Credit card (automatic),$20.25,...,True,False,False,0,2014-03-27,$20.35,2 Year,One Line,No Internet Service,1
7039,8207-VVMYB,Female,0,Yes,No,2,0,2,Bank transfer (automatic),$26.00,...,True,False,False,1,2014-03-27,$26.06,2 Year,Two or More Lines,No Internet Service,1
7040,6010-DDPPW,Male,0,Yes,No,2,0,2,Bank transfer (automatic),$25.15,...,True,False,False,1,2014-03-27,$25.21,2 Year,Two or More Lines,No Internet Service,1
7041,3910-MRQOY,Female,0,Yes,No,1,0,2,Bank transfer (automatic),$19.40,...,True,False,False,1,2014-03-27,$19.43,2 Year,One Line,No Internet Service,1


 ## 2. Use the employees database.

   - Is there a relationship between how long an employee has been with the company and their salary?
   - Is there a relationship between how long an employee has been with the company and the number of titles they 
     have had?

## 3. Use the sleepstudy data. Is there a relationship between days and reaction time?