# Hypothesis Testing 

## Step One: State Null and Alternative Hypothesis

* $H_0$ = Getting more sleep does not increase efficiency rating
* $H_1$ = Getting more sleep increases efficiency rating

## Step Two: Decide on a Level of Significance 

* $\alpha$ = 0.05


## Step Three: Select the Appropriate Test Statistic

* Use t-test for two independent samples
* $t = \frac{\overline{X_1}-\overline{X_2}}{\sqrt{s_p^2(\frac{1}{n_1}+\frac{1}{n_2})}}$
* `ttest_ind()`

## Step Four: Formulate the Decision Rule

* Using t distribution table, the t-critical value for degrees of freedom = 120 and level of significance = 0.05 is **1.658**
* If *t-computed* $> 1.658$, then reject $H_0$
* If *t-computed* $\leq 1.658$ then do not reject $H_0$

In [79]:
import pandas as pd
import numpy as np 
import utils 
import importlib
importlib.reload(utils)

<module 'utils' from 'c:\\Users\\jskye\\OneDrive - Gonzaga University\\Spring 2022\\CPSC 222 - Intro to Data Science\\AutoSleep Project\\utils.py'>

In [80]:
df = pd.read_csv("AutoSleep.csv")
df_2 = pd.read_csv("AutoSleep_test.csv")
join = [df, df_2]
df = pd.concat(join)

# this for loop drops columns where all instances are null
for col in df:
    check_for_nan = df[col].isnull().values.any()
    if check_for_nan == True:
        # print (col, check_for_nan, df[col].isnull().count())
        df.drop(col, axis=1, inplace=True)


df.drop(['ISO8601', 'toDate', 'inBed',
       'awake', 'fellAsleepIn', 'sessions', 'asleepAvg7', 'efficiencyAvg7', 
       'qualityAvg7', 'sleepBPMAvg7', 'wakingBPMAvg7','hrvAvg7'], axis=1, inplace=True)

df = utils.clean_sleep(df, 'asleep')
sleep = df['asleep']
efficiency = df['efficiency']
print(efficiency.head())

0     98.8
1     92.7
2     97.4
3    100.0
4     98.7
Name: efficiency, dtype: float64


In [84]:
x_bar_sleep = sleep.mean()
x_bar_eff = efficiency.mean()
s_sleep = sleep.std()
s_eff = efficiency.std()
n_sleep = len(sleep)
n_eff = len(efficiency)
dof = n_sleep + n_eff - 2
print(dof)

sp2 = ((n_sleep - 1) * s_sleep**2 + (n_eff - 1) * s_eff**2) / (dof)
t = (x_bar_sleep - x_bar_eff) / np.sqrt(sp2 * (1/n_sleep + 1/n_eff))
print("t:", t)

120
t: 28.61919290443943


In [83]:
from scipy import stats 


# 1-tailed
t, pval = stats.ttest_ind(sleep, efficiency)
pval /= 2 # divide by two because 1 rejection region
print("t:", t, "pval:", pval)
alpha = 0.05
if pval < alpha:
    print("reject H0")
else:
    print("do not reject H0")

t: 28.619192904439423 pval: 9.525936029646529e-56
reject H0


# Step Five: Make A Decision 

* Since *t-computed* $= 28.619$, $H_0$ should be rejected. At the 0.05 level of significance, an increase in length of sleep leads to an increase in efficiency rating for said sleep session. 