### Hypothesis Tests in Python
In this assessment, you will look at data from a study on toddler sleep habits.

The hypothesis tests you create and the questions you answer in this Jupyter notebook will be used to answer questions in the following graded assignment.

In [1]:
import numpy as np
import pandas as pd
from scipy.stats import t
pd.set_option('display.max_columns', 30) # set so can see all columns of the DataFrame

Your goal is to analyse data which is the result of a study that examined differences in a number of sleep variables between napping and non-napping toddlers. Some of these sleep variables included: Bedtime (lights-off time in decimalized time), Night Sleep Onset Time (in decimalized time), Wake Time (sleep end time in decimalized time), Night Sleep Duration (interval between sleep onset and sleep end in minutes), and Total 24-Hour Sleep Duration (in minutes). Note: Decimalized time is the representation of the time of day using units which are decimally related.

The 20 study participants were healthy, normally developing toddlers with no sleep or behavioral problems. These children were categorized as napping or non-napping based upon parental report of children’s habitual sleep patterns. Researchers then verified napping status with data from actigraphy (a non-invasive method of monitoring human rest/activity cycles by wearing of a sensor on the wrist) and sleep diaries during the 5 days before the study assessments were made.

You are specifically interested in the results for the Bedtime and Total 24-Hour Sleep Duration.

Reference: Akacem LD, Simpkin CT, Carskadon MA, Wright KP Jr, Jenni OG, Achermann P, et al. (2015) The Timing of the Circadian Clock and Sleep Differ between Napping and Non-Napping Toddlers. PLoS ONE 10(4): e0125181. https://doi.org/10.1371/journal.pone.0125181

In [2]:
df = pd.read_csv("nap_no_nap.csv") 
df.head()

Unnamed: 0,id,sex,age (months),dlmo time,days napped,napping,nap lights outl time,nap sleep onset,nap midsleep,nap sleep offset,nap wake time,nap duration,nap time in bed,night bedtime,night sleep onset,sleep onset latency,night midsleep time,night wake time,night sleep duration,night time in bed,24 h sleep duration,bedtime phase difference,sleep onset phase difference,midsleep phase difference,wake time phase difference
0,1,female,33.7,19.24,0,0,,,,,,,,20.45,20.68,0.23,1.92,7.17,629.4,643.0,629.4,-1.21,-1.44,6.68,11.93
1,2,female,31.5,18.27,0,0,,,,,,,,19.23,19.48,0.25,1.09,6.69,672.4,700.4,672.4,-0.96,-1.21,6.82,12.42
2,3,male,31.9,19.14,0,0,,,,,,,,19.6,20.05,0.45,1.29,6.53,628.8,682.6,628.8,-0.46,-0.91,6.15,11.39
3,4,female,31.6,19.69,0,0,,,,,,,,19.46,19.5,0.05,1.89,8.28,766.6,784.0,766.6,0.23,0.19,6.2,12.59
4,5,female,33.0,19.52,0,0,,,,,,,,19.21,19.65,0.45,1.3,6.95,678.0,718.0,678.0,0.31,-0.13,5.78,11.43


In [3]:
len(df)

20

### Hypothesis tests
We will look at two hypothesis test, each with  α=.05 :

1. Is the average bedtime for toddlers who nap later than the average bedtime for toddlers who don't nap?

2. The average 24 h sleep duration (in minutes) for napping toddlers is different from toddlers who don't nap.

First isolate night bedtime into two variables - one for toddlers who nap and one for toddlers who do not nap.

In [4]:
nap_bedtime = df[df.napping == 1].iloc[:,13]
nap_bedtime.head()

5    19.95
6    20.60
7    22.01
8    20.24
9    20.78
Name: night bedtime, dtype: float64

In [5]:
no_nap_bedtime = df[df.napping == 0].iloc[:,13]
no_nap_bedtime.head()

0    20.45
1    19.23
2    19.60
3    19.46
4    19.21
Name: night bedtime, dtype: float64

Now find the sample mean bedtime for nap and no_nap.

In [6]:
nap_mean_bedtime = nap_bedtime.mean()
nap_mean_bedtime

20.304

In [7]:
no_nap_mean_bedtime = no_nap_bedtime.mean()
no_nap_mean_bedtime

19.590000000000003

**Question**: What is the sample difference of mean bedtime for nappers minus no nappers?

In [8]:
mean_bedtime_diff = round(nap_mean_bedtime - no_nap_mean_bedtime, 3)
mean_bedtime_diff

0.714

Now find the sample standard deviation for $X_{nap}$ and $X_{nonap}$

In [9]:
# The np.std function can be used to find the standard deviation. The
# ddof parameter must be set to 1 to get the sample standard deviation.
# If it is not, you will be using the population standard deviation which
# is not the correct estimator
nap_s_bedtime = nap_bedtime.std(ddof = 1)

In [10]:
no_nap_s_bedtime = no_nap_bedtime.std(ddof = 1)

**Question**: What is the s.e. $(\bar X_{nap} - \bar X_{nonap})$?

We expect the variance in sleep time for toddlers who nap and toddlers who don't nap to be the same. So we use a pooled standard error.

In [11]:
n1 = len(df[df.napping == 1])
n2 = len(df[df.napping == 0])
s1 = nap_s_bedtime
s2 = no_nap_s_bedtime

In [12]:
pooled_se = np.sqrt((((n1-1)*s1**2 + (n2-1)*s2**2)/(n1+n2-2))*(1/n1 + 1/n2))
pooled_se

0.2961871280370147

Question: Given our sample size of  n , how many degrees of freedom ( df ) are there for the associated  t  distribution?

Now calculate the  t -test statistic for our first hypothesis test

In [13]:
tstat = (nap_mean_bedtime - no_nap_mean_bedtime) / pooled_se
tstat

2.4106381824626966

**Question**: What is the p-value for the first hypothesis test?

The function t.cdf(tstat, df) will give you the same value as finding the one-tailed probability of tstat on a t-table with the specified degrees of freedom.

Use the function t.cdf(tstat, df) to find the p-value for the first hypothesis test.

In [14]:
# df = 20 - 2
pvalue = 1 - t.cdf(tstat, 18)
pvalue

0.013417041438843036