#  Hypothesis Tests

In this notebook, we will look at data from a study on toddler sleep habits.

In [1]:
import numpy as np
import pandas as pd
from scipy.stats import t

pd.set_option("display.max_columns", 30)  # So we can see all the columns of the data

Our goal is to analyse data which is the result of a study that examined differences in a number of sleep variables between napping and non-napping toddlers. Some of these sleep variables included: Bedtime (lights-off time in decimalized time), Night Sleep Onset Time (in decimalized time), Wake Time (sleep end time in decimalized time), Night Sleep Duration (interval between sleep onset and sleep end in minutes), and Total 24-Hour Sleep Duration (in minutes). Note: Decimalized time is the representation of the time of day using units which are decimally related.

The 20 study participants were healthy, normally developing toddlers with no sleep or behavioral problems. These children were categorized as napping or non-napping based upon parental report of children’s habitual sleep patterns. Researchers then verified napping status with data from actigraphy (a non-invasive method of monitoring human rest/activity cycles by wearing of a sensor on the wrist) and sleep diaries during the 5 days before the study assessments were made.

We are specifically interested in the results for the Bedtime, Night Sleep Duration, and Total 24- Hour Sleep Duration.

ref: Akacem LD, Simpkin CT, Carskadon MA, Wright KP Jr, Jenni OG, Achermann P, et al. (2015) The Timing of the Circadian Clock and Sleep Differ between Napping and Non-Napping Toddlers. PLoS ONE 10(4): e0125181. https://doi.org/10.1371/journal.pone.0125181

In [2]:
# import the data
df = pd.read_excel("nap_no_nap.xlsx")
df

  warn(msg)


Unnamed: 0,id,sex,age (months),dlmo time,days napped,napping,nap lights outl time,nap sleep onset,nap midsleep,nap sleep offset,nap wake time,nap duration,nap time in bed,night bedtime,night sleep onset,sleep onset latency,night midsleep time,night wake time,night sleep duration,night time in bed,24 h sleep duration,bedtime phase difference,sleep onset phase difference,midsleep phase difference,wake time phase difference
0,1,female,33.7,19.24,0.0,0.0,,,,,,,,20.45,20.68,0.23,1.92,7.17,629.4,643.0,629.4,-1.21,-1.44,6.68,11.93
1,2,female,31.5,18.27,0.0,0.0,,,,,,,,19.23,19.48,0.25,1.09,6.69,672.4,700.4,672.4,-0.96,-1.21,6.82,12.42
2,3,male,31.9,19.14,0.0,0.0,,,,,,,,19.6,20.05,0.45,1.29,6.53,628.8,682.6,628.8,-0.46,-0.91,6.15,11.39
3,4,female,31.6,19.69,0.0,0.0,,,,,,,,19.46,19.5,0.05,1.89,8.28,766.6,784.0,766.6,0.23,0.19,6.2,12.59
4,5,female,33.0,19.52,0.0,0.0,,,,,,,,19.21,19.65,0.45,1.3,6.95,678.0,718.0,678.0,0.31,-0.13,5.78,11.43
5,6,female,36.2,18.22,4.0,1.0,14.0,14.22,15.0,15.78,16.28,93.75,137.0,19.95,20.25,0.29,1.26,6.28,602.2,653.8,695.95,-1.73,-2.03,7.05,12.06
6,7,male,36.3,19.28,1.0,1.0,14.75,15.03,15.92,16.8,16.08,106.0,80.0,20.6,20.96,0.36,2.12,7.27,618.4,655.4,724.4,-1.32,-1.68,6.84,11.99
7,8,male,30.0,21.06,5.0,1.0,13.09,13.43,14.44,15.46,15.82,121.6,163.8,22.01,22.53,0.51,2.92,7.31,526.8,582.4,648.4,-0.95,-1.47,5.86,10.25
8,9,male,33.2,19.38,2.0,1.0,14.41,14.42,15.71,17.01,16.6,155.5,131.25,20.24,20.37,0.13,1.6,6.82,626.8,660.33,782.3,-0.86,-0.99,6.22,11.44
9,10,female,37.1,19.93,3.0,1.0,13.12,13.42,14.31,15.19,15.3,106.67,130.67,20.78,21.63,0.84,2.2,6.52,549.5,626.0,656.17,-0.76,-1.82,6.21,10.59


In [3]:
df.drop([20,21,22], axis = 0,inplace= True)

In [4]:
df.columns

Index(['id', 'sex', 'age (months)', 'dlmo time', 'days napped', 'napping',
       'nap lights outl time', 'nap sleep onset', 'nap midsleep',
       'nap sleep offset', 'nap wake time', 'nap duration', 'nap time in bed',
       'night bedtime', 'night sleep onset', 'sleep onset latency',
       'night midsleep time', 'night wake time', 'night sleep duration',
       'night time in bed', '24 h sleep duration', 'bedtime phase difference',
       'sleep onset phase difference', 'midsleep phase difference',
       'wake time phase difference'],
      dtype='object')

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 25 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   id                            20 non-null     object 
 1   sex                           20 non-null     object 
 2   age (months)                  20 non-null     float64
 3   dlmo time                     20 non-null     float64
 4   days napped                   20 non-null     float64
 5   napping                       20 non-null     float64
 6   nap lights outl time          15 non-null     float64
 7   nap sleep onset               15 non-null     float64
 8   nap midsleep                  15 non-null     float64
 9   nap sleep offset              15 non-null     float64
 10  nap wake time                 15 non-null     float64
 11  nap duration                  15 non-null     float64
 12  nap time in bed               15 non-null     float64
 13  night b

**Question:** What variable is used in the column 'napping' to indicate a toddler takes a nap? napping

**Question:** What is the sample size 
? 20

In [6]:
df.napping.unique()

array([0., 1.])

In [7]:
df.shape

(20, 25)

## Hypothesis Tests

We will look at two hypothesis test, each with $\alpha=.05$:

1. Is the average bedtime for toddlers who nap later than the average bedtime for toddlers who don't nap?

$$H_0 \mu_{\text{nap}} = \mu_{\text{no nap}} \hspace H_a : \mu_{\text{nap}} > \mu_{\text{no nap}}$$

Or equivalently:

$$H_0 : \mu_{\text{nap}} - \mu_{\text{no nap}}=0 \hspace H_a : \mu_{\text{nap}} - \mu_{\text{no nap}}>0$$

2. The average 24 h sleep duration (in minutes) for napping toddlers is different from toddlers who don't nap.
Or equivalenty:

Aside: This 
 level is equivalent to 
 and then applying the Bonferonni correction.

Before any analysis, we will convert 'night bedtime' into decimalized time.

In [8]:
nap_bedtime = df[df["napping"] == 1]["night bedtime"]
no_nap_bedtime = df[df["napping"] == 0]["night bedtime"]

Now, we find the sample mean bedtime for nap and no_nap

In [9]:
nap_mean_bedtime = nap_bedtime.mean()
no_nap_mean_bedtime = no_nap_bedtime.mean()

print(nap_mean_bedtime, no_nap_mean_bedtime)

20.304 19.590000000000003


**Question:** What is the sample difference of mean bedtime for nappers minus no nappers? 

In [10]:
mean_bedtime_difference = nap_mean_bedtime - no_nap_mean_bedtime
mean_bedtime_difference

0.7139999999999951

Now we find the sample deviation for $X_{nap}$ and $X_{\text{no nap}}$

In [11]:
nap_sd_bedtime = nap_bedtime.std()
no_nap_sd_bedtime = no_nap_bedtime.std()

print(nap_sd_bedtime, no_nap_sd_bedtime)

0.5910619981984009 0.5075923561284187


**Question:** Standard Error

We expect the variance in sleep time for toddlers who nap and toddlers who don't nap to be the same. So we use the pooled standard error. Calculate the pooled standard error 
$$S.E. = \sqrt{\frac{(n_1-1)s_{1}^2 + (n_2-1)s_{2}^2}{n_1-n_2-2}(\frac{1}{n_1}+\frac{1}{n_2})}$$

In [12]:
n_nap = len(nap_bedtime)
n_no_nap = len(no_nap_bedtime)
print(n_nap, n_no_nap)

15 5


In [13]:
va = ((n_nap - 1) * nap_sd_bedtime**2 + (n_no_nap - 1) * no_nap_sd_bedtime**2) / (n_nap + n_no_nap -2)
pooled_se = np.sqrt(va * (1/n_nap + 1/n_no_nap))

pooled_se

0.2961871280370147

In [14]:
tstat = mean_bedtime_difference / pooled_se
tstat

2.4106381824626966

In [15]:
+ (n_no_nap - 1) * no_nap_bedtime**2

0    1672.8100
1    1479.1716
2    1536.6400
3    1514.7664
4    1476.0964
Name: night bedtime, dtype: float64

In [16]:
pvalue = 1 - t.cdf(tstat, n_nap+n_no_nap-2)
pvalue

0.013417041438843036

In [17]:
import statsmodels.api as sm
sm.stats.ttest_ind(nap_bedtime, no_nap_bedtime, alternative="larger")

(2.4106381824626966, 0.013417041438843019, 18.0)

In [19]:
df.head()

Unnamed: 0,id,sex,age (months),dlmo time,days napped,napping,nap lights outl time,nap sleep onset,nap midsleep,nap sleep offset,nap wake time,nap duration,nap time in bed,night bedtime,night sleep onset,sleep onset latency,night midsleep time,night wake time,night sleep duration,night time in bed,24 h sleep duration,bedtime phase difference,sleep onset phase difference,midsleep phase difference,wake time phase difference
0,1,female,33.7,19.24,0.0,0.0,,,,,,,,20.45,20.68,0.23,1.92,7.17,629.4,643.0,629.4,-1.21,-1.44,6.68,11.93
1,2,female,31.5,18.27,0.0,0.0,,,,,,,,19.23,19.48,0.25,1.09,6.69,672.4,700.4,672.4,-0.96,-1.21,6.82,12.42
2,3,male,31.9,19.14,0.0,0.0,,,,,,,,19.6,20.05,0.45,1.29,6.53,628.8,682.6,628.8,-0.46,-0.91,6.15,11.39
3,4,female,31.6,19.69,0.0,0.0,,,,,,,,19.46,19.5,0.05,1.89,8.28,766.6,784.0,766.6,0.23,0.19,6.2,12.59
4,5,female,33.0,19.52,0.0,0.0,,,,,,,,19.21,19.65,0.45,1.3,6.95,678.0,718.0,678.0,0.31,-0.13,5.78,11.43


In [28]:
nap_24 = df[df["napping"] == 1]["24 h sleep duration"]
no_nap_24 = df[df["napping"] == 0]["24 h sleep duration"]

nap_mean_24 = nap_24.mean()
no_nap_mean_24 = no_nap_24.mean()

mean_difference_24 = nap_mean_24 - no_nap_mean_24

print(nap_mean_24, no_nap_mean_24, mean_difference_24)

708.8653333333333 675.04 33.82533333333333


In [24]:
nap_sd_24 = nap_24.std()
no_nap_sd_24 = no_nap_24.std()

print(nap_sd_24, no_nap_sd_24)

40.164759049741726 56.169635925471354


In [25]:
va = (((n_nap - 1) * nap_sd_24**2) + ((n_no_nap - 1) * no_nap_sd_24**2))/(n_nap +n_no_nap -2)
pooled_se = np.sqrt(va * (1/n_nap + 1/n_no_nap))

pooled_se

22.837598035900864

In [27]:
# test statistic :
test_stat = mean_difference_24 / pooled_se
test_stat

1.4811248223284985

In [29]:
pvalue = 1 - t.cdf(test_stat, 20-2)
pvalue

0.0779332476509238

In [30]:
import scipy

In [34]:
scipy.stats.ttest_ind(nap_24, no_nap_24, equal_var= True)

Ttest_indResult(statistic=1.4811248223284985, pvalue=0.1558664953018476)