# Reproduce Table 4
- Py kernel with R script
- Verify by looking at [Table 4 in the Original Analysis paper](https://www.nature.com/articles/s41598-021-87029-w?proof=t%25C2%25A0) and by running `sleep.R` [Original version](https://github.com/usc-sail/tiles-day-night/blob/main/code/sleep/table4-sleep.R). Be sure to configure your file paths.

In [1]:
import pandas as pd
import numpy as np

import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri
pandas2ri.activate()

# Load Data

In [2]:
path_to_file = "../data/tiles_datasets/table_4/sleep.csv.gz"

In [3]:
def load_data(file):
    
    original_data = pd.read_csv(file)
    copy_of_data = original_data.copy()
    
    return copy_of_data

In [4]:
sleep_df = load_data(path_to_file)

# Load Generated Specific Questions

In [5]:
%run "../generateSpecificQuestions.ipynb"

In [6]:
table_4_sqs

['what are differences between *work* days and *off* days for primarily *day-shift* nurses and primarily *night-shift* nurses with covariate *age*, *gender* on sleep *duration* ?',
 'what are differences between *work* days and *off* days for primarily *day-shift* nurses and primarily *night-shift* nurses with covariate *age*, *gender* on sleep *efficiency* ?',
 'what are differences between *work* days and *off* days for primarily *day-shift* nurses and primarily *night-shift* nurses with covariate *age*, *gender* on sleep *mid* ?']

In [7]:
physiological_sleep_values = ontology_mappings["physiological_sleep"]
# physiological_sleep_values

# Produce Table 4

In [8]:
def calculate_anova(sleep_df, physiological_sleep_value):
    '''Integrate py and r to calculate the p-value for sleep variables - duration, efficiency, and mid
    
    Arguments:
    sleep_df -- pd DataFrame
    physiological_sleep_value -- py str
    
    Return:
    analysis of variance (aov) for each sleep variable
    '''
    
    '''
    Modify sleep dataframe
    '''
    work_df = sleep_df[sleep_df["work"] == "workday"]
    off_df = sleep_df[sleep_df["work"] == "offday"]
    
    '''
    R Integration
    - print outcome in r script
    '''
    r_objects = robjects.r
    r_objects.source("table4-sleep.R")
    if physiological_sleep_value == "duration":
        duration_df = r_objects.sleep_duration_model(work_df, off_df)
        # print(duration_df)
        return duration_df
    elif physiological_sleep_value == "efficiency":
        efficiency_df = r_objects.sleep_efficiency_model(work_df, off_df)
        # print(efficiency_df)
        return efficiency_df
    elif physiological_sleep_value == "mid":
        mid_df = r_objects.sleep_mid_model(sleep_df)
        # print(mid_df)
        return mid_df
        

In [9]:
def table_4(table_4_specific_questions, sleep_df, physiological_sleep_values):
    '''Reproduce Table 4
    
    Arguments:
    table_4_specific_questions -- py list
    sleep_df -- pd Dataframe
    
    Functions:
    calculate_anova()
    
    Return:
    nothing; everything is being printed in calculate_anova()
    '''
    
    for table_4_specific_question_idx in range(len(table_4_specific_questions)):
        table_4_specific_question = table_4_specific_questions[table_4_specific_question_idx]
        print(table_4_specific_question_idx, "table_4_specific_question : ", table_4_specific_question)
        word_in_specific_question = table_4_specific_question.split("*")
        for physiological_sleep_value in physiological_sleep_values:
            
            if physiological_sleep_value in word_in_specific_question:
                print(physiological_sleep_value, True)
                calculate_anova(sleep_df, physiological_sleep_value)
                print("#####################")
                print()
        

In [10]:
table_4(table_4_sqs, sleep_df, physiological_sleep_values)

R[write to console]: Loading required package: Matrix



0 table_4_specific_question :  what are differences between *work* days and *off* days for primarily *day-shift* nurses and primarily *night-shift* nurses with covariate *age*, *gender* on sleep *duration* ?
duration True
The ANOVA (formula: duration ~ shift + age + gender) suggests that:

  - The main effect of shift is statistically significant and large (F(1, 90) =
65.55, p < .001; Eta2 (partial) = 0.42, 95% CI [0.30, 1.00])
  - The main effect of age is statistically not significant and very small (F(1,
90) = 0.67, p = 0.415; Eta2 (partial) = 7.41e-03, 95% CI [0.00, 1.00])
  - The main effect of gender is statistically significant and small (F(1, 90) =
4.79, p = 0.031; Eta2 (partial) = 0.05, 95% CI [2.47e-03, 1.00])

Effect sizes were labelled following Field's (2013) recommendations.
The ANOVA (formula: duration ~ shift + age + gender) suggests that:

  - The main effect of shift is statistically not significant and small (F(1, 90)
= 2.65, p = 0.107; Eta2 (partial) = 0.03, 95% CI 