# Reproduce Table 4
- Py kernel with R script
- Verify by looking at [Table 4 in the Original Analysis paper](https://www.nature.com/articles/s41598-021-87029-w?proof=t%25C2%25A0) and by running `sleep.R` [Original version](https://github.com/usc-sail/tiles-day-night/blob/main/code/sleep/table4-sleep.R). Be sure to configure your file paths.

In [1]:
import pandas as pd
import numpy as np

import sys
sys.path.insert(1, '/Users/brinkley97/Documents/development/')
import my_created_functions

# Load Data

In [2]:
path_to_file = "lab-kcad/datasets/tiles_dataset/table_4/sleep/"
name_of_file = "sleep.csv.gz"

In [3]:
sleep_df = my_created_functions.load_gzip_csv_data(path_to_file, name_of_file)
# sleep_df

# Load Generated Specific Questions

In [4]:
base = "/Users/brinkley97/Documents/development/lab-kcad/"
generated_sq_file = "TGN10Plus/generateSpecificQuestions.ipynb"
table_4_specific_questions_path = base + generated_sq_file
# table_4_specific_questions_path

In [5]:
%run "../generateSpecificQuestions.ipynb"

In [6]:
table_4_sqs

['what are differences between *work* days and *off* days for primarily *day-shift* nurses and primarily *night-shift* nurses with covariate *age*, *gender* on sleep *duration* ?',
 'what are differences between *work* days and *off* days for primarily *day-shift* nurses and primarily *night-shift* nurses with covariate *age*, *gender* on sleep *efficiency* ?',
 'what are differences between *work* days and *off* days for primarily *day-shift* nurses and primarily *night-shift* nurses with covariate *age*, *gender* on sleep *mid* ?']

In [7]:
physiological_sleep_values = ontology_mappings["physiological_sleep"]
# physiological_sleep_values

# Integrate R

In [8]:
import rpy2.robjects as robjects
import rpy2.robjects.packages as rpackages

# load and activate() bc if NOT error (Conversion 'py2rpy' not defined for objects of type '<class 'pandas.core.frame.DataFrame'>') will appear
from rpy2.robjects import pandas2ri
pandas2ri.activate()

report = rpackages.importr('report')

# Produce Table 4

In [9]:
def calculate_anova(sleep_df, physiological_sleep_value):
    '''Integrate py and r to calculate the p-value for sleep variables - duration, efficiency, and mid
    
    Arguments:
    sleep_df -- pd DataFrame
    physiological_sleep_value -- py str
    
    Return:
    analysis of variance (aov) for each sleep variable
    '''
    
    '''
    Modify sleep dataframe
    '''
    work_df = sleep_df[sleep_df["work"] == "workday"]
    off_df = sleep_df[sleep_df["work"] == "offday"]
    
    '''
    R Integration
    - print outcome in r script
    '''
    r_objects = robjects.r
    r_objects.source("table4-sleep.R")
    if physiological_sleep_value == "duration":
        duration_df = r_objects.sleep_duration_model(work_df, off_df)
        # print(duration_df)
        return duration_df
    elif physiological_sleep_value == "efficiency":
        efficiency_df = r_objects.sleep_efficiency_model(work_df, off_df)
        # print(efficiency_df)
        return efficiency_df
    elif physiological_sleep_value == "mid":
        mid_df = r_objects.sleep_mid_model(sleep_df)
        # print(mid_df)
        return mid_df
        

In [10]:
def table_4(table_4_specific_questions, sleep_df, physiological_sleep_values):
    '''Reproduce Table 4
    
    Arguments:
    table_4_specific_questions -- py list
    sleep_df -- pd Dataframe
    
    Functions:
    calculate_anova()
    
    Return:
    nothing; everything is being printed in calculate_anova()
    '''
    
    for table_4_specific_question_idx in range(len(table_4_specific_questions)):
        table_4_specific_question = table_4_specific_questions[table_4_specific_question_idx]
        print(table_4_specific_question_idx, "table_4_specific_question : ", table_4_specific_question)
        word_in_specific_question = table_4_specific_question.split("*")
        for physiological_sleep_value in physiological_sleep_values:
            
            if physiological_sleep_value in word_in_specific_question:
                print(physiological_sleep_value, True)
                calculate_anova(sleep_df, physiological_sleep_value)
                print("#####################")
                print()
        

In [11]:
table_4(table_4_sqs, sleep_df, physiological_sleep_values)

0 table_4_specific_question :  what are differences between *work* days and *off* days for primarily *day-shift* nurses and primarily *night-shift* nurses with covariate *age*, *gender* on sleep *duration* ?
duration True
R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Ventura 13.4

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] report_0.5.1.3

loaded via a namespace (and not attached):
[1] compiler_4.2.1   datawizard_0.5.0 insight_0.19.0  
The ANOVA (formula: duration ~ shift + age + gender) suggests that:

  - The main effect of shift is statistically significant and large (F(1, 90) =
65.55, p < .001; Eta2 (partial) = 0.42, 95% CI [0.30, 1.00])
  - The main effect 