# Reproduce Table 2
- Py kernel with R script
- Verify by looking at [Table 2 in the Original Analysis paper](https://www.nature.com/articles/s41598-021-87029-w?proof=t%25C2%25A0) and by running 
    1. **Rest activity ratio** : `anova_physical.R` [Original version](https://github.com/usc-sail/tiles-day-night/blob/main/code/physical/anova_physical.R)
    2. **Walk activity ratio** : `anova_step.R` [Original version](https://github.com/usc-sail/tiles-day-night/blob/main/code/physical/anova_step.R)
    3. **Vigorous activity ratio** : `physical_vigorous_lm.R` [Original version](https://github.com/usc-sail/tiles-day-night/blob/main/code/physical/physical_vigorous_lm.R)
    
    Be sure to configure your file paths.
    

In [1]:
import pandas as pd
import numpy as np

import sys
sys.path.insert(1, '/Users/brinkley97/Documents/development/')
import my_created_functions

# Load Data

In [2]:
path_to_file =  "lab-kcad/datasets/tiles_dataset/table_2/physical/" 
slm_file = "stats_lm.csv.gz"

In [3]:
physical_activity_df = my_created_functions.load_gzip_csv_data(path_to_file, slm_file)
# physical_activity_df

# Load Generated Specific Questions

In [4]:
base = "/Users/brinkley97/Documents/development/lab-kcad/"
generated_sq_file = "TGN10Plus/generateSpecificQuestions.ipynb"
table_2_specific_questions_path = base + generated_sq_file
# table_2_specific_questions_path

In [5]:
%run "../generateSpecificQuestions.ipynb"

In [6]:
table_2_sqs

['what are differences between *work* day and *off* day for primarily *day-shift* nurses and primarily *night-shift* nurses with covariate *age*, *gender* on *rest*',
 'what are differences between *work* day and *off* day for primarily *day-shift* nurses and primarily *night-shift* nurses with covariate *age*, *gender* on *step_ratio*',
 'what are differences between *work* day and *off* day for primarily *day-shift* nurses and primarily *night-shift* nurses with covariate *age*, *gender* on *run_ratio*',
 'what are differences between *work* day and *off* day for primarily *day-shift* nurses and primarily *night-shift* nurses with covariate *age*, *gender* on *vigorous_min*']

# Integrate R

In [7]:
import rpy2.robjects as robjects
import rpy2.robjects.packages as rpackages

# load and activate() bc if NOT error (Conversion 'py2rpy' not defined for objects of type '<class 'pandas.core.frame.DataFrame'>') will appear
from rpy2.robjects import pandas2ri
pandas2ri.activate()

report = rpackages.importr('report')

# Produce Table 2

In [8]:
def table_two(table_2_specific_questions, physical_activity_df):
    '''Integrate py and r to calculate the p-value for physical activity variables - rest, step_ratio, and vigorous_min
    
    Arguments:
    table_2_specific_questions -- py list
    physical_activity_df -- pd Dataframe
    
    Return:
    nothing; print variables from r script in r files
    '''
    
    ontology_values = list(physical_activity_df.keys())
    
    
    for table_2_specific_questions_idx in range(len(table_2_specific_questions)):
        t2_specific_question = table_2_specific_questions[table_2_specific_questions_idx]
        print(t2_specific_question)
        store_matching_columns = ['shift']
        for specific_ontology_value in ontology_values:
            # print("specific_ontology_value : ", specific_ontology_value)
            
            if specific_ontology_value in t2_specific_question.split("*"):
                # print(specific_ontology_value, True)
                store_matching_columns.append(specific_ontology_value)
            
            else:
                # print(specific_ontology_value, False)
                continue

        form_table = physical_activity_df.loc[:, store_matching_columns[0:]]
        # print(form_table)
        
        activity_ontologies = list(form_table.keys())[2]
        # print(activity_ontologies)
        
        work_or_off = form_table.set_index("work")
        # print(work_or_off)
        
        at_work_df = work_or_off.loc["work"]
        # print(at_work_df)
                     
        off_work_df = work_or_off.loc["off"]
        # print(off_work_df)

        '''
        R Integration
        - print outcome in r script
        '''
        
        r_objects = robjects.r
        r_objects.source("table2-activity.R")
        
        print("\n==============activity======================", activity_ontologies)
        
        if activity_ontologies == "rest":
            r_objects.rest_model(at_work_df, off_work_df)
            
        elif activity_ontologies == "step_ratio":
            r_objects.step_model(at_work_df, off_work_df)
            
        elif activity_ontologies == "vigorous_min":
            r_objects.vigorous_model(at_work_df, off_work_df)
            
        print("-----------------------------------------------------------------------------------------------------\n")
    return 

In [9]:
table_two(table_2_sqs, physical_activity_df)

what are differences between *work* day and *off* day for primarily *day-shift* nurses and primarily *night-shift* nurses with covariate *age*, *gender* on *rest*
R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Ventura 13.4

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] report_0.5.1.3

loaded via a namespace (and not attached):
[1] compiler_4.2.1   datawizard_0.5.0 insight_0.19.0  

The ANOVA (formula: rest ~ shift + age + gender) suggests that:

  - The main effect of shift is statistically significant and medium (F(1, 101) =
9.77, p = 0.002; Eta2 (partial) = 0.09, 95% CI [0.02, 1.00])
  - The main effect of age is statistically not significant and small (F(1, 101)