# Notebook File Overview

In this notebook, we will be importing the *final_diabetes_ptsd* data frame we created in the third notebook:

We will use the code snippet provided by *All of Us* to import our data frames from the workspace bucket 

We will do some data processing, cleaning, and create a summary table, AKA a *Table 1*

# Add the code snippet from the All of Us R and Cloud Storage snippets

**Step 1: Run *Setup***

In [None]:
library(tidyverse)  # Data wrangling packages.

**Step 2: Run the *copy_file_from_workspace_bucket.R* code snippet**

This will import **final_diabetes_ptsd.csv**

**NOTE: The new data frame will be called *my_dataframe***

In [None]:
# This snippet assumes that you run setup first

# This code copies a file from your Google Bucket into a dataframe

# replace 'test.csv' with the name of the file in your google bucket (don't delete the quotation marks)
name_of_file_in_bucket <- 'final_diabetes_ptsd.csv'

########################################################################
##
################# DON'T CHANGE FROM HERE ###############################
##
########################################################################

# Get the bucket name
my_bucket <- Sys.getenv('WORKSPACE_BUCKET')

# Copy the file from current workspace to the bucket
system(paste0("gsutil cp ", my_bucket, "/data/", name_of_file_in_bucket, " ."), intern=T)

# Load the file into a dataframe
my_dataframe  <- read_csv(name_of_file_in_bucket)


# Inspect data

**Step 1: Use head() to see the first 6 rows of data**

In [None]:
head(my_dataframe)

**Step 2: Use str() to see the structure of the dataset**

In [None]:
str(my_dataframe)

**Step 3: Make a quick table using *dplyr* funtions to see a summary fo the values of each column**

In [None]:
library(dplyr)

# Quick summary of all categorical variables
my_dataframe %>%
  select(gender, race, ethnicity, ptsd_doctor, ptsd_treatment, diabetes) %>%
  pivot_longer(everything(), names_to = "variable", values_to = "value") %>%
  count(variable, value) %>%
  arrange(variable)

**FINDINGS: we need to get rid of *PMI: Skip* from each survey queestions and we need to add some *age-groups* to improve our analysis**

## Clean data before create our *Table 1***

**Step 1: Clean data - Drop *PMI: Skip* and create new varaible called *age_groups***

Note: our age range for our cohort is 25 to 65 years old

In [None]:
my_table_one <- my_dataframe %>%
  filter(ptsd_doctor != "PMI: Skip" & ptsd_treatment != "PMI: Skip") %>%
  mutate(age_group = cut(age,
                           breaks = c(25, 35, 45, 55, 65),
                           labels = c("25-34", "35-44", "45-54", "55-64"),
                           right = FALSE,
                           include.lowest = TRUE))

head(my_table_one)

**Step 2: Install and load the *tableone()* package**

In [None]:
#install.packages("tableone")
library(tableone)

#install.packages("kableExtra")
library(kableExtra)

**Step 3: Create our *Table 1* using the *tableone()* and *kableone()* packages**

Kable stands for **K**nitr T**able** which is a nicely formatted table

In [None]:
# Define variable types
catVars <- c("age_group", "gender", "race", "ethnicity", "ptsd_doctor", "ptsd_treatment")
contVars <- c("age")

# Create stratified Table 1
table1_stratified <- CreateTableOne(vars = c(contVars, catVars),
                                   strata = "diabetes",
                                   data = my_table_one,
                                   factorVars = catVars)


kableone(table1_stratified, caption = "Table 1. Baseline Characteristics by Diabetes Status")

**Step 4: Save your new table to your Jupyter Notebook files to download locally later**

Got to *File* then *Open*

**NOTE: This saves your table as a *Tesxt* file (.txt)**

other options:
* .md
* .markdown
* .Rmd

In [9]:
#Save tableone to your jupyter notebook files
kable_output <- kableone(table1_stratified, 
                        caption = "Table 1. Baseline Characteristics by Diabetes Status")

# Save as HTML
kable_output %>%
  save_kable("table1_baseline_characteristics.txt")