-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vignette for harmonizing full dataset #126
base: dev
Are you sure you want to change the base?
Changes from all commits
1132b05
a85579e
26be060
7a96197
0fb49ed
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,120 @@ | ||||||||
--- | ||||||||
title: "How to harmonize across survey cycles" | ||||||||
output: rmarkdown::html_vignette | ||||||||
vignette: > | ||||||||
%\VignetteIndexEntry{Vignette Title} | ||||||||
%\VignetteEngine{knitr::rmarkdown} | ||||||||
%\VignetteEncoding{UTF-8} | ||||||||
--- | ||||||||
|
||||||||
```{r setup, include = FALSE} | ||||||||
knitr::opts_chunk$set( | ||||||||
echo = T, | ||||||||
collapse = TRUE, | ||||||||
comment = "#>" | ||||||||
) | ||||||||
``` | ||||||||
|
||||||||
## Introduction | ||||||||
|
||||||||
This vignette explains how you can transform variables across multiple CCHS datasets using the full datasets to the _cchsflow_ package. The full PUMF datasets can be found [here](https://odesi.ca/). A full harmonized dataset of all _cchsflow_ variables | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
I'm not sure if I've correctly described the relationship between CCHS and PUMF, but something like this would provide more context to someone new to this area of study. |
||||||||
can be found [here](https://osf.io/j5wgu). With the original PUMF datasets, data file should be renamed such that it specifies the survey and cycle year, which follows the format of the _p sample data (ex. cchs2001_p, cchs2013_2014_p). | ||||||||
|
||||||||
To harmonize the data files, the `rec_with_table()` function is used to transform the indicated variables. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
I know eventually we want users to use |
||||||||
|
||||||||
Note: Harmonizing cycles before 2014 with cycles from 2015 onward is not advised as Statistics Canada has made major survey design changes. | ||||||||
|
||||||||
## How to combine a single variable across multiple cycles | ||||||||
yulric marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
|
||||||||
In this example, the sex variable from 2001 to 2018 CCHS datasets will be transformed and labeled using `rec_with_table()`, which is then combined into one dataset and labeled using `merge_rec_data()`. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a little confused as to why we're harmonizing this variable from 2001 to 2018 when, in the previous section, users were advised not to harmonized data from cycles before 2014 with those from 2015 and onwards. 2014 with cycles from 2015 |
||||||||
|
||||||||
```{r results= 'hide', message = FALSE, warning=FALSE} | ||||||||
library(cchsflow) | ||||||||
``` | ||||||||
|
||||||||
|
||||||||
```{r } | ||||||||
# Harmonize individual datasets | ||||||||
sex2001 <- rec_with_table(cchs2001_p, "DHH_SEX", log = TRUE) | ||||||||
sex2003 <- rec_with_table(cchs2003_p, "DHH_SEX", log = TRUE) | ||||||||
sex2005 <- rec_with_table(cchs2005_p, "DHH_SEX", log = TRUE) | ||||||||
sex2007_2008 <- rec_with_table(cchs2007_2008_p, "DHH_SEX", log = TRUE) | ||||||||
sex2009_2010 <- rec_with_table(cchs2009_2010_p, "DHH_SEX", log = TRUE) | ||||||||
sex2011_2012 <- rec_with_table(cchs2011_2012_p, "DHH_SEX", log = TRUE) | ||||||||
sex2013_2014 <- rec_with_table(cchs2013_2014_p, "DHH_SEX", log = TRUE) | ||||||||
sex2015_2016 <- rec_with_table(cchs2015_2016_p, "DHH_SEX", log = TRUE) | ||||||||
sex2017_2018 <- rec_with_table(cchs2017_2018_p, "DHH_SEX", log = TRUE) | ||||||||
|
||||||||
# Merge harmonized data | ||||||||
combined_sex <- merge_rec_data(sex2001, sex2003, sex2005, sex2007_2008, sex2009_2010, sex2011_2012, sex2013_2014, sex2015_2016, sex2017_2018) | ||||||||
|
||||||||
# Summary statistics of combined dataset | ||||||||
summary(combined_sex) | ||||||||
``` | ||||||||
|
||||||||
|
||||||||
## How to combine multiple variables across multiple cycles | ||||||||
|
||||||||
In this example, the continuous age and sex variable from 2001 to 2018 CCHS datasets will be transformed and labeled using `rec_with_table()`, which is then combined into one dataset and labeled using `merge_rec_data()`. | ||||||||
|
||||||||
```{r ,eval=F, results = "hide"} | ||||||||
# Harmonize individual datasets | ||||||||
age_sex2001 <- rec_with_table(cchs2001_p, c("DHHGAGE_cont", "DHH_SEX")) | ||||||||
age_sex2003 <- rec_with_table(cchs2003_p, c("DHHGAGE_cont", "DHH_SEX")) | ||||||||
age_sex2005 <- rec_with_table(cchs2005_p, c("DHHGAGE_cont", "DHH_SEX")) | ||||||||
age_sex2007_2008 <- rec_with_table(cchs2007_2008_p, c("DHHGAGE_cont", "DHH_SEX")) | ||||||||
age_sex2009_2010 <- rec_with_table(cchs2009_2010_p, c("DHHGAGE_cont", "DHH_SEX")) | ||||||||
age_sex2011_2012 <- rec_with_table(cchs2011_2012_p, c("DHHGAGE_cont", "DHH_SEX")) | ||||||||
age_sex2013_2014 <- rec_with_table(cchs2013_2014_p, c("DHHGAGE_cont", "DHH_SEX")) | ||||||||
age_sex2015_2016 <- rec_with_table(cchs2015_2016_p, c("DHHGAGE_cont", "DHH_SEX")) | ||||||||
age_sex2017_2018 <- rec_with_table(cchs2017_2018_p, c("DHHGAGE_cont", "DHH_SEX")) | ||||||||
|
||||||||
# Merge harmonized data | ||||||||
combined_age_sex <- merge_rec_data(age_sex2001, age_sex2003, age_sex2005, age_sex2007_2008, age_sex2009_2010, age_sex2011_2012, age_sex2013_2014, age_sex2015_2016, age_sex2017_2018) | ||||||||
|
||||||||
``` | ||||||||
|
||||||||
## How to combine all variables in the variable_details sheet across multiple cycles | ||||||||
|
||||||||
To combine a large number of variables, it is best to use `variables.csv` and `variable_details.csv`. There are vignettes that further describe variables and variable_details, including how to add or customize transformed variables. | ||||||||
|
||||||||
### Option 1: Using _cchsflow_ variable_details sheet | ||||||||
|
||||||||
When the variable argument in `rec_with_table()` is not specified, all variables listed in `variables.csv` and `variable_details.csv` will be transformed. In this example, all variables from the _cchsflow_ `variables.csv` and `variable_details.csv` sheets from 2001 to 2018 CCHS datasets will be transformed and labeled using `rec_with_table()`, which is then combined into one dataset and labeled using `merge_rec_data()`. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where will There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The sheets will be in the |
||||||||
|
||||||||
```{r ,eval=F, results = "hide"} | ||||||||
# Harmonize individual datasets | ||||||||
harmonized_2001 <- rec_with_table(cchs2001_p) | ||||||||
harmonized_2003 <- rec_with_table(cchs2003_p) | ||||||||
harmonized_2005 <- rec_with_table(cchs2005_p) | ||||||||
harmonized_2007_2008 <- rec_with_table(cchs2007_2008_p) | ||||||||
harmonized_2009_2010 <- rec_with_table(cchs2009_2010_p) | ||||||||
harmonized_2011_2012 <- rec_with_table(cchs2011_2012_p) | ||||||||
harmonized_2013_2014 <- rec_with_table(cchs2013_2014_p) | ||||||||
harmonized_2015_2016 <- rec_with_table(cchs2015_2016_p) | ||||||||
harmonized_2017_2018 <- rec_with_table(cchs2017_2018_p) | ||||||||
|
||||||||
# Merge harmonized data | ||||||||
combined_all_cycles <- merge_rec_data(harmonized_2001, harmonized_2003, harmonized_2005, harmonized_2007_2008, harmonized_2009_2010, harmonized_2011_2012, harmonized_2013_2014, harmonized_2015_2016, harmonized_2017_2018) | ||||||||
``` | ||||||||
|
||||||||
### Option 2: Using your own variable_details sheet | ||||||||
|
||||||||
In this example, all variables from personalized `variables.csv` and `variable_details.csv` sheets from 2001 to 2018 CCHS datasets will be transformed and labeled using `rec_with_table()`, which is then combined into one dataset and labeled using `merge_rec_data()`. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would consider showing the relationship between |
||||||||
|
||||||||
```{r , eval=F, results = "hide"} | ||||||||
# Harmonize individual datasets | ||||||||
harmonized_2001 <- rec_with_table(cchs2001_p, variables = sample_variables, variable_details = sample_variable_details) | ||||||||
harmonized_2003 <- rec_with_table(cchs2003_p, variables = sample_variables, variable_details = sample_variable_details) | ||||||||
harmonized_2005 <- rec_with_table(cchs2005_p, variables = sample_variables, variable_details = sample_variable_details) | ||||||||
harmonized_2007_2008 <- rec_with_table(cchs2007_2008_p, variables = sample_variables, variable_details = variable_details) | ||||||||
harmonized_2009_2010 <- rec_with_table(cchs2009_2010_p, variables = sample_variables, variable_details = sample_variable_details) | ||||||||
harmonized_2011_2012 <- rec_with_table(cchs2011_2012_p, variables = sample_variables, variable_details = sample_variable_details) | ||||||||
harmonized_2013_2014 <- rec_with_table(cchs2013_2014_p, variables = sample_variables, variable_details = sample_variable_details) | ||||||||
harmonized_2015_2016 <- rec_with_table(cchs2015_2016_p, variables = sample_variables, variable_details = sample_variable_details) | ||||||||
harmonized_2017_2018 <- rec_with_table(cchs2017_2018_p, variables = sample_variables, variable_details = sample_variable_details) | ||||||||
|
||||||||
# Merge harmonized data | ||||||||
combined_all_cycles <- merge_rec_data(harmonized_2001, harmonized_2003, harmonized_2005, harmonized_2007_2008, harmonized_2009_2010, harmonized_2011_2012, harmonized_2013_2014, harmonized_2015_2016, harmonized_2017_2018) | ||||||||
``` | ||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider writing out the first instances of acronyms like CCHS and PUMF in full.