Vignette for harmonizing full dataset #126

kittychenn · 2023-09-11T18:30:34Z

This vignette provides examples of how to harmonize variables using the full dataset. It's a good starting point to harmonize variables, but users can also implement different pipeline tools to harmonize data. Should we include pipeline methods in the example or leave it as is?

JuanLiOHRI

I think this vignette is clear. Only one small change from me: the current merged dataset is saved as .RData and I need an extra step to save it as .rds so I can work with it in targets. Not sure if we should just save it as .rds.

yulric

Can you tell me what command you use to build and view the vignettes?
I would move one of these code samples to the examples on the front page? I think its an important use case for users. I would keep this vignette thought and link to it from the home page example.

yulric · 2023-09-18T11:54:10Z

vignettes/how_to_harmonize.Rmd

+
+## Introduction 
+
+This vignette explains how you can transform variables across multiple CCHS datasets using the full datasets to the _cchsflow_ package. The full PUMF datasets can be found [here](https://odesi.ca/). A full harmonized dataset of all cchsflow variables


Is it possible to link to the actual dataset on odesi?

There are a couple of CCHS cycles on odesi, but I wasn't sure whether to add links to each individual cycle in this vignette or have the general link.

It looks like odesi update their website https://odesi.ca/en/browse? I think maybe just text that says go to this link and search for the cycle you want?

vignettes/how_to_harmonize.Rmd

To show outputs in first chunk only, fix 2011 and 2012 outputs, use sample data

yulric · 2023-10-13T12:27:50Z

Sorry @kittychenn, I think you missed these comments,

Can you tell me what command you use to build and view the vignettes?
I would move one of these code samples to the examples on the front page? I think its an important use case for users. I would keep this vignette though and link to it from the home page example.

yulric · 2023-10-13T18:24:25Z

@reikookamoto Adding you to this PR on Doug's suggestion, good to get an "outside" perspective on this feature. For some reason I can't add you as a reviewer (you don't show up when I search your username), maybe its because you're not part of the team? In any case I sent an invitation to join the GitHub team.

reikookamoto · 2023-10-13T18:37:33Z

Can you try adding me as a reviewer now? Reiko

…

On Fri, Oct 13, 2023 at 2:24 PM yulric ***@***.***> wrote: @reikookamoto <https://github.com/reikookamoto> Adding you to this PR on Doug's suggestion, good to get an "outside" perspective on this feature. For some reason I can't add you as a reviewer (you don't show up when I search your username), maybe its because you're not part of the team? In any case I sent an invitation to join the GitHub team. — Reply to this email directly, view it on GitHub <#126 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANBG5ZQ5N5FFQTDNHBUUPQ3X7GBOJAVCNFSM6AAAAAA4TXWYMSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRRHE4TGNZTGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

yulric · 2023-10-13T18:57:30Z

Adding you to this PR on Doug's suggestion, good to get an "outside" perspective on this feature. For some reason I can't add you as a reviewer (you don't show up when I search your username), maybe its because you're not part of the team? In any case I sent an invitation to join the GitHub team - Yulric

Can you try adding me as a reviewer now? Reiko

Should be there now. I think it was because you weren't a collaborator on the repo and not because you were not on the team....

kittychenn · 2023-10-13T19:28:56Z

@yulric I used knit to HTML to build and view the vignettes. The code sample is also available on the 'Get Started' page, so should I include it on the main page too?

reikookamoto

Left my comments from an "outside" perspective @yulric

reikookamoto · 2023-10-18T14:27:09Z

vignettes/how_to_harmonize.Rmd

+
+## Introduction 
+
+This vignette explains how you can transform variables across multiple CCHS datasets using the full datasets to the _cchsflow_ package. The full PUMF datasets can be found [here](https://odesi.ca/). A full harmonized dataset of all _cchsflow_ variables


Consider writing out the first instances of acronyms like CCHS and PUMF in full.

reikookamoto · 2023-10-18T14:41:08Z

vignettes/how_to_harmonize.Rmd

+
+## Introduction 
+
+This vignette explains how you can transform variables across multiple CCHS datasets using the full datasets to the _cchsflow_ package. The full PUMF datasets can be found [here](https://odesi.ca/). A full harmonized dataset of all _cchsflow_ variables


Suggested change

This vignette explains how you can transform variables across multiple CCHS datasets using the full datasets to the _cchsflow_ package. The full PUMF datasets can be found [here](https://odesi.ca/). A full harmonized dataset of all _cchsflow_ variables

This vignette explains how you can transform variables across multiple Canadian Community Health Survey (CCHS) cycles using complete datasets with the _cchsflow_ package. The Public Use Microdata Files (PUMF) containing the complete data can be found [here](https://odesi.ca/). A full harmonized dataset of all _cchsflow_ variables

I'm not sure if I've correctly described the relationship between CCHS and PUMF, but something like this would provide more context to someone new to this area of study.

reikookamoto · 2023-10-18T14:49:57Z

vignettes/how_to_harmonize.Rmd

+This vignette explains how you can transform variables across multiple CCHS datasets using the full datasets to the _cchsflow_ package. The full PUMF datasets can be found [here](https://odesi.ca/). A full harmonized dataset of all _cchsflow_ variables
+can be found [here](https://osf.io/j5wgu). With the original PUMF datasets, data file should be renamed such that it specifies the survey and cycle year, which follows the format of the _p sample data (ex. cchs2001_p, cchs2013_2014_p).
+
+To harmonize the data files, the `rec_with_table()` function is used to transform the indicated variables. 


Suggested change

To harmonize the data files, the `rec_with_table()` function is used to transform the indicated variables.

To harmonize the data files, the `cchsflow::rec_with_table()` function is used to transform the indicated variables.

I know eventually we want users to use recodeflow::rec_with_table(), but, for the time being, we could specify the package name to avoid confusion.

reikookamoto · 2023-10-18T14:54:38Z

vignettes/how_to_harmonize.Rmd

+
+## How to combine a single variable across multiple cycles
+
+In this example, the sex variable from 2001 to 2018 CCHS datasets will be transformed and labeled using  `rec_with_table()`, which is then combined into one dataset and labeled using  `merge_rec_data()`.


I'm a little confused as to why we're harmonizing this variable from 2001 to 2018 when, in the previous section, users were advised not to harmonized data from cycles before 2014 with those from 2015 and onwards.

2014 with cycles from 2015

reikookamoto · 2023-10-18T14:56:06Z

vignettes/how_to_harmonize.Rmd

+
+### Option 1: Using _cchsflow_ variable_details sheet
+
+When the variable argument in `rec_with_table()` is not specified, all variables listed in `variables.csv` and `variable_details.csv` will be transformed. In this example, all variables from the _cchsflow_ `variables.csv` and `variable_details.csv` sheets from 2001 to 2018 CCHS datasets will be transformed and labeled using  `rec_with_table()`, which is then combined into one dataset and labeled using  `merge_rec_data()`.


Where will variables.csv and variable_details.csv be on the user's computer when they install/load the package (i.e., expected file path)?

The sheets will be in the inst/extdata folder. The rec_with_table uses the sheets from that folder if the user does not pass in those parameters.

reikookamoto · 2023-10-18T15:00:18Z

vignettes/how_to_harmonize.Rmd

+
+### Option 2: Using your own variable_details sheet
+
+In this example, all variables from personalized `variables.csv` and `variable_details.csv` sheets from 2001 to 2018 CCHS datasets will be transformed and labeled using  `rec_with_table()`, which is then combined into one dataset and labeled using  `merge_rec_data()`.


I would consider showing the relationship between variables.csv and sample_variables and variable_details.csv and sample_variable_details. Is the user expected to do something like sample_variables <- readr::read_csv('variables.csv') in their workspace before using the personalized spreadsheets?

…so that pkgdown would not complaing about including them in the references when building the documentation website

…ocumentation website

…site

yulric · 2023-11-22T19:36:05Z

@kittychenn Sorry about getting back so late. All of Reiko's suggestions look good, can you address them?

In addition I was building the website using the following commands,

devtools::document()
pkgdown::build_site()

and I'm getting an error in the getting_started.Rmd vignette. Are you able to reproduce it?

Finally, I pushed some commits to fix some of the website build issues.

[Feature] vignette to harmonize data

1132b05

kittychenn requested review from CBjerke, DougManuel, JuanLiOHRI and yulric September 11, 2023 18:30

kittychenn added the enhancement New feature or request label Sep 11, 2023

kittychenn changed the title ~~Vignette for harmonize data~~ Vignette for harmonizing full dataset Sep 11, 2023

kittychenn added this to the V2.2 milestone Sep 11, 2023

JuanLiOHRI reviewed Sep 13, 2023

View reviewed changes

yulric reviewed Sep 18, 2023

View reviewed changes

[Refactor] update vignette with comments

a85579e

To show outputs in first chunk only, fix 2011 and 2012 outputs, use sample data

yulric requested review from DougManuel and reikookamoto and removed request for DougManuel October 13, 2023 18:53

reikookamoto reviewed Oct 18, 2023

View reviewed changes

yulric added 3 commits November 22, 2023 14:33

Added @Keywords internal to private functions used by rec_with_table …

26be060

…so that pkgdown would not complaing about including them in the references when building the documentation website

Added the new how_to_haromnize vignette to the articles list in the d…

7a96197

…ocumentation website

Added extra functions to the references list in the documentation web…

0fb49ed

…site

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vignette for harmonizing full dataset #126

Vignette for harmonizing full dataset #126

kittychenn commented Sep 11, 2023

JuanLiOHRI left a comment

yulric left a comment

yulric Sep 18, 2023

kittychenn Sep 18, 2023

yulric Oct 13, 2023

yulric commented Oct 13, 2023

yulric commented Oct 13, 2023

reikookamoto commented Oct 13, 2023 via email

yulric commented Oct 13, 2023 •

edited

kittychenn commented Oct 13, 2023

reikookamoto left a comment

reikookamoto Oct 18, 2023

reikookamoto Oct 18, 2023

reikookamoto Oct 18, 2023

reikookamoto Oct 18, 2023

reikookamoto Oct 18, 2023

yulric Nov 22, 2023 •

edited

reikookamoto Oct 18, 2023

yulric commented Nov 22, 2023 •

edited


		## Introduction

		This vignette explains how you can transform variables across multiple CCHS datasets using the full datasets to the _cchsflow_ package. The full PUMF datasets can be found [here](https://odesi.ca/). A full harmonized dataset of all cchsflow variables

	To harmonize the data files, the `rec_with_table()` function is used to transform the indicated variables.
	To harmonize the data files, the `cchsflow::rec_with_table()` function is used to transform the indicated variables.


		## How to combine a single variable across multiple cycles

		In this example, the sex variable from 2001 to 2018 CCHS datasets will be transformed and labeled using `rec_with_table()`, which is then combined into one dataset and labeled using `merge_rec_data()`.


		### Option 1: Using _cchsflow_ variable_details sheet

		When the variable argument in `rec_with_table()` is not specified, all variables listed in `variables.csv` and `variable_details.csv` will be transformed. In this example, all variables from the _cchsflow_ `variables.csv` and `variable_details.csv` sheets from 2001 to 2018 CCHS datasets will be transformed and labeled using `rec_with_table()`, which is then combined into one dataset and labeled using `merge_rec_data()`.


		### Option 2: Using your own variable_details sheet

		In this example, all variables from personalized `variables.csv` and `variable_details.csv` sheets from 2001 to 2018 CCHS datasets will be transformed and labeled using `rec_with_table()`, which is then combined into one dataset and labeled using `merge_rec_data()`.

Vignette for harmonizing full dataset #126

Are you sure you want to change the base?

Vignette for harmonizing full dataset #126

Conversation

kittychenn commented Sep 11, 2023

JuanLiOHRI left a comment

Choose a reason for hiding this comment

yulric left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yulric commented Oct 13, 2023

yulric commented Oct 13, 2023

reikookamoto commented Oct 13, 2023 via email

yulric commented Oct 13, 2023 • edited

kittychenn commented Oct 13, 2023

reikookamoto left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yulric Nov 22, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yulric commented Nov 22, 2023 • edited

yulric commented Oct 13, 2023 •

edited

yulric Nov 22, 2023 •

edited

yulric commented Nov 22, 2023 •

edited