-
Notifications
You must be signed in to change notification settings - Fork 16
Label Data
These are functions for adding metadata, such as variable and value labels, to your data. This is helpful when working with (importing/exporting) SPSS, SAS, or Stata datasets that allow for this kind of embedded metadata.
Almost all functions I cover here come from the haven
or labelled
package (with a brief dip into the rio
and sjPlot
package). I use labelled
mainly because it works best for my workflow, where I typically import/export data using the haven
package and it works well with the %>%
operator as well.
However, I do not cover the labelled::labelled_spss()
function in my examples because I find it has compatibility issues with other functions in the labelled
package. You can read about it here for more information.
The examples below can apply to SPSS, SAS, or Stata datasets. However the missing value functions I cover are SPSS specific. Functions for working with SAS and Stata missing values (such as tagged NAs) are not covered here but information on those functions can be found here.
Several Notes:
- When you add value labels using the
labelled
package, the class for those variables will become haven_labelled, unless you add value labels usinglabelled::labelled_spss()
, then the class will be haven_labelled_spss. - When you add missing value labels to a variable using any function in
labelled
, the class for that variable will become haven_labelled_spss. - When you add variable labels to a dataset those variables will not change class to haven_labelled or haven_labelled_spss unless you also add value labels or missing value labels using any
labelled
function or add variable labels usinglabelled::labelled_spss()
. - When you import data from SPSS, SAS, or Stata with labels using
haven
, the same rules as above will apply. Any variable with simply a variable label will not change class (ex: numeric). However, any variable with a value label will be haven_labelled. Also, if you import an SPSS file withhaven
using the user_na=TRUE option and you have missing value labels in your data, then the class for those variables will be haven_labelled_spss.
There is another package sjlabelled
that has similar label adding functions but do not update the variable class. The sjlabelled
package can be a great one for adding labels for the purposes of plotting, when you don't necessarily want to change your variable classes. More information on sjlabelled
can be found here.
A word of warning. There are times when the ordering of how you apply labels may matter. Every once in a while I have labels disappear (say if I apply the variable labels first and then later apply the value labels, my variable labels may disappear, I’m not sure why). If you have issues with labels disappearing, consider applying them in this order to preserve information:
- value labels
- na values
- variable labels
- [Calculate row sums or means with labelled NA values](See Calculate Row Values)
Main functions used in examples
Package | Functions |
---|---|
haven | read_sav(); write_sav() |
labelled | set_value_labels(); val_labels(); add_value_labels(); labelled(); set_na_values(); na_values; set_variable_labels(); var_label(); look_for(); copy_labels_from() |
sjPlot | view_df() |
rio | characterize() |
Other functions used in examples
Package | Functions |
---|---|
dplyr | across(); mutate(); filter(); select() |
snakecase | to_sentence_case() |
tidyselect | starts_with(); everything() |
knitr | kable() |
base | as.list() |
openxlsx | write.xlsx() |
purrr | map() |
stringr | str_replace_all() |
tibble | deframe() |
Resources
- https://www.pipinghotdata.com/posts/2020-12-23-leveraging-labelled-data-in-r/
- https://cran.r-project.org/web/packages/labelled/vignettes/intro_labelled.html
- https://cran.r-project.org/web/packages/labelled/labelled.pdf
- https://www.rdocumentation.org/packages/labelled/versions/2.7.0
- https://martinctc.github.io/blog/working-with-spss-labels-in-r/
- https://joseph.larmarange.net/intro_labelled.html
- http://larmarange.github.io/labelled/reference/var_label.html
- https://raw.githubusercontent.com/rstudio/cheatsheets/main/labelled.pdf
- https://wlm.userweb.mwn.de/SPSS/wlmsmiss.htm
- https://stackoverflow.com/questions/43529972/set-missing-values-for-multiple-labelled-variables