Skip to content

Label Data

Cghlewis edited this page Feb 2, 2023 · 20 revisions

These are functions for adding metadata, such as variable and value labels, to your data. This is helpful when working with (importing/exporting) SPSS, SAS, or Stata datasets that allow for this kind of embedded metadata.

Almost all functions I cover here come from the haven or labelled package (with a brief dip into the rio and sjPlot package). I use labelled mainly because it works best for my workflow, where I typically import/export data using the haven package and it works well with the %>% operator as well.

However, I do not cover the labelled::labelled_spss() function in my examples because I find it has compatibility issues with other functions in the labelled package. You can read about it here for more information.

The examples below can apply to SPSS, SAS, or Stata datasets. However the missing value functions I cover are SPSS specific. Functions for working with SAS and Stata missing values (such as tagged NAs) are not covered here but information on those functions can be found here.

Several Notes:

  1. When you add value labels using the labelled package, the class for those variables will become haven_labelled, unless you add value labels using labelled::labelled_spss(), then the class will be haven_labelled_spss.
  2. When you add missing value labels to a variable using any function in labelled, the class for that variable will become haven_labelled_spss.
  3. When you add variable labels to a dataset those variables will not change class to haven_labelled or haven_labelled_spss unless you also add value labels or missing value labels using any labelled function or add variable labels using labelled::labelled_spss().
  4. When you import data from SPSS, SAS, or Stata with labels using haven, the same rules as above will apply. Any variable with simply a variable label will not change class (ex: numeric). However, any variable with a value label will be haven_labelled. Also, if you import an SPSS file with haven using the user_na=TRUE option and you have missing value labels in your data, then the class for those variables will be haven_labelled_spss.

There is another package sjlabelled that has similar label adding functions but do not update the variable class. The sjlabelled package can be a great one for adding labels for the purposes of plotting, when you don't necessarily want to change your variable classes. More information on sjlabelled can be found here.


A word of warning. There are times when the ordering of how you apply labels may matter. Every once in a while I have labels disappear (say if I apply the variable labels first and then later apply the value labels, my variable labels may disappear, I’m not sure why). If you have issues with labels disappearing, consider applying them in this order to preserve information:

  1. value labels
  2. na values
  3. variable labels

Add value labels

Add variable labels

Review labelled data

Copy labels

Convert numeric values to labels

Import/Export labelled data

Calculating variables with labelled NA


Main functions used in examples

Package Functions
haven read_sav(); write_sav()
labelled set_value_labels(); val_labels(); add_value_labels(); labelled(); set_na_values(); na_values; set_variable_labels(); var_label(); look_for(); copy_labels_from()
sjPlot view_df()
rio characterize()

Other functions used in examples

Package Functions
dplyr across(); mutate(); filter(); select()
snakecase to_sentence_case()
tidyselect starts_with(); everything()
knitr kable()
base as.list()
openxlsx write.xlsx()
purrr map()
stringr str_replace_all()
tibble deframe()

Resources

Clone this wiki locally