Skip to content

Create New Variables

Cghlewis edited this page Jan 31, 2023 · 64 revisions

There are endless reasons why you may need to create a new variable. Some examples are:

  • Create an ID variable
  • Create an indicator variable based on one or more existing variables in your data
    • Ex: control/treatment or at-risk/not at-risk
  • Calculate a rowwise summary score
  • Recoding or collapsing categories of existing variables into a new variable

The most common function for creating new (and updating existing) variables in a tidyverse fashion is dplyr::mutate(). You will see examples of this function being used throughout almost all scenarios in this resource. Almost any time you recode, create, change class, etc., you will be using the dplyr::mutate() function. Although there are a few other ways I will demonstrate to add columns.

Create a randomly generated column

Create a calculated column

Recode existing variables into indicator columns

Create a constant value column


Main functions used in examples

Package Functions
dplyr mutate()
tidyr pivot_wider()
tibble add_column()

Other functions used in examples

Package Functions
dplyr case_when(); across(); select(); if_else()
stringr str_count(); str_detect()
tidyselect contains()
janitor clean_names()
lubridate interval()
base months(); round(); ceiling(); as.numeric()
tidyr separate_rows()

Resources

Clone this wiki locally