In [3]:
library(tidyverse)

# Indrection

The main challenge of data masking arises when you introduce some indirection, i.e. instead of directly typing the name of a variable you want to supply it in a function argument or character vector.

There are two main cases:

* If you want the user to supply the variable (or function of variables) in a function argument, embrace the argument: `{{argument}}`

In [13]:
dist_summary <- function(df, var) {
    df %>% summarize(min = min({{var}}), max = max({{var}}))
}

In [14]:
dist_summary(mtcars, cyl)

min,max
4,8


* If you have the column name as a character vector, use the .data pronoun, e.g. `summarise(df, mean = mean(.data[[var]]))`

In [18]:
var <- 'disp'
#Get the minimum and maximum value of column `disp`
mtcars %>% summarize(min_disp = min(.data[[var]]), max_disp = max(.data[[var]]))

min_disp,max_disp
71.1,472


# Dot-dot-dot (...)

When this modifier is applied to `...`, there is one other useful technique which solves the problem of creating a new variable with a name supplied by the user. Use the interpolation syntax from the glue package: `"{var}" := expression`. (Note the use of  `:=` instead of `=` to enable this syntax).

In [23]:
#1 Mile per gallon = 0.425143707 kilometers / l

var <- 'km/l'

mtcars %>%
mutate('{var}' := 0.425143707 * mpg) %>% head()

mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb,km/l
21.0,6,160,110,3.9,2.62,16.46,0,1,4,4,8.928018
21.0,6,160,110,3.9,2.875,17.02,0,1,4,4,8.928018
22.8,4,108,93,3.85,2.32,18.61,1,1,4,1,9.693277
21.4,6,258,110,3.08,3.215,19.44,1,0,3,1,9.098075
18.7,8,360,175,3.15,3.44,17.02,0,0,3,2,7.950187
18.1,6,225,105,2.76,3.46,20.22,1,0,3,1,7.695101
