This article is on writing sweet R
code
using the wrapr
package.
Consider the following R puzzle. You are given: a data.frame, the name of a column that you wish to find missing values (NA) in, and the name of a column to land the result. For instance:
d <- data.frame(x = c(1, NA))
print(d)
## x
## 1 1
## 2 NA
cname <- 'x'
print(cname)
## [1] "x"
rname <- paste(cname, 'isNA', sep = '_')
print(rname)
## [1] "x_isNA"
How do you write generic code to populate the column x_isNA with which rows of x are missing?
In “base R” (R without additional packages) this is easy.
When you know the column names while writing the code:
d2 <- d
d2$x_isNA <- is.na(d2$x)
print(d2)
## x x_isNA
## 1 1 FALSE
## 2 NA TRUE
And when you don’t know the column names while writing the code (but know they will arrive in variables later):
d2 <- d
d2[[rname]] <- is.na(d2[[cname]])
The “base R” solution really is quite elegant.
As far as I can tell the “all in” non-standard evaluation dplyr::mutate solution is something like the following.
When you know the column names while writing the code:
library("dplyr")
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
d %>% mutate(x_isNA = is.na(x))
## x x_isNA
## 1 1 FALSE
## 2 NA TRUE
And when you don’t know the column names while writing the code (but know they will arrive in variables later):
d %>%
mutate_(.dots =
stats::setNames(list(lazyeval::interp(
~ is.na(VAR),
VAR = as.name(cname)
)),
rname))
## Warning: mutate_() is deprecated.
## Please use mutate() instead
##
## The 'programming' vignette or the tidyeval book can help you
## to program with mutate() : https://tidyeval.tidyverse.org
## This warning is displayed once per session.
## x x_isNA
## 1 1 FALSE
## 2 NA TRUE
rlang obsoleted the above and now suggests code such as the following:
d %>%
mutate(!!rlang::sym(rname) := is.na(!!rlang::sym(cname)))
## x x_isNA
## 1 1 FALSE
## 2 NA TRUE
Note “the obvious” rlang solution does not work:
d %>%
mutate(!!rname := is.na(!!cname))
## x x_isNA
## 1 1 FALSE
## 2 NA FALSE
(though the above sort of notation is allowed “in select contexts”.)
We will only work the harder “when you don’t yet know the column name” (or parametric) version:
library("wrapr")
##
## Attaching package: 'wrapr'
## The following object is masked from 'package:dplyr':
##
## coalesce
let(list(CNAME = cname, RNAME = rname),
d %>% mutate(RNAME = is.na(CNAME))
)
## x x_isNA
## 1 1 FALSE
## 2 NA TRUE
I think that this is pretty sweet, and can really level up your dplyr game.
wrapr::let
is available from CRAN and already has a number of
satisfied users:
If function behavior depends on variable names, then convenient control of functions is eventually going to require convenient control of variable names; so needing to re-map variable names at some point is inevitable.