# Putting dplyr code into functions

dplyr uses non-standard evaluation (NSE), which allows you to reference column names without quotation marks, but introduces a million other problems - especially if you want to put dplyr code into functions.

Since I think this is a terrible idea and the documentation on how to do this is useless here's my hate-fuelled cheatsheet.

In [2]:
library(dplyr)
library(rlang)
data(iris)

Here's an example using dplyr to filter a column.

In [3]:
out = iris %>%
    filter(Sepal.Width > 4)

head(out, 20)

Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
5.7,4.4,1.5,0.4,setosa
5.2,4.1,1.5,0.1,setosa
5.5,4.2,1.4,0.2,setosa


Note that `Sepal.Width` is not a string, but isn't a variable either.

In [4]:
Species = "this shouldn't work"
Petal.Length = "but it does"

iris %>%
    filter(Species == "setosa" & Petal.Length > 1.5)

Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
5.4,3.9,1.7,0.4,setosa
4.8,3.4,1.6,0.2,setosa
5.7,3.8,1.7,0.3,setosa
5.4,3.4,1.7,0.2,setosa
5.1,3.3,1.7,0.5,setosa
4.8,3.4,1.9,0.2,setosa
5.0,3.0,1.6,0.2,setosa
5.0,3.4,1.6,0.4,setosa
4.7,3.2,1.6,0.2,setosa
4.8,3.1,1.6,0.2,setosa


The problem is when you naively stick this code into a function.

In [5]:
get_wide_sepal <- function(data, colname, val = 4) {
    ans <- data %>%
        filter(colname > val)
    return(ans)
}

get_wide_sepal(iris, colname=Sepal.Width)

ERROR: Error in filter_impl(.data, quo): Evaluation error: object 'Sepal.Width' not found.


Here R thinks that `Sepal.Width` is a variable, but we can't use a string for the column name either as dplyr doesn't accept strings.

In [6]:
out = get_wide_sepal(iris, colname="Sepal.Width")

head(out, 20)

Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
5.1,3.5,1.4,0.2,setosa
4.9,3.0,1.4,0.2,setosa
4.7,3.2,1.3,0.2,setosa
4.6,3.1,1.5,0.2,setosa
5.0,3.6,1.4,0.2,setosa
5.4,3.9,1.7,0.4,setosa
4.6,3.4,1.4,0.3,setosa
5.0,3.4,1.5,0.2,setosa
4.4,2.9,1.4,0.2,setosa
4.9,3.1,1.5,0.1,setosa


This is even worse as it doesn't tell us anything is wrong and returns the entire dataframe without filtering.

-----------

## So how do you do it?

So to get around NSE and use variables with dplyr, you have to use quosures.

#### What the hell is a quosure?
No idea.  
I could rant about this for hours, but instead I'll just demonstrate how to use it rather than try to explain the nonsense.

In [7]:
get_wide_sepal <- function(data, colname, val = 4) {
    colname <- enquo(colname)
    ans <- data %>%
        filter((!!colname) > val)
    return(ans)
}

Enquote your variable with `enquo()`, then reference it with two exclamation marks beforehand, and wrap it in brackets -- obviously...

If you don't use ` !! ` and the brackets then your code will silently fail and just return the wrong answer.


You now have a function that also uses NSE, so you refer to the column name as a variable. **This is a terrible idea**, if you ever want to use this function within another function you'll have to use 2 layers of this quosure nonsense. **Don't write functions that use NSE.**

In [8]:
out = get_wide_sepal(iris, Sepal.Width)

head(out, 20)

Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
5.7,4.4,1.5,0.4,setosa
5.2,4.1,1.5,0.1,setosa
5.5,4.2,1.4,0.2,setosa


### But NSE is the devil, I want to pass the column name as a string!!

So you have to use another package ... `rlang` and convert your string to a symbol with `sym()`, then reference it with three exclamation marks before it, and wrap it in brackets.

If you do any of this wrong your code will silently fail and return the wrong answer without you knowing.

In [9]:
get_wide_sepal <- function(data, colname, val = 4) {
    colname <- rlang::sym(colname)
    ans <- data %>%
        filter((!!!colname) > val)
    return(ans)
}

In [10]:
out = get_wide_sepal(iris, "Sepal.Width")

head(out, 20)

Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
5.7,4.4,1.5,0.4,setosa
5.2,4.1,1.5,0.1,setosa
5.5,4.2,1.4,0.2,setosa
