# Lecture 8: Tidy evaluation

### Lecture learning objectives:

By then end of this lecture & worksheet 8, students should be able to:
* Describe data masking as it relates to the `dplyr` functions. Explain the problems it solves for interactive programming and the problems it creates for programming in a non-interactive setting
* Explain what the `enquo()` function and the `!!` operator do in R in the context of data masking as it relates to the `dplyr` functions
* Use the `{{` (read: curly curly) operator (abstracts quote-and-unquote into a single interpolation step) in R to write functions which wrap the `dplyr` functions


In [None]:
library(gapminder)
library(tidyverse)
options(repr.matrix.max.rows = 5)

### What Metaprogramming lets you do in R

- write `library(purrr)` instead of `library("purrr")`
- enable `plot(x, sin(x))` to automatically label the axes with `x` and `sin(x)`
- create a model object via `lm(y ~ x1 + x2, data = df)`
- and much much more (that you will see in Data Wrangling as we explore the tidyverse)

#### What is metaprogramming?

Code that writes code/code that mutates code.

#### Our narrow focus on metaprogramming for this course:

Tidy evaluation

#### Why focus on tidy evaluation

In the rest of MDS you will be relying on functions from the tidyverse to do a lot of:
- data wrangling
- statistics
- data visualization

## Tidy evaluation

The functions from the tidyverse are beautiful to use interactively.

In [None]:
gapminder

with base r:

In the tidyverse:

#### How does that even work?

- When functions like `filter` are called, there is a delay in evaluation and the data frame is temporarily promoted as first class objects, we say the data masks the workspace

- This is to allow the promotion of the data frame, such that it masks the workspace (global environment) 

- When this happens, R can then find the relevant columns for the computation

*This is referred to as data masking*

#### Back to our example:

What is going on here?

- code evaluation is delayed 
- the `filter` function quotes columns `country` and `year`
- the `filter` function then creates a data mask (to mingle variables from the environment and the data frame)
- the columns `country` and `year` and unquoted and evaluated within the data mask

In [None]:
filter(gapminder, country == "Canada", year == 1952)

#### Trade off of lovely interactivity of tidyverse functions...

#### programming with them can be more challenging.

Let's try writing a function which wraps filter for gapminder:

Why does `filter` work with non-quoted variable names, but our function `filter_gap` fail?

### Defining functions using tidy eval's `enquo` and `!!`:

Use `enquo` to quote the column names, and then `!!` to unquote them in context.

### Defining functions by embracing column names: `{{ }}`

- In the newest release of `rlang`, there has been the introduction of the `{{` (pronounced "curly curly") operator.

- Does the same thing as `enguo` and `!!` but (hopefully) easier to use.

In [None]:
filter_gap <- function(col, val) {
    col <- enquo(col)
    filter(gapminder, !!col == val)
}

filter_gap(country, "Canada")

### (Optional) Creating functions that handle column names as strings:

Sometimes you want to pass a column name into a function as a string (often useful when you are programming and have the column names as a character vector).

You can do this by using symbols + unquoting with `sym` + `!!` :

In [None]:
# example of what we want to wrap: filter(gapminder, country == "Canada")
filter_gap <- function(col, val) {
    col <- sym(col)
    filter(gapminder, !!col == val)
}

filter_gap("country", "Canada")

### The walrus operator `:=` is needed when assigning values
- `:=` is needed when addinging values with tidyevaluation

### Pass the dots when you can

If you are only passing on variable to a tidyverse function, and that variable is not used in logical comparisons, or in variable assignment, you can get away with passing the dots:

#### Notes on passing the dots

- the dots should be the last function argument (or you will not be able to use positional arguments)
- they are useful because you can add multiple arguments

For example:

In [None]:
sort_gap <- function(..., x) {
    print(x + 1)
    arrange(gapminder, ...)
}

sort_gap(year, continent, country, 2)

#### Pass the dots is not always the solution...

### When passing in different column names to different functions, embrace mutliple column names

In [None]:
square_diff_n_select <- function(data, ...) {
    data %>% 
        mutate(... := (... - mean(...))^2) %>% 
        select(...)
}

square_diff_n_select(mtcars, mpg, mpg:hp)

#### Combining embracing with pass the dots:

In [None]:
square_diff_n_select <- function(data, col_to_change, col_range) {
    data %>% 
        mutate({{ col_to_change }} := ({{ col_to_change }} - mean({{ col_to_change }}))^2) %>% 
        select({{col_range}})
}

square_diff_n_select(mtcars, mpg, mpg:hp)

## Programming defensively with tidy evaluation

You can embrace `{{` the column names in an `if` + `stop` statement to check user input when unquoted column names are used as function arguments.

First, we demonstrate how to check if a column is numeric using `DATA_FRAME  %>% pull({{ COLUMN_NAME }})` to access the column:

Next, we add a `if` + `stop` to this, to throw an error in our `square_diff_n_select` function when the column type is not what our function is designed to handle. Here our function works, as the `lifeExp` column in the `gapminder` data set is numeric.

In [None]:
square_diff_n_select <- function(data, col_to_change, ...) {
    data %>% 
        mutate({{ col_to_change }} := ({{ col_to_change }} - mean({{ col_to_change }}))^2) %>% 
        select(..., {{ col_to_change }})
}

square_diff_n_select(gapminder, lifeExp, country, year)

 Here our function throws an error, as the `continent` column in the `gapminder` data set is **not** numeric.

In [None]:
square_diff_n_select(gapminder, continent, country, year)

## What did we learn?
- data masking and its role in tidy evaluation
- programming with tidy-evaluated functions by embracing column names `{{ }}`
- the walrus `:=` operator for assignment when programming with tidy-evaluated functions
- more useful examples of pass the dots `...`

## Attribution:

- [Tidy evaluation](https://tidyeval.tidyverse.org/) by Lionel Henry & Hadley Wickham
- [Tidy eval in context](https://speakerdeck.com/jennybc/tidy-eval-in-context)  talk by Jenny Bryan
- [Programming in the tidyverse](https://dplyr.tidyverse.org/articles/programming.html) 
- [Advanced R](https://adv-r.hadley.nz/) by Hadley Wickham