## Announcements

- Sample quiz I will release today
- Lab 3 grades before Friday morning. (I don't want to give you a warning, so please read https://ubc-mds.github.io/policies/#re-grading )
- All solutions will be posted before Sunday morning.
- Today's events 
    - until approx 1 PM, I will be here. 
    - 1 - 2 PM Andy ORCH 4074, 
    - 2 - 4 PM Lab ORCH 4074, 
    - 4 - 5 PM Yulia ORCH 4074
    - 4 - New programmers session 
- Evaluation (10 min)

## Lecture 8 theme

Key theme to this lecture:

1. Tidy evaluation

In [1]:
library(gapminder)
library(tidyverse)
options(repr.matrix.max.rows = 5)
# library(rlang)

── [1mAttaching packages[22m ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.2 ──
[32m✔[39m [34mggplot2[39m 3.3.6     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.8     [32m✔[39m [34mdplyr  [39m 1.0.9
[32m✔[39m [34mtidyr  [39m 1.2.0     [32m✔[39m [34mstringr[39m 1.4.1
[32m✔[39m [34mreadr  [39m 2.1.2     [32m✔[39m [34mforcats[39m 0.5.1
── [1mConflicts[22m ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()


## Tidy evaluation comes in two forms; 
- Data masking ( we are going to cover this)
- Tidy selection (we will not cover this)

This is a deep topic, and there is A LOT that can be learned 

My Goal: Make you capable of writing functions that call tidyverse functions.

If you want to go further, go for it! but that is beyond the scope of MDS.
## Some terms before we get started:
- ***Data masking:*** blurs the lines between environment variables and data variables.
- ***Environment variables:***  Variables that live in an environment and usually get created with `<-` 
- ***Data variables:*** Statistical variables that live in a data frame.

## Example to aid with definitions:

In [2]:
dataframe <- data.frame(col1 = 1:3,col2 = 5:7)
dataframe$col1
# col1
select(dataframe,col1)
# col1

col1
<int>
1
2
3


## Writing functions with unquoted column names

Let's use `gapminder` again so that we can keep our focus on the task at hand:


In [3]:
gapminder

country,continent,year,lifeExp,pop,gdpPercap
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>
Afghanistan,Asia,1952,28.801,8425333,779.4453
Afghanistan,Asia,1957,30.332,9240934,820.8530
Afghanistan,Asia,1962,31.997,10267083,853.1007
⋮,⋮,⋮,⋮,⋮,⋮
Zimbabwe,Africa,2002,39.989,11926563,672.0386
Zimbabwe,Africa,2007,43.487,12311143,469.7093


Let's now write a function which gives summaries of central tendencies (e.g., mean, median, mode) for  a numeric column. When we do this, we need to embrace the column names with `{{`:



In [4]:
central_tendency <- function(data, col) {
  data %>% 
    summarise(mean = mean( {{ col }}, na.rm = TRUE),
              median = median( {{ col }} , na.rm = TRUE))
}

Now let's use that function to learn about the `lifeExp` column of the `gapminder` data set:



In [5]:
central_tendency(gapminder, lifeExp)

mean,median
<dbl>,<dbl>
59.47444,60.7125


## Assignment with `:=`

When using embraced column names on the left-hand side of assignment, you need to use the `:=` to get the code blueprint right. Let's try adding the option of giving our summarised data frame flexible column names to the function we wrote above:


In [6]:
central_tendency <- function(data, col, mean_col_name, median_col_name) {
  data %>% 
    summarise({{ mean_col_name }} := mean({{ col }}, na.rm = TRUE),
              {{ median_col_name }} := median({{ col }}, na.rm = TRUE))
}

Now let's use that function to learn about the `lifeExp` column of the `gapminder` data set:



In [7]:
central_tendency(gapminder, lifeExp, mean_life_exp, med_life_exp)

mean_life_exp,med_life_exp
<dbl>,<dbl>
59.47444,60.7125


## Passing the dots

There can be a lot of braces when embracing column names, and making mistakes can be easy... So when possible, passing the dots is a nice alternative. Also, it lets you take multiple values separated by commas - which can be helpful when wrapping things like `filter` and `select`.

Here we write a function


In [8]:
filter_and_arrange <- function(data, arrange_by, ...) {
  data %>% 
    filter(...) %>% 
    arrange({{ arrange_by }})
}

Now let's use that function:



In [9]:
filter_and_arrange(gapminder, pop)

country,continent,year,lifeExp,pop,gdpPercap
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>
Sao Tome and Principe,Africa,1952,46.471,60011,879.5836
Sao Tome and Principe,Africa,1957,48.945,61325,860.7369
Djibouti,Africa,1952,34.812,63149,2669.5295
⋮,⋮,⋮,⋮,⋮,⋮
China,Asia,2002,72.028,1280400000,3119.281
China,Asia,2007,72.961,1318683096,4959.115


And again this time with two criteria to filter on:



In [10]:
filter_and_arrange(gapminder, pop, continent == "Oceania", year > 1999)

country,continent,year,lifeExp,pop,gdpPercap
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>
New Zealand,Oceania,2002,79.11,3908037,23189.8
New Zealand,Oceania,2007,80.204,4115771,25185.01
Australia,Oceania,2002,80.37,19546792,30687.75
Australia,Oceania,2007,81.235,20434176,34435.37


## What did we learn?
- data masking and its role in tidy evaluation
- programming with tidy-evaluated functions by embracing column names `{{ }}`
- the walrus `:=` operator for assignment when programming with tidy-evaluated functions
- Examples of pass the dots `...`

<img src="data/thanks.png">