## Lecture 6 theme

There are 4 key themes to this lecture:

1. writing functions

2. documentation

3. testing functions

4. Exception handling

5. Outro (if time permits)

### Clicker 1

In [4]:
library(gapminder)
library(testthat)
library(tidyverse)
options(repr.matrix.max.rows = 10)

## Theme 1: Writing functions

Let's say we have code that we are repeatedly using? Let's write a function to make this less work in the future (and to make less mistakes)!

Imagine a world before the {tidyverse} and the thing we keep repeating is filtering for rows that exactly match a value:

In [5]:
## add some comments (roxygen style)
exact_match <- function(dataframe,col,value) {
    ## add exception
    ## WOW my exciting function will be here - body
    ## return dataframe
}

In [6]:
gapminder

country,continent,year,lifeExp,pop,gdpPercap
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>
Afghanistan,Asia,1952,28.801,8425333,779.4453
Afghanistan,Asia,1957,30.332,9240934,820.8530
Afghanistan,Asia,1962,31.997,10267083,853.1007
Afghanistan,Asia,1967,34.020,11537966,836.1971
Afghanistan,Asia,1972,36.088,13079460,739.9811
⋮,⋮,⋮,⋮,⋮,⋮
Zimbabwe,Africa,1987,62.351,9216418,706.1573
Zimbabwe,Africa,1992,60.377,10704340,693.4208
Zimbabwe,Africa,1997,46.809,11404948,792.4500
Zimbabwe,Africa,2002,39.989,11926563,672.0386


In [7]:
gapminder[gapminder[['country']] == "Canada", ]

country,continent,year,lifeExp,pop,gdpPercap
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>
Canada,Americas,1952,68.75,14785584,11367.16
Canada,Americas,1957,69.96,17010154,12489.95
Canada,Americas,1962,71.30,18985849,13462.49
Canada,Americas,1967,72.13,20819767,16076.59
Canada,Americas,1972,72.88,22284500,18970.57
⋮,⋮,⋮,⋮,⋮,⋮
Canada,Americas,1987,76.860,26549700,26626.52
Canada,Americas,1992,77.950,28523502,26342.88
Canada,Americas,1997,78.610,30305843,28954.93
Canada,Americas,2002,79.770,31902268,33328.97


We can abstract this to a function, which we'll call `exact_match`:

In [8]:
exact_match <- function(df,col,value) {
    ## WOW my exciting function will be here
    df[df[[col]] == value, ]
}

Now let's use our function, which is more verbose (and tidyverse-like) than base R:


In [9]:
exact_match(gapminder, 'country', 'Canada')

country,continent,year,lifeExp,pop,gdpPercap
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>
Canada,Americas,1952,68.75,14785584,11367.16
Canada,Americas,1957,69.96,17010154,12489.95
Canada,Americas,1962,71.30,18985849,13462.49
Canada,Americas,1967,72.13,20819767,16076.59
Canada,Americas,1972,72.88,22284500,18970.57
⋮,⋮,⋮,⋮,⋮,⋮
Canada,Americas,1987,76.860,26549700,26626.52
Canada,Americas,1992,77.950,28523502,26342.88
Canada,Americas,1997,78.610,30305843,28954.93
Canada,Americas,2002,79.770,31902268,33328.97


In [10]:
# Tell them about positional and keyword arguments 
exact_match(gapminder, col = 'country', value = 'Canada')
# exact_match(gapminder, 'Canada','country')

country,continent,year,lifeExp,pop,gdpPercap
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>
Canada,Americas,1952,68.75,14785584,11367.16
Canada,Americas,1957,69.96,17010154,12489.95
Canada,Americas,1962,71.30,18985849,13462.49
Canada,Americas,1967,72.13,20819767,16076.59
Canada,Americas,1972,72.88,22284500,18970.57
⋮,⋮,⋮,⋮,⋮,⋮
Canada,Americas,1987,76.860,26549700,26626.52
Canada,Americas,1992,77.950,28523502,26342.88
Canada,Americas,1997,78.610,30305843,28954.93
Canada,Americas,2002,79.770,31902268,33328.97


### Lexical scoping in R (clicker)

R’s lexical scoping follows several rules, we will cover the following:

- Name masking - Names defined inside a function mask names defined outside a function
- Dynamic lookup -  Output of a function can differ depending on the objects outside the function’s environment
- A fresh start - A function has no way to tell what happened the last time it was run;
- Lazy evaluation - A function arguments are lazily evaluated: they’re only evaluated if accessed.

In [11]:
x <- 1
y <- 1
z <- 2
exp_masking <- function(x, name, z) {
  x <- 3
    # print(z)
  paste0("Hello ",name," ",x,y)
  }

x <- 2
y <- 2
name <- "Huang"

try({ exp_masking( name = "Elisa" ) })

try({ exp_masking( "Elisa" ) })

try({ paste0("Hello ",name," ",x,y) })

Error in paste0("Hello ", name, " ", x, y) : 
  argument "name" is missing, with no default


## Theme 2: Documenting our functions

We use `roxygen2` style documentation to do this. Why? Because when we package our code this will be useful for autogenerating our package documentation.

The basic `roxygen2` skeleton looks like this:

```
#' Title
#'
#' @param dataframe 
#' @param col 
#' @param value 
#'
#' @return
#' @export
#'
#' @examples
```

Let's customize this for our function:

In [12]:
#' Exact match
#'
#' Returns rows of a data frame where there is
#' an exact match of a value in a specified column.
#'
#' @param dataframe A data frame or tibble, from which to search for the matches.
#' @param col A quoted column name where the value will be searched for.
#' @param value A value to match on in the specified column.
#'
#' @return A data frame or tibble, with the rows where there is an exact match
#'   of a value in a specified column.
#' @export
#'
#' @examples
#' exact_match(mtcars, cyl, 2)
exact_match <- function(dataframe, col, value) {
    dataframe[dataframe[[col]] == value, ]
}

In [13]:
# ?mean

## Theme 3: Testing our functions

We should write tests to prove to our current & future selves, and collaborators that our code works! Let's do this now for the `exact_match` function above.

First, to make it easier to test, I am going to create some helper data:

In [14]:
helper_df <- tibble(course = c("DSCI 511",
                                  "DSCI 521",
                                  "DSCI 523",
                                  "DSCI 551",
                                  "DSCI 522"),
                       instructor = c("Arman",
                                     "Florencia",
                                     "Gittu",
                                     "Alexi",
                                     "Gittu"))
helper_df

course,instructor
<chr>,<chr>
DSCI 511,Arman
DSCI 521,Florencia
DSCI 523,Gittu
DSCI 551,Alexi
DSCI 522,Gittu


Now I can use this helper data to test my function for the type of object returned by my function:

In [15]:
test_that('function should return a data frame', {
    expect_s3_class(exact_match(helper_df, 'instructor', 'Gittu'), "data.frame")
})
# test_that("statement what you return",{
# enter all test cases
# })

[32mTest passed[39m 🥇


I should also test that expected values are returned. I will create two more helper data sets for this, one for what I would expect to see if I tried to subset the rows where "Gittu" was the instructor, and one for what I would expect to see if I tried to subset the rows where "DSCI 511" was the course:

In [16]:
helper_gittu_df <- tibble(course = c('DSCI 523', 'DSCI 522'), instructor = c('Gittu', 'Gittu'))
helper_arman_df <- tibble(course = c('DSCI 511'), instructor = c('Arman'))

helper_gittu_df
helper_arman_df

course,instructor
<chr>,<chr>
DSCI 523,Gittu
DSCI 522,Gittu


course,instructor
<chr>,<chr>
DSCI 511,Arman


In [17]:
test_that('function should return rows matching the value in query column', {
    expect_equal(exact_match(helper_df, 'instructor', 'Gittu'), helper_gittu_df)
    expect_equal(exact_match(helper_df, 'course', 'DSCI 511'), helper_arman_df)
})

[32mTest passed[39m 🥇


## Theme 4: Exception handling

We should program defensively, and make our functions fail hard and fast when things go wrong. What often goes wrong? The user inputs the wrong type! Let's write exceptions the throw when that happens!

In [18]:
#' Exact match
#'
#' Returns rows of a data frame where there is
#' an exact match of a value in a specified column.
#'
#' @param dataframe A data frame or tibble, from which to search for the matches.
#' @param col A quoted column name where the value will be searched for.
#' @param value A value to match on in the specified column.
#'
#' @return A data frame or tibble, with the rows where there is an exact match
#'   of a value in a specified column.
#' @export
#'
#' @examples
#' exact_match(mtcars, cyl, 2)
exact_match <- function(dataframe, col, value) {
    if (!is.data.frame(dataframe)) {
        stop('exact_match expects a data frame object')
    }
    if (!is.character(col) | length(col) != 1) {
        stop('col should be a character vector of length 1')
    }
    dataframe[dataframe[[col]] == value, ]
}

Now that we have written these exceptions, we should test whether they are doing what we expect!

In [19]:
test_that('function should throw an error if the dataframe argument is not a data.frame', {
    expect_error(exact_match(c('course', 'instructor'), 'instructor', 'Gittu'))
})
test_that('function should throw an error if the col argument is not a character vector or of length 1', {
    expect_error(exact_match(helper_df, c('course', 'instructor'), 'Gittu'))
    expect_error(exact_match(helper_df, 1, 'Gittu'))
})

[32mTest passed[39m 🎉
[32mTest passed[39m 😀


## Theme 5: Outro (if time permits)

- Make sure it is tidy
I have a function ready with all the great pieces, but is it tidy? Make sure it uses [tidyverse style guide](https://style.tidyverse.org/) and uses roxygen2-style comments (I already dealt with comments). The {[styler](https://styler.r-lib.org/)} package is super helpful, but don't just rely on it.
- Introduction to R packages
- Why am I worried about writing comments in roxygen2-style - useful when you learn about writing packages
- Functions - demo with weathermetrics

### What did we learn today?

- How to write and test functions in R

- Lexical scoping in R
 
- How to handle exceptions

- How to source functions from other files

- A little bit about what R packages are

Remember about the function names, the function that is defined by you is given preference over the ones from packages

In [20]:
library(weathermetrics)

In [21]:
## See now it is using the function from weathermetrics package
fahrenheit.to.celsius(-40)

I am going to define my function with the same name as `fahrenheit.to.celsius`

In [22]:
fahrenheit.to.celsius <- function(temp) {
    if(!is.numeric(temp)){
        stop("Cannot calculate temperature in Farenheit for non-numerical values")
    }
    paste0((temp - 32) * 5/9," ","C")
}

In [23]:
## Now it is using the function that you defined
fahrenheit.to.celsius(-40)

Now there is a conflict between your defined function and the function from weathermetrics package. So if you want to use the function from `weathermetrics` package you use `weathermetrics::fahrenheit.to.celsius()`

In [26]:
## Here you go
weathermetrics::fahrenheit.to.celsius(-40)

It is very important to keep this in mind as this is applicable for functions with the same name from other packages; To see what I am saying, refresh your jupyter and load package `tidyverse`

In [27]:
library(tidyverse)

Did you see conflicts ?

```
────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
```

So if you use just filter then you are using `dplyr::filter()` instead of `stats::filter()`. If you want to use `filter()` from `stats` package then you have to call it explicitly using `stats::filter()`. ( Note: `stats::filter()` might be completely doing a different thing , to see `?stats::filter()`)

You might run into this issue (very rarely, though!) in the future when you use some other packages and accidentally end up using a function from another package.