# Lecture 6: Functions, Testing and Documentation in R

## Attribution:
- [Advanced R](https://adv-r.hadley.nz/) by Hadley Wickham
- [The Tidynomicon](http://tidynomicon.tech/) by Greg Wilson

### Lecture learning objectives:

* In R, define and use a named function that accepts parameters and returns values
* Describe lazy evaluation (variable arguments) and how it affects functions in R
* explain the importance of scoping and environments in R as they relate to functions
* Handle errors gracefully via exception handling
* Use `roxygen2` friendly function documentation to describe parameters, return values, description and example(s).
* Write comments within a function to improve readability

## Lexical scoping in R

R’s lexical scoping is a set of rules that helps to determine how R represents the value of a symbol. It follows several rules, we will see the following 3:

- Name masking
- A fresh start
- Dynamic lookup

#### Name masking

- Names defined inside a function mask names defined outside a function
- If a name isn’t defined inside a function, R looks one level up (and then all the way up into the global environment and even loaded packages!)
    

In [None]:
f <- function() {
  x <- 1
  y <- 2
  c(x, y)
}
f()
rm(f)

In [None]:
# When a name is not define inside the function, R will look one level up
x <- 1
g04 <- function() {
  y <- 2
  c(x,y)
}

In [None]:
g04()
rm(g04,x)

In [None]:
#The same rules apply if a function is defined inside another function:
x <- 1
g04 <- function() {
  y <- 2
  i <- function() {
    z <- 3
    c(x, y, z)
  }
  i()
}

In [None]:
g04()
rm(g04,x)

#### Dynamic lookup

- Lexical scoping determines where to look for values, not when to look for them. 
- R looks for values when the function is run, not when the function is created. 
- This means that the output of a function can differ depending on the objects outside the function’s environment.

In [None]:
g12 <- function() x + 1
x <- 15
g12()

x <- 20
g12()

In [None]:
x <- 10
f1 <- function(x) {
  function() {
    x + 10
  }
}
f1(1)()

#### A fresh start

- Every time a function is called a new environment is created to host its execution.

- This means that a function has no way to tell what happened the last time it was run; each invocation is completely independent.

- NOTE: ``exists()`` returns ``TRUE`` if there’s a variable of that name, otherwise it returns ``FALSE``

Talk through the following code with your neighbour and predict the output, then let's confirm the result by running the code.

In [None]:
g11 <- function() {
  if (!exists("a")) {
    a <- 1
  } else {
    a <- a + 1
  }
  a
}

g11()
g11()
g11()

## Defining functions in R

- Use `variable <- function(…arguments…) { …body… }` to create a function and give it a name

Example:

In [None]:
add <- function(x, y) {
  print(paste("The sum of",x, "+", y,"is", x+y))
	return (x + y)
}

add(5, 10)

- As in Python, functions in R are objects. This is referred to as “first-class functions”.
- The last line of the function returns a value, to return a value early use the special word `return`

In [None]:
add <- function(x, y) {
    if (!is.numeric(x) | !is.numeric(y)) {
        return("ERROR: one argument is not numeric") #missing value (Not available)
    }
    x + y
}

add(5, 2)
#add(5, "a")

### Lazy evaluation

In R, function arguments are lazily evaluated: they’re only evaluated if accessed.



In [None]:
add_one <- function(x, y) {
    x <- x + 1
    x
} 

add_one(5)

Knowing that, now consider the `add_one` function written in Python below:

```
# Python code (would this work?)
def add_one(x, y):
    x = x + 1
    return x

add_one(5)
```

From the list below, select the reason why the above add_one function will work in R, but the equivalent version of the function in python would break.

1. Python evaluates the function arguments before it evaluates the function and because it doesn't know what y is, it will break even though it is not used in the function.
2. R performs lazy evaluation, meaning it delays the evaluation of the function arguments until its value is needed within/inside the function.
3. The question is wrong, both functions would work in their respective languages. 
4. answer 1 & 2 are correct

### anonymous functions
- If you choose not to give a function a name, you get an anonymous function. 
- Anonymys functions are not bound to an identifier.
- This is useful when it’s not worth the effort to figure out a name:


In [None]:
# mtcars is a dataset provided in dplyr
# lapply: applies a Function over a List or Vector. This is a functional
lapply(mtcars, function(x) length(unique(x)))


## Functionals

A functional is a function that takes a function (and other things) as an input and returns a vector as output.

R has several other functionals like: `lapply`, `apply`, `tapply`, `integrate` or `optim`. In tydiverse we use  `purrr` for writing functionals

### How do we apply a function to all columns of a data frame?

Say, for example we wanted to calculate the median for each column in the `mtcars` data frame:

In [None]:
head(mtcars, 2)

In [None]:
medians <- vector("double", ncol(mtcars))
for (i in seq_along(mtcars)) {
    #How do we tell `median` to ignore NA's? Using `na.rm = TRUE`!
    medians[i] <- median(mtcars[[i]], na.rm = TRUE)
}

OK, then next we want to calculate the mean for all of the columns:

In [None]:
means <- vector("double", ncol(mtcars))
for (i in seq_along(mtcars)) {
    means[i] <- mean(mtcars[[i]], na.rm = TRUE)
}

OK, and then the variance...

In [None]:
variances <- vector("double", ncol(mtcars))
for (i in seq_along(mtcars)) {
    variances[i] <- var(mtcars[[i]], na.rm = TRUE)
}

This is getting a little repetitive... What are we repeating?

### Can we write this as a function?

Given that functions are objects in R, this seems reasonable!

In [None]:
mds_map <- function(x, fun)  {
    out <- vector("double", ncol(x))
    for (i in seq_along(x)) {
        out[i] <- fun(x[[i]], na.rm = TRUE)
    }
    out
}
mds_map(mtcars, min)

This is essentially the guts of `purrr::map_dbl`.

In [None]:
# This looks different from our mds_map function! Now, the output is of type list.
library(purrr)
map_dbl(mtcars, min)

### What if our data frame had missing values?

Let's make some to see the consequences..

In [None]:
mtcars_NA <- mtcars
mtcars_NA[1, 1] <- NA
head(mtcars_NA, 2)

In [None]:
map_dbl(mtcars_NA, median)

`map_dbl` returns a vector of type double.

How do we tell `median` to ignore NA's? Using `na.rm = TRUE`! But how do we add this to our `map_dbl` call?

### Solution!

Creating an anonymous function within the `purrr::map_dbl` function!

In [None]:
map_dbl(mtcars_NA, function(x) median(x, na.rm  = TRUE))

### Function composition

You have 3 options in R:

- assigning values to intermediate objects,
- nested function calls, or 
- the binary operator `%>%`, which is called the pipe and is pronounced as “and then”.


For example, imagine you want to compute the population standard deviation using `sqrt()` and `mean()` as building blocks, and we create the two functions:

In [None]:
square <- function(x) {
    x^2
}
deviation <- function(x) {
    x - mean(x)
}
x <- runif(100) # runif(X) generates X random deviates 

Option 1: assigning values to intermediate objects

In [None]:
out <- deviation(x)
out <- square(out)
out <- mean(out)
out <- sqrt(out)
out

Option 2: nested function calls

In [None]:
sqrt(mean(square(deviation(x))))

Option 3: the binary operator `%>%`, which is called the pipe and is pronounced as “and then”.

In [None]:
library(magrittr, quietly = TRUE) # also loaded as a dependency of dplyr and tidyverse

x %>%
  deviation() %>%
  square() %>%
  mean() %>%
  sqrt()

#### What to choose?

Each of the three options has its own strengths and weaknesses:

Intermediate objects:
- requires you to name intermediate objects. This is a strength when objects are important, but a weakness when values are truly intermediate.

Nesting:
- is concise, and well suited for short sequences. 
- But longer sequences are hard to read because they are read inside out and right to left. 

Piping:
- allows you to read code in straightforward left-to-right fashion and doesn’t require you to name intermediate objects. 
- But you can only use it with linear sequences of transformations of a single object.
- It also requires an additional third party package and assumes that the reader understands piping.

## Writing tests in R with test_that

- Industry standard tool for writing tests in R is the [`testthat` package](https://testthat.r-lib.org/).
- To use an R package, we typically load the package into R using the `library` function:

In [None]:
library(testthat)

#### How to write a test with `testthat::test_that`

```
test_that("Message to print if test fails", expect_*(...))
```

Often our `test_that` function calls are longer than 80 characters, so we use `{` to split the code across multiple lines, for example:

In [None]:
x <- c(3.5, 3.5, 3.5)
y <- c(3.5, 3.5, 3.49999)
test_that("x and y should contain the same values", {
    expect_equal(x, y)
})

#### Common `expect_*` statements for use with `test_that`

#### Is the object equal to a value? 
- `expect_identical` - test two objects for being exactly equal
- `expect_equal` - compare R objects x and y testing ‘near equality’ (can set a tolerance)
- `expect_equivalent` - compare R objects x and y testing ‘near equality’ (can set a tolerance) and does not assess attributes

#### Does code produce an output/message/warning/error?
- `expect_error` - tests if an expression throws an error
- `expect_warning` - tests whether an expression outputs a warning
- `expect_output` - tests that ``print()`` output matches a specified value

In [None]:
#expect_error
f_error <- function() stop("My error!")

expect_error(f_error())
expect_error(f_error(), "My error!")

In [None]:
f <- function(x) {
  if (x < 0) {
    warning("*x* is already negative")
    return(x)
  }
  -x
}
expect_warning(f(-1))
expect_warning(f(-1), "is already negative")
expect_warning(f(1), NA) 


In [None]:
str(mtcars)

In [None]:
expect_output(str(mtcars),"32 obs.")
expect_output(str(mtcars),"$ mpg", fixed = TRUE)

#### Is the object true/false?

These are fall-back expectations that you can use when none of the other more specific expectations apply. The disadvantage is that you may get a less informative error message.

- `expect_true` - tests if the object returns `TRUE`
- `expect_false` - tests if the object returns `FALSE`

#### Challenge 1: 

Add a tolerance arguement to the `expect_equal` statement such that the observed difference between these very similar vectors doesn't cause the test to fail.

In [None]:
x <- c(3.5, 3.5, 3.5)
y <- c(3.5, 3.5, 3.49999)
test_that("x and y should contain the same values", {
    expect_equal(x, y)
})

#### Unit test example 

In [None]:
celsius_to_fahr <- function(temp) {
  (temp * (9 / 5)) + 32
}

In [None]:
test_that("Temperature should be the same in Celcius and Fahrenheit at -40", {
        expect_identical(celsius_to_fahr(-40), -40)
    })
test_that("Room temperature should be about 23 degrees in Celcius and 73 degrees Fahrenheit", {
        expect_equal(celsius_to_fahr(23), 73, tolerance = 1)
    })

### Exception handling in R

How to check type and throw an error if not the expected type:

In [None]:
if (!is.numeric(c(1, 2, "c")))
  stop("Cannot compute of a vector of characters.")

Example of defensive programming at the beginning of a function:

In [None]:
fahr_to_celsius <- function(temp) {
    if(!is.numeric(temp)){
        stop("Cannot calculate temperature in Farenheit for non-numerical values")
    }
    (temp - 32) * 5/9
}

In [None]:
fahr_to_celsius("thirty")

If you wanted to issue a warning instead of an error, you could use warning in place of stop in the example above. However, in most cases it is better practice to throw an error than to print a warning...

#### We can test our exceptions using test_that:

In [None]:
test_that("Non-numeric values for temp should throw an error", {
    expect_error(fahr_to_celsius("thirty"))
    expect_error(fahr_to_celsius(list(4)))
    })

### Test-driven development (TDD) review

1. Write your tests first (that call the function you haven't yet written), based on edge cases you expect or can calculate by hand

2. If necessary, create some "helper" data to test your function with (this might be done in conjunction with step 1)

3. Write your function to make the tests pass (in this process you might think of more tests that you want to add)

#### `try` in R

Similar to Python, R has a `try` function to attempt to run code, and continue running subsequent code even if code in the try block does not work:

```
try({
    # some code
    # that can be 
    # split across several
    # lines
})

# code to continue even if error in code 
# in try code block above
```

This code normally results in an error that stops following code from running:

In [None]:
x <- data.frame(col1 = c(1, 2, 3, 2, 1), 
                col2 = c(0, 1, 0, 0 , 1))
x[3]
dim(x)

Try let's the code following the error run:

In [None]:
try({x <- data.frame(col1 = c(1, 2, 3, 2, 1), 
                     col2 = c(0, 1, 0, 0 , 1))
     x[3]
})
dim(x)

### `roxygen2` friendly function documentation 

In [None]:
#' Converts temperatures from Fahrenheit to Celsius.
#'    
#' @param temp a vector of temperatures in Fahrenheit
#' 
#' @return a vector of temperatures in Celsius
#' 
#' @examples
#' fahr_to_celcius(-20)
fahr_to_celsius <- function(temp) {
    (temp - 32) * 5/9
}

Why `roxygen2` documentation? If you document your functions like this, *when* you create an R package to share them they will be set up to have the fancy documentation that we get using `?function_name`.

#### RStudio has template for `roxygen2` documentation

<img src="insert_roxygen.png" width=500>