# **Lab 9: Functions, vectors and lists**

Brian Manzo (thank you to Yanxin for preparing this week's lab) 

Wednesday 8:30-9:50am, ~~USB 2260~~ [Zoom](https://umich.instructure.com/courses/387338/external_tools/25194)

In [1]:
library(tidyverse)

Registered S3 methods overwritten by 'ggplot2':
  method         from 
  [.quosures     rlang
  c.quosures     rlang
  print.quosures rlang
Registered S3 method overwritten by 'rvest':
  method            from
  read_xml.response xml2
── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 3.1.1     ✔ purrr   0.3.2
✔ tibble  3.0.3     ✔ dplyr   1.0.2
✔ tidyr   1.1.2     ✔ stringr 1.4.0
✔ readr   1.3.1     ✔ forcats 0.4.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()


## Anatomy of a function
To write a function we should first think about the inputs and output. A function takes input(s), does something(s) to them, and then returns an output.

For more information on functions, [R for Data Science](https://r4ds.had.co.nz/functions.html) is a good reference

Suppose we want to rescale a variable (that is, restrict it to the range 0-1). What are the input(s) and output of our rescale function?

```
df$a <- (df$a - min(df$a, na.rm = TRUE)) / (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))
```

In [6]:
rescale01 <- function(x) {
#  ^ function name   ^ function argument (input vector)
    (x - min(x, na.rm = TRUE)) / (max(x, na.rm = TRUE) - min(x, na.rm = TRUE))
#   ^ function output
}
x = c(1:10)
rescale01(x)

There appears to be a bug in the function, yielding NaN. 

In [5]:
rescale01 = function(x) {
  rng = range(x, na.rm = TRUE, finite = TRUE)
  (x - rng[1]) / (rng[2] - rng[1])
}
rescale01(x)

Why would we want to use a function in the first place, rather than just doing all of this as needed?

## Conditions

The condition part of the if statement must evaluate to either a single TRUE or FALSE. If it does not, you will get a warning:

In [13]:
if (c(TRUE, FALSE)) { 
    1 
}

“the condition has length > 1 and only the first element will be used”

Similarly, a condition of NA will generate an error:

In [17]:
if (NA) { 
    1 
}

ERROR: Error in if (NA) {: missing value where TRUE/FALSE needed


An example of a working conditional statement is 

In [29]:
if(3 < 5){
    print("3 is less than 5")
} else {
    print("3 is at least 5")
}

[1] "3 is less than 5"


In [30]:
3 < 5

### Logical operators

Often you will need to combine multiple logical conditions in an if statement. To do this we have the `&&` and `||` operators, which take the logical and and or, respectively, of several logical conditions:

In [32]:
TRUE && FALSE && TRUE

In [33]:
FALSE || TRUE || FALSE

There is a subtle but important difference betwen the single and double versions of these operators. The single `&` performs entrywise AND over logical vectors:

In [39]:
c(TRUE, TRUE, FALSE) & c(FALSE, TRUE, FALSE)

In contrast, the double ampersand `&&` returns `F` as soon as it encounters a value of `F`:

In [44]:
c(TRUE, TRUE, TRUE) || c(FALSE, TRUE, FALSE)

It only returns `T` if it gets to the end of a vector without finding any `F` values:

In [51]:
#c(TRUE, TRUE, TRUE) && c(TRUE, TRUE, TRUE)
TRUE

This is known as "short-circuiting": R can stop evaluating as soon as it hits one false value, since this will cause the & to return false.

What is the expected output of the two lines of code I've commented out below?

In [60]:
f = function() { print("f called"); FALSE }
g = function() { print("g called"); TRUE }

f() && g()
g() && f()

[1] "f called"


[1] "g called"
[1] "f called"


The or operator works similarly:

In [61]:
g() || f()

f() || g()

[1] "g called"


[1] "f called"
[1] "g called"


### Testing for equality

Be careful when testing for equality in conditionals. The == operator will return a vector of logicals. If you want to make sure that any/all entries of a vector are TRUE, use the any() or all() functions:

In [71]:
v1 = c(1, 2, 3)
v2 = c(1, 1, 2)
if (v1 == v2) { print("Wrong!") }else{print("Right!")}
if (all(v1 == v2)) { print("All!") }else{ print("Not all!")}
if (any(v1 == v2)) { print("Any!") }

“the condition has length > 1 and only the first element will be used”

[1] "Wrong!"
[1] "Not all!"
[1] "Any!"


Also be wary of testing floating point numbers for equality:

In [74]:
2 == sqrt(2) ^ 2

In [75]:
sqrt(2) ^ 2

If you need to do this, use the `near()` function instead:

In [80]:
near(2, sqrt(2) ^ 2)

### Multiple conditions

Sometimes you will want to check multiple conditions using an if statement. For example, let's define the function:
$$
sign(x)=\begin{cases}
-1, x<0\\
0, x=0\\
1, x>1
\end{cases}$$

The general form is

```
if (this) {
   do that
} else if (that) {
   do something else
} else {
   
}
```

**Exercise:** Write an R function `sign_fn(x)` replicates the behavior of $sign(x)$

In [81]:
sign_fn = function(x){
    if(x < 0){
        sign_x = -1
    } else if (x == 0){
        sign_x = 0
    } else {
        sign_x = 1
    }
    return(sign_x)
}

In [84]:
sign_fn(0.00000001)

## Function arguments

Functions can take multiple arguments. Generally they fall into one of two categories:

*   Data to be processed by the function, and
*   Options, which affect how the data gets processed.


### Rules for function arguments

Generally:

*   The data parameters should come first; and
*   The options should come second, and have sensible defaults.

Default parameter values are specified by the option=default notation:

In [86]:
mean_ci <- function(x, conf = 0.95) {
  se <- sd(x) / sqrt(length(x))
  alpha <- 1 - conf
  mean(x) + se * qnorm(c(alpha / 2, 1 - alpha / 2))
}

In [97]:
set.seed(1)
x = rnorm(1000, 5, 3)
mean_ci(x, conf=0.95)


When you call a function, you can omit the values of the default arguments. If overriding the default, you should specify the parameter you are overriding and then input the overridden value with an = in between:



```
mean_ci(c(1, 2, 3, 4), conf=.99) #yes
mean_ci(c(1, 2, 3, 4), .99)  # no

```

**Exercise:** Write a function which takes two arguments, a vector of numbers `x`, and a percentage `p` by which to multiply each number. Make the default value `p=0.5`.

In [123]:
percent_x <- function(x, p=0.5){
    return(x*p)
}

## Validation

When writing functions it's a good idea to validate the input -- that is, make sure it matches your assumptions about what is being passed to the function. Consider the following function which returns the weighted average of a vector:

In [114]:
w_mean = function(x, w) {
    (x * w) / sum(w)
}

This function relies implicitly on the fact that the weight vector `w` is the same length as the input vector `x`. If it's not, you'll get a warning and unexpected behavior.

In [115]:
w_mean(c(1,2,3), w=c(1, 2))

“longer object length is not a multiple of shorter object length”


It's best to make the assumption of equal length explicit by checking it:

In [111]:
stopifnot(1==2)
print(1)

ERROR: Error: 1 == 2 is not TRUE


**Exercise:** Use `stopifnot` in the function `w_mean` to validate the input (ensure that `w` has the same length as `x`)

In [119]:
w_mean = function(x, w) {
    stopifnot(length(x)==length(w))
    (x * w) / sum(w)
}

In [122]:
w_mean(c(1,2,3), w=c(1, 2))
# uncomment after completing exercise

ERROR: Error in w_mean(c(1, 2, 3), w = c(1, 2)): length(x) == length(w) is not TRUE


## ...

Some functions are designed to take a variable number of inputs. We saw this for example with the str_c function:

In [124]:
stringr::str_c("a", "b")
stringr::str_c("a", "b", "c", "d")

To construct a function that takes a variable number of arguments we use the `...` notation:

```
f = function(...) {
    <do something with variable arguments>
}

```
One thing you can do with the ... is pass it to another function:

In [125]:
commas <- function(...) stringr::str_c(..., collapse = ", ")
commas(letters[1:10])