## Function components

1. R functions have 3 parts:  
(1) **body()**: code inside the function<br>
(2) **formals()**: list of arguments which controls how you can call the function<br>
(3) **environment()**: the "map" of the location of the functions' variables
<p>
<p>
    
Note:<br>
* When you print a function, it shows the 3 components. <br>
* If the environment isn't displayed, it means the function was created in global environment.
* Primitive functions (e.g. sum()) call C code directly with .Primitive() and contain no R code.<br>So their formals(), body(), environment are all NULL.

In [9]:
# one-line function format
# f <- function(x) x^2
# f <- function(x) x + 1
# f <- function(x) x / 2

# ------ Example 1: a function created in global environment

# create function: square a number
f <- function(x) x^2
cat("The function is:")
f

cat("\nThe body() of function is:")
body(f)

cat("\nThe formals() of function is:")
formals(f)

cat("\nThe environment() of function is:")
environment(f)

The function is:


The body() of function is:

x^2


The formals() of function is:

$x




The environment() of function is:

<environment: R_GlobalEnv>

In [15]:
# Example 2: function sum()

cat("The function is:")
sum

cat("\nThe body() of function is:")
body(sum)

cat("\nThe formals() of function is:")
formals(sum)

cat("\nThe environment() of function is:")
environment(sum)

The function is:


The body() of function is:

NULL


The formals() of function is:

NULL


The environment() of function is:

NULL

## Lexical scoping

1. Scoping:<br> 
(1) **Definition**: scoping is a set of rules that govern how R looks up value of a symbol.<br>
(2) **Scoping allow you to**: <br>
a. build tools by composing functions<br>
b. overrule the usual evaluation rules and do non-standard evaluation.
<p>
2. Two types of scoping<br>
(1) **Lexical scoping**: implemented automatically at language level.<br>
(2) **Dynamic scoping**: used in select functions to save typing during interactive analysis.

Note: <br>
* Lexical scoping looks up symbol values based on how functions were nested when they were created, <br>not how they are nested when they are called. <br>With lexical scoping, you don't need to know how the function is called to <br>figure out where the value of a variable will be looked up. <br>You just need to look at the function's definition.<br>

3. Four basic principles behind R's implementation of lexical scoping<br>
(1) name masking <br>
(2) functions vs. variables <br>
(3) a fresh start <br>
(4) dynamic lookup

In [16]:
# Example 1: scoping is set of rules that R applies to go from symbol x to its value 10
x <- 10
x

### Name masking

1. **Name masking rules**<br>
(1) R automatically look up name inside a function<br>
(2) If a name isn't defined inside a function, R will look one level up<br>
(3) If a function is defined inside another function: look inside the current function, <br>
then where that function was defined, and so on, all the way up to the global environment, <br>
and then on to other loaded packages.<br>
(4) The same rules apply to closures, functions created by other functions.

In [25]:
# multiple-line function format
# f <- function() {
#      details here
# }


# Example 1: a basic function that illustrates name masking principle

# define function: vector contain x & y
f <- function() {
    # define x
    x <- 1
    # define y
    y <- 2
    c(x, y)
}

# call function
f()

In [20]:
# remove function
# rm(): use this function to delete objects from memory
rm(f)

In [4]:
# Example 2: If a name isn't defined inside a function, R will look one level up

# define x 
x <- 2

# define fuction: vector contain x & y, but only y is specified in function
g <- function() {
    # define y
    y <- 1
    c(x, y)
}

# call function
g()

In [22]:
# remove function & vector
rm(x, g)

In [23]:
# Example 3: R look up name defines inside a function, inside another function, 
# and one level up

# define x
x <- 1 

# define function: 
h <- function() {
    # define y
    y <- 2
    i <- function() {
        # define z
        z <- 3
        c(x, y, z)
    }
    i()
}

# call function
h()

In [24]:
# remove function
rm(x, h)

In [46]:
# Example 4: R look up names created by another function 

# define x
x <- 1

# define function: j
j <- function() {
    y <- 2
    function() {
        c(x, y)
    }
}

# define function: k is equal to function j
k <- j()

# call function: k
k()

In [45]:
# remove function
rm(x, k)

### Functions vs. variables

1. The same principles apply regardless of type of associated value <br>
(1) Finding functions works exactly the same way as finding variables.<br>
(2) If you are using a name in a context where it's obvious that you want a function (e.g. f(3)), <br>R will ignore objects that are not functions while it is searching.

In [47]:
# Example 1: 

# define function: l
l <- function(x) x + 1

# define function: m
m <- function() {
    l  <- function(x) x * 2
    l(10)
}

# call function: m
m()

In [48]:
# remove function 
rm(l, m)

In [50]:
# Example 2: 

# define function: n
n <- function(x) x / 2

# define function: o
o <- function() {
    n <- 10
    n(n)
}

# call function: o
o()

# Note: it's better to name function and objects differently, 
# instead of naming both function and vector as "n" here

In [51]:
# remove function
rm(n, o)

### A fresh start

Every time a function is called, a new environment is created to host execution.<br> 
A function has no way to tell what happened the last time it has run. <br>
Each invocation (call) is completely independent. 

In [53]:
# Example 1: 

# define function: j
j <- function() {
    if (!exists("a")) {
        a <- 1
    } else {
        a <- a + 1
    }
    print(a)
}

# call function: j
j()

# Every time it just returns 1. 
# This is because every time a function is called, a new environment is created to host execution. 
# A function has no way to tell what happened the lsat time it has run. 
# Each invocation (call) is completely independent. 

[1] 1


### Dynamic lookup

1. Lexical scoping determines **where** to look for values, not **when** to look for them.<br>
R looks for values when the function is run, not when it's created.<br>
This means the output of a function can be different depending on objects outside its environment.
2. You generally want to avoid this behavior because it means the function is no longer self-contained. Solutions: <br>
(1) List all external dependencies of a function: use `codetools:findGlobals()`<br>
(2) Try manually change the environment of the function which contains absolutely nothing: use `emptyenv()`. <br>
This doesn't work because R relies on lexical scoping to find everything even the + operator. It's never possible to make a function completely self-contained because you must always on functions defined in base R or other packages. 

In [3]:
# Example 1: behavior you generally want to avoid

# define function: f
f <- function() x

# define variable outside function: x
x <- 15
f()

# define variable outside function: x (new value but same name)
x <- 20
f()

In [6]:
# Example 2: list dependencies of a function

f <- function() x + 1
codetools::findGlobals(f)

In [8]:
# Example 3: change function environment that contain nothing: 
# This of course won't work. 
environment(f) <- emptyenv()
f()

ERROR: Error in x + 1: could not find function "+"


## Every operation is a function call

1. To understand computations in R, two slogans to remember: <br>
(1) Everything that exists in an object.<br>
(2) Everything that happens is a function call.
2. Every operation in R is a function call, whether or not it looks like one.<br>
For example: you can name a function with `(`<br>
This includes infix operators like `+`, control flow operators like `for`, `if`, `while`, subsetting operators like `[ ]` and `$`, even curly brace `{`. <br>
This means that each pair of statements in Example 1-5 is equivalent. <br>
Note that backtick `` lets you refer to functions or variables that have otherwise reserved or illegal names.
3. When to override the definitions of these special functions: you want to do something that would have otherwise been impossible.<br>
For example, this feature makes it possible for `dplyr` to translate R expressions into SQL expressions.
4. It's more often useful to treat special functions as ordinary functions. See Example 6-7.

In [14]:
# Example 1: sum 2 variables

# define variables: x, y
x <- 10; y <- 5

# method 1: 
x + y

# method 2:
`+`(x, y)

In [11]:
# Example 2: print a vector

# method 1: 
for (i in 1:2) print(i)

# method 2:
`for`(i, 1:2, print(i))

[1] 1
[1] 2
[1] 1
[1] 2


In [20]:
# Example 3: print key based on a value 

# method 1: 
if (i == 1) print("Yes!") else print("No!") 

# method 2:
`if` (i == 1, print("Yes!"), print("No!"))

[1] "No!"
[1] "No!"


In [22]:
# Example 4: subset a vector 

# define vector: x
x <- 10

# method 1: 
x[3]

# method 2:
`[`(x, 3)

In [26]:
# Example 5: print 1-3

# method 1: 
{ print(1); print(2); print(3) }

# method 2:
`{` (print(1), print(2), print(3))

[1] 1
[1] 2
[1] 3
[1] 1
[1] 2
[1] 3


In [30]:
# Example 6: sapply() function

# Method 1
# define function: add 
add <- function(x, y) x + y
sapply(1:10, add, 3)

# Method 2
sapply(1:10, `+`, 3)

# Method 3
sapply(1:10, "+", 3)

# Note: 
# `+` calls the value of the object +
# "+" is a string containing the character + 
# the 2nd version works because sapply can be given the name of a function 
# instead of the function itself

In [31]:
# Example 7: combine lappy(), sapply() with subsetting

# define list: x
x <- list(1:3, 4:9, 10:12)

# extract 2nd element of each vector in the list x

# Method 1
sapply(x, "[", 2)

# Method 2
sapply(x, function(x) x[2])


## Function arguments

1. **Formal arguments**: are a property of the function.<br>
Function is formally defined, but not called yet.
2. **Actual arguments** (calling arguments): can vary each time you call the function.<br>
After a function is formally defined, it be called in many different ways. 

### Calling functions

1. When calling a function, you can specify arguments in many ways. <br>
Arguments are matched by these orders: <br>
(1) exact name (perfect matching)<br>
(2) prefix matching <br>
(3) position
2. Generally, you only want to use positional matching for the first one or two arguments. <br>
3. Avoid using positional matching for less commonly used arguments, and only use readable abbreviations with partial matching. <br>
4. Named arguments should always come after unnamed arguments. <br> 
If a function uses (...) you can only specify arguments listed after (...) with their full name. 

In [14]:
# Example 1: 

# define function: f
# long names: abcdef, bcde1, bcde2
# short names: a, b1, b2
f <- function(abcdef, bcde1, bcde2) {
    list(a = abcdef, b1 = bcde1, b2 = bcde2)
}

# method 1: position matching
str(f(1, 2, 3))

# method 2: long name matching
str(f(2, 3, abcdef = 1))

# method 3: short name matching
str(f(2, 3, a = 1))

List of 3
 $ a : num 1
 $ b1: num 2
 $ b2: num 3
List of 3
 $ a : num 1
 $ b1: num 2
 $ b2: num 3
List of 3
 $ a : num 1
 $ b1: num 2
 $ b2: num 3


In [22]:
# this doesn't work since can't find a short name match in function
str(f(2, 3, b = 1))

ERROR: Error in f(2, 3, b = 1): argument 3 matches multiple formal arguments


In [5]:
# Example 2: calling function - good example 

mean(1:10)
mean(1:10, trim = 0.05)

In [6]:
# Example 3: calling fucntion - ovekill, not very good 

mean(x = 1:10)

In [10]:
# Example 4: calling function - confusing, bad example 

mean(1:10, n = T)
mean(1:10, FALSE)
mean(1:10, 0.05)
mean(, TRUE, x = c(1:10, NA))

### Calling a function given a list of arguments

1. If you have a list of function arguments, use `do.call()`.

In [12]:
# Example 1: 

# suppse have: a list of function arguments
args <- list(1:10, na.rm = TRUE)

# To send this list to mean()

# method 1: use do.call()
do.call(mean, list(1:10, na.rm = TRUE))

# method 2: 
mean(1:10, na.rm = TRUE)

### Default and missing arguments

1. Function arguments in R can have default values. 
2. Since arguments in R are evaluated lazily, the default value can be defined in terms of other arguments.<br>
Default arguments can even be defined in terms of variables created within the function. Generally avoid this. 
3. You can determine if an argument was supplied or not with `missing()` function. <br> Instead of inserting that code in the function definition, you could use `missing()` function. But this makes it hard to know which arguments are required / optional without carefully reading documentation. <br>
A better way: set default value to $NULL$ and use `is.null()` to check if the argument was supplied. 


In [13]:
# Example 1: 

# define function
f <- function(a = 1, b = 2) {
    c(a, b)
}

# call function
f()

In [14]:
# Example 2: generally avoid this

# define function
g <- function(a = 1, b = a * 2) {
    c(a, b)
}

# call function
g()
g(10)

In [17]:
# Example 3: generally avoid this 

# define function 
h <- function(a = 1, b = d) {
    d <- (a + 1) ^ 2
    c(a, b)
}

# call function 
h()
h(10)

In [25]:
# Example 4: 

# define function: 
i <- function(a, b) {
    c(missing(a), missing(b))
}

# call function
i()

# call function: a is supplied
i(a = 1)

# call function: b is supplied
i(b = 2)

# call function: a, b are supplied
i(a = 1, b = 2)

### Lazy evaluation

1. By default, R function arguments are lazy: they're only evaluated if they're actually being used. See example 1.
2. If you want to ensure that an argument is evaluation, use `force()`. See example 2. 
3. Using `force()` important when creating closures with `lapply()` or a loop. See example 3 method 1: this doesn't work if not add force(). (R might have fixed this issue.)
4. In example 3 method 1, x is lazily evaluated the first time that you call one of the adder functions. At this point, the loop is complete and the final value of x is 10. You can manually force evaluation. See example 3 method 2.
5. Example 3 method 3 works because the force function is defined as `force <- function(x) x`. 
6. Default arguments are evaluated inside the function. This means if the expression depends on the current environment the results will differ depending on whether you use the default value or explicitly provide one. See example 4. 
7. An unevaluated argument is called **promise** (or **thunk**). A promise is made up of 2 parts: <br>
(1) The expression: which gives rise to delayed computation. <br>
(2) The environment: where the expression was created and where it should be evaluated.
8. Laziness is useful in **if** statement - the 2nd statement in example 5 will be evaluated only if the first is true. If it wasn't, the statement would return an error because $NULL > 0$ is a logical vector of length 0 and not a valid input to **if**.
9. Sometimes you can use laziness to eliminate an **if** statement altogether. 

In [26]:
# Example 1

# define function
f <- function(x) {
    10
}

# call function
f(stop("This is an error!"))

In [28]:
# Example 2

# define function
f <- function(x) {
    force(x)
    10
}

# call function
f(stop("This is an error!"))

ERROR: Error in force(x): This is an error!


In [35]:
# Example 3:  

# Method 1: not use force()
# It works now maybe R fixed the issue, contrary to the book.

# define function 
add <- function(x) {
    function(y) x + y
}

# lapply 
adders <- lapply(1:10, add)

# call function: extract first element "1" then add by 10
adders[[1]](10)

# call function: extract last element "10" then add by 10
adders[[10]](10)

# Method 2: use force()

# define function 
add2 <- function(x) {
    force(x)
    function(y) x + y
}

# lappy
adders2 <- lapply(1:10, add2)

# call function: extract first element "1" then add by 10
adders2[[1]](10)

# call function: extract last element "10" then add by 10
adders2[[10]](10)

# Method 3: 

# define function 
add3 <- function(x) {
    x
    function(y) x + y
}

# lappy
adders3 <- lapply(1:10, add3)

# call function: extract first element "1" then add by 10
adders3[[1]](10)

# call function: extract last element "10" then add by 10
adders3[[10]](10)


In [37]:
# Example 4: 

# define function
f <- function(x = ls()) {
    a <- 1
    x
}

# call function: 
# ls() is evaluated inside the f function
f()

# call function: 
# ls() is evaluated in a global environment
f(ls())

In [39]:
# Example 5: if statement will be evaluated only if x is not NULL

# define: x
x <- NULL

# if statement
# 
if (!is.null(x) && x > 0) {
}

In [48]:
# Example 6: 

# define function
`&&` <- function(x, y) {
    if (!x) return(FALSE)
    if (!y) return(FALSE)
    
    TRUE
}

# define: a
a <- NULL

# call function
# This function would not work without lazy evaluation.
# because both x and y would always be evaluated, 
# testing a > 0 even when a was NULL
!is.null(a) && a > 0

In [49]:
# Example 7: this example gives stops error since a is NULL

# method 1: with if statement
if (is.null(a)) stop("a is null")

# method 2: without if statement
!is.null(a) || stop("a is null")

ERROR: Error in eval(expr, envir, enclos): a is null


### Special argument (...)

1. ... argument will match any arguments not otherwise matched, and can be easily passed on to other functions. <br>
This is useful if you want to collect arguments to call another function, but you don't want to specify their possible names.<br>
...is often used in conjunction with S3 generic functions to allow individual methods to be more flexible. <br>
Example: base `plot()` function is a generic method with arguments **x**, **y**, and **...**
2. To capture ... in a form that is easier to work with, you can use `list(...)`
3. Using ... comes at a price - any misspelled arguments will not raise an error, and any arguments after ... must be fully named.<br>
4. It's often better to be explicit rather than implicit.

In [36]:
# Example 1

# define function
f <- function(...) {
    names(list(...))
}

# call function
f(a = 1, b = 2)

In [52]:
# Example 2: if there's typo, it will not raise error

# typo: mr
sum(1, 2, NA, na.mr = TRUE)

## Special calls

### Infix functions

1. Most functions in R are "prefix" operators: the name of the function comes before the arguments. <br>
2. You can create infix functions where the function name comes in between its arguments (e.g. + or -).
3. Pre-defined infix functions in R:<br>
(1) Pre-defined infix functions that include **%**:<br>
`%%`, `%*%`, `%/%`, `%in%`, `%o%`, `%x%`<br>
(2) Pre-defined infix functions that not include **%**:<br>
`::`, `:::`, `$`, `@`, `^`, `*`, `/`. `+`, `-`, `>`, `>=`, `<`, `<=`, `==`, `!=`, `!`, `&`, `&&`, `|`, `||`, `~`, `<-`, `<<-`.
4. All user created infix functions must start and end with **\%** <br>
Note that when creating a new infix function, you have to put the name in backticks because it's a special name.
5. The names of infix functions are more flexible than regular R functions: they can contain any sequence of characters. You'll need to escape any special characters in the string used to define the function, but not when you call it. 
6. R's default precedence rule means that infix operators are composed from left to right.


In [55]:
# Example 1: create a new operator that pastes strings togeter

# define operator (function)
`%+%` <- function(a, b) paste0(a, b)

# method 1: use operator
"new" %+% " string"

# method 2: use operator 
`%+%`("new", " string")

In [56]:
# Example 2: use backtick for sum

# method 1
1 + 5

# method 2
`+`(1, 5)

In [67]:
# Example 3: 

# define operator
`% %` <- function(a, b) paste(a, b)
# call operator
"a" % % "b"

# define operator
`%'%` <- function(a, b) paste(a, b)
# call operator
"a" %'% "b"

# define operator
`%/\\%` <- function(a, b) paste(a, b)
# call operator
"a" %/\% "b"

In [52]:
# Example 4: infix operators are composed from left to right in R

# define operator
`%-%` <- function(a, b) paste0("(", a, " %-% ", b, ")")

# call operator
"a" %-% "b" %-% "c"

In [None]:
# Example 5: Ruby's || logical or oeprator
# It's useful as a way of providing a default value in case the output of 
# another function is NULL.

# define operator
`%||%` <- function(a, b) if (!is.null(a)) a else b 
# call operator
function_that_might_return_null() %||% default value

### Replacement functions

1. Replacement functions act like they modify their arguments in place, and have the special name **xxx<-**. They typically have 2 arguments(x, value), although they can have more, and they must return the modified object. <br>
In example 1, when R evaluates the assignment `second(x) <- 5`, it notices left hand side of **<-** is not a simple name, so it looks for a function named **second<-** to do the replacement.
2. It's important to be aware of this behavior since: <br>
If you want to supply additional arguments, they go in between **x** and **value**.
3. It's often useful to combine replacement and subsetting. See example 5.<br>
It works because the expression `names(x)[2] <- "two"` is evaluated as if you written the expression in example 6.

In [70]:
# Example 1: modify second element of vector

# define function
`second<-` <- function(x, value) {
    x[2] <- value
    x
}

# define x
x <- 1:10

# call function
# 2nd element is changed to assigned value
second(x) <- 5L
x

In [76]:
# Example 2

# load package: find memory address of the underlying object
library("pryr")

# define x
x <- 1:10
# find x address
address(x)

# change 2nd element of x
second(x) <- 6L
# find new x address: different 
address(x)

In [81]:
# Example 3: 

# define x
x <- 1:10
# address of x
address(x)

# change 2nd element using `.Primative()` will modify in place
# The book says these 2 addresses are same
# The result here is different with book
x[2] <- 7L
# address of new x
address(x)

In [84]:
# Example 4

# define x 
x <- 1:10

# define function
`modify<-` <- function(x, position, value) {
    x[position] <- value
    x
}

# call function: change first element to 10
modify(x, 1) <- 10
x

# behind the scene: it works like this
x <- `modify<-`(x, 1, 10)
x

In [55]:
# Example 5

# define x
x <- c(a = 1, b = 2, c = 3)
names(x)

# change name for 2nd element
names(x)[2] <- "two"
names(x)

In [91]:
# Example 6

# it does create a local variable named *tmp*, it is removed afterwards. 
`*tmp*` <- names(x)
`*tmp*`[2] <- "two"
names(x) <- `*tmp*`

## Return values

1. The last expression evaluated in a function becomes the return value, the result of invoking the function.
2. Generally, it's good to reserve the use of an explicit `return()` for when you are returning early, such as for an error or a simple case of function.<br>
This style of programming can also reduce level of indention, and make functions easier to understand because you can reason about them locally.
3. Function can return only 1 single object.<br>
But you can return a list containing any number of objects. 
4. **Pure functions** are the easiest to understand, because they always map the same input to same output and have no other impact on workspace. <br>
Pure functions have no side effects: they don't affect state of world in any way apart from the value they return. <br>
5. R protects you from one type of side effect: most R objects have copy-on-modify semantics. So modifying a function argument does not change the original value. See example 3.  
6. There are 2 important exception to copy-on-modify rule: environment & reference classes. These can be modified in place, so extra care is needed.
7. Most base R functions are pure. These are exceptions: <br>
(1) `library()`, loads package, hence modifies search path. <br>
(2) `setwd()`, `Sys.setenv()`, `Sys.setlocale()`<br>
(3) `plot()`and friends which produce graphical output. <br>
(4) `write()`, `write.csv()`, `saveRDS()`, etc.<br>
(5) `options()`, `par()` which modify global settings.<br>
(6) **S4** related functions which modify global tables of classes and methods.<br> 
(7) Random number generators which produce different numbers each time you run them. 
8. Functions can return `invisible` values, see example 4.
9. You can force an invisible value to be displayed by wrapping it in parentheses. 

In [93]:
# Example 1

# define function
f <- function(x) {
    if (x < 10) {
        0
    } else {
        10
    }
}

# call function
f(5)

# call function
f(15)

In [None]:
# Example 2

# define function
f <- function(x, y) {
    if (!x) return(y)
    # complicated processing here
}

In [95]:
rm(x)

In [101]:
# Example 3: 

# define function
f <- function(x) {
    x$a <- 2
    x
}

# define list: x
x <- list(a = 1)

# call function
f(x)

# call list: x
x$a

In [106]:
# Example 4

# define functions 
f1 <- function() 1
f2 <- function() invisible(1)

# call functions
f1()
f2()
f1() == 1
f2() == 1

In [108]:
# Example 5: show invisible value

(f2())

(a <- 2)

In [114]:
# Example 6: why it is possible to assign one value to multiple variables 

a  <- b <- c <- d <- 2
a
b
c
c

# because it is parsed as: 
(a <- (b <- (c <- (d <- 2))))

### On exit

Functions can setup other triggers to occur when functions is finished using `on.exit()`. This is often used as a way to guarantee that changes to global state are restored when function exits.<br>
The code `on.exit()` is run regardless of how the function exits, whether with an explicit return, an error, or simply reaching end of function body. 

In [66]:
# Example 

# define function
in_dir <- function(dir, code) {
    old <- setwd(dir)
    on.exit(setwd(old))
    
    force(code)
}

# get current working directory
getwd()

# call function
in_dir("~", getwd())