**Introduction**: <br>
1. R, at its heart, is a functional **programming language (FP)**. <br>
It means that it provides many tools for the creation and manipulation of functions.
2. In particular, R has what's known as **first class functions**. <br>
You can do anything with functions that you can do with vectors: <br>
you can assign them to variables, store them in lists, pass them as arguments to other functions, create them inside functions, and even return them as the result of a function.
3. Start with small, easy-to-understand building blocks, combine them into more complex structures, and apply them in confidence.<br><br>
4. Three FP techniques:<br>
(1) **Anonymous functions**: These functions are created without names, since they are small functions (not worth naming). <br>
(2) **Closures**: These are functions written by functions. <br>
(3) **List of functions**: In R, functions can be stored in lists.

## Motivation

1. Imagine you loaded a data and used -99 for missing values. You want to replace all -99s with NAs. <br>
2. To prevent bugs and to make more flexible code, stick to "don't repeat yourself" principle (DRY). <br><br>

**Attempt 1**: copy-paste. <br>
The problem with copy-paste is that it's easy to make mistakes and hard to change the. code. For example, if the code for missing value changes from -99 to 9999, you need to make changes at multiple places again. <br><br>

**Attempt 2**: create function that fixes missing values in a single vector. <br>
Here you won't mess up missing value -99 but you can still mistype variables. <br><br>

**Attempt 3**: combine 2 function: fix_missing() and lapply()<br>
`lapply()` is a functional since it takes a function as an argument.<br>
The key idea is function composition. Take 2 simple functions, one which does something to every column and one which fixes missing values, and combine them to fix missing values in every column.<br><br>

**Attempt 4**: User closures to make functions based on a template (that way, we can use any numbers for missing values).<br>

In [2]:
# Example 1: replace missing values -99

# create data 
set.seed(1014)
df <- data.frame(replicate(6, sample(c(1:10, -99), 6, rep = TRUE)))
names(df) <- letters[1:6]
cat("Original data frame: ")
df 

Original data frame: 

a,b,c,d,e,f
7,5,-99,2,5,2
5,5,5,3,6,1
6,8,5,9,9,4
4,2,2,6,6,8
6,7,6,-99,10,6
9,-99,4,7,5,1


In [None]:
# Attempt 1: copy-paste 

# This is easy to make mistakes.
df$a[df$a == -99] <- NA 
df$b[df$b == -99] <- NA 
df$c[df$c == -99] <- NA 
df$d[df$d == -99] <- NA 
df$e[df$e == -99] <- NA 
df$f[df$f == -99] <- NA 


In [None]:
# Attepmt 2: create fix_missing function 

# Notice: 
# The inputs in solution 1 are: 
# (1) df$a, etc. 
# (2) -99

# create function 
fix_missing <- function(x) {
    x[x == -99] <- NA
    x
}

# apply function
df$a <- fix_missing(df$a)
df$b <- fix_missing(df$b)
df$c <- fix_missing(df$c)
df$d <- fix_missing(df$d)
df$e <- fix_missing(df$e)
df$f <- fix_missing(df$f) 

In [None]:
# Attepmt 3: combine fix_missing() with lapply()

# create function fix_missing()
fix_missing <- function(x) {
    x[x == -99] <- NA 
    x
}

# add lapply()
df[] <- lapply(df, fix_missing)

# we can generalize this technique to a subset of columns
df[1:5] <- lapply(df[1:5], fix_missing)

In [4]:
# Attempt 4: create closure to make functions based on template

# create function
missing_fixer <- function(na_value) {
    function(x) {
        x[x == na_value] <- NA
        x
    }
}

# function: fill missing value with -99
fix_missing_99 <- missing_fixer(-99)

# function: fill missing value with -999
fix_missing_999 <- missing_fixer(-999)

# try function 
fix_missing_99(c(-99, -999))
fix_missing_999(c(-99, -999))

# try function on data
# since missing value is -99 in data 
df[] <- lapply(df, fix_missing_99)
cat("New data with missing value replaced with NA: ")
df

New data with missing value replaced with NA: 

a,b,c,d,e,f
7,5.0,,2.0,5,2
5,5.0,5.0,3.0,6,1
6,8.0,5.0,9.0,9,4
4,2.0,2.0,6.0,6,8
6,7.0,6.0,,10,6
9,,4.0,7.0,5,1


In [5]:
# Example 2: get numerical summaries for a data 

# create data 
set.seed(1014)
df <- data.frame(replicate(6, sample(c(1:10, -99), 6, rep = TRUE)))
names(df) <- letters[1:6]
cat("Original data frame: ")
df 

Original data frame: 

a,b,c,d,e,f
7,5,-99,2,5,2
5,5,5,3,6,1
6,8,5,9,9,4
4,2,2,6,6,8
6,7,6,-99,10,6
9,-99,4,7,5,1


In [None]:
# Attempt 1: get summary for each variable separately 

mean(df$a)
median(df$a)
sd(df$a)
mad(df$a)
IQR(df$a)

# do this for each variable 

In [None]:
# Attempt 2: create a function 

# this is still repetitive
summary <- function(x) {
    c(mean(x, na.rm = TRUE), 
      median(x, na.rm = TRUE),
      sd(x, na.rm = TRUE),
      mad(x, na.rm = TRUE),
      IQR(x, na.rm = TRUE))
}

In [None]:
# Attempt 3: create functions 

# combine simple functions with lapply()
summary <- function(x) {
    funs <- c(mean, median, sd, mad, IQR)
    lapply(funs, function(f) f(x, na.rm = TRUE))
}

## Anonymous functions

1. R doesn't have a special syntax for creating a named function: when you create a function, you use the regular assignment operator to give it a name.<br>
2. If you don't give the function a name, you get **anonymous function**.
3. You use an anonymous function when it's not worth the effort to give it a name. 
4. Like all functions, anonymous functions have formals, body, and a parent environment too.
5. You can call anonymous functions with named arguments, but doing so is a sign that your function needs a name. 
6. One common use for anonymous functions is to create closures, functions made by other functions.

In [None]:
# Example 1: simple anonymous functions

# function 1
lapply(mtcars, function(x) length(unique(x)))
# function 2
Filter(function(x) !is.numeric(x), mtcars)
# function 3
integrate(function(x) sin(x) ^ 2, 0, pi)

## Closures

1. Closures are functions written by functions.
2. Closures get the name because they enclose the environment of the parent function and can access all its variables. <br>
This is helpful because it allows us to have two levels of parameters:<br>
(1) a parent level that controls operation<br>
(2) a child level that does the work. 
3. In R, almost every function is a closure. <br>
All functions remember the environment in which they were created, typically either the global environment (function you wrote), or a package environment. <br>
Exception: primitive functions, which call C directly and don't have environment.
4. Closures are useful for making function factories, and are one way to manage mutable state in R.

In [29]:
# Example 1: 
# parent function power() creates 2 child functions:
# square() and cube()

# ------ (1) create: parent function 
power <- function(exponent) {
    function(x) {
        x ^ exponent
    }
}

# child function 1: square 
square <- power(2)
# apply
square(2)

# child function 2: cube 
cube <- power(3)
# apply
cube(2)

### Function factories

1. Function factories are most helpful when: <br>
(1) The different levels are more complex, with multiple arguments and complicated bodies; <br>
(2) Some work only needs to be done once, when the function is generated. <br>
Function factories are particularly good for maximum likelihood problems.

### Mutable state

1. The key to managing variables at different levels is the double arrow assignment operator `<<-`. <br>
(1) `<-` always assigns in the current environment <br>
(2) `<<-` will keep looking up the chain of parent environment until it finds a matching name. 
2. Together, a static parent environment and `<<-` make it possible to maintain state across function calls.<br><br>

In example 1: <br>
(i) The function is a counter that records how many times a function has been called. <br>
(ii) Each time new_counter is run, it creates an environment, initializes the counter i in this environment, and then creates a new function. <br>
(iii) Ordinarily, function execution environments are temp, but a closure maintains access to the environment in which it was created. <br>
(iv) `counter_one()` and `counter_two()` each get their own **enclosing environment** when run, so they can maintain different counts.<br>
(v) The counters get around the "fresh start" limitation by not modifying variables in their local environment. <br>
(vi) Since the changes are made. in unchanging parent (or enclosing) environment, they are preserved across function calls.

In [32]:
# Example 1: use closure to create function

# create counter function
new_counter <- function() {
    i <- 0
    function() {
        i <<- i + 1
        i
    }
}

# create closure function 1
counter_one <- new_counter()
# test 
cat("counter one first run: ")
counter_one()
cat("counter one second run: ")
counter_one()

# create closure function 2
counter_two <- new_counter()
#test
cat("counter two first run: ")
counter_two()

counter one first run: 

counter one second run: 

counter two first run: 

In [41]:
# Example 2: if not use closure 

# set i
i <- 0

# create counter function
new_counter2 <- function() {
        i <<- i + 1
        i
}

# test results: 
# each time, the counter just increases
# since you can not use enclosed functions
new_counter2() 
new_counter2()

In [43]:
# Example 3: if not use <<- 

# create counter function
new_counter3 <- function() {
    i <- 0
    function() {
        i <- i + 1
        i
    }
}

# test results
# for enclosed functon, it always initializes
counter_1 <- new_counter3()
counter_2 <- new_counter3()
counter_1()
counter_1()
counter_2()

## List of functions

1. In R, functions can be stored in lists. This makes it easier to work with groups of related functions, in the same way a data frame makes it easier to work with groups of related vectors.<br><br>
(Key: functions - lists - related functions; data frame - lists - related vectors)
2. Use of list of functions: <br><br>
(1) You want to compare performance of multiple ways of computing arithmetic mean. <br>
You can do this by storing each approach (function) in a list.<br><br>
(3) Another use is to summarize an object in multiple ways. We could store each summary function in a list, and run them all with lapply()

In [55]:
# Example 1: compare performance of multiple ways of computing arithmetic mean

# create function
compute_mean <- list(
    base = function(x) mean(x),
    sum = function(x) sum(x) / length(x),
    manual = function(x) {
        total <- 0
        n <- length(x)
        for (i in seq_along(x)) {
            total <- total + x[i] / n
        }
        total
    }
)

# create vector 
# length: 100,000
x <- runif(1e5)

In [57]:
# Attempt 1: call function 1 by 1 
cat("Base compute time: ")
system.time(compute_mean$base(x))
cat("Sum compute time: ")
system.time(compute_mean$sum(x))
cat("Manual compute time: ")
system.time(compute_mean$manual(x))

Base compute time: 

   user  system elapsed 
  0.013   0.000   0.013 

Sum compute time: 

   user  system elapsed 
  0.002   0.000   0.003 

Manual compute time: 

   user  system elapsed 
  0.009   0.000   0.009 

In [58]:
# Attempt 2: anonymous function with lapply()

lapply(compute_mean, function(f) f(x))

In [59]:
# Attempt 3: named function with lapply()

call_fun <- function(f, ...) f(...)
lapply(compute_mean, call_fun, x) 

In [60]:
# Now combine lapply() with system.time()

lapply(compute_mean, function(f) system.time(f(x)))

$base
   user  system elapsed 
  0.013   0.000   0.013 

$sum
   user  system elapsed 
  0.003   0.000   0.003 

$manual
   user  system elapsed 
  0.007   0.000   0.007 


In [62]:
# Example 2

# create vector 
x <- 1:10 

# store function in a list 
funs <- list(
    sum = sum, 
    mean = mean, 
    median = median
)

# ------ (1) apply function 
lapply(funs, function(f) f(x))
       
# ------ (2) apply function (add remove missing value) 
lapply(funs, function(f) f(x, na.rm = TRUE))

In [80]:
# Note: 
# difference between seq_along() and seq_len()

# seq_along()
# takes vector, output is sequence of its length
x <- c(1, 3, 4, 5)
seq_along(x)

# seq_len()
# take a single numeric, output is sequence of number 
seq_len(10)