## Apply family

1. The apply family comprises: <br>
(1) apply(): <br>
`apply(X, MARGIN, FUN, …)` <br><br>
(2) lapply(): (list apply)<br>
`lapply(X, FUN, …)` <br> <br>
(3) sapply(): (simplified apply) <br>
`sapply(X, FUN, …, simplify = TRUE, USE.NAMES = TRUE)` <br><br>
(4) vapply(): (vector apply)<br>
`vapply(X, FUN, FUN.VALUE, …, USE.NAMES = TRUE)` <br><br>
(5) tapply(): (table apply [factors])<br>
`tapply(X, INDEX, FUN = NULL, …, default = NA, simplify = TRUE)`<br> <br>
(6) mapply(): (map apply)<br>
`mapply(FUN, …, MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)` <br><br>
(7) rapply(): (recursive apply)<br>
`rapply(object, f, classes = "ANY", deflt = NULL, how = c("unlist", "replace", "list"), ...)` 

2. About apply family:<br>
It consists of vectorized functions which minimize your need to explicitly create loops. <br>
These functions will apply a specified function to a data object and their primary difference is in the object class in which the function is applied to (list vs. matrix, etc) and the object class that will be returned from the function. 

### apply()

1. apply() input: matrix / data frame / array. <br>
apply() output: a vector.<br><br>

2. Basic syntax: 
`apply(x, MARGIN, FUN, ...)`<br>
x: matrix / dataframe / array<br>
MARGIN: for a matrix,  1 indicates rows, 2 indicates columns<br>
FUN: the function to be applied<br>
...: for any other arguments to be passed to the function<br><br>

3. Using apply() is not faster than using a loop function, but it is highly compact and can be written in one line.

In [None]:
# Example: apply()

# ------ (1) get mean of each column
apply(mtcars, 2, mean)

# ------ (2) get the sum of each row
apply(mtcars, 1, sum)

# ------ (3) get column quantiles 
apply(mtcars, 2, quantile, probs = c(0.1, 0.25, 0.5, 0.75, 0.9))

### lapply()
(list apply)

1. lapply() input: a list. <br>
lapply() output: a list.<br><br>

2. Basic syntax: 
`lapply(X, FUN, …)`<br>
x: list<br>
FUN: the function to be applied<br>
...: for any other arguments to be passed to the function.<br><br>

3. lapply() does the following: <br>
(1) it loops over a list, iterating over each element in that list<br>
(2) it applies a function to each element of the list (a function that you specify)<br> 
(3) returns a list.

In [8]:
# Example: lapply()

# ------ (1) get mean for each sub-list
# Create a list with 4 sub-list: 
# each sub-list is a vector 
data <- list(item1 = 1:4, item2 = rnorm(10),
              item3 = rnorm(20, 1), item4 = rnorm(100, 5))

# get mean for each sub-list
cat("Mean for each sub-list is: ")
lapply(data, mean)

# ------ (2) embed an apply function within an lapply function

# Create a list of with 2 data-frames:
# each sub-list is a data frame
beaver_data <- list(beaver1 = beaver1, beaver2 = beaver2)
cat("Structure of each sub-list is a data frame: \n")
cat("Sub-list 1: \n")
str(beaver_data$beaver1)
cat("Sub-list 2: \n")
str(beaver_data$beaver2)

# get column mean for each sub-list (data frame)
cat("Mean for each sub-list is: ")
lapply(beaver_data, function(x) round(apply(x, 2, mean), 2))

Mean for each sub-list is: 

Structure of each sub-list is a data frame: 
Sub-list 1: 
'data.frame':	114 obs. of  4 variables:
 $ day  : num  346 346 346 346 346 346 346 346 346 346 ...
 $ time : num  840 850 900 910 920 930 940 950 1000 1010 ...
 $ temp : num  36.3 36.3 36.4 36.4 36.5 ...
 $ activ: num  0 0 0 0 0 0 0 0 0 0 ...
Sub-list 2: 
'data.frame':	100 obs. of  4 variables:
 $ day  : num  307 307 307 307 307 307 307 307 307 307 ...
 $ time : num  930 940 950 1000 1010 1020 1030 1040 1050 1100 ...
 $ temp : num  36.6 36.7 36.9 37.1 37.2 ...
 $ activ: num  0 0 0 0 0 0 0 0 0 0 ...
Mean for each sub-list is: 

### sapply()
(simplified apply)

1. sapply() input: a list. <br>
sapply() output: a vector or matrix.<br><br>

2. Basic syntax: `sapply(X, FUN, …, simplify = TRUE, USE.NAMES = TRUE)`<br>
x: list<br>
FUN: the function to be applied<br>
...: for any other arguments to be passed to the function.<br><br>
simplify = TRUE: a logical value: <br>
should the result be simplified to a vector or matrix if possible?<br><br>
USE.NAMES = TRUE: logical: <br>
if TRUE and if X is character, use X as names for the result unless it had names already.<br><br>

3. sapply() will try to simplify the result of lapply() if possible. 

In [26]:
# Example: sapply() 

# ------ (1) get mean for each sub-list (data frame) 
# output: matrix
cat("Mean for each column for each sub-list (data frame)\n")
cat("Default sapply() simplifies output to a matrix")
sapply(beaver_data, function(x) round(apply(x, 2, mean), 2))
cat("If specify not to simplify output, sapply() returns a list just like lapply()")
sapply(beaver_data, function(x) round(apply(x, 2, mean), 2), simplify = FALSE)

Mean for each column for each sub-list (data frame)
Default sapply() simplifies output to a matrix

Unnamed: 0,beaver1,beaver2
day,346.2,307.13
time,1312.02,1446.2
temp,36.86,37.6
activ,0.05,0.62


If specify not to simplify output, sapply() returns a list just like lapply()

### vapply()
(vector apply)

1. vapply() input: a list, matrix, or data frame. <br>
vapply() output: (specify) a vector, a matrix, or a data frame (numeric(1), integer(1), character(1), data.frame(1), etc.)<br><br>

2. Basic syntax: `vapply(X, FUN, FUN.VALUE, …, USE.NAMES = TRUE, simplify = TRUE)`<br>
x: list<br>
FUN: the function to be applied<br>
FUN.VALUE: a template for the return value from the specified function<br>
numeric(1) means return a single numeric value. 
...: for any other arguments to be passed to the function.<br><br>
USE.NAMES = TRUE: logical: <br>
if TRUE and if X is character, use X as names for the result unless it had names already.<br><br>
simplify = TRUE: a logical value: <br>
should the result be simplified to a vector or matrix if possible?<br><br>

3. When to use vapply()?  <br><br>
(1) If a list contains variables of numeric and characters, sapply() coerces characters to numeric and won't give you error; while vapply() gives you an error.<br><br> 
(2) vapply() requires additional argument `FUN.VALUE` to specify output type. <br>
If you want to specify the type of result you are expecting, use vapply().

In [34]:
# Example: vapply()

# ------ (1) vapply() gives error while sapply() doesn't 
# if some variable types are non-numeric

# Create a list 
test <- list(a = c(1, 3, 5), b = c(2,4,6), c = c(9,8,7), d = c("a", "b", "c"))
# sapply()
cat("sapply() doesn't raise any errors: ")
sapply(test, max) 
# vapply()
cat("vapply() raises an error since variable "d" is character: ")
vapply(test, max, numeric(1)) 

sapply() doesn't raise any errors: 

In [52]:
# ------ (2) return vector (numeric / double / integer, etc.)

# Create a list 
test <- list(a = c(1, 3.3, 5.2), b = c(2,4.3,6.5), c = c(9.5,8,7))

# specify type: numeric / double
vapply(test, max, double(1))  
vapply(test, max, numeric(1))          

# Create a list 
test <- list(a = c(1L, 3L, 5L), b = c(2L, 4L, 6L), c = c(9L, 8L, 7L))

# specify type:  integer
# also works for: numeric / double 
vapply(test, max, integer(1))  

### tapply()
(table apply)

1. tapply() input: a matrix, or data frame, etc. <br>
tapply() output: array or list<br><br>

2. Basic syntax: `tapply(X, INDEX, FUN = NULL, …, default = NA, simplify = TRUE)`<br>

x: an R object for which a split method exists. Typically vector-like, allowing subsetting with \[.<br><br>

INDEX: a list of $\geq$ 1 factors, each of same length as X. The elements are coerced to factors by as.factor.<br><br>

FUN: a function (or name of a function) to be applied, or NULL. In the case of functions like +, \%*\%, etc., the function name must be backquoted or quoted. If FUN is NULL, tapply returns a vector which can be used to subscript the multi-way array tapply normally produces. <br><br>

...: for any other arguments to be passed to the function.<br><br>

default = NA: If simplify = TRUE, is the array initialization value. <br><br>

simplify = TRUE: a logical value: <br>
should the result be simplified to a vector or matrix if possible?<br><br>

3. When to use tapply()?  <br>
You want to create group summaries based on factor levels, use tapply(). <br>
(i) A dataset that can be broken up into groups (via category/factors)<br>
(ii) Want to break the dataset up into groups<br>
(iii) Within each group, we want to apply a function

In [73]:
# Example: tapply(), 1 factor 

# create data frame 
set.seed(2)

df <- data.frame(price = round(rnorm(25, sd = 10, mean = 30)),
                       type = sample(1:4, size = 25, replace = TRUE),
                       store = sample(paste("Store", 1:4),
                                      size = 25, replace = TRUE))
cat("Head of data frame: ")
head(df, 3)

# store 3 variables in vectors 
price <- df$price
store <- df$store
# re-lable "type" with appropriate labels
type <- factor(df$type,
               labels = c("toy", "food", "electronics", "drinks"))

# ------ (1) Mean price by product type
cat("Mean prices by product type (in array): ")
(mean_prices_array <- tapply(price, type, mean))

# class of the mean
class(mean_prices_array)
# you can access elements with []
mean_prices_array[2]

# ------ (2) Mean price by product type: 
# simplify = FALSE: returns a list instead of array
cat("Mean prices by product type (in list): ")
(mean_prices_list <- tapply(price, type, mean, simplify = FALSE))

# ------ (3) if there are missing values, can use "na.rm = NA"

# Adding NA values to data frame
df[1, 1] <- NA
df[2, 3] <- NA

# get mean price for each store
cat("Mean prices for each store: ")
tapply(df$price, df$store, mean, na.rm = TRUE)

Head of data frame: 

price,type,store
21,2,Store 2
32,3,Store 3
46,4,Store 4


Mean prices by product type (in array): 

Mean prices by product type (in list): 

In [75]:
# Example: tapply(), multiple factors

# ------ (1) Mean price by product type and store
cat("Default mean price by product type and store: ")
tapply(price, list(type, store), mean)

# ------ (2) Mean price by product type and store, 
# changing default argument for NA = 0
cat("Mean price by product type and store, missing = 0: ")
tapply(price, list(type, store), mean, default = 0)

Default mean price by product type and store: 

Unnamed: 0,Store 1,Store 2,Store 3,Store 4
toy,46,31.0,49,36.66667
food,26,30.33333,39,
electronics,50,29.0,32,25.0
drinks,22,40.0,20,36.0


Mean price by product type and store, missing = 0: 

Unnamed: 0,Store 1,Store 2,Store 3,Store 4
toy,46,31.0,49,36.66667
food,26,30.33333,39,0.0
electronics,50,29.0,32,25.0
drinks,22,40.0,20,36.0


In [78]:
# Example: tapply: 
# mean for every variable grouped by a factor variable 

# ------ (1) mean for every variable grouped by cyl variable 
cat("Mean for every variable grouped by cyl variable: ")
apply(mtcars, 2, function(x) tapply(x, mtcars$cyl, mean))

Mean for every variable grouped by cyl variable: 

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
4,26.66364,4,105.1364,82.63636,4.070909,2.285727,19.13727,0.9090909,0.7272727,4.090909,1.545455
6,19.74286,6,183.3143,122.28571,3.585714,3.117143,17.97714,0.5714286,0.4285714,3.857143,3.428571
8,15.1,8,353.1,209.21429,3.229286,3.999214,16.77214,0.0,0.1428571,3.285714,3.5


In [14]:
# Why tapply is table apply? 
# since tapply is the generic form of the table function. 

# You can see this by comparing the following calls:
x <- sample(letters, 100, rep=T)
table(x)
tapply(x, x, length)

x
a b c d e f g h i j k l m n o p q r s t u v w x y z 
4 5 1 1 6 6 2 6 2 3 2 3 8 5 4 5 4 3 4 5 3 1 3 3 4 7 

### mapply()
(table apply)

1. mapply() input: a vector, or a list. <br>
mapply() output: a vector, or a list.<br><br>

2. Basic syntax: `mapply(FUN, …, MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)`<br><br>
FUN is a function to apply.<br>
…: for any other arguments to be passed to the function.<br>
MoreArgs: a list of other arguments to FUN.<br><br>
simplify = TRUE: a logical value: <br>
should the result be simplified to a vector or matrix if possible?<br><br>
USE.NAMES = TRUE: logical:<br>
if TRUE and if X is character, use X as names for the result unless it had names already.<br><br>

3. When to use mapply()?  <br>
mapply() is a multivariate apply of sorts which applies a function in parallel over a set of arguments. <br><br>
(1) lapply()iterate over a single R object. If you want to iterate over multiple R objects in parallel, then mapply() is the function for you. <br>
(2) mapply gives us a way to call a non-vectorized function in a vectorized way. It is a multivariate version of sapply. mapply applies FUN to the first elements of each … argument, the second elements, the third elements, and so on. Arguments are recycled if necessary.

In [5]:
# Example: mapply)()

# replicate 1:5, with 5:1 times 

# method 1: use list 
cat("Method 1: use list: ")
list(rep(1, 4), rep(2, 3), rep(3, 2), rep(4, 1))


# method 2: use mapply()
cat("Method 1: use mapply(): ")
mapply(rep, 1:4, 4:1)

Method 1: use list: 

Method 1: use mapply(): 

In [6]:
# Example 2: 

set.seed(1)

# create function
noise <- function(n, mean, std) {
rnorm(n, mean, std)
}

mapply(noise, 1:5, 1:5, 2)

In [8]:
# Example: mapply

# create lists 
values1 <- list(a = c(1, 21, 3), b = c(4, 5, 6), c = c(7, 8, 9))
values2 <- list(a = c(10, 11, 12), b = c(13, 14, 15), c = c(16, 17, 18)) 

# mapply: find max value from each sub-lists parallelly
mapply(function(num1, num2) max(c(num1, num2)), values1, values2)

### rapply()
(recursive apply)

1. rapply() input: a vector, or a list. <br>
rapply() output: a vector, or a list.<br><br>

2. Basic syntax: `rapply(object, f, classes = "ANY", deflt = NULL, how = c("unlist", "replace", "list"), ...)`<br><br>
object: a list or expression, i.e., “list-like”.<br>
a function of one “principal” argument, passing further arguments via … <br><br>
classes = "ANY": character vector of class names, or "ANY" to match any class.<br>
deflt = NULL: The default result (not used if how = "replace").
how = c("unlist", "replace", "list"): If how = "unlist", a vector, otherwise “list-like” of similar structure as object. If how = "replace", each element of object which is not itself list-like and has a class included in classes is replaced by the result of applying f to the element.

3. When to use rapply()?  <br>


In [26]:
# Example 1: rapply()

# create list 
X <- list(list(a = pi, b = list(c = 1L)), d = "a test")

# ------ (1) rapply
cat("rapply(), how = 'replace': ")
rapply(X, sqrt, classes = "numeric", how = "replace")

# ------ (2) rapply
cat("rapply(), how = 'unlist': ")
rapply(X, sqrt, classes = "numeric", how = "unlist")

# ------ (3) rapply
cat("rapply(), how = 'list': ")
rapply(X, sqrt, classes = "numeric", how = "list")

rapply(), how = 'replace': 

rapply(), how = 'unlist': 

rapply(), how = 'list': 

In [24]:
# Example 2: rapply()

# create list 
x=list(1,2,3,4,"a")

# rapply()
# it skips character element
rapply(x, function(x) x ^ 2, class = c("numeric"))

In [21]:
# compare different apply() variants

# compare apply(), lapply(), sapply(), vapply()

# ------ (1) apply()
# output: vector 
# note: need addtional argument "1" to specify row and "2" to specify column
cat("apply(), sapply(), vapply() return vectors: ")
apply(mtcars, 2, mean)

# ------ (2) sapply()
# returns vector 
sapply(mtcars, mean)

# ------ (3) vapply()
# returns vector 
# note: need additional argument "numeric(1)"
vapply(mtcars, mean, numeric(1))

# ------ (4) lapply()
# returns list 
cat("lapply() returns a list: ")
lapply(mtcars, mean)

apply(), sapply(), vapply() return vectors: 

lapply() returns a list: 

###  Other "Loop-like" functions 

1. Simplified apply() functions: <br><br>

(1) Base R provides: <br>
colSums (x, na.rm = FALSE) <br>
rowSums (x, na.rm = FALSE) <br>
colMeans(x, na.rm = FALSE) <br>
rowMeans(x, na.rm = FALSE)<br><br>

(2) `miscTools` package gives (for data frames): <br>
colMedians()<br>
rowMedians()<br><br>

(3) `matrixStats` package gives (for matrices): <br>
colMedians() and rowMedians()<br>
colSds() and rowSds()<br>
colVars() and rowVars()<br>
colRanges() and rowRanges()<br>
colQuantiles() and rowQuantiles()<br>
along with several additional summary statistic functions

In [27]:
# Example: `miscTools`

# load package 
library(miscTools)

# example
colMedians(mtcars)
rowMedians(mtcars)

In [31]:
# Example: `matrixStats`

# load package
library(matrixStats)

# create matrix
A <- matrix(1:15, nrow=3, ncol=5)

# examples 
colMedians(A)
colSds(A)
rowSds(A)
colVars(A)
rowVars(A)
colRanges(A)
rowRanges(A)
colQuantiles(A)
rowQuantiles(A)

# calculate column sums 
# Method 1: colSums() is much faster
colSums(mtcars)

# Method 2
apply(mtcars, 2, sum)

0,1
1,3
4,6
7,9
10,12
13,15


0,1
1,13
2,14
3,15


0%,25%,50%,75%,100%
1,1.5,2,2.5,3
4,4.5,5,5.5,6
7,7.5,8,8.5,9
10,10.5,11,11.5,12
13,13.5,14,14.5,15


0%,25%,50%,75%,100%
1,4,7,10,13
2,5,8,11,14
3,6,9,12,15
