# Loop Functions
The actual looping is done internally in C code, so these functions will work faster.

|Loop Function|Function|
|:-|:-|
|lapply|Loop over a list and evaluate a function on each element.|
|sapply|Same as lapply but try to simplify the result.|
|vapply|Whereas sapply tries to 'guess' the correct format of the result, vapply allows you to specify it explicitly.|
|apply|Apply a function over the margins of an array.|
|tapply|Apply a function over subsets of a vector.|
|mapply|Multivariate version of lapply.|
|split|Split takes a vector or other objects and splits it into groups determined by a factor or list of factors.|

## lapply
Loop over a list and evaluate a function on each element.

In [43]:
str(lapply)

function (X, FUN, ...)  


Three arguments:
- X: a list
- FUN: a function
- other arguments

In [2]:
x <- list(a = 1:5, b = rnorm(10))
lapply(x, mean)

In [7]:
# if X is not a list, it will be coerced to a list using as.list()
x <- 1:4
lapply(x, runif, min = 0, max = 10)

In [8]:
# make use of anonymous functions
x <- list(a = matrix(1:4, 2, 2), b = matrix(1:6, 3, 2))
lapply(x, function(elt) elt[,1])

## sapply
Same as lapply but try to simplify the result if possible.

- If the result is a list where every element is length 1, then a vector is returned.
- If the result is a list where every element is a vector of the same length (> 1), a matrix is returned.
- If it can’t figure things out, a list is returned

In [9]:
x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5))
lapply(x, mean)

In [11]:
print(sapply(x, mean))

        a         b         c         d 
2.5000000 0.1176109 1.1759061 5.1322004 


## vapply
Use vapply to specify the format explicitly.

In [19]:
print(vapply(x, mean, numeric(1)))

        a         b         c         d 
2.5000000 0.1176109 1.1759061 5.1322004 


## apply
Apply a function over the margins of an array.

- It is most often used to apply a function to the rows or columns of a matrix.
- It can be used with general arrays, e.g. taking the average of an array of matrices.
- It is not really faster than writing a loop, but it works in one line!

In [42]:
str(apply)

function (X, MARGIN, FUN, ...)  


- **`X`** is an array
- **`MARGIN`** is an integer vector indicating which margins should be “retained”.
- **`FUN`** is a function to be applied.
- ... is for other arguments to be passed to **`FUN`**.

In [21]:
x <- matrix(rnorm(200), 20, 10)
apply(x, 2, mean)
apply(x, 1, sum)

In [22]:
# However, for sums and means of matrix dimensions, we have some shortcuts.
# rowSums = apply(x, 1, sum)
# rowMeans = apply(x, 1, mean)
# colSums = apply(x, 2, sum)
# colMeans = apply(x, 2, mean)
rowSums(x)

In [23]:
x <- matrix(rnorm(200), 20, 10)
apply(x, 1, quantile, probs = c(0.25, 0.75))

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
25%,-1.26588308,-1.5716399,-0.9697316,-0.7560281,-0.4894893,-0.1415978,-1.025518,0.2852734,0.2619959,-0.3601064,-1.1072965,-0.0673048,-0.3866871,-1.0863424,-0.6065425,-0.9444925,-0.1211762,-1.3992334,-0.08954942,-0.7340517
75%,-0.04466169,0.2496246,0.2707867,0.1404011,0.6845154,0.5346277,0.9245223,1.0518022,1.1929142,0.7334745,0.4720414,0.9800842,1.1324641,0.2695238,0.9869443,0.2148389,0.7564675,0.7046646,0.57625922,0.8407462


In [24]:
a <- array(rnorm(2 * 2 * 10), c(2, 2, 10))
apply(a, c(1, 2), mean)
rowMeans(a, dims = 2)

0,1
0.005414237,0.1636575
0.260209521,-0.4407803


0,1
0.005414237,0.1636575
0.260209521,-0.4407803


## tapply
Apply a function over subsets of a vector.

In [41]:
str(tapply)

function (X, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE)  


- **`X`** is a vector.
- **`INDEX`** is a factor or a list of factors (or else they are coerced to factors).
- **`FUN`** is a function to be applied.
- ... contains other arguments to be passed **`FUN`**
- **`simplify`**, should we simplify the result?

In [38]:
x <- c(rnorm(10), runif(10), rnorm(10, 1))
f <- gl(3, 10)
print(f)
print(tapply(x, f, mean))
print(tapply(x, f, mean, simplify = F))

 [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3
Levels: 1 2 3
         1          2          3 
-0.5504630  0.4189813  1.4164064 
$`1`
[1] -0.550463

$`2`
[1] 0.4189813

$`3`
[1] 1.416406



In [39]:
print(tapply(x, f, range))

$`1`
[1] -3.2615202  0.7038547

$`2`
[1] 0.03530037 0.84991271

$`3`
[1] -1.478466  3.080520



## mapply
Multivariate version of lapply.

In [40]:
str(mapply)

function (FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)  


- **`FUN`** is a function to apply.
- ... contains arguments to apply over.
- **`MoreArgs`** is a list of other arguments to **`FUN`**.
- **`SIMPLIFY`** indicates whether the result should be simplified.

In [45]:
# The following is tedious to type:
# list(rep(1, 4), rep(2, 3), rep(3, 2), rep(4, 1))
# Instead we can do
print(mapply(rep, 1:4, 4:1))

[[1]]
[1] 1 1 1 1

[[2]]
[1] 2 2 2

[[3]]
[1] 3 3

[[4]]
[1] 4



In [47]:
# Vectorizing a function
noise <- function(n, mean, sd) {
rnorm(n, mean, sd)
}

noise(5, 1, 2)

mapply(noise, 1:5, 1:5, 2)

# which is the same as
# list(noise(1, 1, 2), noise(2, 2, 2),
#      noise(3, 3, 2), noise(4, 4, 2),
#      noise(5, 5, 2))

## split
split takes a vector or other objects and splits it into groups determined by a factor or list of
factors.
Works well with lapply.

In [48]:
str(split)

function (x, f, drop = FALSE, ...)  


- **`x`** is a vector (or list) or data frame.
- **`f`** is a factor (or coerced to one) or a list of factors.
- **`drop`** indicates whether empty factors levels should be dropped.

In [49]:
x <- c(rnorm(10), runif(10), rnorm(10, 1))
f <- gl(3, 10)
split(x, f)

In [50]:
# work with lapply because the first argument in lapply is a list
lapply(split(x, f), mean)

In [53]:
# splitting a data frame
head(airquality)
s <- split(airquality, airquality$Month)
print(lapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")])))

Ozone,Solar.R,Wind,Temp,Month,Day
41.0,190.0,7.4,67,5,1
36.0,118.0,8.0,72,5,2
12.0,149.0,12.6,74,5,3
18.0,313.0,11.5,62,5,4
,,14.3,56,5,5
28.0,,14.9,66,5,6


$`5`
   Ozone  Solar.R     Wind 
      NA       NA 11.62258 

$`6`
    Ozone   Solar.R      Wind 
       NA 190.16667  10.26667 

$`7`
     Ozone    Solar.R       Wind 
        NA 216.483871   8.941935 

$`8`
   Ozone  Solar.R     Wind 
      NA       NA 8.793548 

$`9`
   Ozone  Solar.R     Wind 
      NA 167.4333  10.1800 



In [57]:
sapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")]))
sapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")],
na.rm = TRUE))

Unnamed: 0,5,6,7,8,9
Ozone,,,,,
Solar.R,,190.16667,216.483871,,167.4333
Wind,11.62258,10.26667,8.941935,8.793548,10.18


Unnamed: 0,5,6,7,8,9
Ozone,23.61538,29.44444,59.115385,59.961538,31.44828
Solar.R,181.2963,190.16667,216.483871,171.857143,167.43333
Wind,11.62258,10.26667,8.941935,8.793548,10.18


In [60]:
# splitting on more than one level
x <- rnorm(10)
f1 <- gl(2, 5)
f2 <- gl(5, 2)
interaction(f1, f2)
str(split(x, list(f1, f2)))
# empty levels can be dropped
str(split(x, list(f1, f2), drop = TRUE))

List of 10
 $ 1.1: num [1:2] -2.069 0.996
 $ 2.1: num(0) 
 $ 1.2: num [1:2] 0.906 -1.785
 $ 2.2: num(0) 
 $ 1.3: num -1.76
 $ 2.3: num -1.1
 $ 1.4: num(0) 
 $ 2.4: num [1:2] -0.212 -0.312
 $ 1.5: num(0) 
 $ 2.5: num [1:2] -0.329 -0.237
List of 6
 $ 1.1: num [1:2] -2.069 0.996
 $ 1.2: num [1:2] 0.906 -1.785
 $ 1.3: num -1.76
 $ 2.3: num -1.1
 $ 2.4: num [1:2] -0.212 -0.312
 $ 2.5: num [1:2] -0.329 -0.237
