[from here https://www.datacamp.com/community/tutorials/r-tutorial-apply-family?utm_source=adwords_ppc&utm_campaignid=898687156&utm_adgroupid=48947256715&utm_device=c&utm_keyword=&utm_matchtype=b&utm_network=g&utm_adpostion=1t1&utm_creative=229765585186&utm_targetid=aud-299261629574:dsa-473406586995&utm_loc_interest_ms=&utm_loc_physical_ms=1006976&gclid=CjwKCAjwh9_bBRA_EiwApObaOBBBa3VQg2IdUyVeeI0gMPTVevVK_LBKtyEp6hDeKp0sWCwjgMKLORoCV2MQAvD_BwE#codelapplycode]

These functions allow crossing the data in a number of ways and avoid explicit use of loop constructs. They act on an input list, matrix or array and apply a named function with one or several optional arguments.

#### apply()

In [1]:
X <- matrix(rnorm(30), nrow=5, ncol=6)

In [2]:
X

0,1,2,3,4,5
-1.7648698,-0.9681232,0.8652812,-1.3385158,0.15191537,-0.7293546
-0.2429392,0.1016771,0.2233147,0.8072886,-0.04300274,-0.7057317
-1.1247899,0.193524,-0.370222,-0.9269893,-0.37576651,1.3872886
0.7997437,0.3726808,-0.5780711,-0.7692512,-0.367801,-1.3405953
0.5810471,-0.1703864,1.5247963,-1.0809034,-0.1581783,-0.9181016


In [3]:
apply(X, 2, sum)

#### lapply()

The difference from apply() is that:

- It can be used for other objects like dataframes, lists or vectors; and
- The output returned is a list (which explains the “l” in the function name), which has the same number of elements as the object passed to it.

In [4]:
A <- matrix(data = 1:9, nrow = 3, ncol = 3)
B <- matrix(data = 4:15, nrow = 4, ncol = 3)
C <- matrix(data = 8:10, nrow = 3, ncol = 2)

MyList <- list(A, B, C)

In [5]:
MyList

0,1,2
1,4,7
2,5,8
3,6,9

0,1,2
4,8,12
5,9,13
6,10,14
7,11,15

0,1
8,8
9,9
10,10


The empty space between the commas is because that's because where you'd say what row we want; 
here, we only want column

In [6]:
lapply(MyList,"[", ,2)

In this case we have selected a row, but we've left the column blank

In [7]:
lapply(MyList, "[", 1, )

#### sapply()

The sapply() function works like lapply(), but it tries to simplify the output to the most elementary data structure that is possible. And indeed, sapply() is a ‘wrapper’ function for lapply().

In [8]:
# Compare lapply() and sapply()

lapply(MyList,"[", 2, 1 )

In [9]:
typeof(lapply(MyList,"[", 2, 1 ))

In [10]:
sapply(MyList, "[", 2, 1)

In [11]:
typeof(sapply(MyList, "[", 2, 1))

In [12]:
# When simplify is FALSE behavious is as with lapply()

sapply(MyList, "[", 2, 1, simplify = FALSE)

Good article: https://www.r-bloggers.com/using-apply-sapply-lapply-in-r/. Implemented below:

In [13]:
m <- matrix(data=cbind(rnorm(30, 0), rnorm(30, 2), rnorm(30, 5)), nrow=30, ncol=3)

In [14]:
m

0,1,2
0.71279691,1.51749294,4.113684
-0.81720056,1.63225061,4.208022
0.46750026,0.49209897,4.485332
0.71779986,2.06275902,4.970641
1.17039595,3.03498602,5.206474
-1.32376309,0.01582787,3.653192
0.4562336,1.97765534,5.888694
0.06133481,2.31621796,3.933522
-0.9251331,3.79546636,5.64441
0.04869757,2.88742681,4.862328


In [15]:
apply(m, 2, mean)

In [16]:
apply(m, 2, function(x) length(x[x<0]))

Here the function definition is not required, we could instead just pass the is.vector function, as it only takes one argument and has already been wrapped up in a function for us. Let’s check they are vectors as we might expect.

In [17]:
apply(m, 2, is.vector) == apply(m, 2, function(x) is.vector(x))

Why then did we need to wrap up our length function? When we want to define our own handling function for apply, we must at a minimum give a name to the incoming data, so we can use it in our function.

In [19]:
# This will break

apply(m, 2, length(x[x<0]))

ERROR: Error in match.fun(FUN): object 'x' not found


In [20]:
apply(m, 2, function(x) mean(x[x>0]))

Some of the other functions:

In [21]:
sapply(1:3, function(x) x^2)

In [22]:
lapply(1:3, function(x) x^2)

In [23]:
sapply(1:3, function(x) x^2, simplify=F)

In [24]:
unlist(lapply(1:3, function(x) x^2))

In [25]:
# This is apparently a vector

typeof(unlist(lapply(1:3, function(x) x^2)))

#### In sum:

apply : used with matrices, and gives back matrices or vectors 

sapply : takes vectors, and gives back vectors

lapply : takes vectors/lists, and gives back a list

In [26]:
add2 <- function(x) {
    x + 2
}

In [27]:
v <- seq(1:10) 
lapply(X = v, FUN = add2)

In [28]:
m

0,1,2
0.71279691,1.51749294,4.113684
-0.81720056,1.63225061,4.208022
0.46750026,0.49209897,4.485332
0.71779986,2.06275902,4.970641
1.17039595,3.03498602,5.206474
-1.32376309,0.01582787,3.653192
0.4562336,1.97765534,5.888694
0.06133481,2.31621796,3.933522
-0.9251331,3.79546636,5.64441
0.04869757,2.88742681,4.862328


#### MARGIN
a vector giving the subscripts which the function will be applied over. E.g., for a matrix 1 indicates rows, 2 indicates columns, c(1, 2) indicates rows and columns. Where X has named dimnames, it can be a character vector selecting dimension names.

In [29]:
apply(X = m, MARGIN = c(1, 2), FUN = add2)

0,1,2
2.71279691,3.517493,6.113684
1.18279944,3.632251,6.208022
2.46750026,2.492099,6.485332
2.71779986,4.062759,6.970641
3.17039595,5.034986,7.206474
0.67623691,2.015828,5.653192
2.4562336,3.977655,7.888694
2.06133481,4.316218,5.933522
1.0748669,5.795466,7.64441
2.04869757,4.887427,6.862328


#### Playing with strings..

In [37]:
str1 <- "hey"
str2 <- "yo"
str3 <- "pal"

In [38]:
paste(str1, str2, sep = "")

In [39]:
paste0(str1, str2, str3)

In [40]:
library(stringr)

In [42]:
names <- c("Rossi", "John", "Sarah", "Celia")

In [43]:
str_sub(names, start = 3, end = 3)