
* **apply** - apply over the margins of an array (e.g. the rows or columns of a matrix)
* **lapply** - apply a function to each element of a list in turn and get a list back.
* **sapply** - apply a function to each element of a list and get a simplified object like vector back, rather than a list.
* **tapply** - apply a function to subsets of a vector and the subsets are defined by some other vector, usually a factor.
* **mapply** - apply a function to the 1st elements of each, and then the 2nd elements of each, etc



In [224]:
set.seed(111)
m <- matrix(data=cbind(rnorm(30, 0), rnorm(30, 2), rnorm(30, 5)), nrow=30, ncol=3)
m

0,1,2
0.23522071,-1.1132173,4.783571
-0.33073587,1.0586426,6.446478
-0.31162382,3.4002588,5.40971
-2.30234566,0.37953,5.910917
-0.17087604,-0.265996,6.430358
0.14027823,3.1629936,4.618708
-1.49742666,1.883845,5.202307
-1.01018842,2.334256,4.193801
-0.9484756,1.3791419,5.294634
-0.49396222,0.6901551,6.404883


### apply
* When you want to apply a function to the rows or columns of a matrix (and higherdimensional analogues); 
* not generally advisable for data frames as it will coerce to a matrix first

In [225]:
# traverse row wise
apply(m, 1, mean)

In [226]:
# traverse column wise
# using Our own functions
apply(m, 2, function(x) length(x[x<0]))

### lapply
* When you want to apply a function to each element of a list/vector in turn and get a list back

![](img/apply_fun.jpg)


In [227]:
lapply(1:3, function(x) x^2)

In [228]:
# you can use unlist with lapply to get a vector
unlist(lapply(1:3, function(x) x^2))

In [229]:
#simplify2array and sapply
simplify2array(lapply(1:3, function(x) x^2))

### saplly
* When you want to apply a function to each element of a list in turn, but you want avector back, rather than a list
 

In [230]:
sapply(1:3, function(x) x^2)

In [231]:
print(sapply(1:3, function(x) x^2,simplify = FALSE))

[[1]]
[1] 1

[[2]]
[1] 4

[[3]]
[1] 9



### taplly

In [232]:
print(tapply(mtcars$wt,mtcars$cyl,mean))

       4        6        8 
2.285727 3.117143 3.999214 


In [233]:
print(tapply(iris$Sepal.Length,iris$Species,mean))

    setosa versicolor  virginica 
     5.006      5.936      6.588 


In [234]:
print(tapply(iris$Sepal.Length,iris$Species,mean))

    setosa versicolor  virginica 
     5.006      5.936      6.588 


In [None]:
# Without simplification, tapply always returns a list.
tapply(iris$Sepal.Length,iris$Species,mean, simplify = FALSE)

In [236]:
# split function splits a vector into groups using a factor.Using split 
# and then applying a function with lapply produces the same result as tapply
lapply(split(iris$Sepal.Length,iris$Species), mean) # Instead of tapply

### aggregate
* The aggregate function splits the data into subsets and computes summary statistics for each of them. 
* The output of aggregate is a data.frame, including a column for species.

In [257]:
# ?tapply
# ?aggregate

In [268]:
iris.x <- subset(iris, select= -Species) # subsetting without Species
iris.s <- subset(iris, select= Species) # subsetting only Species
aggregate(iris.x, iris.s, mean)


Species,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width
setosa,5.006,3.428,1.462,0.246
versicolor,5.936,2.77,4.26,1.326
virginica,6.588,2.974,5.552,2.026


In [265]:
aggregate(x = mtcars, by = list(mtcars$carb), FUN = median)

Group.1,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
1,22.8,4,108.0,93,3.85,2.32,19.47,1.0,1,4.0,1
2,22.1,4,143.75,111,3.73,3.17,17.175,0.5,0,4.0,2
3,16.4,8,275.8,180,3.07,3.78,17.6,0.0,0,3.0,3
4,15.25,8,350.5,210,3.815,3.505,17.22,0.0,0,3.5,4
6,19.7,6,145.0,175,3.62,2.77,15.5,0.0,1,5.0,6
8,15.0,8,301.0,335,3.54,3.57,14.6,0.0,1,5.0,8


In [266]:
## example with character variables and NAs
testDF <- data.frame(v1 = c(1,3,5,7,8,3,5,NA,4,5,7,9),
                     v2 = c(11,33,55,77,88,33,55,NA,44,55,77,99) )
by1 <- c("red", "blue", 1, 2, NA, "big", 1, 2, "red", 1, NA, 12)
by2 <- c("wet", "dry", 99, 95, NA, "damp", 95, 99, "red", 99, NA, NA)
aggregate(x = testDF, by = list(by1, by2), FUN = "mean")


Group.1,Group.2,v1,v2
1,95,5.0,55.0
2,95,7.0,77.0
1,99,5.0,55.0
2,99,,
big,damp,3.0,33.0
blue,dry,3.0,33.0
red,red,4.0,44.0
red,wet,1.0,11.0


In [267]:
## Formulas, one ~ one, one ~ many, many ~ one, and many ~ many:
aggregate(weight ~ feed, data = chickwts, mean)
aggregate(breaks ~ wool + tension, data = warpbreaks, mean)
aggregate(cbind(Ozone, Temp) ~ Month, data = airquality, mean)
aggregate(cbind(ncases, ncontrols) ~ alcgp + tobgp, data = esoph, sum)

## Dot notation:
aggregate(. ~ Species, data = iris, mean)
aggregate(len ~ ., data = ToothGrowth, mean)

feed,weight
casein,323.5833
horsebean,160.2
linseed,218.75
meatmeal,276.9091
soybean,246.4286
sunflower,328.9167


wool,tension,breaks
A,L,44.55556
B,L,28.22222
A,M,24.0
B,M,28.77778
A,H,24.55556
B,H,18.77778


Month,Ozone,Temp
5,23.61538,66.73077
6,29.44444,78.22222
7,59.11538,83.88462
8,59.96154,83.96154
9,31.44828,76.89655


alcgp,tobgp,ncases,ncontrols
0-39g/day,0-9g/day,9,261
40-79,0-9g/day,34,179
80-119,0-9g/day,19,61
120+,0-9g/day,16,24
0-39g/day,10-19,10,84
40-79,10-19,17,85
80-119,10-19,19,49
120+,10-19,12,18
0-39g/day,20-29,5,42
40-79,20-29,15,62


Species,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width
setosa,5.006,3.428,1.462,0.246
versicolor,5.936,2.77,4.26,1.326
virginica,6.588,2.974,5.552,2.026


supp,dose,len
OJ,0.5,13.23
VC,0.5,7.98
OJ,1.0,22.7
VC,1.0,16.77
OJ,2.0,26.06
VC,2.0,26.14


### Difference between how apply and sapply handles the data

In [237]:
df <- data.frame(ingr=c("2L",6L, 7L),fctr=c(TRUE,FALSE,TRUE))

In [238]:
apply(df,2,class)
sapply(df,class)

### mapply
* Multivariate version of sapply
* It applies FUN to the first elements of each (…) argument,  the second elements, the third elements, and so on.
* Note that the **first argument of mapply() here is the name of a function**
* advisable when you have several data structures (e.g. vectors, lists) and you want to apply a function over elements


In [239]:
l1 <- list(a = c(1:10), b = c(11:20))
l2 <- list(c = c(21:30), d = c(31:40))
# sum the corresponding elements of l1 and l2
print(mapply(sum, l1$a, l1$b, l2$c, l2$d))

 [1]  64  68  72  76  80  84  88  92  96 100


In [240]:
print(mapply(sum, l1))
#sum(c(1:10)) 

  a   b 
 55 155 


In [241]:
print(mapply(sum, l1,l2))
#sum(c(1:10),c(21:30))

  a   b 
310 510 


In [242]:
print(mapply(sum, l1$a, l1$b))

 [1] 12 14 16 18 20 22 24 26 28 30


In [243]:
print(mapply(sum, l1$a, l1$b, l2$c, l2$d))

 [1]  64  68  72  76  80  84  88  92  96 100


In [244]:
# Map - A wrapper to mapply with SIMPLIFY = FALSE, so it is guaranteed to return a list
print(Map(sum, l1$a, l1$b, l2$c, l2$d))

[[1]]
[1] 64

[[2]]
[1] 68

[[3]]
[1] 72

[[4]]
[1] 76

[[5]]
[1] 80

[[6]]
[1] 84

[[7]]
[1] 88

[[8]]
[1] 92

[[9]]
[1] 96

[[10]]
[1] 100



![](img/merge_from_ofoct.jpg)

![](img/merge_from_ofoct1.jpg)

**Further References on apply functions:**   
https://nsaunders.wordpress.com/2010/08/20/a-brief-introduction-to-apply-in-r/  
https://www.datacamp.com/community/tutorials/r-tutorial-apply-family  
https://stackoverflow.com/questions/3505701/grouping-functions-tapply-by-aggregate-and-the-apply-family/7701638  

### by
* applys a function to each level of a factor or factors
* split your data by factors and do calculations on each subset
* similar to group by function in SQL, applied to factors, where in we may apply operations on individual results set.

In [245]:
# we apply colMeans() function to all the observations on iris dataset grouped by Species.
by(iris[, 1:4], iris$Species, colMeans)

iris$Species: setosa
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       5.006        3.428        1.462        0.246 
------------------------------------------------------------ 
iris$Species: versicolor
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       5.936        2.770        4.260        1.326 
------------------------------------------------------------ 
iris$Species: virginica
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       6.588        2.974        5.552        2.026 

### with
* applys an expression to a dataset  
**syntax:** `with(data, expression)`

In [246]:
with(mtcars, mpg[cyl == 8  &  disp > 350])
    # is the same as, but nicer than
mtcars$mpg[mtcars$cyl == 8  &  mtcars$disp > 350]

### Let’s take one variable, square it and add 100
> "How many ways might an R beginner screw up such a simple computation"

In [247]:
#let's create a data frame
mydata <- data.frame(x = 1:5)
print(mydata)

  x
1 1
2 2
3 3
4 4
5 5


In [248]:
#### using dollar format: mydata$x

In [249]:
mydata.new <- mydata #take original data into another data frame

In [250]:
mydata.new$x2 <- mydata.new$x  ^ 2
mydata.new$x3 <- mydata.new$x2 + 100
print(mydata.new)

  x x2  x3
1 1  1 101
2 2  4 104
3 3  9 109
4 4 16 116
5 5 25 125


#### Using attach function
* Use the names like “x" instead of “mydata$x".
* The attach function allows you to use short names to refer to variables in a data frame

In [251]:
# the attach function is tricky to use. Here’s the most common mistake made by beginners
mydata.new <- mydata
attach(mydata.new)
x2 <- x  ^ 2 
x3 <- x2 + 100
print(mydata.new)

The following object is masked from mydata.new (pos = 3):

    x

The following object is masked from mydata.new (pos = 4):

    x

The following object is masked from mydata.new (pos = 5):

    x

The following object is masked from mydata.new (pos = 6):

    x

The following object is masked from mydata.new (pos = 7):

    x

The following object is masked from mydata.new (pos = 8):

    x

The following object is masked from mydata.new (pos = 9):

    x

The following object is masked from mydata.new (pos = 10):

    x



  x
1 1
2 2
3 3
4 4
5 5


**proceed further on below interesting article:**   
http://r4stats.com/2013/01/22/comparing-tranformation-styles/