## Loop Functions
- *lapply*: Loop over a list and evaluate a function on each element
- *sapply*: Same as `lapply` but try to simplify the result
- *apply*: Apply a function over the margins of an array
- *tapply*: Apply a function over subsets of a vector
- *mapply*: Multivariate version of lapply

### lapply

`lapply` **always** retuns a list, regardless of the class of the input

In [7]:
x <- list(a = 1:5, b = rnorm(10))
x

In [8]:
lapply(x, mean)

In [6]:
mean(x$a)

In [9]:
mean(x$b)

In [10]:
x <- 1:4
lapply(x, runif)

In [11]:
x <- 1:4
lapply(x, runif, min = 0, max = 10)

In [12]:
x <- list(a = matrix(1:4, 2,2), b = matrix(1:6, 3, 2))
x

0,1
1,3
2,4

0,1
1,4
2,5
3,6


In [13]:
lapply(x, function(elt) elt[,1])

### sapply

`sapply` will try to simplify the result of `lapply` if possible

- If the result is a list where every element is length 1, then a vector is returned
- If the result is a list where every element is a vector of the same length (>1), a matrix is returned
- If it can't figure things out, a list is returned

In [15]:
x <- list(
    a = 1:4,
    b = rnorm(10),
    c = rnorm(20, 1),
    d = rnorm(100, 5)
)
#x

In [16]:
lapply(x, mean)

In [17]:
sapply(x, mean)

In [18]:
mean(x) # returns an error message

“argument is not numeric or logical: returning NA”


### apply

`apply` is used to evaluate a function (often an anonymous one) over the margins of an array

- It is most often used to apply a function to the rows or columns of a matrix
- It can be used with general arrays, e.g. taking the average of an array of matrices
- It is not really faster than writing a loop, but it works in one line!

For sums and means of matrix dimensions, here are some shortcuts
- `rowSums` = apply(x,1,sum)
- `rowMeans` = apply(x,1,mean)
- `colSums` = apply(x, 2, sum)
- `colMeans` = apply(x,2,mean)

In [19]:
str(apply)

function (X, MARGIN, FUN, ...)  


argument `MARGIN` is an integer vector indicating which margins should be retained

In [21]:
x <- matrix(rnorm(200), 20, 10)
#x

In [26]:
apply(x, 2, mean) # column mean (with argument margin = 2)

In [24]:
apply(x, 1, sum) # row sum (with argument margin = 1)

In [28]:
x <- matrix(rnorm(200), 20, 20)
apply(x, 1, quantile, probs = c(0.25, 0.75))

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
25%,-0.3293164,-1.208235,-0.2148967,-0.3936341,-1.076517,-0.1608993,-0.70520445,0.2119727,-0.4498982,-1.4504287,-1.1201617,-0.5653239,-0.5070608,-0.59826957,0.2684981,-1.3097704,-0.79940918,0.2076668,-0.7916373,-0.9790552
75%,0.2981084,1.463501,0.5982626,0.6451128,0.188986,0.5748976,0.03444989,1.2433275,0.8032708,0.1033605,-0.1305911,0.6716183,0.3150976,-0.09251304,0.4977738,0.3394357,0.06618663,1.1759835,1.1109227,0.02259531


In [29]:
a <- array(rnorm(2 * 2 * 10), c(2,2,10))
a

In [30]:
apply(a, c(1,2), mean)

0,1
0.03831172,-0.1367765
-0.25377791,-0.5193281


In [31]:
rowMeans(a, dims = 2)

0,1
0.03831172,-0.1367765
-0.25377791,-0.5193281


### mapply

`mapply` is a multivariate apply of sorts which applies a function in parallel over a set of arguments

In [32]:
str(mapply)

function (FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)  


Arguments:
- `FUN` is a function to apply
- `...` contains arguments to apply over
- `MoreArgs` is a list of other arguments to `FUN`
- `SIMPLIFY` indicates whether the result should be simplified

In [33]:
list(rep(1,4), rep(2,3), rep(3,2), rep(4,1))

In [34]:
mapply(rep, 1:4, 4:1)

### tapply

 `tapply` is used to apply a function over subsets of a vector. (dont know why it's called `tapply`)

In [35]:
str(tapply)

function (X, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE)  


Arguments
- `X` is a vector
- 'INDEX` is a factor or a list of factors (or else they are coerced to factors)
- `FUN`
- `...`
- `simplify`

In [37]:
x <- c(rnorm(10), runif(10), rnorm(10,1))
f <- gl(3,10)
f

In [39]:
x

In [38]:
tapply(x, f, mean)

In [40]:
tapply(x, f, mean, simplify = FALSE)

### split

`split` takes a vector or other objects and splits it into groups determined by a factor or list of factors

In [42]:
str(split)

function (x, f, drop = FALSE, ...)  


Arguments
- `x` is a vector (or list) or data frame
- `f` is a factor (or coerced to one) or a list of factors
- `drop` indicates whether empty factors levels should be droped

In [43]:
x <- c(rnorm(10), runif(10), rnorm(10, 1))
x

In [44]:
f <- gl(3, 10)

In [45]:
split(x, f)

In [46]:
lapply(split(x,f), mean)

In [51]:
library(datasets)
head(airquality) # data frame

Unnamed: 0_level_0,Ozone,Solar.R,Wind,Temp,Month,Day
Unnamed: 0_level_1,<int>,<int>,<dbl>,<int>,<int>,<int>
1,41.0,190.0,7.4,67,5,1
2,36.0,118.0,8.0,72,5,2
3,12.0,149.0,12.6,74,5,3
4,18.0,313.0,11.5,62,5,4
5,,,14.3,56,5,5
6,28.0,,14.9,66,5,6


In [50]:
# split it by month
s <- split(airquality, airquality$Month)
# calculate each column's mean value
lapply (s, function(x) colMeans(x[,c("Ozone", 'Solar.R', 'Wind')]))

In [52]:
sapply(s, function(x) colMeans(x[, c("Ozone", 'Solar.R', 'Wind')]))

Unnamed: 0,5,6,7,8,9
Ozone,,,,,
Solar.R,,190.16667,216.483871,,167.4333
Wind,11.62258,10.26667,8.941935,8.793548,10.18


In [53]:
sapply(s, function(x) colMeans(x[, c("Ozone", 'Solar.R', 'Wind')],
                              na.rm = T))

Unnamed: 0,5,6,7,8,9
Ozone,23.61538,29.44444,59.115385,59.961538,31.44828
Solar.R,181.2963,190.16667,216.483871,171.857143,167.43333
Wind,11.62258,10.26667,8.941935,8.793548,10.18


split on more than 1 level

In [54]:
x <- rnorm(10)
f1 <- gl(2,5)
f2 <- gl(5,2)

In [55]:
f1

In [56]:
f2

In [57]:
interaction(f1,f2)

In [58]:
str(split(x, list(f1, f2)))

List of 10
 $ 1.1: num [1:2] -0.985 -1.297
 $ 2.1: num(0) 
 $ 1.2: num [1:2] 0.762 -0.447
 $ 2.2: num(0) 
 $ 1.3: num -0.21
 $ 2.3: num 0.529
 $ 1.4: num(0) 
 $ 2.4: num [1:2] 0.513 2.066
 $ 1.5: num(0) 
 $ 2.5: num [1:2] -0.534 0.229


In [59]:
str(
    split(x, list(f1, f2), drop = T)
)

List of 6
 $ 1.1: num [1:2] -0.985 -1.297
 $ 1.2: num [1:2] 0.762 -0.447
 $ 1.3: num -0.21
 $ 2.3: num 0.529
 $ 2.4: num [1:2] 0.513 2.066
 $ 2.5: num [1:2] -0.534 0.229


## Debugging Tools

In [61]:
log(-1) # warning

“NaNs produced”


### traceback

In [65]:
mean(xx)
traceback()

ERROR: Error in mean(xx): object 'xx' not found


In [66]:
traceback() # seems it only works in terminal mode

No traceback available 


In [68]:
lm(y ~ x) # oh, now traceback is automatically embedded

ERROR: Error in eval(predvars, data, env): object 'y' not found


## Week 3 Quiz

In [70]:
library(datasets)
data(iris)

In [83]:
# ?iris
head(iris)

Unnamed: 0_level_0,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<fct>
1,5.1,3.5,1.4,0.2,setosa
2,4.9,3.0,1.4,0.2,setosa
3,4.7,3.2,1.3,0.2,setosa
4,4.6,3.1,1.5,0.2,setosa
5,5.0,3.6,1.4,0.2,setosa
6,5.4,3.9,1.7,0.4,setosa


In [75]:
# ?split
str(split)

function (x, f, drop = FALSE, ...)  


In [82]:
#?sapply

In [79]:
apply(split(iris$Sepal.Length, iris$Species), mean)

ERROR: Error in match.fun(FUN): argument "FUN" is missing, with no default


In [90]:
# split iris by Species
s <- split(iris, iris$Species)
irisMeans <- sapply(s, function(x) colMeans(x[, c("Sepal.Length", 'Sepal.Width', 'Petal.Length', 'Petal.Width')]))
#sapply(s, function(x) colMeans(x[, c("Ozone", 'Solar.R', 'Wind')], na.rm = T))
round(irisMeans,1)

Unnamed: 0,setosa,versicolor,virginica
Sepal.Length,5.0,5.9,6.6
Sepal.Width,3.4,2.8,3.0
Petal.Length,1.5,4.3,5.6
Petal.Width,0.2,1.3,2.0


In [95]:
# colMeans(iris)
# apply(iris[,1:4], 1, mean)
apply(iris[,1:4], 2, mean)
# apply(iris, 2, mean)

In [100]:
library(datasets)
data(mtcars)

In [97]:
?mtcars

0,1
mtcars {datasets},R Documentation

0,1,2
"[, 1]",mpg,Miles/(US) gallon
"[, 2]",cyl,Number of cylinders
"[, 3]",disp,Displacement (cu.in.)
"[, 4]",hp,Gross horsepower
"[, 5]",drat,Rear axle ratio
"[, 6]",wt,Weight (1000 lbs)
"[, 7]",qsec,1/4 mile time
"[, 8]",vs,"Engine (0 = V-shaped, 1 = straight)"
"[, 9]",am,"Transmission (0 = automatic, 1 = manual)"
"[,10]",gear,Number of forward gears


In [101]:
mtcars

Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1
Duster 360,14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4
Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
Merc 280,19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4


In [129]:
# split mtcars by cyl
s <- split(mtcars, mtcars$cyl)
#s
mtcarsMeans <- sapply(s, function(x) colMeans(x[, c("mpg", 'hp')]))
mtcarsMeans
#sapply(s, function(x) colMeans(x[, c("Ozone", 'Solar.R', 'Wind')], na.rm = T))
#round(irisMeans,1)

debugging in: ls(vars, all.names = T)
debug: {
    if (!missing(name)) {
        pos <- tryCatch(name, error = function(e) e)
        if (inherits(pos, "error")) {
            name <- substitute(name)
            if (!is.character(name)) 
                name <- deparse(name)
                sQuote(name)), domain = NA)
            pos <- name
        }
    }
    all.names <- .Internal(ls(envir, all.names, sorted))
    if (!missing(pattern)) {
        if ((ll <- length(grep("[", pattern, fixed = TRUE))) && 
            ll != length(grep("]", pattern, fixed = TRUE))) {
            if (pattern == "[") {
                pattern <- "\\["
            }
            else if (length(grep("[^\\\\]\\[<-", pattern))) {
                pattern <- sub("\\[<-", "\\\\\\[<-", pattern)
            }
        }
        grep(pattern, all.names, value = TRUE)
    }
    else all.names
}
debug: if (!missing(name)) {
    pos <- tryCatch(name, error = function(e) e)
    if (inherits(pos, "error")) {
        nam

Unnamed: 0,4,6,8
mpg,26.66364,19.74286,15.1
hp,82.63636,122.28571,209.2143


In [106]:
sapply(
    split(mtcars$mpg, mtcars$cyl), mean
)

In [107]:
tapply(
    mtcars$mpg, mtcars$cyl, mean
)

In [108]:
lapply(mtcars, mean)

In [109]:
split(mtcars, mtcars$cyl)

Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
Fiat 128,32.4,4,78.7,66,4.08,2.2,19.47,1,1,4,1
Honda Civic,30.4,4,75.7,52,4.93,1.615,18.52,1,1,4,2
Toyota Corolla,33.9,4,71.1,65,4.22,1.835,19.9,1,1,4,1
Toyota Corona,21.5,4,120.1,97,3.7,2.465,20.01,1,0,3,1
Fiat X1-9,27.3,4,79.0,66,4.08,1.935,18.9,1,1,4,1
Porsche 914-2,26.0,4,120.3,91,4.43,2.14,16.7,0,1,5,2
Lotus Europa,30.4,4,95.1,113,3.77,1.513,16.9,1,1,5,2

Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
Valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1
Merc 280,19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4
Merc 280C,17.8,6,167.6,123,3.92,3.44,18.9,1,0,4,4
Ferrari Dino,19.7,6,145.0,175,3.62,2.77,15.5,0,1,5,6

Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2
Duster 360,14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4
Merc 450SE,16.4,8,275.8,180,3.07,4.07,17.4,0,0,3,3
Merc 450SL,17.3,8,275.8,180,3.07,3.73,17.6,0,0,3,3
Merc 450SLC,15.2,8,275.8,180,3.07,3.78,18.0,0,0,3,3
Cadillac Fleetwood,10.4,8,472.0,205,2.93,5.25,17.98,0,0,3,4
Lincoln Continental,10.4,8,460.0,215,3.0,5.424,17.82,0,0,3,4
Chrysler Imperial,14.7,8,440.0,230,3.23,5.345,17.42,0,0,3,4
Dodge Challenger,15.5,8,318.0,150,2.76,3.52,16.87,0,0,3,2
AMC Javelin,15.2,8,304.0,150,3.15,3.435,17.3,0,0,3,2


In [110]:
apply(mtcars, 2, mean)

In [111]:
sapply(mtcars, cyl, mean)

ERROR: Error in match.fun(FUN): object 'cyl' not found


In [112]:
mean(mtcars$mpg, mtcars$cyl)

ERROR: Error in mean.default(mtcars$mpg, mtcars$cyl): 'trim' must be numeric of length one


In [113]:
tapply(mtcars$cyl, mtcars$mpg, mean)

In [114]:
with(mtcars, tapply(mpg, cyl, mean))

In [115]:
# split mtcars by cyl
s <- split(mtcars, mtcars$cyl)
#s
mtcarsMeans_hp <- sapply(s, function(x) colMeans(x[, c("mpg", 'hp')]))
mtcarsMeans_hp
#sapply(s, function(x) colMeans(x[, c("Ozone", 'Solar.R', 'Wind')], na.rm = T))
#round(irisMeans,1)

Unnamed: 0,4,6,8
mpg,26.66364,19.74286,15.1
hp,82.63636,122.28571,209.2143


In [119]:
mtcarsMeans_hp[2,]

In [123]:
round(abs(mtcarsMeans_hp[2,1] - mtcarsMeans_hp[2,3]),1)

In [126]:
debug(ls)
ls

## Week 3 Quiz

In [134]:
# makeVector.R
makeVector <- function(x = numeric()){
    m <- NULL
    set <- function(y) {
        x <<- y
        m <<- NULL
    }
    get <- function() x
    setmean <- function(mean) m <<- mean
    getmean <- function() m
    list(set = set, get = get, 
        setmean = setmean,
        getmean = getmean)
}

In [135]:
# cachemean.R
cachemean <- function(x, ...){
    m <- x$getmean()
    if(!is.null(m)){
        message('getting cached data')
        return(m)
    }
    data <- x$get()
    m <- mean(data, ...)
    x$setmean(m)
}

In [137]:
# cacheMatrix.R
## Put comments here that give an overall description of what your
## functions do

## Write a short comment describing this function

makeCacheMatrix <- function(x = matrix()) {
    # Initialize the inverse property
    i <- NULL
    
    # Method to set the matrix
    set <- function(matrix) {
        m <<- matrix
        i <<- NULL
    }
    
    # Method to get the matrix
    get <- function() {
        ## return the matrix
        m
    }
    
    # Method to set the inverse of the matrix
    setInverse <- function(inverse) {
        i <<- inverse
    }
    
    ## Method to get the inverse of the matrix
    getInverse <- function() {
        ## Return the inverse property
        i
    }

    # Return a list of the methods
    list(set = set, get = get, setInverse = setInverse, getInverse = getInverse)
}

In [138]:
# cacheSolve.R
# Compute the inverse of the special matrix returned by "makeCacheMatrix"
# above. If the inverse has already been calculated (and the matrix has not
# changed), then the "cachesolve" should retrieve the inverse from the cache

cacheSolve <- function(x, ...) {
    ## Return a matrix that is the inverse of 'x'
    m <- x$getInverse()
    
    ## Just return the inverse if it already exists
    if(!is.null(m)){
        message("getting cached data")
        return(m)
    }
    
    ## Get the matrix from our object
    data <- x$get()
    
    ## Calculate the inverse using matrix multiplication
    m <- solve(data) %*% data
    
    ## Set the inverse to the object
    x$setInverse(m)
    
    ## Return the matrix
    m
}