### Looping on Command Line
**Functions include**
1. ``lapply``: loop over a list and evaluate a function on each element
2. ``sapply``: same as ``lapply`` but simplified result
    - if the result is a list where every element is length 1, then a vector is returned
    - else if a matrix is returned
    - if it cannot figure out any things, a list is returned
3. ``apply``: apply a function over the margins of an array
4. ``tapply``: apply a function over subsets of a vector
5. ``mapply``: multivariate version of ``lapply``
- an auxiliary function``split`` is useful when used in conjuction with ``lapply``

``lapply`` takes on three arguments which are a list, a function, and other arguments via its ... argument. 
- the output is always a list

In [1]:
x <- list(a = 1:5, b=rnorm(10))
x

In [2]:
lapply(x, mean)

In [3]:
lapply(x, max)

In [4]:
# min and max are the arguments that pass into runif function ordinarily
lapply(x, runif, min=0, max=10)

In [5]:
# lapply make heavy use of anonymous function
x <- list(a=matrix(1:4, 2, 2), b=matrix(1:6, 3, 2))
x

0,1
1,3
2,4

0,1
1,4
2,5
3,6


In [6]:
# an anonymous function that extract the first column of each matrix
lapply(x, function(elt) elt[,1])

In [7]:
# sapply
sapply(x, runif, min=0, max=10)

### ``apply`` function
- used to evaluate a function (anonymous) over the margins of an array
- most often used to apply a function to the rows or columns of a matrix
- can be used with general arrays
- not really faster than writting a loop, but it works in one line

In [1]:
x <- matrix(rnorm(200), 4, 5)
x

0,1,2,3,4
-0.6313846,0.1131204,0.4797403,-1.2079088,-0.9019968
-1.2472822,0.458781,-0.6054469,0.1006162,-0.3789392
0.7189516,0.2380875,1.9855186,1.5486129,0.3131404
-1.4130351,-1.9885703,-1.5128419,-1.7156004,-1.9057925


In [2]:
# 2 means by columns
apply(x, 2, mean)

In [4]:
# 1 means by rows
apply(x, 1, sum)

**These functions are simplified into**
1. ``rowSums`` = apply(x, 1, sum)
2. ``rowMeans`` = apply(x, 1, mean)
3. ``colSums`` = apply(x, 2, sum)
4. ``colMeans`` = apply(x, 2, mean)
- these shortcut functions run *much* faster

In [6]:
a <- array(rnorm(2*2*10), c(2,2,10))
dim(a)

In [7]:
apply(a, c(1,2), mean)

0,1
0.02032909,0.4165731
0.01911213,0.5053509


In [8]:
rowMeans(a, dims=2)

0,1
0.02032909,0.4165731
0.01911213,0.5053509


### ``mapply()`` is a multivariate apply of sorts which applies a function in parallel over a set of arguments
- a better for loop

In [9]:
rep(2, 3)

In [10]:
list(rep(1,4), rep(2, 3), rep(3, 2), rep(4, 1))

In [11]:
mapply(rep, 1:4, 4:1)

### ``tapply()`` is used to apply a function over subsets of a vector

In [22]:
a <- rnorm(10)
b <- runif(10)
c <- rnorm(10)
print(mean(a))
print(mean(b))
print(mean(c))
a
b
c

[1] 0.2033137
[1] 0.4494579
[1] -0.1150005


In [18]:
x <- c(a, b, c)
x

In [15]:
f <- gl(3, 10)
f

In [20]:
tapply(x, f, mean)

### ``split()`` takes a vector or other objects and splits it into groups determined by a factor or list of factors

In [21]:
split(x, f)

In [23]:
lapply(split(x, f), mean)

**Splitting a Data Frame**
```
s <- split(data, data $ columnname)
lapply(s, function(x colMeans(x[, c("columnname1", "columnname2", "columnname3")]))
```

### Debugging
![image.png](attachment:image.png)

![image.png](attachment:image.png)