# Apply family of functions

R is also a vectorized language. It allows the same function to be applied to every element of a vector or a matrix.
- A way to bring performance into the fold
- R is interpretable and hence executing loops is very costly.
- apply family will help to avoid loops and increase performance

<br>

- Vectors are typed. All the elements of the vector are of the same type. When R figures out the type of the first element in the vector, it knows the type of all the elements. No need to spend time figuring out the types of every element in the vector. <br>
- At the low level, every object is stored as a vector. <br>
- Memory reallocation is evpensive, thus we could preallocate the memory if we know how large the vector is going to be ahead of time.

In [1]:
rm(list = ls())

In [2]:
a = c()

In [3]:
length(a)

In [4]:
a[5] = 7

In [5]:
length(a)

In [6]:
a = rep(NA, 5)

In [7]:
length(a)

In [8]:
a[5] = 7

# sapply

Goal - Apply an operation to every element of the vector.

In [9]:
b = c('I','work','at','cisco')

In [10]:
nchar("xyz")

In [11]:
#Option 1

result = c(nchar(b[1]),nchar(b[2]),nchar(b[3]),nchar(b[4]))

In [12]:
result

In [None]:
# Option 2 - Use sapply

In [15]:
result = sapply(b, nchar)
result
mode(result)
class(result)

In [21]:
# Option 3 - use vectorized function
# Not every function is vectorized.

In [22]:
result = nchar(b)

In [23]:
result

In [24]:
mode(result)
class(result)

# Passing args to functions in apply

In [25]:
inner_product = function(rowvec, colvec){
    product = rowvec * colvec
    inner = sum(product)
    return(inner)
}

In [27]:
a = matrix(c(1,2,3,1,2,3,1,2,3,1,2,3), nrow = 3, ncol = 4)
a

0,1,2,3
1,1,1,1
2,2,2,2
3,3,3,3


In [28]:
b = c(4,5,6,7)

In [29]:
inner_product(a[1,],b)

In [30]:
inner_product(a[2,],b)

In [32]:
# a = matrix
# 1 = margin i.e. we want to apply operations on the rows
# function
# arg to the function

apply(a,1,inner_product,b)

# Checking validity of arguments

It is our responsibility to check if the modes of the objects that we are passing to the functions match.

In [35]:
matrixRowSum = function(data){
    if(is.matrix(data) ==TRUE){
        margin = 1
        sums = apply(data, margin, sum)
        return(sums)        
    }else{
        print("ERROR: Argument must be a matrix")
        return(NA)
    }
}

In [36]:
matrixRowSum(a)

In [37]:
matrixRowSum(c(1:10))

[1] "ERROR: Argument must be a matrix"


# Safe coercion

In [38]:
df = data.frame(c(1:4),c(4:7),c(8:11))

In [39]:
class(df)

In [40]:
mode(df)

In [41]:
df

c.1.4.,c.4.7.,c.8.11.
1,4,8
2,5,9
3,6,10
4,7,11


In [42]:
matrixRowSum(df)

[1] "ERROR: Argument must be a matrix"


In [43]:
mm = as.matrix(df)

In [44]:
mm

c.1.4.,c.4.7.,c.8.11.
1,4,8
2,5,9
3,6,10
4,7,11


In [45]:
matrixRowSum(mm)

# lapply

In [46]:
a = c(1:10)
b = exp(-3:3)
c = c(TRUE, FALSE,TRUE)

x = list(a = a, b = b, c = c)

In [47]:
x

In [48]:
class(x)

In [50]:
result = lapply(x, mean)

In [52]:
class(result)
result

In [53]:
result2 = sapply(x, mean)

In [55]:
class(result2)
result2

# Avoid loops

Use apply family of functions

In [87]:
avoidLoops = function(data){
    
    if(is.matrix(data) == TRUE){
        sums = c()
        
        for(i in 1:nrow(data)){
            rowSum = sum(data[i,])            
            sums = c(sums,rowSum)            
        }
        return(sums)
    } else {
        return(NULL)
    }    
}

In [88]:
a = matrix(c(1,2,3,1,2,3,1,2,3,1,2,3), nrow = 3, ncol = 4)

In [89]:
a

0,1,2,3
1,1,1,1
2,2,2,2
3,3,3,3


In [90]:
x = avoidLoops(a)
x