# Apply Family of Functions

Source: A brief introduction to apply in R, by Neil Saunders in his blog _What you're doing is rather desparate_
- https://nsaunders.wordpress.com/2010/08/20/a-brief-introduction-to-apply-in-r/

## Contents
1. `lapply` and `sapply`
1. `apply` 
2. `mapply`

In this notebook, you will get to learn use functions from the apply family.  

We will cover `lapply`, `sapply` and `mapply`, and explain the difference between them.   

In addition, we briefly cover `apply`, which takes a matrix or dataframe as input.

The _apply family of functions_ provide an alternate, and more succinct or precise, way of running a single function on several pieces of data. 

This would typically be performed using a loop, such as a for loop or while loop, but the apply functions require less code and so are easier to use and understand. For this reason errors are less likely to happen with your code. 

There are several different apply functions. The examples below only cover the use of `lapply`, `sapply`, `apply` and `mapply`.

## 1. `lapply` and `sapply`

The general structure of the `lapply` function is `lapply(X, FUN, ...)`. 

The `lapply` and `sapply` functions:
- run a function (their second argument i.e. FUN) 
- on every element of a list or vector (their first argument i.e. X)


The `sapply` function returns a vector if possible and has the general structure 
- `sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)`

Two lists are created for the examples below.

In [7]:
%r
list_of_vectors         = list(1:5, c("a", "b", "c"), c(TRUE, FALSE))
list_of_numeric_vectors = list(1:3, 1:10)

In [8]:
%r
list_of_vectors

In [9]:
%r
list_of_numeric_vectors

Recall the `length` and `sum` functions.

In [11]:
%r
length(1:5)

In [12]:
%r
sum(1:5)

In [13]:
%r
length(c("a", "b", "c"))

Now we apply the `length` function to every element (which are vectors) of the list `list_of_vectors`.

In [15]:
%r
lapply(list_of_vectors, length)

Compare this to the elements of `list_of_vectors`.

In [17]:
%r
list_of_vectors

Now we apply the `sum` function to every element (which are vectors) of the list `list_of_numeric_vectors`.

In [19]:
%r
list_of_numeric_vectors

In [20]:
%r
lapply(list_of_numeric_vectors, sum)

Compare this to the elements of `list_of_numeric_vector` above.

The `lapply` and `sapply` functions are said to _apply_ the function given in the second argument to every element of the list, or vector, given in the first argument.

The only differences between `sapply` and `lapply` are that
- `lapply` always returns a list
- `sapply` returns a vector if possible

First recall `list_of_vectors` and define a function that retrieves the first element of a vector.

In [24]:
%r
list_of_vectors

Define a function to be applied to each item of a list.

In [26]:
%r
first_element_of_vector = function(item) { 
  item[1]
}

Now apply the function.

In [28]:
%r
lapply(list_of_vectors, first_element_of_vector)

Compare the above output by `lapply` with the below output by `sapply`.

In [30]:
%r
sapply(list_of_vectors, first_element_of_vector)

R has _coerced_ the values `1` and `TRUE` to characters so that the result is a vector.

Lists can be complicated. For instance, the following list contains three dataframes.

In [33]:
%r
list_of_dataframe = list(iris, trees, mtcars)

The following code applies the `dim` (dimension) function to each element of the list, dataframes in this case, and then returns, in this case, a list of dimension vectors.

In [35]:
%r
list_of_dataframe_dim = lapply(list_of_dataframe, dim)
list_of_dataframe_dim

Now apply the _anonymous function_ function `function (item) { item[1] }` below to each element of `list_of_dataframe_dim`. 

This anonymous function returns the first element of the input vector.

In [37]:
%r
sapply(list_of_dataframe_dim, function (item) { item[1] })

__Exercise:__ Create a single call to `sapply` that returns a vector containing the number of rows in each dataframe/item of the input list.

## 2. `apply`

The apply functions is used on matrices and dataframes. An example of its use on dataframes is as follows:

In [41]:
%r
my_df=data.frame(Col_1=c(1:3), 
                 Col_2=c(4:6))
my_df

The general structure of the apply() function is 
- `apply(X, MARGIN, FUN, …)` 

Here when the `MARGIN=1`, the function is applied on rows and when `MARGIN=2`, the function will be applied on columns.

In [43]:
%r
apply(my_df, MARGIN=1, sum)

In [44]:
%r
apply(my_df, MARGIN=2, sum)

## 3. `mapply`

The `lapply` and `sapply` function can take as input a function that has only a single parameter, such as `length` or `sum`. 

The `mapply` function can take as input a function with more than one parameter. 

The general structure of the `mapply` function is 
- `mapply(FUN, …, MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)`
       
To demonstrate this function we use the `rep` function. 

We use the `rep` function in the following examples. This function repeats its first argument the number of times specified by its second argument. 

For instance,

In [47]:
%r
rep("hello", 3)

Now we _apply_ the `rep` function to the elements of two vectors:
- the first contains the string to repeat: `c("hello", "goodbye")`
- the second contains the number of times to repeat: `c(2,3)`

In [49]:
%r
list_of_vectors_of_repeated_strings = mapply(rep, 
                                             c("hello", "goodbye"), 
                                             c(      2,         3))
list_of_vectors_of_repeated_strings

Notice that `list_of_vectors_of_repeated_strings` is a list of two elements, which are the results of the `rep` function.

In [51]:
%r
str(list_of_vectors_of_repeated_strings)

There are several additional apply functions that do things that these three functions are not able to do. 

See the link at the top of the page for examples of these functions.

__Exercise__: Use `mapply` to return a list where:
- the first element is the first column of the first dataframe of `list_of_dataframe`
- the second element is the second column of the second dataframe of `list_of_dataframe`
- the third element is the third column of the third dataframe of `list_of_dataframe`

__The End__