# Functional programming

## Why functional programming?

1. Rely on domain knowledge
2. Use variables - to generalize the process
3. Extract out common code

### for loops are like pages in the recipe book

```
out1 <- vector("double", ncol(mtcars))

for (i in seq_along(mtcars)) {
    out1[[i]] <- mean(mtcars[[i]], na.rm = TRUE)
}

out2 <- vector("double", ncol(mtcars))

for (i in seq_along(mtcars)) {
    out2[[i]] <- median(mtcars[[i]], na.rm = TRUE)
}
```
* Emphasizes the objects, pattern of implementation
* Hides actions

### Functional programming is like the meta-recipe

* It allows you to focus on what is different, rather than what is the same.

```
library(purrr)

means   <- map_dbl(mtcars, mean)
medians <- map_dbl(mtcars, median)
```

* Give equal weight to verbs and nouns
* Abstract away the details of implementation

### Using a for loop to remove duplication

In [1]:
df <- data.frame(a = rnorm(10),
                 b = rnorm(10),
                 c = rnorm(10),
                 d = rnorm(10))

# Initialize output vector
output <- vector("double", ncol(df))  

# Fill in the body of the for loop
for (i in seq_along(df)) {            
  output[i] <- median(df[[i]], na.rm = TRUE)
}

# View the result
output

### Turning the for loop into a function

In [6]:
# Turn this code into col_median()
output <- vector("double", ncol(df))  
# for (i in seq_along(df)) {            
#   output[[i]] <- median(df[[i]])      
# }
col_median <- function(df) {
    output <- numeric(length(df))
    for (i in seq_along(df)) {
        output[[i]] <- median(df[[i]])
    }
    return (output)
}

output <- col_median(df)

output

### What about column means?

In [8]:
# Create col_mean() function to find column means
col_mean <- function(df) {
  output <- numeric(length(df))
  for (i in seq_along(df)) {
    output[[i]] <- mean(df[[i]])
  }
  output
}

### What about column standard deviations?

In [9]:
# Define col_sd() function
col_sd <- function(df) {
  output <- numeric(length(df))
  for (i in seq_along(df)) {
    output[[i]] <- sd(df[[i]])
  }
  output
}

### Uh oh... time to write a function again

In [10]:
# Add a second argument called power
# f <- function(x) {
#     # Edit the body to return absolute deviations raised to power
#     abs(x - mean(x))
# }

f <- function(x, power) {
    # Edit the body to return absolute deviations raised to power
    abs(x - mean(x)) ^ power
}

## Functions can be arguments too

### Removing duplication with arguments

```
f1 <- function(x) abs(x - mean(x)) ^ 1
f2 <- function(x) abs(x - mean(x)) ^ 2
f3 <- function(x) abs(x - mean(x)) ^ 3

f <- function(x, power) abs(x - mean(x)) ^ power

col_median <- function(df) {
    output <- numeric(length(df))
    for (i in seq_along(df)) {
        output[[i]] <- median(df[[i]])
    }
    output
}

# Create col_mean() function to find column means
col_mean <- function(df) {
  output <- numeric(length(df))
  for (i in seq_along(df)) {
    output[[i]] <- mean(df[[i]])
  }
  output
}

# Define col_sd() function
col_sd <- function(df) {
  output <- numeric(length(df))
  for (i in seq_along(df)) {
    output[[i]] <- sd(df[[i]])
  }
  output
}
```

* The previous summary functions, col_median, col_mean, col_sd can be replaced by arguments

```
col_summary <- function(df, fun) {
    output <- numeric(length(df))
    for (i in seq_along(df)) {
        output[i] <- fun(df[i])
    }
    output
}

col_summary(df, fun = median)
col_summary(df, fun = mean)
col_summary(df, fun = sd)
```

### Using a function as an argument

In [13]:
# Define col_summary function
col_summary <- function(df, fun) {
    output <- numeric(length(df))
    for (i in seq_along(df)) {
        output[i] <- fun(df[[i]])
    }
    output
}

# Find the column medians using col_median() and col_summary()
col_median(df)
col_summary(df, median)

# Find the column means using col_mean() and col_summary()
col_mean(df)
col_summary(df, mean)

# Find the column IQRs using col_summary()
col_summary(df, IQR)

## Introducing purrr

### Passing functions as arguments

```
sapply(df, mean)

# Has very similar syntax to 
col_summary(df, mean)

# Using purrr
library(purrr)
map_dbl(df, mean)
```

### Every map function works the same way


> map_dbl(.x, .f, ...)

1. Loop over a vector .x
2. Do something to each elevemnt .f
3. Return the results

### The map functions differ in their return type

There's one function for each type of vector:
    - map() returns a list
    - map_dbl() returns a double vector
    - map_lgl() returns a logical vector
    - map_int() returns a integer vector
    - map_chr() returns a character vector
    
### Different types of vector input

> map(.x, .f, ...)

* .x is always a vector

```
df <- data.frame(a = 1:10, b = 11:20)
map(df, mean)
```

* Data frames, iterate over columns

```
l <- list(a = 1:10, b = 11:20)
map(l, mean)
```

* Lists, iterate over elements

```
vec <- c(a = 1, b = 2)
map(vec, mean)
```

* Vectors, iterate over elements

### Advantages of the map functions in purrr

* Handy shorcuts for specifying .f
* More consistent than sapply(), lapply(), which makes them better for programming(Chapter 5)
* Takes much less time to solve iteration problems

### The map functions

In [15]:
# Load the purrr package
library(purrr)

# Use map_dbl() to find column means
map_dbl(df, mean)

# Use map_dbl() to column medians
map_dbl(df, median)

# Use map_dbl() to find column standard deviations
map_dbl(df, sd)

### The ... argument to the map function

In [21]:
library(nycflights13)
# Find the mean of each column
map_dbl(df, mean)

# Find the mean of each column, excluding missing values
map_dbl(df, mean, na.rm = TRUE)

# Find the 5th percentile of each column, excluding missing values
map_dbl(df, quantile, probs = 0.05, na.rm = TRUE)

### Picking the right map function

In [22]:
df3 <- data.frame(A = c(0.82, 0.21, 1.04, -0.85, 0.86,
                       -1.82, 0.77, 1.24, 1.73, 0.91),
                 B = c("A", "B"),
                 C = 1:10,
                 D = c(0.88, -0.65, -0.10, -0.31, -0.31,
                       -0.76, 1.23, -0.63, 0.2, -0.13))

# Find the columns that are numeric
map_lgl(df3, is.numeric)

# Find the type of each column
map_chr(df3, typeof)

# Find a summary of each column
map(df3, summary)

$A
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -1.820   0.350   0.840   0.491   1.008   1.730 

$B
A B 
5 5 

$C
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00    3.25    5.50    5.50    7.75   10.00 

$D
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -0.760  -0.550  -0.220  -0.058   0.125   1.230 


## Shortcuts

### Shortcuts for specifying .f

### Specifying .f

> map(df, summary)

An existing function

> map(df, rescale01)

An existing function you defined

> map(df, function(x) sum(is.na(x)))

An anonymous function defined on the fly

> map(df, ~ sum(is.na(.)))

An anonymous function defined using a formula shortcut

### Shortcuts when .f is [[

```
list_of_results <- list(list(a = 1, b = "A"),
                        list(a = 2, b = "C"),
                        list(a = 3, b = "D"))

# Extract elements under a
map_dbl(list_of_results, function(x) x[["a"]]) # An anonymous function

map_dbl(list_of_results, "a") # Shortcut: string subsetting

map_dbl(list_of_results, 1)   # Shortcut: integer subsetting
```

### A list of data frames

```
# Split the data frame mtcars based on the unique values in the cyl column
cyl <- split(mtcars, mtcars["cyl"])

str(cyl)

cyl[[1]]
```

In [24]:
# Split the data frame mtcars based on the unique values in the cyl column
cyl <- split(mtcars, mtcars["cyl"])

str(cyl)

cyl[[1]]
cyl

List of 3
 $ 4:'data.frame':	11 obs. of  11 variables:
  ..$ mpg : num [1:11] 22.8 24.4 22.8 32.4 30.4 33.9 21.5 27.3 26 30.4 ...
  ..$ cyl : num [1:11] 4 4 4 4 4 4 4 4 4 4 ...
  ..$ disp: num [1:11] 108 146.7 140.8 78.7 75.7 ...
  ..$ hp  : num [1:11] 93 62 95 66 52 65 97 66 91 113 ...
  ..$ drat: num [1:11] 3.85 3.69 3.92 4.08 4.93 4.22 3.7 4.08 4.43 3.77 ...
  ..$ wt  : num [1:11] 2.32 3.19 3.15 2.2 1.61 ...
  ..$ qsec: num [1:11] 18.6 20 22.9 19.5 18.5 ...
  ..$ vs  : num [1:11] 1 1 1 1 1 1 1 1 0 1 ...
  ..$ am  : num [1:11] 1 0 0 1 1 1 0 1 1 1 ...
  ..$ gear: num [1:11] 4 4 4 4 4 4 3 4 5 5 ...
  ..$ carb: num [1:11] 1 2 2 1 2 1 1 1 2 2 ...
 $ 6:'data.frame':	7 obs. of  11 variables:
  ..$ mpg : num [1:7] 21 21 21.4 18.1 19.2 17.8 19.7
  ..$ cyl : num [1:7] 6 6 6 6 6 6 6
  ..$ disp: num [1:7] 160 160 258 225 168 ...
  ..$ hp  : num [1:7] 110 110 110 105 123 123 175
  ..$ drat: num [1:7] 3.9 3.9 3.08 2.76 3.92 3.92 3.62
  ..$ wt  : num [1:7] 2.62 2.88 3.21 3.46 3.44 ...
  ..$ qsec: 

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
Fiat 128,32.4,4,78.7,66,4.08,2.2,19.47,1,1,4,1
Honda Civic,30.4,4,75.7,52,4.93,1.615,18.52,1,1,4,2
Toyota Corolla,33.9,4,71.1,65,4.22,1.835,19.9,1,1,4,1
Toyota Corona,21.5,4,120.1,97,3.7,2.465,20.01,1,0,3,1
Fiat X1-9,27.3,4,79.0,66,4.08,1.935,18.9,1,1,4,1
Porsche 914-2,26.0,4,120.3,91,4.43,2.14,16.7,0,1,5,2
Lotus Europa,30.4,4,95.1,113,3.77,1.513,16.9,1,1,5,2


Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
Fiat 128,32.4,4,78.7,66,4.08,2.2,19.47,1,1,4,1
Honda Civic,30.4,4,75.7,52,4.93,1.615,18.52,1,1,4,2
Toyota Corolla,33.9,4,71.1,65,4.22,1.835,19.9,1,1,4,1
Toyota Corona,21.5,4,120.1,97,3.7,2.465,20.01,1,0,3,1
Fiat X1-9,27.3,4,79.0,66,4.08,1.935,18.9,1,1,4,1
Porsche 914-2,26.0,4,120.3,91,4.43,2.14,16.7,0,1,5,2
Lotus Europa,30.4,4,95.1,113,3.77,1.513,16.9,1,1,5,2

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
Valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1
Merc 280,19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4
Merc 280C,17.8,6,167.6,123,3.92,3.44,18.9,1,0,4,4
Ferrari Dino,19.7,6,145.0,175,3.62,2.77,15.5,0,1,5,6

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2
Duster 360,14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4
Merc 450SE,16.4,8,275.8,180,3.07,4.07,17.4,0,0,3,3
Merc 450SL,17.3,8,275.8,180,3.07,3.73,17.6,0,0,3,3
Merc 450SLC,15.2,8,275.8,180,3.07,3.78,18.0,0,0,3,3
Cadillac Fleetwood,10.4,8,472.0,205,2.93,5.25,17.98,0,0,3,4
Lincoln Continental,10.4,8,460.0,215,3.0,5.424,17.82,0,0,3,4
Chrysler Imperial,14.7,8,440.0,230,3.23,5.345,17.42,0,0,3,4
Dodge Challenger,15.5,8,318.0,150,2.76,3.52,16.87,0,0,3,2
AMC Javelin,15.2,8,304.0,150,3.15,3.435,17.3,0,0,3,2


### Goal

* Fit regression to each of the data frame in cyl
* Quantify relationship between mpg and wt

### Solve a simple problem first

In [26]:
# Examine the structure of cyl
str(cyl)

# Extract the first element into four_cyls
four_cyls <- cyl[[1]]

# Fit a linear regression of mpg on wt using four_cyls
lm(data = four_cyls, formula = mpg ~ wt)

List of 3
 $ 4:'data.frame':	11 obs. of  11 variables:
  ..$ mpg : num [1:11] 22.8 24.4 22.8 32.4 30.4 33.9 21.5 27.3 26 30.4 ...
  ..$ cyl : num [1:11] 4 4 4 4 4 4 4 4 4 4 ...
  ..$ disp: num [1:11] 108 146.7 140.8 78.7 75.7 ...
  ..$ hp  : num [1:11] 93 62 95 66 52 65 97 66 91 113 ...
  ..$ drat: num [1:11] 3.85 3.69 3.92 4.08 4.93 4.22 3.7 4.08 4.43 3.77 ...
  ..$ wt  : num [1:11] 2.32 3.19 3.15 2.2 1.61 ...
  ..$ qsec: num [1:11] 18.6 20 22.9 19.5 18.5 ...
  ..$ vs  : num [1:11] 1 1 1 1 1 1 1 1 0 1 ...
  ..$ am  : num [1:11] 1 0 0 1 1 1 0 1 1 1 ...
  ..$ gear: num [1:11] 4 4 4 4 4 4 3 4 5 5 ...
  ..$ carb: num [1:11] 1 2 2 1 2 1 1 1 2 2 ...
 $ 6:'data.frame':	7 obs. of  11 variables:
  ..$ mpg : num [1:7] 21 21 21.4 18.1 19.2 17.8 19.7
  ..$ cyl : num [1:7] 6 6 6 6 6 6 6
  ..$ disp: num [1:7] 160 160 258 225 168 ...
  ..$ hp  : num [1:7] 110 110 110 105 123 123 175
  ..$ drat: num [1:7] 3.9 3.9 3.08 2.76 3.92 3.92 3.62
  ..$ wt  : num [1:7] 2.62 2.88 3.21 3.46 3.44 ...
  ..$ qsec: 


Call:
lm(formula = mpg ~ wt, data = four_cyls)

Coefficients:
(Intercept)           wt  
     39.571       -5.647  


### Using an anonymous function

In [27]:
# Rewrite to call an anonymous function
# map(cyl, reg_fit)
map(cyl, function(df) lm(mpg ~ wt, data = df))

$`4`

Call:
lm(formula = mpg ~ wt, data = df)

Coefficients:
(Intercept)           wt  
     39.571       -5.647  


$`6`

Call:
lm(formula = mpg ~ wt, data = df)

Coefficients:
(Intercept)           wt  
      28.41        -2.78  


$`8`

Call:
lm(formula = mpg ~ wt, data = df)

Coefficients:
(Intercept)           wt  
     23.868       -2.192  



### Using a formula

In [28]:
# Rewrite to use the formula shortcut instead
# map(cyl, function(df) lm(mpg ~ wt, data = df))
map(cyl, ~ lm(mpg ~ wt, .))

$`4`

Call:
lm(formula = mpg ~ wt, data = .)

Coefficients:
(Intercept)           wt  
     39.571       -5.647  


$`6`

Call:
lm(formula = mpg ~ wt, data = .)

Coefficients:
(Intercept)           wt  
      28.41        -2.78  


$`8`

Call:
lm(formula = mpg ~ wt, data = .)

Coefficients:
(Intercept)           wt  
     23.868       -2.192  



### Using a string

In [29]:
# Save the result from the previous exercise to the variable models
models <- map(cyl, ~ lm(mpg ~ wt, data = .))

# Use map and coef to get the coefficients for each model: coefs
coefs <- map(models, coef)

# Use string shortcut to extract the wt coefficient 
map(coefs, "wt")

### Using a numeric vector

In [30]:
coefs <- map(models, coef)

# use map_dbl with the numeric shortcut to pull out the second element
map_dbl(coefs, 2)

### Putting it together with pipes

In [44]:
# Define models (don't change)
models <- mtcars %>% 
  split(mtcars$cyl) %>%
  map(~ lm(mpg ~ wt, data = .))

# Rewrite to be a single command using pipes 
# summaries <- map(models, summary)
# map_dbl(summaries, "r.squared")
summaries <- models %>%
                 map(summary) %>%
                 map_dbl("r.squared")