# Worksheet B-3: Nesting, List Columns, and `purrr`


From this topic, students are anticipated to be able to:

- Use the `map` family of functions from the purrr package to iteratively apply a function.
- Create and operate on list columns in a tibble using `nest()`, `unnest()`, and the `map` family of functions.
- Define functions on-the-fly within a `map` function using shortcuts.
- Apply list columns to cases in data analysis: columns of models, columns of nested lists (JSON-style data), and operating on entire groups within a tibble.

Load the worksheet requirements:

In [55]:
suppressPackageStartupMessages(library(testthat))
suppressPackageStartupMessages(library(digest))
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(palmerpenguins))
suppressPackageStartupMessages(library(glue))
suppressPackageStartupMessages(library(gapminder))
suppressPackageStartupMessages(library(broom))
suppressPackageStartupMessages(library(distplyr)) # install with devtools::install_github('vincenzocoia/distplyr')
suppressPackageStartupMessages(library(repurrrsive))

The following code chunk has been unlocked, to give you the flexibility to start this document with some of your own code. Remember, it's bad manners to keep a call to `install.packages()` in your source code, so don't forget to delete these lines if you ever need to run them. 

Most likely you will need to install `devtools`, to order to install `distplyr`.

# Part 1: Exploring `purrr` Fundamentals

The `purrr` package is also part of the `tidyverse`.

Apply a function to each element in a list/vector with `map`.

General usage: `purrr::map(VECTOR_OR_LIST, YOUR_FUNCTION)`

Note:

- `map` always returns a list.
- `YOUR_FUNCTION` can return anything!

There are many variations of `map_*`, which you can find in this [cheatsheet](https://github.com/rstudio/cheatsheets/blob/master/purrr.pdf).

For the next few tasks, you will be converting for-loop(s) to vectorized expressions that reproduce the output (numbers should be the same, the format can be different).

## QUESTION 1

Without using vectorization, take the square root of the following vector:

In [4]:
x <- 1:10

The result should be a list of the calculations. Store your answer in `answer1`.

```r
answer1 <- map(FILL_THIS_IN, FILL_THIS_IN)
```

In [5]:
answer1 <- map(x, sqrt)
answer1

In [6]:
test_that('Question 1', {
    expect_known_hash(mode(answer1), '086ebc4c59c08c43e75bae74f1e16897')
    expect_known_hash(round(unlist(answer1), 4), 'ad16817e39d61cdf2ce38234f61306de')
})

[32mTest passed[39m 


## QUESTION 2 

In Question 1, we used the generic `map` function, and got a list. Let's use a more specific `map_*` function this time.

Again without using vectorization, square each component of `x`. The result should be a numeric vector. Store your answer in `answer2`:

```r
answer2 <- map_dbl(FILL_THIS_IN, FILL_THIS_IN)
```

_Hint:_ The last `FILL_THIS_IN` corresponds to an anonymous function!

In [7]:
answer2 <- map_dbl(x, function(x) x^2)
print(answer2)

 [1]   1   4   9  16  25  36  49  64  81 100


In [8]:
test_that('Question 2', {
    expect_known_hash(mode(answer2), '46606ee201b428a3fa6c8a0d3d9e671c')
    expect_known_hash(round(unlist(answer2), 4), '84a2193460cb35ff884e4c3144abf122')
})

[32mTest passed[39m 


Now we've used both `map` and a more specific `map_dbl`. Now you can see how they differ, and how the use of one is better justified than the other for our purpose. Now it's your turn to choose! Remember to use the [cheatsheet](https://github.com/rstudio/cheatsheets/blob/master/purrr.pdf) if you need it!

## QUESTION 3

Below is sample code that computes the mean of every column in the `mtcars` dataset. Use the appropriate `purrr` function to vectorize this task.

In [9]:
mtcars_means <- numeric()
for (c in seq_along(mtcars)){
  mtcars_means[[c]] <- mean(mtcars[[c]])
}
mtcars_means

Store your answer in `answer3`; as above, your answer should be a vector. _Hint_: note that a tibble / data frame is just a list, where each entry is a column (a vector).

```r
answer3 <- FILL_THIS_IN(datasets::mtcars, FILL_THIS_IN)
```

In [10]:
answer3 <- map_dbl(datasets::mtcars, mean)
print(answer3)

       mpg        cyl       disp         hp       drat         wt       qsec 
 20.090625   6.187500 230.721875 146.687500   3.596563   3.217250  17.848750 
        vs         am       gear       carb 
  0.437500   0.406250   3.687500   2.812500 


In [11]:
test_that('Question 3', {
    expect_known_hash(floor(unname(answer3)), '9a69e180a47954630685d24f403fe3af')
})

[32mTest passed[39m 


## QUESTION 4

Below is sample code that divides the values in each column of the `mtcars` dataset by the maximum in that column. Underneath it is a vectorized method using `purrr`, returning a list, but we want a data frame instead.

In [13]:
for (i in seq_along(mtcars)){
  mtcars[[c]] <- mtcars[[i]] / max(mtcars[[i]], na.rm = TRUE)
}
head(mtcars)

Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Mazda RX4,21.0,6,160,110,3.9,2.62,16.46,0,1,4,0.8
Mazda RX4 Wag,21.0,6,160,110,3.9,2.875,17.02,0,1,4,0.8
Datsun 710,22.8,4,108,93,3.85,2.32,18.61,1,1,4,0.8
Hornet 4 Drive,21.4,6,258,110,3.08,3.215,19.44,1,0,3,0.6
Hornet Sportabout,18.7,8,360,175,3.15,3.44,17.02,0,0,3,0.6
Valiant,18.1,6,225,105,2.76,3.46,20.22,1,0,3,0.6


In [14]:
map(mtcars, ~ .x / max(.x)) %>% 
    print(max = 5)

$mpg
[1] 0.6194690 0.6194690 0.6725664 0.6312684 0.5516224
 [ reached getOption("max.print") -- omitted 27 entries ]

$cyl
[1] 0.75 0.75 0.50 0.75 1.00
 [ reached getOption("max.print") -- omitted 27 entries ]

$disp
[1] 0.3389831 0.3389831 0.2288136 0.5466102 0.7627119
 [ reached getOption("max.print") -- omitted 27 entries ]

$hp
[1] 0.3283582 0.3283582 0.2776119 0.3283582 0.5223881
 [ reached getOption("max.print") -- omitted 27 entries ]

$drat
[1] 0.7910751 0.7910751 0.7809331 0.6247465 0.6389452
 [ reached getOption("max.print") -- omitted 27 entries ]

 [ reached getOption("max.print") -- omitted 6 entries ]


Find a way to do this using a _`purrr`-style_ function, using only `dplyr` functions. Store your answer in `answer6`:

```r
answer6 <- datasets::mtcars %>%
  mutate(FILL_THIS_IN(FILL_THIS_IN, FILL_THIS_IN))
 ```
 
 _Hint_: The last `FILL_THIS_IN` corresponds to an anonymous function.

In [15]:
answer4 <- datasets::mtcars %>%
  mutate(across(everything(), function(x) x / max(x)))
head(answer4)

Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Mazda RX4,0.619469,0.75,0.3389831,0.3283582,0.7910751,0.4830383,0.7187773,0,1,0.8,0.5
Mazda RX4 Wag,0.619469,0.75,0.3389831,0.3283582,0.7910751,0.5300516,0.7432314,0,1,0.8,0.5
Datsun 710,0.6725664,0.5,0.2288136,0.2776119,0.7809331,0.4277286,0.8126638,1,1,0.8,0.125
Hornet 4 Drive,0.6312684,0.75,0.5466102,0.3283582,0.6247465,0.592736,0.8489083,1,0,0.6,0.125
Hornet Sportabout,0.5516224,1.0,0.7627119,0.5223881,0.6389452,0.6342183,0.7432314,0,0,0.6,0.25
Valiant,0.5339233,0.75,0.4766949,0.3134328,0.5598377,0.6379056,0.8829694,1,0,0.6,0.125


In [16]:
test_that('Question 4', {
    expect_known_hash(class(answer4), '555434c8748e07b094500256087cdcc5')
    expect_known_hash(dimnames(answer4), '3a51b37e4731153f63a1f5f9dc188269')
    expect_known_hash(round(answer4$mpg, 3), 'af82f570a0aa02d8abcbbd14386e98b0')
})

[32mTest passed[39m 


## QUESTION 5

Below is sample code that computes the number of unique values in each column of `mtcars` as a named vector, using for-loops. Use the appropriate `purrr` function to vectorize this task.

In [17]:
mtcars_unique <- numeric()
for (c in seq_along(datasets::mtcars)){
  mtcars_unique[[c]] <- length(unique(datasets::mtcars[[c]]))
}
names(mtcars_unique) <- names(datasets::mtcars)
mtcars_unique

Store your answer in `answer5`:

```r
answer5 <- datasets::mtcars %>% 
    FILL_THIS_IN(FILL_THIS_IN) %>% 
    FILL_THIS_IN(FILL_THIS_IN)
```

In [18]:
answer5 <- datasets::mtcars %>% 
    map(unique) %>% 
    map_dbl(length)
print(answer5)

 mpg  cyl disp   hp drat   wt qsec   vs   am gear carb 
  25    3   27   22   22   29   30    2    2    3    6 


In [19]:
test_that('Question 5', {
    expect_known_hash(mode(answer5), '46606ee201b428a3fa6c8a0d3d9e671c')
    expect_known_hash(as.integer(answer5), '1981b33e1151073e1c227fe95218c6f5')
})

[32mTest passed[39m 


## QUESTION 6

Let's use `purrr` to make probability distributions. The Generalized Pareto Distribution is a three-parameter distribution, so if we wanted to make a bunch of these distributions, we'd need a `purrr` function to plug in the three parameters. To make the GPD distributions, we can use a function called `dst_gpd()` from the distplyr R package. Here are the parameters of our 5 GPD distributions:

In [20]:
(parameters <- tibble(location = c(105, 99, 120, 119, 111),
                      scale = c(12.2, 13.5, 18.5, 9.2, 15.5),
                      shape = c(0.4, 0.9, 0.5, 0.6, 0.4)))

location,scale,shape
<dbl>,<dbl>,<dbl>
105,12.2,0.4
99,13.5,0.9
120,18.5,0.5
119,9.2,0.6
111,15.5,0.4


Store your answer in `answer6`. It should be a list of distributions.

```r
answer6 <- FILL_THIS_IN(FILL_THIS_IN, FILL_THIS_IN)
```

In [21]:
answer6 <- pmap(parameters, dst_gpd)
print(answer6)


[[1]]
gpd parametric dst

 name :
[1] "gpd"

[[2]]
gpd parametric dst

 name :
[1] "gpd"

[[3]]
gpd parametric dst

 name :
[1] "gpd"

[[4]]
gpd parametric dst

 name :
[1] "gpd"

[[5]]
gpd parametric dst

 name :
[1] "gpd"



In [22]:
test_that('Question 6', {
    walk(answer6, ~ expect_equal(class(.x), c("gpd", "parametric", "dst")))
    tibble(p = map(answer6, distionary::parameters)) %>% 
        unnest_wider(p) %>% 
        expect_equal(parameters)
})

[32mTest passed[39m 


## Question 7

Introducing the Big Bang `!!!` and `rlang::exec()`.

Let's make a mixture distribution of the above 5 GPD's using the function `distplyr::mix()`. The straightforward way to do this would be to do:

In [23]:
distplyr::mix(answer6[[1]], answer6[[2]], answer6[[3]], answer6[[4]], answer6[[5]])

Mixture Distribution

Components: 
[90m# A tibble: 5 x 2[39m
  distributions probs
  [3m[90m<named list>[39m[23m  [3m[90m<dbl>[39m[23m
[90m1[39m [90m<gpd>[39m           0.2
[90m2[39m [90m<gpd>[39m           0.2
[90m3[39m [90m<gpd>[39m           0.2
[90m4[39m [90m<gpd>[39m           0.2
[90m5[39m [90m<gpd>[39m           0.2

But this code is error-prone and not robust against a differing number of distributions. Inputting the list itself via `distplyr::mix(answer10)` throws an error, because `distplyr::mix()` is not expecting a list input. What's the alternative?

Your task: use the big bang operator (`!!!`) in front of the list argument, to get the desired result. This effectively takes the arguments of a list, and puts them as arguments to a function.

```
answer12 <- distplyr::mix(FILL_THIS_IN)
```

__FYI__: Conveniently, `distplyr::mix()` recognizes the big bang operator. If you have a function that doesn't recognize it (like `sum()`), use `rlang::exec(function, !!!list_of_arguments)` instead.

In [24]:
answer7 <- distplyr::mix(!!!answer6)
print(answer7)

Mixture Distribution

Components: 
[90m# A tibble: 5 x 2[39m
  distributions probs
  [3m[90m<named list>[39m[23m  [3m[90m<dbl>[39m[23m
[90m1[39m [90m<gpd>[39m           0.2
[90m2[39m [90m<gpd>[39m           0.2
[90m3[39m [90m<gpd>[39m           0.2
[90m4[39m [90m<gpd>[39m           0.2
[90m5[39m [90m<gpd>[39m           0.2


In [25]:
test_that('Question 7', {
    expect_equal(class(answer7), c("mix", "dst"))
    map_chr(answer7$components$distributions, ~ class(.x)[1]) %>%
        unique() %>% 
        expect_equal("gpd")
})

[32mTest passed[39m 


# Part 2: Nesting and List Columns

_One_ of the ways a list-column can be made is by using `nest()`.

## QUESTION 8

Create a tibble that bundles everything in `gapminder` except for `country` and `continent` into a list-column. Name your list column `other` (without using `rename()` or `mutate()`). Store your answer in `answer8`.

```r
answer8 <- gapminder %>%
   nest(FILL_THIS_IN = FILL_THIS_IN)
```

In [26]:
answer8 <- gapminder %>%
   nest(other = c(-country, -continent))
head(answer8, n = 3)

country,continent,other
<fct>,<fct>,<list>
Afghanistan,Asia,"1.952000e+03, 1.957000e+03, 1.962000e+03, 1.967000e+03, 1.972000e+03, 1.977000e+03, 1.982000e+03, 1.987000e+03, 1.992000e+03, 1.997000e+03, 2.002000e+03, 2.007000e+03, 2.880100e+01, 3.033200e+01, 3.199700e+01, 3.402000e+01, 3.608800e+01, 3.843800e+01, 3.985400e+01, 4.082200e+01, 4.167400e+01, 4.176300e+01, 4.212900e+01, 4.382800e+01, 8.425333e+06, 9.240934e+06, 1.026708e+07, 1.153797e+07, 1.307946e+07, 1.488037e+07, 1.288182e+07, 1.386796e+07, 1.631792e+07, 2.222742e+07, 2.526841e+07, 3.188992e+07, 7.794453e+02, 8.208530e+02, 8.531007e+02, 8.361971e+02, 7.399811e+02, 7.861134e+02, 9.780114e+02, 8.523959e+02, 6.493414e+02, 6.353414e+02, 7.267341e+02, 9.745803e+02"
Albania,Europe,"1952.000, 1957.000, 1962.000, 1967.000, 1972.000, 1977.000, 1982.000, 1987.000, 1992.000, 1997.000, 2002.000, 2007.000, 55.230, 59.280, 64.820, 66.220, 67.690, 68.930, 70.420, 72.000, 71.581, 72.950, 75.651, 76.423, 1282697.000, 1476505.000, 1728137.000, 1984060.000, 2263554.000, 2509048.000, 2780097.000, 3075321.000, 3326498.000, 3428038.000, 3508512.000, 3600523.000, 1601.056, 1942.284, 2312.889, 2760.197, 3313.422, 3533.004, 3630.881, 3738.933, 2497.438, 3193.055, 4604.212, 5937.030"
Algeria,Africa,"1952.000, 1957.000, 1962.000, 1967.000, 1972.000, 1977.000, 1982.000, 1987.000, 1992.000, 1997.000, 2002.000, 2007.000, 43.077, 45.685, 48.303, 51.407, 54.518, 58.014, 61.368, 65.799, 67.744, 69.152, 70.994, 72.301, 9279525.000, 10270856.000, 11000948.000, 12760499.000, 14760787.000, 17152804.000, 20033753.000, 23254956.000, 26298373.000, 29072015.000, 31287142.000, 33333216.000, 2449.008, 3013.976, 2550.817, 3246.992, 4182.664, 4910.417, 5745.160, 5681.359, 5023.217, 4797.295, 5288.040, 6223.367"


In [27]:
test_that('Question 8', {
    expect_known_hash(enc2utf8(sapply(answer8$other, colnames)), 'ceba7fd58def34a537b5b13430a7ec2a')
    expect_known_hash(sapply(answer8$other, dim), '388a8eae98b3cb184d3fe8ed8dd46916')
    expect_known_hash(sapply(answer8$other, `[[`, 'year'), '0370844f5c0d097891d284949811883e')
})

[32mTest passed[39m 


## QUESTION 9

_Reproducibly_ sample 5 countries in the `gapminder` tibble at random. Store your answer in `answer9`, and set the seed as 123.

```r
FILL_THIS_IN(123)
answer9 <- gapminder %>%
    nest(FILL_THIS_IN = FILL_THIS_IN) %>% 
    sample_n(5) %>% 
    unnest(cols = FILL_THIS_IN)
```

In [28]:
set.seed(123)
answer9 <- gapminder %>%
    nest(other = !country) %>% 
    sample_n(5) %>% 
    unnest(cols = other)
head(answer9)

country,continent,year,lifeExp,pop,gdpPercap
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>
Botswana,Africa,1952,47.622,442308,851.2411
Botswana,Africa,1957,49.618,474639,918.2325
Botswana,Africa,1962,51.52,512764,983.654
Botswana,Africa,1967,53.298,553541,1214.7093
Botswana,Africa,1972,56.024,619351,2263.6111
Botswana,Africa,1977,59.319,781472,3214.8578


In [29]:
test_that('Question 9', {
    expect_known_hash(sort(enc2utf8(as.character(answer9$country)), method = 'radix'), 'ca060e5983d51a09aeb24c2393462353')
    expect_known_hash(round(answer9$gdpPercap[order(enc2utf8(as.character(answer9$country)), method = 'radix')], 3), '75d94ea77a21140b54a374ed59f2253a')
})

[32mTest passed[39m 


## QUESTION 10

For each `gapminder` continent, fit a linear model of `lifeExp` from `log(gdpPercap)` and put this as a new column. Store your answer into `answer10`:

```r
answer10 <- gapminder %>% 
  select(continent, gdpPercap, lifeExp) %>% 
  nest(data = c(FILL_THIS_IN, FILL_THIS_IN)) %>% 
  mutate(model = FILL_THIS_IN(data, ~ lm(FILL_THIS_IN ~ FILL_THIS_IN, data = FILL_THIS_IN)))
```

In [30]:
answer10 <- gapminder %>% 
  select(continent, gdpPercap, lifeExp) %>% 
  nest(data = c(lifeExp, gdpPercap)) %>% 
  mutate(model = map(data, ~ lm(lifeExp ~ log(gdpPercap), data = .x)))
print(answer10)

[90m# A tibble: 5 x 3[39m
  continent data               model 
  [3m[90m<fct>[39m[23m     [3m[90m<list>[39m[23m             [3m[90m<list>[39m[23m
[90m1[39m Asia      [90m<tibble [396 x 2]>[39m [90m<lm>[39m  
[90m2[39m Europe    [90m<tibble [360 x 2]>[39m [90m<lm>[39m  
[90m3[39m Africa    [90m<tibble [624 x 2]>[39m [90m<lm>[39m  
[90m4[39m Americas  [90m<tibble [300 x 2]>[39m [90m<lm>[39m  
[90m5[39m Oceania   [90m<tibble [24 x 2]>[39m  [90m<lm>[39m  


In [31]:
test_that('Question 10', {
    expect_known_hash(sapply(answer10$model, class), '2fe5bf6c6fb725f272c801e5f7560afe')
    expect_known_hash(round(unlist(lapply(answer10$model, coef)), 3), 'e536d3378586d3c54b920504b3238cde')
})

[32mTest passed[39m 


## QUESTION 11

Using your model from Question 10, make predictions using `augment()` from the `broom` package, and then `unnest`. Store your answer in `answer11`:

```r
answer11 <- answer10 %>% 
  transmute(continent, yhat = map(FILL_THIS_IN, FILL_THIS_IN)) %>% 
  unnest(FILL_THIS_IN)
```

In [32]:
answer11 <- answer10 %>% 
  transmute(continent, yhat = map(model, broom::augment)) %>% 
  unnest(yhat)
print(answer11)

[90m# A tibble: 1,704 x 8[39m
   continent lifeExp `log(gdpPercap)` .fitted    .hat .sigma .cooksd .std.resid
   [3m[90m<fct>[39m[23m       [3m[90m<dbl>[39m[23m            [3m[90m<dbl>[39m[23m   [3m[90m<dbl>[39m[23m   [3m[90m<dbl>[39m[23m  [3m[90m<dbl>[39m[23m   [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m
[90m 1[39m Asia         28.8             6.66    51.2 0.005[4m4[24m[4m8[24m   8.54 0.018[4m8[24m      -[31m2[39m[31m.[39m[31m61[39m 
[90m 2[39m Asia         30.3             6.71    51.6 0.005[4m2[24m[4m7[24m   8.55 0.016[4m2[24m      -[31m2[39m[31m.[39m[31m47[39m 
[90m 3[39m Asia         32.0             6.75    51.8 0.005[4m1[24m[4m1[24m   8.56 0.013[4m7[24m      -[31m2[39m[31m.[39m[31m31[39m 
[90m 4[39m Asia         34.0             6.73    51.7 0.005[4m1[24m[4m9[24m   8.57 0.011[4m0[24m      -[31m2[39m[31m.[39m[31m0[39m[31m6[39m 
[90m 5[39m Asia         36.1             6.61    50.9 0.

In [33]:
test_that('Question 11', {
    expect_known_hash(dimnames(answer11), '2a26d04409e37c9a7c8065dba02d3b3a')
    expect_known_hash(round(with(answer11, .fitted[order(lifeExp)]), 3), 'f8db1a712fe6b69f7577c88882248dc5')
    expect_known_hash(round(with(answer11, .sigma[order(lifeExp)]), 3), '83e927e2f02fd18c293c3a1ddb7a0ed2')
})

[32mTest passed[39m 


## Question 12: `map2`

Using the `palmerpenguins::penguins` tibble: 

1. Make a Normal distribution using `distplyr::dst_norm()` for the body mass of each species, using estimates taken from `mean()` and `var()`. Title the column "distribution".
2. Calculate the 0.975-quantile from each distribution using `distplyr::eval_quantile()`, under a column named "quantile".

Starter code:

```
answer12  <- penguins %>% 
  group_by(species) %>% 
  summarise(mean = mean(body_mass_g, na.rm = TRUE),
            var  = var(body_mass_g, na.rm = TRUE)) %>% 
  FILL_THIS_IN
```

In [34]:
answer12  <- penguins %>% 
  group_by(species) %>% 
  summarise(mean = mean(body_mass_g, na.rm = TRUE),
            var  = var(body_mass_g, na.rm = TRUE)) %>% 
  mutate(distribution = map2(mean, var, ~ dst_norm(.x, .y)), 
        quantile = map_dbl(distribution, ~eval_quantile(.x, at = 0.975)))
print(answer12)

[90m# A tibble: 3 x 5[39m
  species    mean     var distribution quantile
  [3m[90m<fct>[39m[23m     [3m[90m<dbl>[39m[23m   [3m[90m<dbl>[39m[23m [3m[90m<list>[39m[23m          [3m[90m<dbl>[39m[23m
[90m1[39m Adelie    [4m3[24m701. [4m2[24m[4m1[24m[4m0[24m283. [90m<norm>[39m          [4m4[24m599.
[90m2[39m Chinstrap [4m3[24m733. [4m1[24m[4m4[24m[4m7[24m713. [90m<norm>[39m          [4m4[24m486.
[90m3[39m Gentoo    [4m5[24m076. [4m2[24m[4m5[24m[4m4[24m133. [90m<norm>[39m          [4m6[24m064.


In [35]:
test_that('Question 12', {
    answer12 %>% 
        pull(distribution) %>% 
        map(class) %>% 
        map_chr(1) %>% 
        unique() %>% 
        expect_equal("norm")
    answer12 %>% 
        pull(quantile) %>% 
        round(4) %>% 
        expect_known_hash("020c1c78b457b2b18a95b9417ae90e67")
    expect_true("species" %in% names(answer12))
})

[32mTest passed[39m 


## Question 13 `unnest()`

`unnest()` need not always be paired with `nest()`. For the above distributions, evaluate the 0.25, 0.50, and 0.75 quantiles using the `distplyr::enframe_quantile()` function.

Starter code:

```
answer13 <- answer12 %>% 
  mutate(quantile = FILL_THIS_IN) %>% 
  unnest()
```

In [36]:
answer13 <- answer12 %>% 
  mutate(quantile = map(distribution, enframe_quantile, at = c(0.25, 0.5, 0.75))) %>% 
  unnest(quantile)
print(answer13)

[90m# A tibble: 9 x 6[39m
  species    mean     var distribution  .arg quantile
  [3m[90m<fct>[39m[23m     [3m[90m<dbl>[39m[23m   [3m[90m<dbl>[39m[23m [3m[90m<list>[39m[23m       [3m[90m<dbl>[39m[23m    [3m[90m<dbl>[39m[23m
[90m1[39m Adelie    [4m3[24m701. [4m2[24m[4m1[24m[4m0[24m283. [90m<norm>[39m        0.25    [4m3[24m391.
[90m2[39m Adelie    [4m3[24m701. [4m2[24m[4m1[24m[4m0[24m283. [90m<norm>[39m        0.5     [4m3[24m701.
[90m3[39m Adelie    [4m3[24m701. [4m2[24m[4m1[24m[4m0[24m283. [90m<norm>[39m        0.75    [4m4[24m010.
[90m4[39m Chinstrap [4m3[24m733. [4m1[24m[4m4[24m[4m7[24m713. [90m<norm>[39m        0.25    [4m3[24m474.
[90m5[39m Chinstrap [4m3[24m733. [4m1[24m[4m4[24m[4m7[24m713. [90m<norm>[39m        0.5     [4m3[24m733.
[90m6[39m Chinstrap [4m3[24m733. [4m1[24m[4m4[24m[4m7[24m713. [90m<norm>[39m        0.75    [4m3[24m992.
[90m7[39m Gentoo    [4m5[24m076.

In [37]:
test_that('Question 13', {
    answer13 %>% 
        pull(distribution) %>% 
        map(class) %>% 
        map_chr(1) %>% 
        unique() %>% 
        expect_equal("norm")
    answer13 %>% 
        pull(quantile) %>% 
        round(4) %>% 
        expect_known_hash("0808f88ec09d4a111bccb0ccc684cbbe")
    expect_true("species" %in% names(answer13))
})

[32mTest passed[39m 


## Question 14

Output a list of gapminder tibbles, one for each continent. Do not include the `continent` column in the divided tibbles -- the name of each list entry should be the continent name. 

_Hint_: Check out the `enframe()` and `deframe()` functions. 

Starter code:

```
answer14 <- gapminder %>% 
    nest(FILL_THIS_IN) %>% 
    FILL_THIS_IN()
```


In [38]:
answer14 <- gapminder %>% 
    nest(data = !continent) %>% 
    deframe()
print(answer14)

$Asia
[90m# A tibble: 396 x 5[39m
   country      year lifeExp      pop gdpPercap
   [3m[90m<fct>[39m[23m       [3m[90m<int>[39m[23m   [3m[90m<dbl>[39m[23m    [3m[90m<int>[39m[23m     [3m[90m<dbl>[39m[23m
[90m 1[39m Afghanistan  [4m1[24m952    28.8  8[4m4[24m[4m2[24m[4m5[24m333      779.
[90m 2[39m Afghanistan  [4m1[24m957    30.3  9[4m2[24m[4m4[24m[4m0[24m934      821.
[90m 3[39m Afghanistan  [4m1[24m962    32.0 10[4m2[24m[4m6[24m[4m7[24m083      853.
[90m 4[39m Afghanistan  [4m1[24m967    34.0 11[4m5[24m[4m3[24m[4m7[24m966      836.
[90m 5[39m Afghanistan  [4m1[24m972    36.1 13[4m0[24m[4m7[24m[4m9[24m460      740.
[90m 6[39m Afghanistan  [4m1[24m977    38.4 14[4m8[24m[4m8[24m[4m0[24m372      786.
[90m 7[39m Afghanistan  [4m1[24m982    39.9 12[4m8[24m[4m8[24m[4m1[24m816      978.
[90m 8[39m Afghanistan  [4m1[24m987    40.8 13[4m8[24m[4m6[24m[4m7[24m957      852.
[90m 9[39m Afghanis

In [39]:
test_that('Question 14', {
    expect_known_hash(names(answer14), '90da4aa25e5abc752edec3d524ea2677')
    map(answer14, pull, gdpPercap) %>% 
        unlist() %>% 
        unname() %>% 
        round(4) %>% 
        expect_known_hash("a621bfc9dba8da1f02e4dc19fa4083f6")
})

[32mTest passed[39m 


## Question 15 

Sometimes the vector/list we're iterating over has names, and it's useful to use those names. To access these names, use the `imap` family.

For the list of tibbles made in the above question, save each one to file using the appropriate purrr function, using the names as the file names.

Starter code:

```
answer15 <- FILL_THIS_IN(answer14, ~ write_csv(FILL_THIS_IN, glue::glue(FILL_THIS_IN, ".csv")))
```

In [41]:
answer15 <- iwalk(answer14, ~ write_csv(.x, glue::glue(.y, ".csv")))
dir()

In [42]:
test_that('Question 15', {
    expect_true("Africa.csv" %in% dir())
    expect_true("Americas.csv" %in% dir())
    expect_true("Asia.csv" %in% dir())
    expect_true("Europe.csv" %in% dir())
    expect_true("Oceania.csv" %in% dir())
})

[32mTest passed[39m 


# Part 3: Recursive Lists

We won't focus much on recursive lists in this course, but here is a little taste of it.

## Question 16: `unnest_wider()` and `unnest_longer()`

Explore the `repurrrsive::got_chars` nested list. It contains information about Game of Thrones characters.

In [46]:
str(got_chars, list.len = 4)

List of 30
 $ :List of 18
  ..$ url        : chr "https://www.anapioficeandfire.com/api/characters/1022"
  ..$ id         : int 1022
  ..$ name       : chr "Theon Greyjoy"
  ..$ gender     : chr "Male"
  .. [list output truncated]
 $ :List of 18
  ..$ url        : chr "https://www.anapioficeandfire.com/api/characters/1052"
  ..$ id         : int 1052
  ..$ name       : chr "Tyrion Lannister"
  ..$ gender     : chr "Male"
  .. [list output truncated]
 $ :List of 18
  ..$ url        : chr "https://www.anapioficeandfire.com/api/characters/1074"
  ..$ id         : int 1074
  ..$ name       : chr "Victarion Greyjoy"
  ..$ gender     : chr "Male"
  .. [list output truncated]
 $ :List of 18
  ..$ url        : chr "https://www.anapioficeandfire.com/api/characters/1109"
  ..$ id         : int 1109
  ..$ name       : chr "Will"
  ..$ gender     : chr "Male"
  .. [list output truncated]
  [list output truncated]


Put the list in a tibble:

In [47]:
got_chars_tbl <- tibble(character = repurrrsive::got_chars)
print(got_chars_tbl, n = 5)

[90m# A tibble: 30 x 1[39m
  character        
  [3m[90m<list>[39m[23m           
[90m1[39m [90m<named list [18]>[39m
[90m2[39m [90m<named list [18]>[39m
[90m3[39m [90m<named list [18]>[39m
[90m4[39m [90m<named list [18]>[39m
[90m5[39m [90m<named list [18]>[39m
[90m# ... with 25 more rows[39m


Would widening the list column work best, or lengthening? Do it.

```
answer16 <- FILL_THIS_IN(got_chars_tbl, character)
```

In [52]:
answer16 <- unnest_wider(got_chars_tbl, character)
print(answer16, n = 5)

[90m# A tibble: 30 x 18[39m
  url         id name   gender culture born   died   alive titles aliases father
  [3m[90m<chr>[39m[23m    [3m[90m<int>[39m[23m [3m[90m<chr>[39m[23m  [3m[90m<chr>[39m[23m  [3m[90m<chr>[39m[23m   [3m[90m<chr>[39m[23m  [3m[90m<chr>[39m[23m  [3m[90m<lgl>[39m[23m [3m[90m<list>[39m[23m [3m[90m<list>[39m[23m  [3m[90m<chr>[39m[23m 
[90m1[39m https:/~  [4m1[24m022 Theon~ Male   [90m"[39mIronb~ [90m"[39mIn 2~ [90m"[39m[90m"[39m     TRUE  [90m<chr [0m~ [90m<chr [[0m~ [90m"[39m[90m"[39m    
[90m2[39m https:/~  [4m1[24m052 Tyrio~ Male   [90m"[39m[90m"[39m      [90m"[39mIn 2~ [90m"[39m[90m"[39m     TRUE  [90m<chr [0m~ [90m<chr [[0m~ [90m"[39m[90m"[39m    
[90m3[39m https:/~  [4m1[24m074 Victa~ Male   [90m"[39mIronb~ [90m"[39mIn 2~ [90m"[39m[90m"[39m     TRUE  [90m<chr [0m~ [90m<chr [[0m~ [90m"[39m[90m"[39m    
[90m4[39m https:/~  [4m1[24m109 Will   Male   [90m

In [53]:
test_that('Question 16', {
    answer16 %>% 
      pull(culture) %>% 
      expect_known_hash("239b2663946d88db14fb52f017d749da")
    answer16 %>% 
      pull(url) %>% 
      expect_known_hash("40d4d84edde6c1573c6eef61b2bd49c2")
    answer16 %>% 
      pull(name) %>% 
      expect_known_hash("9fa482de54b3e866524eff35d7e4dee9")
})

[32mTest passed[39m 


### Attributions

Thanks to Diana Lin for putting this worksheet together, Icíar Fernandez Boyano for reviewing, and David Kepplinger for assistance implementing these questions. Thanks to Firas Moosvi for providing a bunch of the questions on this worksheet. Thanks to Andy Tai for implementing the autograder for many of these questions.