21-iteration.Rmd


```{r setup21, include = FALSE, message = FALSE, warning = FALSE}
knitr::opts_chunk$set(echo = TRUE, cache = TRUE)

library(tidyverse)
library(ggplot2)
library(dplyr)
library(tidyr)
library(nycflights13)
library(babynames)
library(nasaweather)
library(lubridate)
library(purrr)
library(readr)
library(stringr)
```

# Ch. 21: Iteration

```{block2, type='rmdimportant'}
**Key questions:**  
  
* 21.2.1. #1, 2
* 21.3.5. #1, 3
* 21.4.1. #2
* 21.5.3. #1
* 21.9.4. #2
```

```{block2, type='rmdtip'}
**Functions and notes:**
```

* Common `for` loop template:
```{r, eval = FALSE}
output <- vector("double", ncol(df)) # common for loop style  
for (i in seq_len(length(df))){
  output[[i]] <- fun(df[[i]])
  }  
```

* Common `while` loop template:
```{r, eval = FALSE}
i <- 1
while (i <= length(x)){
  # body
  i <- i + 1
}  
```

* `seq_along(df)` does essentially same as `seq_len(length(df))`
* `unlist` flatten list of vectors into single vector
    + `flaten_dbl` is stricter alternative
* `dplyr::bind_rows` save output in a list of dfs and then append all at end rather than sequential `rbind`ing
* `sample(c("T", "H"), 1)`
* `sapply` is wrapper around `lapply` that automatically simplifies output -- problematic in that never know what ouptut will be
* `vapply` is safe alternative to `sapply` e.g. for logical `vapply(df, is.numeric, logical(1))`, but `map_lgl(df, is.numeric)` is more simple
* `map()`     makes a list.
  * `map_lgl()` makes a logical vector.
  * `map_int()` makes an integer vector.
  * `map_dbl()` makes a double vector.
  * `map_chr()` makes a character vector.
* shortcuts for applying functions in `map`:
```{r, eval = FALSE}
models <- mtcars %>% 
  split(.$cyl) %>% 
  map(function(df) lm(mpg ~ wt, data = df))

models <- mtcars %>% 
  split(.$cyl) %>% 
  map(~lm(mpg ~ wt, data = .))
```

* extracting by named elements from `map`:
```{r, eval = FALSE}
models %>% 
  map(summary) %>% 
  map_dbl("r.squared")
```

* extracting by positions from `map`
```{r, eval = FALSE}
x <- list(list(1, 2, 3), list(4, 5, 6), list(7, 8, 9))
x %>% 
  map_dbl(2)
```

* `map2` let's you iterate through two components at once
* `pmap` allows you to iterate over p components -- works well to hold inputs in a dataframe
* `safely` takes funciton returns two parts, result and error object
  + similar to `try` but more consistent
* `possibly` similar to safely, but provide it a default value to return for errors
* `quietly` is similar to safely but captures all printed output messages and warnings
* `purrr::transpose` allows you to do things like get all 2nd elements in list, e.g. show later
* `invoke_map` let's you iterate over both the functions and the parameters, have an `f` and a `param` input, e.g. 
```{r, eval = FALSE}
f <- c("runif", "rnorm", "rpois")
param <- list(
  list(min = -1, max = 1), 
  list(sd = 5), 
  list(lambda = 10)
)

invoke_map(f, param, n = 5) %>% str()
```

* `walk` is alternative to `map` that you call for side effects. Also have `walk2` and `pwalk` that are generally more useful 
  + all invisibly return `.x (the first argument) so can used in the middle of pipelines
* `keep` and `discard` keep or discard elements in the input based off if `TRUE` to predicate
* `some` and `every` determine if the predicte is true for any or for all of our elements
* `detect` finds the first element where the predicate is true, `detect_index` returns its position
* `head_while` and `tail_while` take elements from the start or end of a vector while a predicate is true
* `reduce` is good for applying two table rule repeatedly, e.g. joins
  * `accumulate` is similar but keeps all the interim results

## 21.2: For loops

### 21.2.1

1.  Write for loops to (think about the output, sequence, and body __before__ you start writing the loop):
    
    1. Compute the mean of every column in `mtcars`.
    
    ```{r}
    output <- vector("double", length(mtcars))
    for (i in seq_along(mtcars)){
      output[[i]] <- mean(mtcars[[i]])
    }
    output
    
    ```
    
    1. Determine the type of each column in `nycflights13::flights`.
    
    ```{r}
    output <- vector("character", length(flights))
    for (i in seq_along(flights)){
      output[[i]] <- typeof(flights[[i]])
    }
    output
    ```
    
    1. Compute the number of unique values in each column of `iris`.
    
    ```{r}
    output <- vector("integer", length(iris))
    for (i in seq_along(iris)){
      output[[i]] <- unique(iris[[i]]) %>% length()
    }
    output
    ```
    
    1. Generate 10 random normals for each of $\mu = -10$, $0$, $10$, and $100$.
    ```{r}
    output <- vector("list", 4)
    input_means <- c(-10, 0, 10, 100)
    for (i in seq_along(output)){
      output[[i]] <- rnorm(10, mean = input_means[[i]])
    }
    output
    
    ```
    

1.  Eliminate the for loop in each of the following examples by taking advantage of an existing function that works with vectors:
    
    *example:*
    ```{r, eval = FALSE}
    out <- ""
    for (x in letters) {
      out <- stringr::str_c(out, x)
    }
    out
    ```
    
    * collabse letters into length-one character vector with all characters concatenated
    ```{r}
    str_c(letters, collapse = "")
    ```
    
    *example:*
    ```{r}
    x <- sample(100)
    sd <- 0
    for (i in seq_along(x)) {
      sd <- sd + (x[i] - mean(x)) ^ 2
    }
    sd <- sqrt(sd / (length(x) - 1))
    sd
    ```
    
    * calculate standard deviaiton of x
    ```{r}
    sd(x)
    ```
    
    *example:*
    ```{r}
    x <- runif(100)
    out <- vector("numeric", length(x))
    out[1] <- x[1]
    for (i in 2:length(x)) {
      out[i] <- out[i - 1] + x[i]
    }
    out
    ```
    
    * calculate cumulative sum
    ```{r}
    cumsum(x)
    ```
    

1.  Combine your function writing and for loop skills:
    
    1. Write a for loop that `prints()` the lyrics to the children's song "Alice the camel".
    
    ```{r}
    num_humps <- c("five", "four", "three", "two", "one", "no")
    
    for (i in seq_along(num_humps)){
      
      paste0("Alice the camel has ", num_humps[[i]], " humps.") %>% 
        rep(3) %>% 
        writeLines()
      
      writeLines("So go, Alice, go.\n")
    }
    ```
    
    2. Convert the nursery rhyme "ten in the bed" to a function. Generalise it to any number of people in any sleeping structure.
    
    ```{r}
    nursery_bed <- function(num, y) {
      output <- vector("character", num)
      for (i in seq_along(output)) {
        output[[i]] <- str_replace_all(
        'There were x in the _y\n And the little one said, \n"Roll over! Roll over!"\n So they all rolled over and\n one fell out.', c("x" = (length(output) - i + 1), "_y" = y))
      } 
      str_c(output, collapse = "\n\n") %>% 
        writeLines()
    }
    
    nursery_bed(3, "asteroid")
    ```
    
    3. Convert the song "99 bottles of beer on the wall" to a function. Generalise to any number of any vessel containing any liquid on any surface.  
       
    * This is a little bit of a lazy version...
       
    ```{r}
    beer_rhyme <- function(x, y, z){
      output <- vector("character", x)
      for (i in seq_along(output)){
        output[i] <-
          str_replace_all("x bottles of y on the z.\n One fell off...", c(
          "x" = (x - i + 1),
          "y" = y,
          "z" = z
          ))
      }
      output <- (str_c(output, collapse = "\n") %>% 
                   str_c("\nNo more bottles...", collapse = ""))
      writeLines(output)
    }
    
    beer_rhyme(4, "soda", "toilet")
    ```
    
1.  It's common to see for loops that don't preallocate the output and instead increase the length of a vector at each step. How does this affect performance? Design and execute an experiment.
    
    ```{r}
    preallocate <- function(){
    x <- vector("double", 100)
      for (i in seq_along(x)){
        x[i] <- rnorm(1)
      }
    }
    
    growing <- function(){
      x <- c(0)
        for (i in 1:100){
          x[i] <- rnorm(1)
      }
    }
    
    microbenchmark::microbenchmark(
      space = preallocate(),
      no_space = growing(),
      times = 20
    )  
    
    ```

    * see roughly 35% better performance when creating ahead of time
    * note: if you can do these operations with vectorized approach though -- they're often much faster
    
    ```{r}
    microbenchmark::microbenchmark(
      space = preallocate(),
      no_space = growing(),
      vector = rnorm(100),
      times = 20
    )
    ```
    
    * vectorized was > 10x faster

## 21.3 For loop variations

### 21.3.5

1.  Imagine you have a directory full of CSV files that you want to read in. You have their paths in a vector, `files <- dir("data/", pattern = "\\.csv$", full.names = TRUE)`, and now want to read each one with `read_csv()`. Write the for loop that will load them into a single data frame. 
    
    * To start this problem, I first created a file directory, and then wrote in 26 csvs each with the most popular name from each year since 1880 for a particular letter[^WalkExample].
    * Next I read these into a single dataframe with a for loop
        
    [^WalkExample]: 
        Below is the code that accomplished this. I used `walk2` and methods we learn later in the chapter.
        
        ```{r, eval = FALSE}
        dir.create("ch21_csvs_example")
        
        babynames %>% 
          mutate(first_letter = str_sub(name, 1, 1)) %>% 
          group_by(first_letter, year) %>% 
          filter(dplyr::min_rank(-prop) == 1) %>%  
          split(.$first_letter) %>% 
          # map(~select(.x, -first_letter)) %>% 
          walk2(.x = ., .y = names(.), 
                ~write_csv(.x,
                           paste0("ch21_csvs_example/", "letter_", .y, ".csv"))
                )
        ```
        
        ```{r}
        append_csvs <- function(dir){
          #input vector of file paths name and output appended file
          
          out <- vector("list", length(dir))
          for (i in seq_along(out)){
            out[[i]] <- read_csv(dir[[i]], col_types = cols(.default = "c"))
          }
          out <-  bind_rows(out) %>% 
            type_convert()
          out
        }
        
        dir_examp <- dir("ch21_csvs_example", 
            pattern = "csv$",
            full.names = TRUE)
        
        names_appended <- append_csvs(dir_examp)
        names_appended
        ```
        
        * See [Using map] for example of how this could be accomplished using `map()` and `map(safely(read_csv))`.
    
    
2.  *What happens if you use `for (nm in names(x))` and `x` has no names?*
    
    ```{r}
    x <- list(1:10, 11:18, 19:25)
    for (nm in names(x)) {
      print(x[[nm]])
    }
    ```
    
    * each iteration produces an error, so nothing is written
    
    *What if only some of the elements are named?*
    
    ```{r}
    x <- list(a = 1:10, 11:18, c = 19:25)
    for (nm in names(x)) {
      print(x[[nm]])
    }
    ```
    
    * you have output for those with names and NULL for those without
    
    *What if the names are not unique?*
    ```{r}
    x <- list(a = 1:10, a = 11:18, c = 19:25)
    for (nm in names(x)) {
      print(x[[nm]])
    }
    ```
    
    * it prints the first position with the name
    
3.  Write a function that prints the mean of each numeric column in a data frame, along with its name. For example, `show_mean(iris)` would print:
    
    ```{r, eval = FALSE}
    show_mean(iris)
    #> Sepal.Length: 5.84
    #> Sepal.Width:  3.06
    #> Petal.Length: 3.76
    #> Petal.Width:  1.20
    ```
    
    (Extra challenge: what function did I use to make sure that the numbers lined up nicely, even though the variable names had different lengths?)
    
    ```{r}
    show_mean <- function(df){
      # select just cols that are numeric
      out <- vector("logical", length(df))
      for (i in seq_along(df)) {
        out[[i]] <- is.numeric(df[[i]])
      } 
      df_select <- df[out]
      # keep/discard funs would have made this easy
      
      # make list of values w/ mean
      means <- vector("list", length(df_select))
      names(means) <- names(df_select)
      for (i in seq_along(df_select)){
        means[[i]] <- mean(df_select[[i]], na.rm = TRUE) %>%
          round(digits = 2)
      }
      
      # print out, use method to identify max chars for vars printed
      means_names <- names(means)
      chars_max <- (str_count(means_names) + str_count(as.character(means))) %>%
        max()
      
      chars_pad <- chars_max - (str_count(means_names) + str_count(as.character(means)))
      
      names(chars_pad) <- means_names
      
    str_c(means_names, ": ", str_dup(" ", chars_pad), means) %>% 
      writeLines()
    }
    
    show_mean(flights)
    ```

4.  What does this code do? How does it work?

    ```{r, eval = FALSE}
    trans <- list( 
      disp = function(x) x * 0.0163871,
      am = function(x) {
        factor(x, labels = c("auto", "manual"))
      }
    )
    for (var in names(trans)) {
      mtcars[[var]] <- trans[[var]](mtcars[[var]])
    }
    mtcars
    ```
    
    * first part builds list of functions, 2nd applies those to a dataset
    * are storing the data transformations as a function and then applying this to a dataframe ^[This is a very powerful practice because it allows you to save / keep track of your manipulations and apply them at other locations, while keeping the logic very well organized -- go and use this for documenting your work / transformations]
    
## 21.4: For loops vs. functionals

### 21.4.1

1.  Read the documentation for `apply()`. In the 2d case, what two for loops does it generalise?
    
    * It allows you to input either 1 or 2 for the `MARGIN` argument, which corresponds with looping over either the rows or the columns.
    
    
1.  Adapt `col_summary()` so that it only applies to numeric columns You might want to start with an `is_numeric()` function that returns a logical vector that has a TRUE corresponding to each numeric column.
    
    ```{r}
    col_summary_gen <- function(df, fun, ...) {
      #find cols that are numeric
      out <- vector("logical", length(df))
      for (i in seq_along(df)) {
        out[[i]] <- is.numeric(df[[i]])
      }
      #make list of values w/ mean
      df_select <- df[out]
      output <- vector("list", length(df_select))
      names(output) <- names(df_select)
      for (nm in names(output)) {
        output[[nm]] <- fun(df_select[[nm]], ...) %>% 
          round(digits = 2)
      }
      
      as_tibble(output)
    }
    
    col_summary_gen(flights, fun = median, na.rm = TRUE) %>% 
      gather() # trick to gather all easily
    ```
    
    * the `...` makes this so you can add arguments to the functions.    
    
## 21.5: The map functions

### 21.5.3

1.  Write code that uses one of the map functions to:

    *Compute the mean of every column in `mtcars`.*
    ```{r}
    purrr::map_dbl(mtcars, mean)
    ```
    
    *Determine the type of each column in `nycflights13::flights`.*
    
    ```{r}
    purrr::map_chr(flights, typeof)
    ```
    
    *Compute the number of unique values in each column of `iris`.*
    
    ```{r}
    purrr::map(iris, unique) %>% 
      map_dbl(length)
    ```
    
    *Generate 10 random normals for each of $\mu = -10$, $0$, $10$, and $100$.*
    
    ```{r}
    purrr::map(c(-10, 0, 10, 100), rnorm, n = 10)
    # purrr::map_dbl(flights, ~mean(is.na(.x)))
    ```

1.  How can you create a single vector that for each column in a data frame indicates whether or not it's a factor?
    
    ```{r}
    purrr::map_lgl(iris, is.factor)
    ```
    

1.  What happens when you use the map functions on vectors that aren't lists? What does `map(1:5, runif)` do? Why?
    
    ```{r}
    purrr::map(1:5, rnorm)
    ```
    
    * It runs on each item in the vector. 
    * `map()` runs on each element item within the input, i.e .x[[1]], .x[[2]], .x[[n]]. The elements of a numeric vector are scalars (or technically length 1 numeric vectors)
    * In this case then it is passing the values 1, 2, 3, 4, 5 into the first argument of `rnorm` for each run, hence pattern above.
      
1.  What does `map(-2:2, rnorm, n = 5)` do? Why?
    
    ```{r}
    map(-2:2, rnorm, n = 5)
    ```
    
    * It makes 5 vectors each of length 5 with the values centered at the means of -2,-1, 0, 1, 2 respectively. 
    * The reason is that the default filling of the first argument is already named by the defined input of 'n = 5', therefore, the inputs are instead going to the 2nd argument, and hence become the mean of the different rnorm calls.
    
1.  Rewrite `map(x, function(df) lm(mpg ~ wt, data = df))` to eliminate the anonymous function. 
    
    ```{r, eval = FALSE}
    mtcars %>% 
      purrr::map( ~ lm(mpg ~ wt, data = .))
    ```
    
## 21.9 Other patterns of for loops

### 21.9.3

1.  Implement your own version of `every()` using a for loop. Compare it with `purrr::every()`. What does purrr's version do that your version doesn't?
    
    ```{r}
    every_loop <- function(x, fun, ...) {
      output <- vector("list", length(x))
      for (i in seq_along(x)) {
      output[[i]] <- fun(x[[i]])
      }
      total <- flatten_lgl(output)
      sum(total) == length(x)
    }
    
    x <- list(flights, mtcars, iris)
    every_loop(x, is.data.frame)
    every(x, is.data.frame)
    
    ```
    
1.  Create an enhanced `col_sum()` that applies a summary function to every numeric column in a data frame.
    
    ```{r}
    col_summary_enh <- function(x,fun){
      x %>% 
        keep(is.numeric) %>% 
        purrr::map_dbl(fun)
    }
    col_summary_enh(mtcars, median)
    ```

1.  A possible base R equivalent of `col_sum()` is:
    ```{r, eval = FALSE}
    col_sum3 <- function(df, f) {
      is_num <- sapply(df, is.numeric)
      df_num <- df[, is_num]

      sapply(df_num, f)
    }
    ```
    
    But it has a number of bugs as illustrated with the following inputs:
    
    ```{r, eval = FALSE}
    df <- tibble(
      x = 1:3, 
      y = 3:1,
      z = c("a", "b", "c")
    )
    # OK
    col_sum3(df, mean) 
    # Has problems: don't always return numeric vector
    col_sum3(df[1:2], mean) 
    col_sum3(df[1], mean) 
    col_sum3(df[0], mean)
    ```
    
    What causes the bugs?
    
    * The vector output is not always consistent in it's output type. Also, returns error when inputting an empty list due to indexing issue.
    
## Appendix

### 21.3.5.1

#### Using map

```{r, eval = FALSE}
    outputted_csv <- files_example %>% 
      mutate(csv_data = map(file_paths, read_csv))
    
    outputted_csv <- files_example %>% 
      mutate(csv_data = map(file_paths, safely(read_csv)))
```

#### Plot of names

* Below is a plot of the proportion of individuals named the most popular letter in each year. This suggests that the top names by letter do not have as large of a proportion of the population ocmpared to historically.

```{r}
names_appended %>% 
  ggplot(aes(x = year, y = prop, colour = first_letter))+
  geom_line()
```

#### csv other example

The code below might be used to read csvs from a shared drive. I added on the 'file_path_pull' and 'files_example' components to add in information on the file paths and other details that were relevant. You might also add this data into a new column on the output...
```{r, eval = FALSE}
files_path_pull <- dir("//companydomain.com/directory/", 
                       pattern = "csv$",
                       full.names = TRUE)

files_example <- tibble(file_paths = files_path_pull[1:2]) %>% 
  extract(file_paths, into = c("path", "name"), regex = "(.*)([0-9]{4}-[0-9]{2}-[0-9]{2})", remove = FALSE)

read_dir <- function(dir){
  #input vector of file paths name and output appended file
  out <- vector("list", length(dir))
  for (i in seq_along(out)){
    out[[i]] <- read_csv(dir[[i]])
  }
  out <-  bind_rows(out)
  out
}

read_dir(files_example$file_paths)
```


### 21.3.5.2 (with purrr)

```{r}
purrr::map_lgl(iris, is.factor) %>% 
  tibble::enframe()
```


Slightly less attractive printing
```{r}
show_mean2 <- function(df) {
  df %>% 
    keep(is.numeric) %>% 
    map_dbl(mean, na.rm = TRUE)
}

show_mean2(flights)
```

Maybe slightly better printing and in df
```{r}
show_mean3 <- function(df){
  df %>% 
    keep(is.numeric) %>% 
    map_dbl(mean, na.rm = TRUE) %>% 
    as_tibble() %>% 
    mutate(names = row.names(.))
}

show_mean3(flights)
```

Other method is to take advantage of the `gather()` function
```{r}
flights %>% 
  keep(is.numeric) %>% 
  map(mean, na.rm = TRUE) %>% 
  as_tibble() %>% 
  gather()
```

### 21.9.3.1
* mine can't handle shortcut formulas or new functions    
```{r}
z <- sample(10)
z %>% 
  every( ~ . < 11)

# e.g. below would fail
# z %>%
#   every_loop( ~ . < 11)
```

### 21.9 mirroring `keep`

* below is one method for passing multiple, more complex arguments through keep, though you can also use function shortcuts (`~`) in `keep` and `discard`
    ```{r}
    ##how to pass multiple functions through keep?
    #can use map to subset columns by multiple criteria and then subset at end
    flights %>%
      purrr::map(is.na) %>% 
      purrr::map_dbl(sum) %>% 
      purrr::map_lgl(~.>10) %>% 
      flights[.]
    ```

### invoke examples

Let's change the example to be with quantile...

```{r}
invoke(runif, n = 10)

list("01a", "01b") %>%
  invoke(paste, ., sep = "-")

set.seed(123)
invoke_map(list(runif, rnorm), list(list(n = 10), list(n = 5)))
set.seed(123)
invoke_map(list(runif, rnorm), list(list(n = 10), list(5, 50)))
```

```{r}
list(m1 = mean, m2 = median) %>% invoke_map(x = rcauchy(100))

rcauchy(100)
```

Let's store everything in a dataframe...

```{r}
set.seed(123)
tibble(funs = list(rn = "rnorm", rp = "rpois", ru = "runif"),
       params = list(list(n = 20, mean = 10), list(n = 20, lambda = 3), list(n = 20, min = -1, max = 1))) %>% 
  with(invoke_map_df(funs, params))
```

```{r}
map_df(iris, ~.x*2)

select(iris, -Species) %>% 
  flatten_dbl() %>% 
  mean()
```

```{r}
mean.and.median <- function(x){
  list(mean = mean(x, na.rm = TRUE), 
       median = median(x, na.rm = TRUE))
}
```

Difference between dfr and dfc, taken from here: https://bio304-class.github.io/bio304-fall2017/control-flow-in-R.html 
```{r}
iris %>%
  select(-Species) %>%
  map_dfr(mean.and.median) %>% 
  bind_cols(tibble(names = names(select(iris, -Species))))

iris %>%
  select(-Species) %>%
  map_dfr(mean.and.median) %>% 
  bind_cols(tibble(names = names(select(iris, -Species))))
```

```{r}
iris %>%
  select(-Species) %>%
  map_dfc(mean.and.median)
```

### indexing nms caution

When creating your empty list, use indexes rather than names if you are creating values, otherwise you are creating new values on the list. E.g. in the example below I the output ends up being length 6 because you have the 3 `NULL` values plus the 3 newly created named positions.
```{r}
x <- list(a = 1:10, b = 11:18, c = 19:25)
output <- vector("list", length(x))
for (nm in names(x)) {
  output[[nm]] <- x[[nm]] * 3
}
output
```

### in-class notes

the `map_*` functions are essentially like running a `flatten_*` after running `map`. E.g. the two things below are equivalent

```{r}
map(flights, typeof) %>% 
  flatten_chr()

map_chr(flights, typeof)
```

Calculate the number of unique values for each level
```{r, eval = FALSE}
iris %>% 
  map(unique) %>% 
  map_dbl(length)

map_int(iris, ~length(unique(.x)))
```
    
Iterate through different min and max values
```{r}
min_params <- c(-1, 0, -10)
max_params <- c(11:13)
map2(.x = min_params, .y = max_params, ~runif(n = 10, min = .x, max = .y))
```

When using `pmap` it's often best to keep the parameters in a dataframe
```{r}
min_df_params <- tibble(n = c(10, 15, 20, 50 ), 
                        min = c(-1, 0, 1, 2), 
                        max = c(0, 1, 2, 3))

pmap(min_df_params, runif)
```

You can often use `map` a bunch of output that can then be stored in a tibble
```{r}
tibble(type = map_chr(mtcars, typeof),
       means = map_dbl(mtcars, mean),
       median = map_dbl(mtcars, median),
       names = names(mtcars))
```

*Provide the number of unique values for all columns excluding columns with numeric types or date types.*

```{r}
num_unique <- function(df) {
  df %>% 
  keep(~is_character(.x) | is.factor(.x)) %>% 
  map(~length(unique(.x))) %>% 
  as_tibble() %>% 
  gather() %>% 
  rename(field_name = key, num_unique = value)
}

num_unique(flights)
num_unique(iris)
num_unique(mpg)
```