In [3]:
library(tidyverse)

# Expand data frame to include all possible combinations of values

`expand()` generates all combination of variables found in a dataset. It is paired with `nesting()` and `crossing()` helpers. `crossing()` is a wrapper around `expand_grid()` that de-duplicates and sorts its inputs; `nesting()` is a helper that only finds combinations already present in the data.

**`expand()`** is often useful in conjunction with joins:

* use it with `right_join()` to convert implicit missing values to explicit missing values (e.g., fill in gaps in your data frame).

* use it with `anti_join()` to figure out which combinations are missing (e.g., identify gaps in your data frame).

```r
expand(data, ..., .name_repair = "check_unique")

crossing(..., .name_repair = "check_unique")

nesting(..., .name_repair = "check_unique")
```

# Examples

In [4]:
fruits <- tibble(
  type   = c("apple", "orange", "apple", "orange", "orange", "orange"),
  year   = c(2010, 2010, 2012, 2010, 2010, 2012),
  size  =  factor(
    c("XS", "S",  "M", "S", "S", "M"),
    levels = c("XS", "S", "M", "L")
  ),
  weights = rnorm(6, as.numeric(size) + 2)
)

fruits

type,year,size,weights
apple,2010,XS,3.807436
orange,2010,S,4.212495
apple,2012,M,3.103962
orange,2010,S,3.551894
orange,2010,S,1.83174
orange,2012,M,3.62332


In [6]:
# All possible combinations ---------------------------------------
# Note that all defined, but not necessarily present, levels of the
# factor variable `size` are retained.

fruits %>% expand(type)

type
apple
orange


In [12]:
#create by product of unique values from 2 columns
#e.g: if `type` has 2 unique value 1, 2
#e.:     `size` has 2 unique value 3, 4
#Then the result is (1, 3), (1, 4), (2, 3), (2, 4)

#Think of: itertools.product(fruits['type'], fruits['size'])

fruits %>% expand(type, size)

type,size
apple,XS
apple,S
apple,M
apple,L
orange,XS
orange,S
orange,M
orange,L


In [8]:
fruits %>% expand(type, size, year)

type,size,year
apple,XS,2010
apple,XS,2012
apple,S,2010
apple,S,2012
apple,M,2010
apple,M,2012
apple,L,2010
apple,L,2012
orange,XS,2010
orange,XS,2012


In [9]:
# Only combinations that already appear in the data ---------------
fruits %>% expand(nesting(type))

type
apple
orange


In [13]:

fruits %>% expand(nesting(type, size))

type,size
apple,XS
apple,M
orange,S
orange,M


In [15]:
#think of using count
fruits %>% count(type, size)

type,size,n
apple,XS,1
apple,M,1
orange,S,3
orange,M,1


In [16]:
fruits %>% expand(nesting(type, size, year))

type,size,year
apple,XS,2010
apple,M,2012
orange,S,2010
orange,M,2012


In [18]:
# Other uses -------------------------------------------------------
# Use with `full_seq()` to fill in values of continuous variables


#intertools.product(fruits['type'], fruits['size'], full_seq(fruits['year'], 1))
fruits %>% expand(type, size, full_seq(year, 1))

type,size,"full_seq(year, 1)"
apple,XS,2010
apple,XS,2011
apple,XS,2012
apple,S,2010
apple,S,2011
apple,S,2012
apple,M,2010
apple,M,2011
apple,M,2012
apple,L,2010


In [19]:
# Use `anti_join()` to determine which observations are missing

In [23]:
all <- fruits %>% expand(type, size, year) 

all  %>% anti_join(fruits)

Joining, by = c("type", "size", "year")


type,size,year
apple,XS,2012
apple,S,2010
apple,S,2012
apple,M,2010
apple,L,2010
apple,L,2012
orange,XS,2010
orange,XS,2012
orange,S,2012
orange,M,2010


In [24]:
# Use with `right_join()` to fill in missing rows
fruits %>% dplyr::right_join(all)

Joining, by = c("type", "year", "size")


type,year,size,weights
apple,2010,XS,3.807436
orange,2010,S,4.212495
apple,2012,M,3.103962
orange,2010,S,3.551894
orange,2010,S,1.83174
orange,2012,M,3.62332
apple,2012,XS,
apple,2010,S,
apple,2012,S,
apple,2010,M,
