In [2]:
library(tidyverse)

# [Column wise operations](https://dplyr.tidyverse.org/articles/colwise.html)

It’s often useful to perform the same operation on multiple columns, but copying and pasting is both tedious and error prone:

In [3]:
iris %>% summarize(
    mean_sepal_length = mean(Sepal.Length),
    mean_sepal_width = mean(Sepal.Width),
    mean_petal_length = mean(Petal.Length),
    mean_petal_width = mean(Petal.Width)
)

mean_sepal_length,mean_sepal_width,mean_petal_length,mean_petal_width
5.843333,3.057333,3.758,1.199333


Using **`dplyr::across`**

In [11]:
iris %>% summarize(across(!Species, mean, .names = 'mean_{.col}'))

mean_Sepal.Length,mean_Sepal.Width,mean_Petal.Length,mean_Petal.Width
5.843333,3.057333,3.758,1.199333


# Basic Usage

**`across()`** has two primary arguments:

- The first argument, `.cols`, selects the columns you want to operate on. It uses tidy selection (like `select()`) so you can pick variables by position, name, and type.

- The second argument, `.fns`, is a function or list of functions to apply to each column. This can also be a purrr style formula (or list of formulas) like `~ .x / 2`. (This argument is optional, and you can omit it if you just want to get the underlying data;

In [13]:
# Caculate the number of unique values for each character column
starwars %>% summarize(across(where(is.character), ~ length(unique(.))))

name,hair_color,skin_color,eye_color,sex,gender,homeworld,species
87,13,31,15,5,3,49,38


### Mutiple functions

You can transform each variable with more than one function by supplying a named list of functions or lambda functions in the second argument:

In [21]:
# caculate the min and max of each numeric column

min_max = list(
    min = partial(min, na.rm = T),   # using purrr:partial
    max = ~ max(., na.rm = T)        # using purrr formula
)
starwars %>% summarize(across(where(is.numeric), min_max))

height_min,height_max,mass_min,mass_max,birth_year_min,birth_year_max
66,264,15,1358,8,896


Control how the names are created with the `.names` argument which takes a **`glue`** spec:

In [23]:
starwars %>% summarize(across(where(is.numeric), min_max, .names = '{.col}.{.fn}'))

height.min,height.max,mass.min,mass.max,birth_year.min,birth_year.max
66,264,15,1358,8,896


If you’d prefer all summaries with the same function to be grouped together, you’ll have to expand the calls yourself:

In [25]:
starwars %>% summarize(
    across(where(is.numeric), ~ min(., na.rm = T), .names = '{.col}.min'),
    across(where(is.numeric), ~ max(., na.rm = T), .names = '{.col}.max')
)

height.min,mass.min,birth_year.min,height.max,mass.max,birth_year.max,height.min.max,mass.min.max,birth_year.min.max
66,15,8,264,1358,896,66,15,8


### Current column

### Gotchas

Be careful when combining numeric summaries with `is.numeric`:

In [26]:
df <- data.frame(x = c(1, 2, 3), y = c(1, 4, 9))

In [28]:
df %>% summarize(
    n = n(),
    across(where(is.numeric), sd)
)

n,x,y
,1,4.041452


Here `n` becomes `NA` because `n` is numeric, so the `across()` computes its standard deviation, and the standard deviation of 3 (a constant) is `NA`. You probably want to compute `n()` last to avoid this problem:

In [29]:
df %>% summarize(
    across(where(is.numeric), sd),
    n = n()
)

x,y,n
1,4.041452,3


Alternatively, you could explicitly exclude `n` from the columns to operate on:

In [30]:
df %>% summarize(
    n = n(),
    across(where(is.numeric) & !n, sd)
)

n,x,y
3,1,4.041452


### Other verbs

So far we’ve focussed on the use of `across()` with `summarise()`, but it works with any other dplyr verb that uses data masking:

* Rescale all numeric variables to range 0-1:

In [36]:
min_max_scaler <- function(x) {
    rng = range(x, na.rm = T)
    (x - rng[1]) / (rng[2] - rng[1])
}

df <- tibble(x = 1:4, y = rnorm(4))

df %>% mutate(across(everything(), min_max_scaler))

x,y
0.0,0.0
0.3333333,1.0
0.6666667,0.4016147
1.0,0.289363


* Find all rows where no variable has missing values:

In [39]:
starwars %>% filter(across(everything(), negate(is.na)))

name,height,mass,hair_color,skin_color,eye_color,birth_year,sex,gender,homeworld,species,films,vehicles,starships
Luke Skywalker,172,77.0,blond,fair,blue,19.0,male,masculine,Tatooine,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens","Snowspeeder , Imperial Speeder Bike","X-wing , Imperial shuttle"
Darth Vader,202,136.0,none,white,yellow,41.9,male,masculine,Tatooine,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope",,TIE Advanced x1
Leia Organa,150,49.0,brown,light,brown,19.0,female,feminine,Alderaan,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens",Imperial Speeder Bike,
Owen Lars,178,120.0,"brown, grey",light,blue,52.0,male,masculine,Tatooine,Human,"Attack of the Clones, Revenge of the Sith , A New Hope",,
Beru Whitesun lars,165,75.0,brown,light,blue,47.0,female,feminine,Tatooine,Human,"Attack of the Clones, Revenge of the Sith , A New Hope",,
Biggs Darklighter,183,84.0,black,light,brown,24.0,male,masculine,Tatooine,Human,A New Hope,,X-wing
Obi-Wan Kenobi,182,77.0,"auburn, white",fair,blue-gray,57.0,male,masculine,Stewjon,Human,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope",Tribubble bongo,"Jedi starfighter , Trade Federation cruiser, Naboo star skiff , Jedi Interceptor , Belbullab-22 starfighter"
Anakin Skywalker,188,84.0,blond,fair,blue,41.9,male,masculine,Tatooine,Human,"Attack of the Clones, The Phantom Menace , Revenge of the Sith","Zephyr-G swoop bike, XJ-6 airspeeder","Trade Federation cruiser, Jedi Interceptor , Naboo fighter"
Chewbacca,228,112.0,brown,unknown,blue,200.0,male,masculine,Kashyyyk,Wookiee,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens",AT-ST,"Millennium Falcon, Imperial shuttle"
Han Solo,180,80.0,brown,fair,brown,29.0,male,masculine,Corellia,Human,"The Empire Strikes Back, Return of the Jedi , A New Hope , The Force Awakens",,"Millennium Falcon, Imperial shuttle"


* Find all distinct

In [40]:
starwars %>% distinct(across(contains('color')))

hair_color,skin_color,eye_color
blond,fair,blue
,gold,yellow
,"white, blue",red
none,white,yellow
brown,light,brown
"brown, grey",light,blue
brown,light,blue
,"white, red",red
black,light,brown
"auburn, white",fair,blue-gray


* Count all combinations of variables with a given pattern:

In [41]:
starwars %>% count(across(contains('color')), sort = T)

hair_color,skin_color,eye_color,n
brown,light,brown,6
brown,fair,blue,4
none,grey,black,4
black,dark,brown,3
blond,fair,blue,3
black,fair,brown,2
black,tan,brown,2
black,yellow,blue,2
brown,fair,brown,2
none,white,yellow,2


<b style = 'color:red'>NOTE: **`across()`** doesn’t work with **`select()`** or **`rename()`** because they already use tidy select syntax<br>if you want to transform column names with a function, you can use **`rename_with()`**.</b>



# Why `dplyr::across`

1. `across()` makes it possible to express useful summaries that were previously impossible:

In [46]:
iris %>% summarize(
    across(where(is.numeric), mean, .names = '{.col}.mean'), 
    across(where(is.factor), nlevels, .names = 'Number of {.col}'),
    `number of observations` = n()
)

Sepal.Length.mean,Sepal.Width.mean,Petal.Length.mean,Petal.Width.mean,Number of Species,number of observations
5.843333,3.057333,3.758,1.199333,3,150


2. `across()` reduces the number of functions that dplyr needs to provide. This makes dplyr easier for you to use (because there are fewer functions to remember) and easier for us to implement new verbs (since we only need to implement one function, not four). (No need to learn functions like `summarize_if`, `summarize_at`, ...