In [2]:
library(tidyverse)

# [Row-wise operations](https://dplyr.tidyverse.org/articles/rowwise.html#row-wise-summary-functions)

# Creating

Row-wise operations require a special type of grouping where each group consists of a single row. You create this with **`rowwise()`**:

In [3]:
df <- tibble(x = 1:2, y = 3:4, z = 5:6)

In [4]:
df %>% rowwise()

x,y,z
1,3,5
2,4,6


Like `group_by()`, `rowwise()` doesn’t really do anything itself; it just changes how the other verbs work.

In [5]:
# Calculate the mean of each row

df %>% rowwise() %>% mutate(m = mean(c(x, y, z)))

x,y,z,m
1,3,5,3
2,4,6,4


You can optionally supply “identifier” variables in your call to `rowwise()`. These variables are preserved when you call `summarise()`, so they behave somewhat similarly to the grouping variables passed to `group_by()`:

In [9]:
df <- tibble(name = c("Mara", "Hadley"), x = 1:2, y = 3:4, z = 5:6)
df

name,x,y,z
Mara,1,3,5
Hadley,2,4,6


In [10]:
df %>% rowwise() %>% summarize(m = mean(c(x, y, z)))

`summarise()` ungrouping output (override with `.groups` argument)


m
3
4


In [11]:
df %>% rowwise(name) %>% summarize(m = mean(c(x, y, z)))

`summarise()` regrouping output by 'name' (override with `.groups` argument)


name,m
Mara,3
Hadley,4


<b style = 'color:red'>NOTE: `rowwise()` is just a special form of grouping, so if you want to remove it from a data frame, just call `ungroup()`.</b>

# Per row summary statistics

In [12]:
df <- tibble(id = 1:6, w = 10:15, x = 20:25, y = 30:35, z = 40:45)
df

id,w,x,y,z
1,10,20,30,40
2,11,21,31,41
3,12,22,32,42
4,13,23,33,43
5,14,24,34,44
6,15,25,35,45


In [13]:
# for each row, calulate the mean of w, x, y, z
df %>% rowwise(id) %>% summarize(m = mean(c(w, x, y, z)))

`summarise()` regrouping output by 'id' (override with `.groups` argument)


id,m
1,25
2,26
3,27
4,28
5,29
6,30


Of course, if you have a lot of variables, it’s going to be tedious to type in every variable name. Instead, you can use `c_across()` which uses tidy selection syntax so you can to succinctly select many variables:

In [17]:
df %>% rowwise(id) %>% summarize(m = mean(c_across(w:z)))

`summarise()` regrouping output by 'id' (override with `.groups` argument)


id,m
1,25
2,26
3,27
4,28
5,29
6,30


 compute the proportion of the total for each column:

In [19]:
df %>% rowwise() %>% mutate(total = sum(c_across(w:z))) %>% mutate(across(w:z, ~ . / total))

id,w,x,y,z,total
1,0.1,0.2,0.3,0.4,100
2,0.1057692,0.2019231,0.2980769,0.3942308,104
3,0.1111111,0.2037037,0.2962963,0.3888889,108
4,0.1160714,0.2053571,0.2946429,0.3839286,112
5,0.1206897,0.2068966,0.2931034,0.3793103,116
6,0.125,0.2083333,0.2916667,0.375,120


### Row-wise summary functions

The `rowwise()` approach will work for any summary function. But if you need greater speed, it’s worth looking for a built-in row-wise variant of your summary function. These are more efficient because they operate on the data frame as whole; they don’t split it into rows, compute the summary, and then join the results back together again.

In [20]:
df %>% mutate(total = rowSums(across(w:z)))

id,w,x,y,z,total
1,10,20,30,40,100
2,11,21,31,41,104
3,12,22,32,42,108
4,13,23,33,43,112
5,14,24,34,44,116
6,15,25,35,45,120


In [22]:
df %>% mutate(mean = rowMeans(across(w:z)))

id,w,x,y,z,mean
1,10,20,30,40,25
2,11,21,31,41,26
3,12,22,32,42,27
4,13,23,33,43,28
5,14,24,34,44,29
6,15,25,35,45,30


# List-columns

### Motivation

In [23]:
df <- tibble(
  x = list(1, 2:3, 4:6)
)

df

x
1
"2, 3"
"4, 5, 6"


In [24]:
# the lengh of each element in column x

df %>% rowwise() %>% mutate(length = length(x))

x,length
1,1
"2, 3",2
"4, 5, 6",3


In [26]:
# equivalent using purrr

df %>% mutate(length = x %>% map_int(length))

x,length
1,1
"2, 3",2
"4, 5, 6",3


### Subsetting

There’s an important difference between a grouped data frame where each group happens to have one row, and a row-wise data frame where every group always has one row. Take these two data frames:

In [32]:
df <- tibble(g = 1:2, y = list(1:3, "a"))
df

g,y
1,"1, 2, 3"
2,a


In [27]:
gf <- df %>% group_by(g)
rf <- df %>% rowwise(g)

If we compute some properties of y, you’ll notice the results look different:

In [29]:
gf %>% mutate(type = typeof(y), length = length(y))

g,y,type,length
1,"1, 2, 3",list,1
2,a,list,1


In [31]:
rf %>% mutate(type = typeof(y), length = length(y))

g,y,type,length
1,"1, 2, 3",integer,3
2,a,character,1


### Modelling

# Repeated function calls

`rowwise()` doesn’t just work with functions that return a length-1 vector (aka summary functions); it can work with any function if the result is a list. This means that `rowwise()` and `mutate()` provide an elegant way to call a function many times with varying arguments, storing the outputs alongside the inputs.

In [34]:
df <- tribble(
  ~ n, ~ min, ~ max,
    1,     0,     1,
    2,    10,   100,
    3,   100,  1000,
)

df

n,min,max
1,0,1
2,10,100
3,100,1000


In [36]:
df %>% rowwise() %>% mutate(sample = list(runif(n, min, max)))

n,min,max,sample
1,0,1,0.1193836
2,10,100,"20.31088, 85.50538"
3,100,1000,"771.5930, 197.3408, 293.5938"


Note the use of list().  list() means that we’ll get a list column where each row is a list containing multiple values

### Multiple combinations

In [37]:
df <- expand_grid(mean = c(-1, 0, 1), sd = c(1, 10, 100))
df

mean,sd
-1,1
-1,10
-1,100
0,1
0,10
0,100
1,1
1,10
1,100


In [40]:
df %>% rowwise() %>% mutate(sample = list(rnorm(10, mean, sd)))

mean,sd,sample
-1,1,"-0.22471285, -0.30996150, -0.06898567, -1.46914925, -0.61824616, -0.27017763, -0.43321641, -0.58437822, -0.76692078, 0.13604184"
-1,10,"0.7530198, -1.0602098, 2.4141938, -17.8469462, 5.6150016, 7.0875617, -17.5819421, 11.6992405, 6.1962554, 6.4414500"
-1,100,"-5.619504, -50.891112, 17.299161, 169.273241, -63.690020, -151.137164, -44.118924, -199.529634, -95.491138, -53.365946"
0,1,"0.083067730, 0.179244292, 0.182316255, 1.326328958, -0.425857270, 0.003974857, -0.641008557, -1.433796526, 0.817382289, -0.080969036"
0,10,"6.8041879, 0.8434895, -1.6479602, 2.5171706, 0.4781909, 3.3695991, 8.5346329, -4.7405180, 4.9195873, 4.3896954"
0,100,"-87.787070, 12.645020, -110.218653, -44.056024, -57.157618, -225.539459, -19.679948, 121.568075, -1.983079, -2.839947"
1,1,"1.6752758, 1.9775224, 0.8780981, 1.5284610, -0.5147356, 0.2220964, 0.7432625, -0.5891559, 2.1530528, 0.1522253"
1,10,"-5.584778, 11.273510, -17.287649, 21.166979, -2.185932, 9.321539, 8.573130, 8.782625, 13.367115, 14.604982"
1,100,"121.211715, 71.293742, 35.728676, -86.356419, 127.590632, 60.205311, 106.472898, -9.274463, -57.897382, -67.233160"


### Varying functions