In [2]:
library(tidyverse)

# Group by one or more variables

Most data operations are done on groups defined by variables. `group_by()` takes an existing tbl and converts it into a grouped tbl where operations are performed "by group". `ungroup()` removes grouping.

```R
group_by(.data, ..., .add = FALSE, .drop = group_by_drop_default(.data))

ungroup(x, ...)
```

**Arguments**  
`...`	
In group_by(), variables or computations to group by. In ungroup(), variables to remove from the grouping.

# Examples

In [8]:
TF <- data.frame(
    clan = c('VNC', 'VN', 'King Allool', 'VNC', 'King Allool', 'VN'),
    player = c('Meomeo888', 'VN Van Du', 'GHOST', 'VN Pikachu', 'Dr Strange', 'VN Wanie'),
    power = c(95, 78, 75, 100, 90, 85),
    country = c('VN', 'VN', 'USA', 'VN', 'INDIA', 'VN')
)
TF

clan,player,power,country
VNC,Meomeo888,95,VN
VN,VN Van Du,78,VN
King Allool,GHOST,75,USA
VNC,VN Pikachu,100,VN
King Allool,Dr Strange,90,INDIA
VN,VN Wanie,85,VN


grouping doesn't change how the data looks (apart from listing
how it's grouped):

In [9]:
TF %>% group_by(clan)

clan,player,power,country
VNC,Meomeo888,95,VN
VN,VN Van Du,78,VN
King Allool,GHOST,75,USA
VNC,VN Pikachu,100,VN
King Allool,Dr Strange,90,INDIA
VN,VN Wanie,85,VN


In [10]:
#Average power for each clan
TF %>% group_by(clan) %>% summarize(power = mean(power))

`summarise()` ungrouping output (override with `.groups` argument)


clan,power
King Allool,82.5
VN,81.5
VNC,97.5


In [11]:
#For each clan, filter players having power >= 90
TF %>% group_by(clan) %>% filter(power >= 90)

clan,player,power,country
VNC,Meomeo888,95,VN
VNC,VN Pikachu,100,VN
King Allool,Dr Strange,90,INDIA


Each call to `summarise()` removes a layer of grouping

In [13]:
v <- TF %>% group_by(clan, country) %>% summarize(n = n())
v

`summarise()` regrouping output by 'clan' (override with `.groups` argument)


clan,country,n
King Allool,INDIA,1
King Allool,USA,1
VN,VN,2
VNC,VN,2


In [14]:
v %>% summarize(n = sum(n))

`summarise()` ungrouping output (override with `.groups` argument)


clan,n
King Allool,2
VN,2
VNC,2


To removing grouping, use `ungroup()`

In [16]:
v %>% ungroup() %>% summarize(n = sum(n))

n
6


You can group by expressions: this is just short-hand for
a mutate() followed by a group_by()

In [18]:
TF %>% group_by(substr(clan, 1, 1)) %>% summarize(power = mean(power))

`summarise()` ungrouping output (override with `.groups` argument)


"substr(clan, 1, 1)",power
K,82.5
V,89.5


By default, group_by() overrides existing grouping

In [19]:
TF %>% group_by(clan) %>% group_by(country) %>% group_vars()

Use `add = TRUE` to instead append

In [20]:
TF %>% group_by(clan) %>% group_by(country, .add = TRUE) %>% group_vars()

when factors are involved and `.drop = FALSE`, groups can be empty

In [21]:
tbl <- tibble(
  x = 1:10,
  y = factor(rep(c("a", "c"), each  = 5), levels = c("a", "b", "c"))
)
tbl

x,y
1,a
2,a
3,a
4,a
5,a
6,c
7,c
8,c
9,c
10,c


In [22]:
# .drop = FALSE, notice the second row is empty but it is not dropped
tbl %>%
  group_by(y, .drop = FALSE) %>%
  group_rows()

In [23]:
# .drop = TRUE, there is no empty row
tbl %>%
  group_by(y, .drop = TRUE) %>%
  group_rows()