In [2]:
library(tidyverse)

# Count observations by group

`count()` lets you quickly count the unique values of one or more variables: `df %>% count(a, b)` is roughly equivalent to `df %>% group_by(a, b) %>% summarise(n = n())`. `count()` is paired with `tally()`, a lower-level helper that is equivalent to `df %>% summarise(n = n())`. Supply `wt` to perform weighted counts, switching the summary from `n = n()` to `n = sum(wt)`.

`add_count()` are `add_tally()` are equivalents to `count()` and `tally()` but use `mutate()` instead of `summarise()` so that they add a new column with group-wise counts.

```R
count(
  x,
  ...,
  wt = NULL,
  sort = FALSE,
  name = NULL,
  .drop = group_by_drop_default(x)
)

tally(x, wt = NULL, sort = FALSE, name = NULL)

add_count(x, ..., wt = NULL, sort = FALSE, name = NULL, .drop = deprecated())

add_tally(x, wt = NULL, sort = FALSE, name = NULL)
```

# Examples

count is a convenient way to get a sense of the distribution of values in a dataset

In [6]:
#Count the number of observations for each `manufacturer` in `mpg`
mpg  %>% count(manufacturer)

manufacturer,n
audi,18
chevrolet,19
dodge,37
ford,25
honda,9
hyundai,14
jeep,8
land rover,4
lincoln,3
mercury,4


In [9]:
#Sort by the the result of count 
mpg %>% count(manufacturer, sort = T)

manufacturer,n
dodge,37
toyota,34
volkswagen,27
ford,25
chevrolet,19
audi,18
hyundai,14
subaru,14
nissan,13
honda,9


In [11]:
#Count the number of observations for each manufacturer AND model, sort the result
mpg %>% count(manufacturer, model, sort = T)

manufacturer,model,n
dodge,caravan 2wd,11
dodge,ram 1500 pickup 4wd,10
dodge,dakota pickup 4wd,9
ford,mustang,9
honda,civic,9
volkswagen,jetta,9
audi,a4 quattro,8
jeep,grand cherokee 4wd,8
subaru,impreza awd,8
audi,a4,7


In [14]:
#Round `Sepal.Length` then calculate frequency for each value
iris %>% count(round(Sepal.Length))

round(Sepal.Length),n
4,5
5,47
6,68
7,24
8,6


use the `wt` argument to perform a weighted count. This is useful
when the data has already been aggregated once

In [15]:
library(titanic)

"package 'titanic' was built under R version 3.6.3"

In [17]:
df <- titanic_train %>% select(Sex, Survived)

df %>% head()

Sex,Survived
male,0
female,1
female,1
female,1
male,0
male,0


In [18]:
#The number of man survivied and woman survived
df %>% count(Sex, wt = Survived)

Sex,n
female,233
male,109


`tally()` is a lower-level function that assumes you've done the grouping

In [22]:
mpg %>% group_by(manufacturer, model) %>% tally(sort = T)
#Equivalent to: mpg %>% count(manufacturer, model, sort = T)

manufacturer,model,n
dodge,caravan 2wd,11
dodge,ram 1500 pickup 4wd,10
dodge,dakota pickup 4wd,9
ford,mustang,9
honda,civic,9
volkswagen,jetta,9
audi,a4 quattro,8
jeep,grand cherokee 4wd,8
subaru,impreza awd,8
audi,a4,7


both `count()` and `tally()` have add_ variants that work like
`mutate()` instead of summarise

In [24]:
iris %>% 
add_count(Species) %>%
head()

Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species,n
5.1,3.5,1.4,0.2,setosa,50
4.9,3.0,1.4,0.2,setosa,50
4.7,3.2,1.3,0.2,setosa,50
4.6,3.1,1.5,0.2,setosa,50
5.0,3.6,1.4,0.2,setosa,50
5.4,3.9,1.7,0.4,setosa,50


In [25]:
iris %>% 
group_by(Species) %>% 
add_tally() %>% 
head()

Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species,n
5.1,3.5,1.4,0.2,setosa,50
4.9,3.0,1.4,0.2,setosa,50
4.7,3.2,1.3,0.2,setosa,50
4.6,3.1,1.5,0.2,setosa,50
5.0,3.6,1.4,0.2,setosa,50
5.4,3.9,1.7,0.4,setosa,50


# Arguments

```r
count(
  x,
  ...,
  wt = NULL,
  sort = FALSE,
  name = NULL,
  .drop = group_by_drop_default(x)
)

tally(x, wt = NULL, sort = FALSE, name = NULL)

add_count(x, ..., wt = NULL, sort = FALSE, name = NULL, .drop = deprecated())

add_tally(x, wt = NULL, sort = FALSE, name = NULL)
```

### `x`	

A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr).

<hr>

In [3]:
#Count the number of observations for each species in `iris` dataset
count(x = iris, Species)

Species,n
setosa,50
versicolor,50
virginica,50


In [4]:
#Equivalent
iris %>% count(Species)

Species,n
setosa,50
versicolor,50
virginica,50


### `wt`	

<data-masking> Frequency weights. Can be NULL or a variable:

* If NULL (the default), counts the number of rows in each group.

* If a variable, computes sum(`wt`) for each group.

In [6]:
clan <- data.frame(
    name =  c('VNC', 'VNC', 'Dirilis'),
    reward =  c(100, 96, 10)
)

clan

name,reward
VNC,100
VNC,96
Dirilis,10


In [7]:
#default, wt = NULL
clan %>% count(name)

name,n
Dirilis,1
VNC,2


In [8]:
#wt = reward

clan %>% count(name, wt = reward)

name,n
Dirilis,10
VNC,196


### `sort`	

If TRUE, will show the largest groups at the top.

<hr> 

In [10]:
#default, sort = FALSE
mpg %>% count(manufacturer)

manufacturer,n
audi,18
chevrolet,19
dodge,37
ford,25
honda,9
hyundai,14
jeep,8
land rover,4
lincoln,3
mercury,4


In [12]:
#sort = TRUE, sort result by count descending
mpg %>% count(manufacturer, sort = T)

manufacturer,n
dodge,37
toyota,34
volkswagen,27
ford,25
chevrolet,19
audi,18
hyundai,14
subaru,14
nissan,13
honda,9


### `name`

	
The name of the new column in the output.

If omitted, it will default to n. If there's already a column called n, it will error, and require you to specify the name.

In [13]:
#Set the name of the result column to 'Total Reward'
clan %>% count(name, wt = reward, name = 'Total Reward')

name,Total Reward
Dirilis,10
VNC,196


### `.drop`	

For `count()`: if FALSE will include counts for empty groups (i.e. for levels of factors that don't exist in the data). Deprecated `in add_count()` since it didn't actually affect the output.

Value