In [3]:
library(tidyverse)

<b style='color:red'>Equivalent `DataFrame.agg()`</b>

**`summarise()`** creates a new data frame. It will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row summarising all observations in the input. It will contain one column for each grouping variable and one column for each of the summary statistics that you have specified.

```R
summarise(.data, ..., .groups = NULL)

summarize(.data, ..., .groups = NULL)
```

# Useful functions

# Examples

A summary applied to ungrouped tbl returns a single row

In [3]:
#Return the mean of displacement and the number of observations
mpg %>%
summarize(mean = mean(displ), n = n())

mean,n
3.471795,234


Usually, you'll want to group first

In [7]:
#For each group `cyl`, calculate the mean `displ` and the number ber observations for each group
mpg %>%
group_by(cyl) %>%
summarize(mean_dsiplacement = mean(displ), n = n())

`summarise()` ungrouping output (override with `.groups` argument)


cyl,mean_dsiplacement,n
4,2.145679,81
5,2.5,4
6,3.408861,79
8,5.132857,70


dplyr 1.0.0 allows to summarise to more than one value:

In [8]:
mtcars %>%
   group_by(cyl) %>%
   summarise(qs = quantile(disp, c(0.25, 0.75)), prob = c(0.25, 0.75))

`summarise()` regrouping output by 'cyl' (override with `.groups` argument)


cyl,qs,prob
4,78.85,0.25
4,120.65,0.75
6,160.0,0.25
6,196.3,0.75
8,301.75,0.25
8,390.0,0.75


Each summary call removes one grouping level (since that group
is now just a single row)

In [9]:
mtcars %>%
  group_by(cyl, vs) %>%
  summarise(cyl_n = n()) %>%
  group_vars()

`summarise()` regrouping output by 'cyl' (override with `.groups` argument)


In [5]:
#tidy-select style via across
iris %>% group_by(Species) %>% summarize(res = across(!Species, mean))

ERROR: Error: Problem with `summarise()` input `res`.
x Can't subset columns that don't exist.
x Column `Species` doesn't exist.
i Input `res` is `across(!Species, mean)`.
i The error occurred in group 1: Species = "setosa".


# Arguments

### `...`	

`<data-masking>` Name-value pairs of summary functions. The name will be the name of the variable in the result.

The value can be:

* A vector of length 1, e.g. `min(x)`, `n()`, or `sum(is.na(y))`.

* A vector of length n, e.g. `quantile()`.

* A data frame, to add multiple columns from a single expression.

<hr>

A vector of length 1

In [7]:
#The number of observations in each group of Species
iris %>% group_by(Species) %>% summarize(n = n())

`summarise()` ungrouping output (override with `.groups` argument)


Species,n
setosa,50
versicolor,50
virginica,50


A vector of length n (window function)

In [9]:
school <- data.frame(
    class = c('A', 'B', 'A', 'B'),
    name = c('Pikachu', 'ETOGRUL', 'Gravita', 'VIKING'),
    grade = c(10, 5, 6, 2)
)

school

class,name,grade
A,Pikachu,10
B,ETOGRUL,5
A,Gravita,6
B,VIKING,2


In [19]:
#for each class, ranking student based on their score
school %>% group_by(class) %>% summarize(rank = rank(desc(grade)), .groups = 'keep')

class,rank
A,1
A,2
B,1
B,2
