<center><h1>Using <code>group_by()</code> and <code>summarise()</code> in dplyr</h1></center>

# 1. Why use `group_by()` and `summarise()` from _dplyr_?
  - Being able to aggregate and summarize by grouping is hugely common
  - _split-apply-combine_ pattern
  - These operations can be "chained" with other _dplyr_ functions
  - Often makes for concise, intuitive, and readable code

## 1.1 Example of `group_by()` and `summarise()`

In [3]:
library(dplyr)

arrests <- read.csv("data/pvd_arrests_2020-10-03.csv")


In [4]:
gender_tbl <- arrests %>%
    group_by(gender) %>%
    summarise(
        n_rows = n(),
        mean_age = mean(age)
    ) 

head(gender_tbl)

`summarise()` ungrouping output (override with `.groups` argument)



gender,n_rows,mean_age
<chr>,<int>,<dbl>
,21,29.47619
Female,1906,31.99895
Male,6804,33.20988
,20,28.15
Unknown,4,34.5


# 2. Chaining `filter()` with `group_by()` and `summarise()`

In [7]:
gender_tbl <- arrests %>%
    filter(
        from_city == "Providence",
        year == 2019
    ) %>%
    group_by(gender) %>%
    summarise(
        n_rows = n(),
        mean_age = mean(age),
        mean_cnts = mean(counts, na.rm = TRUE)
    ) 

head(gender_tbl)

`summarise()` ungrouping output (override with `.groups` argument)



gender,n_rows,mean_age,mean_cnts
<chr>,<int>,<dbl>,<dbl>
,9,23.88889,1.0
Female,515,33.46602,1.064039
Male,2039,33.38941,1.098027
Unknown,1,49.0,1.0


## 2.1 More Interesting Example of Chaining

In [8]:
is_summer <- function(month_num) {
    chk <- month_num %in% c(6, 7, 8)
    return(chk)
}

In [10]:
is_summer(6)   # TRUE
is_summer(2)   # FALSE
is_summer(8)   # TRUE


### 2.1.1 More Interesting Example (cont.)

In [11]:
vio_tbl <- arrests %>%
    filter(
        statute_desc != "",
        statute_desc != "NULL", 
        year == 2020
    ) %>%
    group_by(statute_desc) %>%
    summarise(
        n_vios = n(),
        prop_male = mean(gender == "Male"),
        mean_age = mean(age),
        prop_summer = mean(is_summer(month))
    ) %>%
    arrange(desc(n_vios))

head(vio_tbl, 10)

`summarise()` ungrouping output (override with `.groups` argument)



statute_desc,n_vios,prop_male,mean_age,prop_summer
<chr>,<int>,<dbl>,<dbl>,<dbl>
"Driving after Denial, Suspension or Revocation of License",457,0.7374179,30.76805,0.2669584
DOMESTIC-SIMPLE ASSAULT/BATTERY,364,0.8104396,33.91758,0.3214286
DISORDERLY CONDUCT,216,0.7453704,31.05556,0.2962963
SIMPLE ASSAULT OR BATTERY,199,0.638191,31.04523,0.2763819
BENCH WARRANT ISSUED FROM SUPERIOR COURT,141,0.8014184,36.13475,0.106383
RESISTING LEGAL OR ILLEGAL ARREST,123,0.7642276,30.18699,0.2764228
POSSESSION OF SCHEDULE I II III,116,0.8189655,36.02586,0.137931
BENCH WARRANT ISSUED FROM 6TH DISTRICT COURT,101,0.7821782,36.20792,0.1881188
SHOPLIFTING-MISD - SHOPLIFTING,99,0.4343434,33.75758,0.2424242
WARRANT OF ARREST ON AFFIDAVIT - ALL OTH OFFENSE,93,0.8709677,33.91398,0.1397849
