pak::pak("BigelowLab/congrdist")
This is a tutorial on using sf R package to learn about the many things we can do using spatial math. In particular we’ll work with this tutorial on geometrical operations.
We’ll use congressional district boundary data from Jeffrey B Lewis and others.
Jeffrey B. Lewis, Brandon DeVine, Lincoln Pitcher, and Kenneth C. Martis. (2013) Digital Boundary Definitions of U.S. Congressional Districts, 1789-2012.
We use congressional districts because they are fun and an endless source of interest.
We have cloned the data repository and extracted from it the congressional boundaries for Maine. We then mutated the dataset to add a Congressional (session 1) start date (well, year).
suppressPackageStartupMessages({
library(congrdist)
library(sf)
library(dplyr)
library(units)
library(ggplot2)
library(patchwork)
})
x = read_state(state = "Maine")
x## Simple feature collection with 75 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -71.08481 ymin: 42.97651 xmax: -66.94983 ymax: 47.45969
## Geodetic CRS: NAD83
## # A tibble: 75 × 6
## date pop congress district n geom
## <dbl> <dbl> <dbl> <dbl> <dbl> <MULTIPOLYGON [°]>
## 1 1821 298335 17 1 1 (((-70.60849 42.97915, -70.60943 42.978…
## 2 1821 298335 17 2 1 (((-70.3383 43.54702, -70.33868 43.5471…
## 3 1821 298335 17 3 1 (((-69.8327 43.69528, -69.83285 43.6958…
## 4 1821 298335 17 4 1 (((-68.52323 44.08455, -68.52298 44.084…
## 5 1821 298335 17 5 1 (((-69.85915 43.97955, -69.85823 43.979…
## 6 1821 298335 17 6 1 (((-69.32775 47.35899, -69.32907 47.357…
## 7 1821 298335 17 7 1 (((-70.23268 46.28443, -70.23433 46.283…
## 8 1823 298335 18 1 5 (((-70.69416 43.79901, -70.69381 43.800…
## 9 1823 298335 18 2 5 (((-70.3383 43.54702, -70.33868 43.5471…
## 10 1823 298335 18 3 5 (((-69.8327 43.69528, -69.83285 43.6958…
## # ℹ 65 more rows
So, we have the start date (year), census state population (start of decade), congressional session number, district number, the number of congress with that spatial configuration and then the spatial data (as MULTIPOLYGON, which is perfect for Maine with so many islands.)
Note that it may seem like some congresses are missing, for instance we seem to jump from the 18th to the 23rd. To save memory (mostly on disk) the data are served in a sparse for with redundancies dropped. You can expand to the full set (complete with redundancies!) using
x = read_state(state = 'Maine', sparse = FALSE)Let’s look at Maine’s first Congress, the 17th, when the population was 298,335.
x17 = filter(x, congress == 17)
plot(x17['district'],
main = sprintf("Congressional Districts in %s", x17$date[1]))Soooooo, there were 7 districts with just ~300k people. That’s 42619 counted individuals per district.
Just for fun let’s look at the number of districts per Congress. It’s much faster to drop the geometry when counting spatial tables.
y = st_drop_geometry(x) |>
group_by(congress) |>
summarize(ndistricts = n(),
year = date[1],
ind_per_district = pop[1]/n())
gg1 = ggplot(data = y,
mapping = aes(x = congress, y = ndistricts)) +
geom_col()
gg2 = ggplot(data = y,
mapping = aes(x = congress, y = ind_per_district)) +
geom_col()
gg1/gg2Let’s pull out just district 1 and check that out (since it is the only one always present.)
d1 = filter(x, district == 1)
d1 = mutate(d1,
area = st_area(d1),
popdens = pop/area)
gg1 = ggplot(data = d1,
mapping = aes(x = congress, y = popdens)) +
geom_col()
gg2 = ggplot(data = d1,
mapping = aes(x = congress, y = area)) +
geom_col()
gg1 / gg2Let’s look at the geometry for district 1 for two congresses.
y = filter(x, congress %in% c(17, 119), district == 1)
y## Simple feature collection with 2 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -70.98904 ymin: 42.9776 xmax: -68.55392 ymax: 44.72027
## Geodetic CRS: NAD83
## # A tibble: 2 × 6
## date pop congress district n geom
## * <dbl> <dbl> <dbl> <dbl> <dbl> <MULTIPOLYGON [°]>
## 1 1821 298335 17 1 1 (((-70.60849 42.97915, -70.60943 42.978…
## 2 2023 1362359 119 1 1 (((-68.55919 44.04062, -68.55596 44.039…
Let’s reverse the order (makes for easier plotting results), and add a variable that allows for catagorical coloring of regions.
y = slice(y, 2:1) |>
dplyr::mutate(congr = as.character(congress), .after = congress)
plot(y['congr'])Let’s copy the 17th congress, and move it northeast.
z = slice(y,2) |>
st_translate(d = c(.5, 1)) |>
mutate(congr = paste0(congress, "-t"))
plot(bind_rows(y,z)['congr'])Let’s do again, but move it southeast and rotate it through 180 degrees.
z = slice(y,2) |>
st_translate(d = c(1,-.25)) |>
st_rotate(r = pi) |>
mutate(congr = paste0(congress, "-tr"))
plot(bind_rows(y,z)['congr'])




