Skip to content

BigelowLab/congrdist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Polygons in Congress

Requirememnts

Installation

pak::pak("BigelowLab/congrdist")

Summary

This is a tutorial on using sf R package to learn about the many things we can do using spatial math. In particular we’ll work with this tutorial on geometrical operations.

Data

We’ll use congressional district boundary data from Jeffrey B Lewis and others.

Jeffrey B. Lewis, Brandon DeVine, Lincoln Pitcher, and Kenneth C. Martis. (2013) Digital Boundary Definitions of U.S. Congressional Districts, 1789-2012.

We use congressional districts because they are fun and an endless source of interest.

A smaller dataset

We have cloned the data repository and extracted from it the congressional boundaries for Maine. We then mutated the dataset to add a Congressional (session 1) start date (well, year).

suppressPackageStartupMessages({
  library(congrdist)
  library(sf)
  library(dplyr)
  library(units)
  library(ggplot2)
  library(patchwork)
})

x = read_state(state = "Maine")
x
## Simple feature collection with 75 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -71.08481 ymin: 42.97651 xmax: -66.94983 ymax: 47.45969
## Geodetic CRS:  NAD83
## # A tibble: 75 × 6
##     date    pop congress district     n                                     geom
##    <dbl>  <dbl>    <dbl>    <dbl> <dbl>                       <MULTIPOLYGON [°]>
##  1  1821 298335       17        1     1 (((-70.60849 42.97915, -70.60943 42.978…
##  2  1821 298335       17        2     1 (((-70.3383 43.54702, -70.33868 43.5471…
##  3  1821 298335       17        3     1 (((-69.8327 43.69528, -69.83285 43.6958…
##  4  1821 298335       17        4     1 (((-68.52323 44.08455, -68.52298 44.084…
##  5  1821 298335       17        5     1 (((-69.85915 43.97955, -69.85823 43.979…
##  6  1821 298335       17        6     1 (((-69.32775 47.35899, -69.32907 47.357…
##  7  1821 298335       17        7     1 (((-70.23268 46.28443, -70.23433 46.283…
##  8  1823 298335       18        1     5 (((-70.69416 43.79901, -70.69381 43.800…
##  9  1823 298335       18        2     5 (((-70.3383 43.54702, -70.33868 43.5471…
## 10  1823 298335       18        3     5 (((-69.8327 43.69528, -69.83285 43.6958…
## # ℹ 65 more rows

So, we have the start date (year), census state population (start of decade), congressional session number, district number, the number of congress with that spatial configuration and then the spatial data (as MULTIPOLYGON, which is perfect for Maine with so many islands.)

Sparse? Expanded?

Note that it may seem like some congresses are missing, for instance we seem to jump from the 18th to the 23rd. To save memory (mostly on disk) the data are served in a sparse for with redundancies dropped. You can expand to the full set (complete with redundancies!) using

x = read_state(state = 'Maine', sparse = FALSE)

A look at one congress

Let’s look at Maine’s first Congress, the 17th, when the population was 298,335.

x17 = filter(x, congress == 17)
plot(x17['district'], 
     main = sprintf("Congressional Districts in %s", x17$date[1]))

Soooooo, there were 7 districts with just ~300k people. That’s 42619 counted individuals per district.

Districts over time

Just for fun let’s look at the number of districts per Congress. It’s much faster to drop the geometry when counting spatial tables.

y = st_drop_geometry(x) |>
  group_by(congress) |>
  summarize(ndistricts = n(), 
            year = date[1], 
            ind_per_district = pop[1]/n())

gg1 = ggplot(data = y,
             mapping = aes(x = congress, y = ndistricts)) + 
  geom_col()
gg2 = ggplot(data = y, 
             mapping = aes(x = congress, y = ind_per_district)) + 
  geom_col()
gg1/gg2

A look at one district

Let’s pull out just district 1 and check that out (since it is the only one always present.)

d1 = filter(x, district == 1)
d1 = mutate(d1, 
            area = st_area(d1),
            popdens = pop/area)
gg1 = ggplot(data = d1,
       mapping = aes(x = congress, y = popdens)) + 
      geom_col()
gg2 = ggplot(data = d1,
       mapping = aes(x = congress, y = area)) + 
      geom_col()
gg1 / gg2

# Geometry

Let’s look at the geometry for district 1 for two congresses.

y = filter(x, congress %in% c(17, 119), district == 1) 
y
## Simple feature collection with 2 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -70.98904 ymin: 42.9776 xmax: -68.55392 ymax: 44.72027
## Geodetic CRS:  NAD83
## # A tibble: 2 × 6
##    date     pop congress district     n                                     geom
## * <dbl>   <dbl>    <dbl>    <dbl> <dbl>                       <MULTIPOLYGON [°]>
## 1  1821  298335       17        1     1 (((-70.60849 42.97915, -70.60943 42.978…
## 2  2023 1362359      119        1     1 (((-68.55919 44.04062, -68.55596 44.039…

Let’s reverse the order (makes for easier plotting results), and add a variable that allows for catagorical coloring of regions.

y = slice(y, 2:1) |>
  dplyr::mutate(congr = as.character(congress), .after = congress)
plot(y['congr'])

Let’s copy the 17th congress, and move it northeast.

z = slice(y,2) |>
  st_translate(d = c(.5, 1)) |>
  mutate(congr = paste0(congress, "-t"))
plot(bind_rows(y,z)['congr'])

Let’s do again, but move it southeast and rotate it through 180 degrees.

z = slice(y,2) |>
  st_translate(d = c(1,-.25)) |>
  st_rotate(r = pi) |>
  mutate(congr = paste0(congress, "-tr"))
plot(bind_rows(y,z)['congr'])

About

Tutorial on spatial polygons using historical congressional districts

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages