# Thematic Mapping using R

In this learning activity, we will cover the basic process to display spatial data with your audience using R. While other GIS software can do more, the functionality in R is fairly robust for most policy use-cases. Further, it has the advantages of being free and working easily with the other types of data and models with which you are familiar. 

There are several important packages, including **{sf}** (for interacting with and processing vector data in R), **{terra}** (for working with raster data, although we won't do much here), **{tigris}** and **{tidycensus}** for accessing spatial data from the U.S. Census, and more. The first time you install these packages, you might need to install some dependencies on your system. Be prepared for it to take a few minutes.  

In [None]:
# the pacman package allows you to check if a package is installed, 
# then load the package if installed and install it - then load it,
# if not already installed. Let's try it out!

# if needed, install pacman
install.packages('pacman')

pacman::p_load(
    tidyverse,      # tidy data management + graphing
    sf,             # spatial data management and processing
    tigris,         # downloading GIS files from Census
    tidycensus,     # downloading census data + GIS
    terra,          # raster data functions
    basemaps,       # download and use raster basemaps
    ggnewscale      # add new scale attributes to ggplot
)

We can use **{tidycensus}**'s `load_variables()` function to find variable names from the Census. Let's look for poverty in the 2023 American Community Survey, 5-year estimates.

In [None]:
v2023 <- load_variables(year = 2023, dataset = "acs5")

view(v2023)

Now we can use `get_acs()` to download the data by U.S. county. However, this time, we will *also* grab the geographic data using the `geometry` argument below.

In [None]:

df <- get_acs(geography = "county", 
              variables = c(total = "B16009_001",
                            poverty.count = "B16009_002"),
              geometry = TRUE,                              # says we want geography data, stored as a sf object
              cb = FALSE)                                   # if TRUE, means simplified boundaries

df2 <- df |> select(-moe) |>                                # drop the `moe` variable from data
             pivot_wider(names_from = variable,             # converts long data to wide data
              values_from = estimate) |>       
             mutate(pct.poverty = 100 * poverty.count / total) |> # creates percent variable
             filter(substr(GEOID, 1,2) != "72" &            # drops Puerto Rico, Alaska, and Hawaii
              substr(GEOID, 1,2) != "02" & 
              substr(GEOID, 1,2) != "15")


We can use our friendly neighborhood `ggplot()` for mapping. For perhaps an easier alternative, check out [`tmap`](https://r-tmap.github.io/tmap/). Maps are made using `geom_sf`. While we map, we will make liberal use of **{ggnewscale}**, because `ggplot` only lets you define one *fill* attribute layer at a time. We'll use `new_scale_fill` to add more layers. 

In [None]:
ggplot(df2, aes(fill = pct.poverty)) +      # identify data and fill variable
    geom_sf() +                             # identifies a map as the `geom`
    scale_fill_viridis_c()                  # uses the continuous viridis color scheme for fill

Not too bad. Let's see what this looks like when we re-project using Albers Equal Area Conic (very typical for maps of the contiguous U.S.). You can find the EPSG codes that you can use to load different projections [here](https://epsg.io/?q=), among other places.

In [None]:
# st_transform() will project or transform data to a different coordinate reference system
df2_albers <- st_transform(df2, crs = "EPSG:5069")

Before we remake the map, let's also make the county boundary lines smaller, move the legend around and give it a title, and get rid of the coordinates on the axis (typically not needed for thematic mapping).

In [None]:
ggplot(df2_albers, aes(fill = pct.poverty)) + 
            geom_sf(linewidth = .05) +                  # shrinks polygon boundaries
            scale_fill_viridis_c() +
                labs(fill = "Poverty (%)") +            # changes title of fill legend
                guides(fill = guide_legend(position = "inside")) +  # moves legend inside the plot
            theme_void() +                              # removes axis scales and grids
            theme(
                legend.position.inside = c(0.9, 0.225)  # places legend at 90% on x axis and 22.5% on y
            )                                           # of plot space

We can have some more fun with this. This time, let's make our own color scale using HEX color codes. In the code below, we replace `scale_fill_viridis_c()` with `scale_fill_gradientn()`, which allows to specify a number of colors to use in our scale.

In [None]:
ggplot(df2_albers, aes(fill = pct.poverty)) + 
        geom_sf(linewidth = .05) +
        scale_fill_gradientn(colors = c("#283747","white", "#d35400")) +
        labs(fill = "Poverty (%)") +
        guides(fill = guide_legend(position = "inside")) +
        theme_void() +
        theme(
            legend.position.inside = c(0.9, 0.225)
        )

This looks nice. Now you try! Create your own county map of poverty, this time choosing your own, self-designed color scheme using `scale_fill_gradientn()`.

Typically, we want to use more than one layer. For example, we might want to include state boundary lines to help our viewers understand where these counties are located. We can load just geographic data from the Census using functions from **{tigris}** package. 

In [None]:
states <- states(year = 2023)                           # states() from tigris will get state boundaries
states <- states |> filter(as.numeric(GEOID)<=56 & 
                            STUSPS != "AK" &            # filter out territories and AK and HI
                            STUSPS != "HI")
                            
states_albers <- st_transform(states, crs ="EPSG:5069") # project to Albers

In order to add another layer, we need to use `new_scale_fill()`, which resets the fill attributes and allows us to present another polygon layer. 

In [None]:
ggplot() +
    geom_sf(df2_albers, mapping = aes(fill = pct.poverty), 
                         linewidth = .05) +
    scale_fill_gradientn(colors = c("#283747","white", "#d35400")) +
    labs(fill = "Poverty (%)") +
    guides(fill = guide_legend(position = "inside")) +
    theme_void() +
    theme(
          legend.position.inside = c(0.9, 0.225)) +
    new_scale_fill() +                              # need to place the new layers at the end
        geom_sf(data = states_albers,           
            color = "grey15",
            fill = "transparent") +                 # makes transparent polygons
            scale_fill_identity()                   # required to map the boundaries instead of a variable


That was fun. Let's try one more map. This time, we will make a map of poverty in Appalachia, a region that spans portions of over a dozen states. The csv file below lists the county FIPS codes of all counties included in the Appalachian Regional Commission ([source](https://www.arc.gov/appalachian-counties-served-by-arc/)).

In [None]:

appalachia <- read.csv("https://raw.githubusercontent.com/bowendc/512_labs/refs/heads/main/appalachia.csv",
                       col.names = "fips",       # rename the single variable
                       colClasses = "character") # code as character so that leading zeros aren't dropped

fips <- as.list(appalachia$fips)  # convert to list


**{sf}** makes it very easy to process geospatial data because we can use the same `dplyr` functions for data management that we are already used to and familiar with. 

In [None]:
df3 <- df2 |> 
            filter(GEOID %in% fips) |>       # keep just counties in our appalachia list
            st_transform(crs = st_crs(3857)) # transform to Web Mercator projection


Map it. This time we use a different, pre-programmed viridis color scheme and remove county boundary lines altogether.

In [None]:
ggplot() + 
      geom_sf(data = df3,  aes(fill = pct.poverty),
            linewidth = 0) +
        scale_fill_viridis_c(option = "A") +
      theme_void()

On occasion, you may want to utilize basemaps: raster data that shows static information like topography, roads, labels, rivers, etc. The **{basemaps}** package let's you download these layers based on the extent of your data and include directly on your graph. *Note: I had to install the **{terra}** package before I could install and use **{basemaps}**.* Please review the [package website to see what map services and basemap types](https://jakob.schwalb-willmann.de/basemaps/index.html#supported-services-and-maps) are available. Some require you to register to access before you can download using the package. 

For this map, we will use a hillshade basemap. Hillshade is exactly what it sounds like; it provides shading based on topography so that elevation changes can be subtly viewed through shading. It is one way of visualizing terrain. 

In [None]:
ggplot() + 
  basemap_gglayer(df3,               # downloads basemap and adds as ggplot layer of df3 extent
    map_service = "esri",            # sets map service
    map_type = "world_hillshade")  + # chooses which basemap to use from service
  scale_fill_identity() +            # required for basemap
  new_scale_fill() +                 # add new data fill layer
  geom_sf(data = df3,  
          aes(fill = pct.poverty), 
          alpha = .4,                 # make partially transparent so we can see hillshade
          linewidth = 0) +
  scale_fill_viridis_c(option = "G",      # choose different viridis scheme
                       direction = -1) +  # and flips so lighter is lower and darker is higher 
    labs(fill = "Poverty (%)") +
    guides(fill = guide_legend(position = "bottom")) +
    theme_void() +
    theme(legend.justification.bottom = "center")


Play around with the basemaps. Find another basemap other then ESRI's hillshade maps that you like and can use as a layer underneath your poverty data. Display this new map in Quarto-rendered pdf. 