
I learned from the [Data is Plural newsletter](https://www.data-is-plural.com/) that the U.S. Postal Service regularly releases numbers of Change of Address (COA) filings by ZIP code. [Here's the data.](https://about.usps.com/who/legal/foia/library.htm)

This is an exploratory data analysis of the USPS's Change of Address data. I suspect if ZIP codes can be ranked by net change, then combining apartment rental data would be insightful.     
      



In [1]:
library(tidyverse)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.2 ──
[32m✔[39m [34mggplot2[39m 3.4.0      [32m✔[39m [34mpurrr  [39m 0.3.4 
[32m✔[39m [34mtibble [39m 3.1.6      [32m✔[39m [34mdplyr  [39m 1.0.10
[32m✔[39m [34mtidyr  [39m 1.2.1      [32m✔[39m [34mstringr[39m 1.4.0 
[32m✔[39m [34mreadr  [39m 2.1.3      [32m✔[39m [34mforcats[39m 0.5.2 
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()


In [42]:
# this is restricted to year 2023 as proof of concept. specify class type to keep leading 0s in ZIP codes.

y2023 <- read.csv("y2023_for_eda.csv", colClasses = c(zipcode = "character"), stringsAsFactors = FALSE)
str(y2023)
head(y2023)

In [44]:
# clean up city capitalization

y2023$city <- str_to_title(y2023$city)


# remove not-states. shld be 51 (inc. DC) and hackily recode the month. This works only because I know there's just 2 values


length(unique(y2023$state))

y2023 <- filter(y2023, state != "AA" & state != "VI" & state != "AE" & state != "AP" & state != "GU" & state != "MP" & state != "PR")

length(unique(y2023$state))
head(y2023)

unique(y2023$month)

y2023$month <- ifelse(y2023$month == 202301, "Jan.", "Feb.")

unique(y2023$month)


str(y2023)
head(y2023)


In [58]:
# Yes, we can easily calculate net change by ZIP codes or cities, monthly, annually, and/or over time.

y2023 %>%
    group_by(zipcode, city, state) %>%
    summarize(
    net_change = sum(total_to_less_biz) - sum(total_from_less_biz)
    ) %>%
    arrange(desc(net_change)) %>%
    head(25)

[1m[22m`summarise()` has grouped output by 'zipcode', 'city'. You can override using
the `.groups` argument.


zipcode,city,state,net_change
<chr>,<chr>,<chr>,<int>
34135,Bonita Springs,FL,1883
32162,The Villages,FL,1344
32163,The Villages,FL,1332
34145,Marco Island,FL,1331
33928,Estero,FL,1264
34112,Naples,FL,1211
34293,Venice,FL,1174
34114,Naples,FL,1158
33908,Fort Myers,FL,1063
34110,Naples,FL,970
