## Processing 

This notebook reads in the following U.S. Census Bureau county-level datasets on population estimates:

*For the 2000-2010 period*
- [**Vintage 2010 postcensal estimates**](https://www.census.gov/programs-surveys/popest/technical-documentation/research/evaluation-estimates.html): estimates released in 2011 for the 2000-2010 period, with no knowledge of the Census count at the end of the decade
- [**2000-2010 intercensal estimates**](https://www.census.gov/data/datasets/time-series/demo/popest/intercensal-2000-2010-counties.html): revised estimates released in 2012 after the 2010 Census count. The preferred estimates for the period. [Per the Bureau](https://www.census.gov/programs-surveys/popest/guidance.html): "They differ from the postcensal estimates that are released annually because they rely on a formula that redistributes the difference between the April 1 postcensal estimate and April 1 census count for the end of the decade across the estimates for that decade."

*For the 2010-2018 period*
- [**Vintage 2018 postcensal estimates**](https://www.census.gov/data/tables/time-series/demo/popest/2010s-counties-total.html): estimates released in 2019 for the 2010-2018 period, produced (obviously) with no knowledge of the Census count in 2020. For explanations of the data fields in this file, see the [technical documentation](https://www2.census.gov/programs-surveys/popest/technical-documentation/file-layouts/2010-2018/co-est2018-alldata.pdf).

The raw datasets are saved in the `input/` folder. The processed datasets are saved in the `output/` folder.

The data are filtered to contain only counties in Maryland. 

In [None]:
suppressMessages(library('tidyverse'))
suppressMessages(library('reshape2'))
suppressMessages(library('janitor'))

2000-2010 period

In [63]:
process_00_10 <- function (type) {
    df <- read_csv(paste0('input/pop_00_10_', type, '.csv'))  %>% 
          clean_names() %>% 
          select(-sumlev, -state, -region, -division) %>% 
          filter(stname == 'Maryland') 

    df.m <- melt(df, id.vars = c('stname', 'ctyname', 'county')) %>% 
                 rename(label = variable, tot.pop = value) %>% 
                 mutate(year.label = 
                        case_when(grepl('2000', label) ~ 2000,
                                  grepl('2001', label) ~ 2001,
                                  grepl('2002', label) ~ 2002,
                                  grepl('2003', label) ~ 2003,
                                  grepl('2004', label) ~ 2004,
                                  grepl('2005', label) ~ 2005,
                                  grepl('2006', label) ~ 2006,
                                  grepl('2007', label) ~ 2007,
                                  grepl('2008', label) ~ 2008,
                                  grepl('2009', label) ~ 2009,
                                  grepl('2010', label) ~ 2010),
                       county = str_pad(county, 3, pad = "0"))
    
    return(df.m)
    }

md.inter <- suppressMessages(process_00_10('inter'))
md.post <- suppressMessages(process_00_10('post'))

write_csv(md.inter, 'output/md_inter_00_10.csv')
write_csv(md.post, 'output/md_post_00_10.csv')

2010-2018 period

In [66]:
md.18 <- suppressMessages(read_csv('input/CO-EST2018-Alldata.csv') %>% 
                            clean_names() %>% filter(state == '24') %>% select(-sumlev, 
                                                                               -region,
                                                                               -division,
                                                                               -state))
        
md.18.m <- melt(md.18, 
                id.vars = c('county', 'stname', 'ctyname')) %>% 
                mutate(year.label = str_extract(variable, "\\-*\\d+\\.*\\d*"))

write_csv(md.18.m, 'output/md_10_18.csv')