<a href="https://colab.research.google.com/github/ReidelVichot/LC_identification/blob/main/census_cleaning_11_18_24.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Problem Definition

Background

There seems to be a relationship between the agglomeration of logistical activiey and air pollution. To assess this relationship, the author will use a difference in difference natural experiment design with a synthetic control group. The level of analysis is US contigous counties from 1998 to 2022.

Problem

To conduct a synthetic control group analysis, we need to have a set of covariates and controls to construct this synthetic control group. Using information from the US census, we can construct a set of variables that includes covariates and control variables for the analysis.
I will get variables for each county and each year, including total population, share of white, share of male, age groups, industries, and time of commuting.

#Data Collection

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
# -- import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


In [3]:
# -- set directory path
dpath = "/content/drive/MyDrive/Disertation/Census/"
fname = "DECENNIALDPSF42000.DP1-Data.csv"
# -- create dataframes
decennial_00 = pd.read_csv(dpath + fname, skiprows=1)
# -- add year variable
decennial_00["year"] = 2000

#Data Cleaning

##Decennial 2000

In [4]:
# -- create GEOID
decennial_00["GEOID"] = decennial_00.Geography.str[-5:]

# -- select necesary columns
cols = ['GEOID','Geography', 'Geographic Area Name', 'Race/Ethnic Group',
       'Population Groups', 'Number!!Total population',
       'Number!!Total population!!SEX AND AGE!!Male',
       'Number!!Total population!!SEX AND AGE!!Female',
       'Number!!Total population!!SEX AND AGE!!Under 5 years',
       'Number!!Total population!!SEX AND AGE!!5 to 9 years',
       'Number!!Total population!!SEX AND AGE!!10 to 14 years',
       'Number!!Total population!!SEX AND AGE!!15 to 19 years',
       'Number!!Total population!!SEX AND AGE!!20 to 24 years',
       'Number!!Total population!!SEX AND AGE!!25 to 34 years',
       'Number!!Total population!!SEX AND AGE!!35 to 44 years',
       'Number!!Total population!!SEX AND AGE!!45 to 54 years',
       'Number!!Total population!!SEX AND AGE!!55 to 59 years',
       'Number!!Total population!!SEX AND AGE!!60 to 64 years',
       'Number!!Total population!!SEX AND AGE!!65 to 74 years',
       'Number!!Total population!!SEX AND AGE!!75 to 84 years',
       'Number!!Total population!!SEX AND AGE!!85 years and over',
       'Number!!Total population!!SEX AND AGE!!Median age (years)',
       'Number!!HOUSEHOLDS BY TYPE!!Households',
       'Number!!HOUSEHOLDS BY TYPE!!Households!!Average household size',
       'year']

decennial_00 = decennial_00[cols]

In [5]:
# -- rename columns for simplicity
renamed_columns = {'Number!!Total population': "total_population",
                   'Number!!Total population!!SEX AND AGE!!Male': "male",
                   'Number!!Total population!!SEX AND AGE!!Female': "female",
                   'Number!!Total population!!SEX AND AGE!!Under 5 years': "under_5_years",
                   'Number!!Total population!!SEX AND AGE!!5 to 9 years': '5_to_9_years',
                   'Number!!Total population!!SEX AND AGE!!10 to 14 years': '10_to_14_years',
                   'Number!!Total population!!SEX AND AGE!!15 to 19 years': '15_to_19_years',
                   'Number!!Total population!!SEX AND AGE!!20 to 24 years': '20_to_24_years',
                   'Number!!Total population!!SEX AND AGE!!25 to 34 years': '25_to_34_years',
                   'Number!!Total population!!SEX AND AGE!!35 to 44 years': '35_to_44_years',
                   'Number!!Total population!!SEX AND AGE!!45 to 54 years': '45_to_54_years',
                   'Number!!Total population!!SEX AND AGE!!55 to 59 years': '55_to_59_years',
                   'Number!!Total population!!SEX AND AGE!!60 to 64 years': '60_to_64_years',
                   'Number!!Total population!!SEX AND AGE!!65 to 74 years': '65_to_74_years',
                   'Number!!Total population!!SEX AND AGE!!75 to 84 years': '75_to_84_years',
                   'Number!!Total population!!SEX AND AGE!!85 years and over': '85_years_and_over',
                   'Number!!Total population!!SEX AND AGE!!Median age (years)': 'median_age',
                   'Number!!HOUSEHOLDS BY TYPE!!Households': 'households',
                   'Number!!HOUSEHOLDS BY TYPE!!Households!!Average household size': 'average_household_size'}
decennial_00.rename(columns=renamed_columns, inplace=True)

# -- create age groups
decennial_00["Age < 15"] = decennial_00['under_5_years'] + decennial_00['5_to_9_years'] + decennial_00['10_to_14_years']
decennial_00["Age 15-24"] = decennial_00['15_to_19_years'] + decennial_00['20_to_24_years']
decennial_00["Age 25-44"] = decennial_00['25_to_34_years'] + decennial_00['35_to_44_years']
decennial_00["Age 45-64"] = decennial_00['45_to_54_years'] + decennial_00['55_to_59_years'] + decennial_00['60_to_64_years']
decennial_00["Age >= 65"] = decennial_00['65_to_74_years'] + decennial_00['75_to_84_years'] + decennial_00['85_years_and_over']

# -- drop unnecesary columns
decennial_00.drop(columns= ['under_5_years', '5_to_9_years', '10_to_14_years',
                            '15_to_19_years', '20_to_24_years', '25_to_34_years',
                            '35_to_44_years', '45_to_54_years', '55_to_59_years',
                            '60_to_64_years', '65_to_74_years', '75_to_84_years',
                            '85_years_and_over'], inplace=True)

# -- create a variable for white population
white = decennial_00[decennial_00["Race/Ethnic Group"] == 2][["total_population"]].values.copy()

# -- add variable to the decennial dataframe
decennial_00 = decennial_00[decennial_00['Race/Ethnic Group'] == 1].reset_index(drop=True)
decennial_00["white"] = white
del white

decennial_00.drop(columns=['Geography', 'Geographic Area Name',
                           'Race/Ethnic Group', 'Population Groups'], inplace=True)

# -- organize columns
cols = ['GEOID', 'year', 'total_population', 'white', 'male', 'female', 'median_age',
       'households', 'average_household_size', 'Age < 15', 'Age 15-24',
       'Age 25-44', 'Age 45-64', 'Age >= 65']
decennial_00 = decennial_00[cols]


##ACS 2010

In [82]:
fname = "ACSST5Y2010.S0101-Data.csv"
acs5_10 = pd.read_csv(dpath + fname, skiprows=1)

In [83]:
# -- create GEOID
acs5_10["GEOID"] = acs5_10.Geography.str[-5:]
# -- create year
acs5_10["year"] = 2010

# -- drop columns
cols = ['GEOID', 'year',
        'Total!!Estimate!!Total population',
        'Male!!Estimate!!Total population',
        'Female!!Estimate!!Total population',
        'Total!!Estimate!!AGE!!Under 5 years',
        'Total!!Estimate!!AGE!!5 to 9 years',
        'Total!!Estimate!!AGE!!10 to 14 years',
        'Total!!Estimate!!AGE!!15 to 19 years',
        'Total!!Estimate!!AGE!!20 to 24 years',
        'Total!!Estimate!!AGE!!25 to 29 years',
        'Total!!Estimate!!AGE!!30 to 34 years',
        'Total!!Estimate!!AGE!!35 to 39 years',
        'Total!!Estimate!!AGE!!40 to 44 years',
        'Total!!Estimate!!AGE!!45 to 49 years',
        'Total!!Estimate!!AGE!!50 to 54 years',
        'Total!!Estimate!!AGE!!55 to 59 years',
        'Total!!Estimate!!AGE!!60 to 64 years',
        'Total!!Estimate!!AGE!!65 to 69 years',
        'Total!!Estimate!!AGE!!70 to 74 years',
        'Total!!Estimate!!AGE!!75 to 79 years',
        'Total!!Estimate!!AGE!!80 to 84 years',
        'Total!!Estimate!!AGE!!85 years and over',
        'Total!!Estimate!!SUMMARY INDICATORS!!Median age (years)']

acs5_10 = acs5_10[cols]
# -- rename columns for simplicity
renamed_columns = {'Total!!Estimate!!Total population': "total_population",
                   'Male!!Estimate!!Total population': "male",
                   'Female!!Estimate!!Total population': "female",
                   'Total!!Estimate!!AGE!!Under 5 years': "under_5_years",
                   'Total!!Estimate!!AGE!!5 to 9 years': '5_to_9_years',
                   'Total!!Estimate!!AGE!!10 to 14 years': '10_to_14_years',
                   'Total!!Estimate!!AGE!!15 to 19 years': '15_to_19_years',
                   'Total!!Estimate!!AGE!!20 to 24 years': '20_to_24_years',
                   'Total!!Estimate!!AGE!!25 to 29 years': '25_to_29_years',
                   'Total!!Estimate!!AGE!!30 to 34 years': '30_to_34_years',
                   'Total!!Estimate!!AGE!!35 to 39 years': '35_to_39_years',
                   'Total!!Estimate!!AGE!!40 to 44 years': '40_to_44_years',
                   'Total!!Estimate!!AGE!!45 to 49 years': '45_to_49_years',
                   'Total!!Estimate!!AGE!!50 to 54 years': '50_to_54_years',
                   'Total!!Estimate!!AGE!!55 to 59 years': '55_to_59_years',
                   'Total!!Estimate!!AGE!!60 to 64 years': '60_to_64_years',
                   'Total!!Estimate!!AGE!!65 to 69 years': '65_to_69_years',
                   'Total!!Estimate!!AGE!!70 to 74 years': '70_to_74_years',
                   'Total!!Estimate!!AGE!!75 to 79 years': '75_to_79_years',
                   'Total!!Estimate!!AGE!!80 to 84 years': '80_to_84_years',
                   'Total!!Estimate!!AGE!!85 years and over': '85_years_and_over',
                   'Total!!Estimate!!SUMMARY INDICATORS!!Median age (years)': 'median_age'}
acs5_10 = acs5_10.rename(columns=renamed_columns).copy()

# -- create age groups
acs5_10["Age < 15"] = acs5_10['under_5_years'] + acs5_10['5_to_9_years'] + acs5_10['10_to_14_years']
acs5_10["Age 15-24"] = acs5_10['15_to_19_years'] + acs5_10['20_to_24_years']
acs5_10["Age 25-44"] = acs5_10['25_to_29_years'] + acs5_10['30_to_34_years'] + acs5_10['35_to_39_years'] + acs5_10['40_to_44_years']
acs5_10["Age 45-64"] = acs5_10['45_to_49_years'] + acs5_10['50_to_54_years'] + acs5_10['55_to_59_years'] + acs5_10['60_to_64_years']
acs5_10["Age >= 65"] = acs5_10['65_to_69_years'] + acs5_10['70_to_74_years'] + acs5_10['75_to_79_years'] + acs5_10['80_to_84_years'] + acs5_10['85_years_and_over']


# -- drop unnecesary columns
acs5_10.drop(columns=['under_5_years', '5_to_9_years', '10_to_14_years',
                      '15_to_19_years', '20_to_24_years', '25_to_29_years',
                      '30_to_34_years', '35_to_39_years', '40_to_44_years',
                      '45_to_49_years', '50_to_54_years', '55_to_59_years',
                      '60_to_64_years', '65_to_69_years', '70_to_74_years',
                      '75_to_79_years', '80_to_84_years', '85_years_and_over'], inplace=True)

In [84]:
acs5_10_dp05 = pd.read_csv(dpath + "ACSDP5Y2010.DP05-Data.csv", skiprows=1, low_memory=False)

In [85]:
acs5_10.head()

Unnamed: 0,GEOID,year,total_population,male,female,median_age,Age < 15,Age 15-24,Age 25-44,Age 45-64,Age >= 65
0,1001,2010,53155,25780,27375,36.2,22.1,14.1,27.4,24.9,11.5
1,1003,2010,175791,85902,89889,41.0,19.3,11.6,24.6,28.1,16.5
2,1005,2010,27699,14652,13047,38.0,18.4,13.4,27.3,27.0,13.9
3,1007,2010,22610,12162,10448,38.3,19.1,14.3,28.1,26.2,12.5
4,1009,2010,56692,28080,28612,38.3,20.5,12.4,26.5,26.5,14.3


In [86]:
# -- add white variable from ACS DP05
acs5_10_dp05 = pd.read_csv(dpath + "ACSDP5Y2010.DP05-Data.csv", skiprows=1, low_memory=False)
acs5_10_dp05 = acs5_10_dp05[['Geography',"Estimate!!RACE!!White", "Estimate!!Total housing units"]]
acs5_10_dp05['GEOID'] = acs5_10_dp05.Geography.str[-5:]
acs5_10_dp05.drop(columns=['Geography'], inplace=True)
acs5_10_dp05.rename(columns={'Estimate!!RACE!!White': 'white', "Estimate!!Total housing units":'households'}, inplace=True)

acs5_10 = acs5_10.merge(acs5_10_dp05, on='GEOID').copy()
del acs5_10_dp05
acs5_10.head()



Unnamed: 0,GEOID,year,total_population,male,female,median_age,Age < 15,Age 15-24,Age 25-44,Age 45-64,Age >= 65,white,households
0,1001,2010,53155,25780,27375,36.2,22.1,14.1,27.4,24.9,11.5,42758,21530
1,1003,2010,175791,85902,89889,41.0,19.3,11.6,24.6,28.1,16.5,153434,101093
2,1005,2010,27699,14652,13047,38.0,18.4,13.4,27.3,27.0,13.9,13972,12011
3,1007,2010,22610,12162,10448,38.3,19.1,14.3,28.1,26.2,12.5,19054,8885
4,1009,2010,56692,28080,28612,38.3,20.5,12.4,26.5,26.5,14.3,54543,23482


##ACS 2011

In [20]:
acs5_11 = pd.read_csv(dpath + "ACSDP5Y2011.DP05-Data.csv", skiprows=1, low_memory=False)
# -- create GEOID
acs5_11["GEOID"] = acs5_11.Geography.str[-5:]
# -- create year
acs5_11["year"] = 2011

# -- select columns
cols = ['GEOID', 'year',
         'Estimate!!RACE!!White',
         'Estimate!!SEX AND AGE!!Under 5 years',
         'Estimate!!SEX AND AGE!!5 to 9 years',
         'Estimate!!SEX AND AGE!!10 to 14 years',
         'Estimate!!SEX AND AGE!!15 to 19 years',
         'Estimate!!SEX AND AGE!!20 to 24 years',
         'Estimate!!SEX AND AGE!!25 to 34 years',
         'Estimate!!SEX AND AGE!!35 to 44 years',
         'Estimate!!SEX AND AGE!!45 to 54 years',
         'Estimate!!SEX AND AGE!!55 to 59 years',
         'Estimate!!SEX AND AGE!!60 to 64 years',
         'Estimate!!SEX AND AGE!!65 years and over',
         'Estimate!!SEX AND AGE!!Female',
         'Estimate!!SEX AND AGE!!Male',
         'Estimate!!SEX AND AGE!!Median age (years)',
         'Estimate!!SEX AND AGE!!Total population',
         'Estimate!!Total housing units']
acs5_11 = acs5_11[cols]


In [21]:
renamed_columns = {'Estimate!!RACE!!White': 'white',
         'Estimate!!SEX AND AGE!!Under 5 years': "under_5_years" ,
         'Estimate!!SEX AND AGE!!5 to 9 years': '5_to_9_years',
         'Estimate!!SEX AND AGE!!10 to 14 years': '10_to_14_years',
         'Estimate!!SEX AND AGE!!15 to 19 years': '15_to_19_years',
         'Estimate!!SEX AND AGE!!20 to 24 years': '20_to_24_years',
         'Estimate!!SEX AND AGE!!25 to 34 years': '25_to_34_years',
         'Estimate!!SEX AND AGE!!35 to 44 years': '35_to_44_years',
         'Estimate!!SEX AND AGE!!45 to 54 years': '45_to_54_years',
         'Estimate!!SEX AND AGE!!55 to 59 years': '55_to_59_years',
         'Estimate!!SEX AND AGE!!60 to 64 years': '60_to_64_years',
         'Estimate!!SEX AND AGE!!65 years and over': '65_years_and_over',
         'Estimate!!SEX AND AGE!!Female': 'female',
         'Estimate!!SEX AND AGE!!Male': 'male',
         'Estimate!!SEX AND AGE!!Median age (years)': 'median_age',
         'Estimate!!SEX AND AGE!!Total population': 'total population',
         'Estimate!!Total housing units': 'households'}
acs5_11.rename(columns=renamed_columns, inplace=True)

# -- create age groups
acs5_11["Age < 15"] = acs5_11['under_5_years'] + acs5_11['5_to_9_years'] + acs5_11['10_to_14_years']
acs5_11["Age 15-24"] = acs5_11['15_to_19_years'] + acs5_11['20_to_24_years']
acs5_11["Age 25-44"] = acs5_11['25_to_34_years'] + acs5_11['35_to_44_years']
acs5_11["Age 45-64"] = acs5_11['45_to_54_years'] + acs5_11['55_to_59_years'] + acs5_11['60_to_64_years']
acs5_11["Age >= 65"] = acs5_11['65_years_and_over']


# -- drop unnecesary columns
acs5_11.drop(columns=['under_5_years', '5_to_9_years', '10_to_14_years',
                      '15_to_19_years', '20_to_24_years', '25_to_34_years',
                      '35_to_44_years', '45_to_54_years', '55_to_59_years',
                      '60_to_64_years', '65_years_and_over'], inplace=True)



##ACS 2012

In [27]:
acs5_12 = pd.read_csv(dpath + "ACSDP5Y2012.DP05-Data.csv", skiprows=1, low_memory=False)

# -- create GEOID
acs5_12["GEOID"] = acs5_12.Geography.str[-5:]
# -- create year
acs5_12["year"] = 2012

# -- select columns
cols = ['GEOID', 'year',
         'Estimate!!RACE!!White',
         'Estimate!!SEX AND AGE!!Under 5 years',
         'Estimate!!SEX AND AGE!!5 to 9 years',
         'Estimate!!SEX AND AGE!!10 to 14 years',
         'Estimate!!SEX AND AGE!!15 to 19 years',
         'Estimate!!SEX AND AGE!!20 to 24 years',
         'Estimate!!SEX AND AGE!!25 to 34 years',
         'Estimate!!SEX AND AGE!!35 to 44 years',
         'Estimate!!SEX AND AGE!!45 to 54 years',
         'Estimate!!SEX AND AGE!!55 to 59 years',
         'Estimate!!SEX AND AGE!!60 to 64 years',
         'Estimate!!SEX AND AGE!!65 years and over',
         'Estimate!!SEX AND AGE!!Female',
         'Estimate!!SEX AND AGE!!Male',
         'Estimate!!SEX AND AGE!!Median age (years)',
         'Estimate!!SEX AND AGE!!Total population',
         'Estimate!!Total housing units']
acs5_12 = acs5_12[cols]

renamed_columns = {'Estimate!!RACE!!White': 'white',
         'Estimate!!SEX AND AGE!!Under 5 years': "under_5_years" ,
         'Estimate!!SEX AND AGE!!5 to 9 years': '5_to_9_years',
         'Estimate!!SEX AND AGE!!10 to 14 years': '10_to_14_years',
         'Estimate!!SEX AND AGE!!15 to 19 years': '15_to_19_years',
         'Estimate!!SEX AND AGE!!20 to 24 years': '20_to_24_years',
         'Estimate!!SEX AND AGE!!25 to 34 years': '25_to_34_years',
         'Estimate!!SEX AND AGE!!35 to 44 years': '35_to_44_years',
         'Estimate!!SEX AND AGE!!45 to 54 years': '45_to_54_years',
         'Estimate!!SEX AND AGE!!55 to 59 years': '55_to_59_years',
         'Estimate!!SEX AND AGE!!60 to 64 years': '60_to_64_years',
         'Estimate!!SEX AND AGE!!65 years and over': '65_years_and_over',
         'Estimate!!SEX AND AGE!!Female': 'female',
         'Estimate!!SEX AND AGE!!Male': 'male',
         'Estimate!!SEX AND AGE!!Median age (years)': 'median_age',
         'Estimate!!SEX AND AGE!!Total population': 'total population',
         'Estimate!!Total housing units': 'households'}
acs5_12.rename(columns=renamed_columns, inplace=True)

# -- create age groups
acs5_12["Age < 15"] = acs5_12['under_5_years'] + acs5_12['5_to_9_years'] + acs5_12['10_to_14_years']
acs5_12["Age 15-24"] = acs5_12['15_to_19_years'] + acs5_12['20_to_24_years']
acs5_12["Age 25-44"] = acs5_12['25_to_34_years'] + acs5_12['35_to_44_years']
acs5_12["Age 45-64"] = acs5_12['45_to_54_years'] + acs5_12['55_to_59_years'] + acs5_12['60_to_64_years']
acs5_12["Age >= 65"] = acs5_12['65_years_and_over']


# -- drop unnecesary columns
acs5_12.drop(columns=['under_5_years', '5_to_9_years', '10_to_14_years',
                      '15_to_19_years', '20_to_24_years', '25_to_34_years',
                      '35_to_44_years', '45_to_54_years', '55_to_59_years',
                      '60_to_64_years', '65_years_and_over'], inplace=True)
acs5_12.head()

Unnamed: 0,GEOID,year,white,female,male,median_age,total population,households,Age < 15,Age 15-24,Age 25-44,Age 45-64,Age >= 65
0,1001,2012,43841,28052,26538,37.0,54590,22077,11885,7561,14507,13945,6692
1,1003,2012,160739,93956,89270,41.2,183226,103984,34953,21002,44796,51424,31051
2,1005,2012,13816,12720,14749,38.2,27469,11878,4966,3521,7448,7518,4016
3,1007,2012,17661,10615,12154,39.4,22769,8958,4395,2856,6136,6441,2941
4,1009,2012,55359,28977,28489,39.1,57466,23761,11530,7233,14729,15418,8556


##ACS 2013

In [31]:
acs5_13 = pd.read_csv(dpath + "ACSDP5Y2013.DP05-Data.csv", skiprows=1, low_memory=False)

# -- create GEOID
acs5_13["GEOID"] = acs5_13.Geography.str[-5:]
# -- create year
acs5_13["year"] = 2013

# -- select columns
cols = ['GEOID', 'year',
         'Estimate!!RACE!!One race!!White',
         'Estimate!!SEX AND AGE!!Under 5 years',
         'Estimate!!SEX AND AGE!!5 to 9 years',
         'Estimate!!SEX AND AGE!!10 to 14 years',
         'Estimate!!SEX AND AGE!!15 to 19 years',
         'Estimate!!SEX AND AGE!!20 to 24 years',
         'Estimate!!SEX AND AGE!!25 to 34 years',
         'Estimate!!SEX AND AGE!!35 to 44 years',
         'Estimate!!SEX AND AGE!!45 to 54 years',
         'Estimate!!SEX AND AGE!!55 to 59 years',
         'Estimate!!SEX AND AGE!!60 to 64 years',
         'Estimate!!SEX AND AGE!!65 years and over',
         'Estimate!!SEX AND AGE!!Total population!!Female',
         'Estimate!!SEX AND AGE!!Total population!!Male',
         'Estimate!!SEX AND AGE!!Median age (years)',
         'Estimate!!SEX AND AGE!!Total population',
         'Estimate!!Total housing units']
acs5_13 = acs5_13[cols]

renamed_columns = {'Estimate!!RACE!!One race!!White': 'white',
         'Estimate!!SEX AND AGE!!Under 5 years': "under_5_years" ,
         'Estimate!!SEX AND AGE!!5 to 9 years': '5_to_9_years',
         'Estimate!!SEX AND AGE!!10 to 14 years': '10_to_14_years',
         'Estimate!!SEX AND AGE!!15 to 19 years': '15_to_19_years',
         'Estimate!!SEX AND AGE!!20 to 24 years': '20_to_24_years',
         'Estimate!!SEX AND AGE!!25 to 34 years': '25_to_34_years',
         'Estimate!!SEX AND AGE!!35 to 44 years': '35_to_44_years',
         'Estimate!!SEX AND AGE!!45 to 54 years': '45_to_54_years',
         'Estimate!!SEX AND AGE!!55 to 59 years': '55_to_59_years',
         'Estimate!!SEX AND AGE!!60 to 64 years': '60_to_64_years',
         'Estimate!!SEX AND AGE!!65 years and over': '65_years_and_over',
         'Estimate!!SEX AND AGE!!Total population!!Female': 'female',
         'Estimate!!SEX AND AGE!!Total population!!Male': 'male',
         'Estimate!!SEX AND AGE!!Median age (years)': 'median_age',
         'Estimate!!SEX AND AGE!!Total population': 'total population',
         'Estimate!!Total housing units': 'households'}
acs5_13.rename(columns=renamed_columns, inplace=True)

# -- create age groups
acs5_13["Age < 15"] = acs5_13['under_5_years'] + acs5_13['5_to_9_years'] + acs5_13['10_to_14_years']
acs5_13["Age 15-24"] = acs5_13['15_to_19_years'] + acs5_13['20_to_24_years']
acs5_13["Age 25-44"] = acs5_13['25_to_34_years'] + acs5_13['35_to_44_years']
acs5_13["Age 45-64"] = acs5_13['45_to_54_years'] + acs5_13['55_to_59_years'] + acs5_13['60_to_64_years']
acs5_13["Age >= 65"] = acs5_13['65_years_and_over']


# -- drop unnecesary columns
acs5_13.drop(columns=['under_5_years', '5_to_9_years', '10_to_14_years',
                      '15_to_19_years', '20_to_24_years', '25_to_34_years',
                      '35_to_44_years', '45_to_54_years', '55_to_59_years',
                      '60_to_64_years', '65_years_and_over'], inplace=True)


Unnamed: 0,GEOID,year,white,female,male,median_age,total population,households,Age < 15,Age 15-24,Age 25-44,Age 45-64,Age >= 65
0,1001,2012,42997,28114,26793,37.5,54907,22220,11792,7419,14535,14216,6945
1,1003,2012,161737,95701,91413,41.5,187114,104648,35216,21835,45251,52455,32357
2,1005,2012,12960,12687,14634,38.3,27321,11790,4898,3435,7456,7428,4104
3,1007,2012,17461,10411,12343,39.4,22754,8939,4121,3293,6106,6144,3090
4,1009,2012,54890,29019,28604,39.6,57623,23767,11499,7278,14459,15583,8804


##ACS 2014

In [32]:
acs5_14 = pd.read_csv(dpath + "ACSDP5Y2014.DP05-Data.csv", skiprows=1, low_memory=False)

# -- create GEOID
acs5_14["GEOID"] = acs5_14.Geography.str[-5:]
# -- create year
acs5_14["year"] = 2014

# -- select columns
cols = ['GEOID', 'year',
         'Estimate!!RACE!!One race!!White',
         'Estimate!!SEX AND AGE!!Under 5 years',
         'Estimate!!SEX AND AGE!!5 to 9 years',
         'Estimate!!SEX AND AGE!!10 to 14 years',
         'Estimate!!SEX AND AGE!!15 to 19 years',
         'Estimate!!SEX AND AGE!!20 to 24 years',
         'Estimate!!SEX AND AGE!!25 to 34 years',
         'Estimate!!SEX AND AGE!!35 to 44 years',
         'Estimate!!SEX AND AGE!!45 to 54 years',
         'Estimate!!SEX AND AGE!!55 to 59 years',
         'Estimate!!SEX AND AGE!!60 to 64 years',
         'Estimate!!SEX AND AGE!!65 years and over',
         'Estimate!!SEX AND AGE!!Total population!!Female',
         'Estimate!!SEX AND AGE!!Total population!!Male',
         'Estimate!!SEX AND AGE!!Median age (years)',
         'Estimate!!SEX AND AGE!!Total population',
         'Estimate!!Total housing units']
acs5_14 = acs5_14[cols]

renamed_columns = {'Estimate!!RACE!!One race!!White': 'white',
         'Estimate!!SEX AND AGE!!Under 5 years': "under_5_years" ,
         'Estimate!!SEX AND AGE!!5 to 9 years': '5_to_9_years',
         'Estimate!!SEX AND AGE!!10 to 14 years': '10_to_14_years',
         'Estimate!!SEX AND AGE!!15 to 19 years': '15_to_19_years',
         'Estimate!!SEX AND AGE!!20 to 24 years': '20_to_24_years',
         'Estimate!!SEX AND AGE!!25 to 34 years': '25_to_34_years',
         'Estimate!!SEX AND AGE!!35 to 44 years': '35_to_44_years',
         'Estimate!!SEX AND AGE!!45 to 54 years': '45_to_54_years',
         'Estimate!!SEX AND AGE!!55 to 59 years': '55_to_59_years',
         'Estimate!!SEX AND AGE!!60 to 64 years': '60_to_64_years',
         'Estimate!!SEX AND AGE!!65 years and over': '65_years_and_over',
         'Estimate!!SEX AND AGE!!Total population!!Female': 'female',
         'Estimate!!SEX AND AGE!!Total population!!Male': 'male',
         'Estimate!!SEX AND AGE!!Median age (years)': 'median_age',
         'Estimate!!SEX AND AGE!!Total population': 'total population',
         'Estimate!!Total housing units': 'households'}
acs5_14.rename(columns=renamed_columns, inplace=True)

# -- create age groups
acs5_14["Age < 15"] = acs5_14['under_5_years'] + acs5_14['5_to_9_years'] + acs5_14['10_to_14_years']
acs5_14["Age 15-24"] = acs5_14['15_to_19_years'] + acs5_14['20_to_24_years']
acs5_14["Age 25-44"] = acs5_14['25_to_34_years'] + acs5_14['35_to_44_years']
acs5_14["Age 45-64"] = acs5_14['45_to_54_years'] + acs5_14['55_to_59_years'] + acs5_14['60_to_64_years']
acs5_14["Age >= 65"] = acs5_14['65_years_and_over']


# -- drop unnecesary columns
acs5_14.drop(columns=['under_5_years', '5_to_9_years', '10_to_14_years',
                      '15_to_19_years', '20_to_24_years', '25_to_34_years',
                      '35_to_44_years', '45_to_54_years', '55_to_59_years',
                      '60_to_64_years', '65_years_and_over'], inplace=True)


Unnamed: 0,GEOID,year,white,female,male,median_age,total population,households,Age < 15,Age 15-24,Age 25-44,Age 45-64,Age >= 65
0,1001,2014,43011,28362,26774,37.9,55136,22431,11667,7236,14581,14331,7321
1,1003,2014,165673,97976,93229,41.8,191205,105563,35779,22159,46166,53319,33782
2,1005,2014,12806,12557,14562,38.3,27119,11833,4892,3391,7265,7391,4180
3,1007,2014,17379,10433,12220,40.0,22653,8985,3876,3094,6156,6318,3209
4,1009,2014,54927,29071,28574,40.2,57645,23868,11400,7193,14293,15587,9172


##ACS 2015

In [35]:
acs5_15 = pd.read_csv(dpath + "ACSDP5Y2015.DP05-Data.csv", skiprows=1, low_memory=False)

# -- create GEOID
acs5_15["GEOID"] = acs5_15.Geography.str[-5:]
# -- create year
acs5_15["year"] = 2015

# -- select columns
cols = ['GEOID', 'year',
         'Estimate!!RACE!!One race!!White',
         'Estimate!!SEX AND AGE!!Under 5 years',
         'Estimate!!SEX AND AGE!!5 to 9 years',
         'Estimate!!SEX AND AGE!!10 to 14 years',
         'Estimate!!SEX AND AGE!!15 to 19 years',
         'Estimate!!SEX AND AGE!!20 to 24 years',
         'Estimate!!SEX AND AGE!!25 to 34 years',
         'Estimate!!SEX AND AGE!!35 to 44 years',
         'Estimate!!SEX AND AGE!!45 to 54 years',
         'Estimate!!SEX AND AGE!!55 to 59 years',
         'Estimate!!SEX AND AGE!!60 to 64 years',
         'Estimate!!SEX AND AGE!!65 years and over',
         'Estimate!!SEX AND AGE!!Total population!!Female',
         'Estimate!!SEX AND AGE!!Total population!!Male',
         'Estimate!!SEX AND AGE!!Median age (years)',
         'Estimate!!SEX AND AGE!!Total population',
         'Estimate!!Total housing units']
acs5_15 = acs5_15[cols]

renamed_columns = {'Estimate!!RACE!!One race!!White': 'white',
         'Estimate!!SEX AND AGE!!Under 5 years': "under_5_years" ,
         'Estimate!!SEX AND AGE!!5 to 9 years': '5_to_9_years',
         'Estimate!!SEX AND AGE!!10 to 14 years': '10_to_14_years',
         'Estimate!!SEX AND AGE!!15 to 19 years': '15_to_19_years',
         'Estimate!!SEX AND AGE!!20 to 24 years': '20_to_24_years',
         'Estimate!!SEX AND AGE!!25 to 34 years': '25_to_34_years',
         'Estimate!!SEX AND AGE!!35 to 44 years': '35_to_44_years',
         'Estimate!!SEX AND AGE!!45 to 54 years': '45_to_54_years',
         'Estimate!!SEX AND AGE!!55 to 59 years': '55_to_59_years',
         'Estimate!!SEX AND AGE!!60 to 64 years': '60_to_64_years',
         'Estimate!!SEX AND AGE!!65 years and over': '65_years_and_over',
         'Estimate!!SEX AND AGE!!Total population!!Female': 'female',
         'Estimate!!SEX AND AGE!!Total population!!Male': 'male',
         'Estimate!!SEX AND AGE!!Median age (years)': 'median_age',
         'Estimate!!SEX AND AGE!!Total population': 'total population',
         'Estimate!!Total housing units': 'households'}
acs5_15.rename(columns=renamed_columns, inplace=True)

# -- create age groups
acs5_15["Age < 15"] = acs5_15['under_5_years'] + acs5_15['5_to_9_years'] + acs5_15['10_to_14_years']
acs5_15["Age 15-24"] = acs5_15['15_to_19_years'] + acs5_15['20_to_24_years']
acs5_15["Age 25-44"] = acs5_15['25_to_34_years'] + acs5_15['35_to_44_years']
acs5_15["Age 45-64"] = acs5_15['45_to_54_years'] + acs5_15['55_to_59_years'] + acs5_15['60_to_64_years']
acs5_15["Age >= 65"] = acs5_15['65_years_and_over']


# -- drop unnecesary columns
acs5_15.drop(columns=['under_5_years', '5_to_9_years', '10_to_14_years',
                      '15_to_19_years', '20_to_24_years', '25_to_34_years',
                      '35_to_44_years', '45_to_54_years', '55_to_59_years',
                      '60_to_64_years', '65_years_and_over'], inplace=True)


In [34]:
acs5_15.head()

Unnamed: 0,GEOID,year,white,female,male,median_age,total population,households,Age < 15,Age 15-24,Age 25-44,Age 45-64,Age >= 65
0,1001,2014,42741,28476,26745,37.7,55221,22582,11446,7456,14440,14401,7478
1,1003,2014,168646,99807,95314,42.2,195121,106422,36149,21971,47367,54258,35376
2,1005,2014,12756,12435,14497,38.8,26932,11810,4798,3326,7230,7253,4325
3,1007,2014,17327,10531,12073,38.9,22604,8971,4011,2964,6531,5843,3255
4,1009,2014,54881,29198,28512,40.7,57710,23860,11239,7063,14213,15683,9512


##ACS 2016

In [36]:
acs5_16 = pd.read_csv(dpath + "ACSDP5Y2016.DP05-Data.csv", skiprows=1, low_memory=False)

# -- create GEOID
acs5_16["GEOID"] = acs5_16.Geography.str[-5:]
# -- create year
acs5_16["year"] = 2016

# -- select columns
cols = ['GEOID', 'year',
         'Estimate!!RACE!!One race!!White',
         'Estimate!!SEX AND AGE!!Under 5 years',
         'Estimate!!SEX AND AGE!!5 to 9 years',
         'Estimate!!SEX AND AGE!!10 to 14 years',
         'Estimate!!SEX AND AGE!!15 to 19 years',
         'Estimate!!SEX AND AGE!!20 to 24 years',
         'Estimate!!SEX AND AGE!!25 to 34 years',
         'Estimate!!SEX AND AGE!!35 to 44 years',
         'Estimate!!SEX AND AGE!!45 to 54 years',
         'Estimate!!SEX AND AGE!!55 to 59 years',
         'Estimate!!SEX AND AGE!!60 to 64 years',
         'Estimate!!SEX AND AGE!!65 years and over',
         'Estimate!!SEX AND AGE!!Total population!!Female',
         'Estimate!!SEX AND AGE!!Total population!!Male',
         'Estimate!!SEX AND AGE!!Median age (years)',
         'Estimate!!SEX AND AGE!!Total population',
         'Estimate!!Total housing units']
acs5_16 = acs5_16[cols]

renamed_columns = {'Estimate!!RACE!!One race!!White': 'white',
         'Estimate!!SEX AND AGE!!Under 5 years': "under_5_years" ,
         'Estimate!!SEX AND AGE!!5 to 9 years': '5_to_9_years',
         'Estimate!!SEX AND AGE!!10 to 14 years': '10_to_14_years',
         'Estimate!!SEX AND AGE!!15 to 19 years': '15_to_19_years',
         'Estimate!!SEX AND AGE!!20 to 24 years': '20_to_24_years',
         'Estimate!!SEX AND AGE!!25 to 34 years': '25_to_34_years',
         'Estimate!!SEX AND AGE!!35 to 44 years': '35_to_44_years',
         'Estimate!!SEX AND AGE!!45 to 54 years': '45_to_54_years',
         'Estimate!!SEX AND AGE!!55 to 59 years': '55_to_59_years',
         'Estimate!!SEX AND AGE!!60 to 64 years': '60_to_64_years',
         'Estimate!!SEX AND AGE!!65 years and over': '65_years_and_over',
         'Estimate!!SEX AND AGE!!Total population!!Female': 'female',
         'Estimate!!SEX AND AGE!!Total population!!Male': 'male',
         'Estimate!!SEX AND AGE!!Median age (years)': 'median_age',
         'Estimate!!SEX AND AGE!!Total population': 'total population',
         'Estimate!!Total housing units': 'households'}
acs5_16.rename(columns=renamed_columns, inplace=True)

# -- create age groups
acs5_16["Age < 15"] = acs5_16['under_5_years'] + acs5_16['5_to_9_years'] + acs5_16['10_to_14_years']
acs5_16["Age 15-24"] = acs5_16['15_to_19_years'] + acs5_16['20_to_24_years']
acs5_16["Age 25-44"] = acs5_16['25_to_34_years'] + acs5_16['35_to_44_years']
acs5_16["Age 45-64"] = acs5_16['45_to_54_years'] + acs5_16['55_to_59_years'] + acs5_16['60_to_64_years']
acs5_16["Age >= 65"] = acs5_16['65_years_and_over']


# -- drop unnecesary columns
acs5_16.drop(columns=['under_5_years', '5_to_9_years', '10_to_14_years',
                      '15_to_19_years', '20_to_24_years', '25_to_34_years',
                      '35_to_44_years', '45_to_54_years', '55_to_59_years',
                      '60_to_64_years', '65_years_and_over'], inplace=True)


##ACS 2017

In [43]:
acs5_17 = pd.read_csv(dpath + "ACSDP5Y2017.DP05-Data.csv", skiprows=1, low_memory=False)

# -- create GEOID
acs5_17["GEOID"] = acs5_17.Geography.str[-5:]
# -- create year
acs5_17["year"] = 2017

# -- select columns
cols = ['GEOID', 'year',
         'Estimate!!RACE!!Total population!!One race!!White',
         'Estimate!!SEX AND AGE!!Total population!!Under 5 years',
         'Estimate!!SEX AND AGE!!Total population!!5 to 9 years',
         'Estimate!!SEX AND AGE!!Total population!!10 to 14 years',
         'Estimate!!SEX AND AGE!!Total population!!15 to 19 years',
         'Estimate!!SEX AND AGE!!Total population!!20 to 24 years',
         'Estimate!!SEX AND AGE!!Total population!!25 to 34 years',
         'Estimate!!SEX AND AGE!!Total population!!35 to 44 years',
         'Estimate!!SEX AND AGE!!Total population!!45 to 54 years',
         'Estimate!!SEX AND AGE!!Total population!!55 to 59 years',
         'Estimate!!SEX AND AGE!!Total population!!60 to 64 years',
         'Estimate!!SEX AND AGE!!Total population!!65 years and over',
         'Estimate!!SEX AND AGE!!Total population!!Female',
         'Estimate!!SEX AND AGE!!Total population!!Male',
         'Estimate!!SEX AND AGE!!Total population!!Median age (years)',
         'Estimate!!RACE!!Total population',
         'Estimate!!Total housing units']
acs5_17 = acs5_17[cols]

renamed_columns = {'Estimate!!RACE!!Total population!!One race!!White': 'white',
         'Estimate!!SEX AND AGE!!Total population!!Under 5 years': "under_5_years" ,
         'Estimate!!SEX AND AGE!!Total population!!5 to 9 years': '5_to_9_years',
         'Estimate!!SEX AND AGE!!Total population!!10 to 14 years': '10_to_14_years',
         'Estimate!!SEX AND AGE!!Total population!!15 to 19 years': '15_to_19_years',
         'Estimate!!SEX AND AGE!!Total population!!20 to 24 years': '20_to_24_years',
         'Estimate!!SEX AND AGE!!Total population!!25 to 34 years': '25_to_34_years',
         'Estimate!!SEX AND AGE!!Total population!!35 to 44 years': '35_to_44_years',
         'Estimate!!SEX AND AGE!!Total population!!45 to 54 years': '45_to_54_years',
         'Estimate!!SEX AND AGE!!Total population!!55 to 59 years': '55_to_59_years',
         'Estimate!!SEX AND AGE!!Total population!!60 to 64 years': '60_to_64_years',
         'Estimate!!SEX AND AGE!!Total population!!65 years and over': '65_years_and_over',
         'Estimate!!SEX AND AGE!!Total population!!Female': 'female',
         'Estimate!!SEX AND AGE!!Total population!!Male': 'male',
         'Estimate!!SEX AND AGE!!Total population!!Median age (years)': 'median_age',
         'Estimate!!RACE!!Total population': 'total population',
         'Estimate!!Total housing units': 'households'}
acs5_17.rename(columns=renamed_columns, inplace=True)

# -- create age groups
acs5_17["Age < 15"] = acs5_17['under_5_years'] + acs5_17['5_to_9_years'] + acs5_17['10_to_14_years']
acs5_17["Age 15-24"] = acs5_17['15_to_19_years'] + acs5_17['20_to_24_years']
acs5_17["Age 25-44"] = acs5_17['25_to_34_years'] + acs5_17['35_to_44_years']
acs5_17["Age 45-64"] = acs5_17['45_to_54_years'] + acs5_17['55_to_59_years'] + acs5_17['60_to_64_years']
acs5_17["Age >= 65"] = acs5_17['65_years_and_over']


# -- drop unnecesary columns
acs5_17.drop(columns=['under_5_years', '5_to_9_years', '10_to_14_years',
                      '15_to_19_years', '20_to_24_years', '25_to_34_years',
                      '35_to_44_years', '45_to_54_years', '55_to_59_years',
                      '60_to_64_years', '65_years_and_over'], inplace=True)

##ACS 2018

In [45]:
acs5_18 = pd.read_csv(dpath + "ACSDP5Y2018.DP05-Data.csv", skiprows=1, low_memory=False)

# -- create GEOID
acs5_18["GEOID"] = acs5_18.Geography.str[-5:]
# -- create year
acs5_18["year"] = 2018

# -- select columns
cols = ['GEOID', 'year',
         'Estimate!!RACE!!Total population!!One race!!White',
         'Estimate!!SEX AND AGE!!Total population!!Under 5 years',
         'Estimate!!SEX AND AGE!!Total population!!5 to 9 years',
         'Estimate!!SEX AND AGE!!Total population!!10 to 14 years',
         'Estimate!!SEX AND AGE!!Total population!!15 to 19 years',
         'Estimate!!SEX AND AGE!!Total population!!20 to 24 years',
         'Estimate!!SEX AND AGE!!Total population!!25 to 34 years',
         'Estimate!!SEX AND AGE!!Total population!!35 to 44 years',
         'Estimate!!SEX AND AGE!!Total population!!45 to 54 years',
         'Estimate!!SEX AND AGE!!Total population!!55 to 59 years',
         'Estimate!!SEX AND AGE!!Total population!!60 to 64 years',
         'Estimate!!SEX AND AGE!!Total population!!65 years and over',
         'Estimate!!SEX AND AGE!!Total population!!Female',
         'Estimate!!SEX AND AGE!!Total population!!Male',
         'Estimate!!SEX AND AGE!!Total population!!Median age (years)',
         'Estimate!!RACE!!Total population',
         'Estimate!!Total housing units']
acs5_18 = acs5_18[cols]

renamed_columns = {'Estimate!!RACE!!Total population!!One race!!White': 'white',
         'Estimate!!SEX AND AGE!!Total population!!Under 5 years': "under_5_years" ,
         'Estimate!!SEX AND AGE!!Total population!!5 to 9 years': '5_to_9_years',
         'Estimate!!SEX AND AGE!!Total population!!10 to 14 years': '10_to_14_years',
         'Estimate!!SEX AND AGE!!Total population!!15 to 19 years': '15_to_19_years',
         'Estimate!!SEX AND AGE!!Total population!!20 to 24 years': '20_to_24_years',
         'Estimate!!SEX AND AGE!!Total population!!25 to 34 years': '25_to_34_years',
         'Estimate!!SEX AND AGE!!Total population!!35 to 44 years': '35_to_44_years',
         'Estimate!!SEX AND AGE!!Total population!!45 to 54 years': '45_to_54_years',
         'Estimate!!SEX AND AGE!!Total population!!55 to 59 years': '55_to_59_years',
         'Estimate!!SEX AND AGE!!Total population!!60 to 64 years': '60_to_64_years',
         'Estimate!!SEX AND AGE!!Total population!!65 years and over': '65_years_and_over',
         'Estimate!!SEX AND AGE!!Total population!!Female': 'female',
         'Estimate!!SEX AND AGE!!Total population!!Male': 'male',
         'Estimate!!SEX AND AGE!!Total population!!Median age (years)': 'median_age',
         'Estimate!!RACE!!Total population': 'total population',
         'Estimate!!Total housing units': 'households'}
acs5_18.rename(columns=renamed_columns, inplace=True)

# -- create age groups
acs5_18["Age < 15"] = acs5_18['under_5_years'] + acs5_18['5_to_9_years'] + acs5_18['10_to_14_years']
acs5_18["Age 15-24"] = acs5_18['15_to_19_years'] + acs5_18['20_to_24_years']
acs5_18["Age 25-44"] = acs5_18['25_to_34_years'] + acs5_18['35_to_44_years']
acs5_18["Age 45-64"] = acs5_18['45_to_54_years'] + acs5_18['55_to_59_years'] + acs5_18['60_to_64_years']
acs5_18["Age >= 65"] = acs5_18['65_years_and_over']


# -- drop unnecesary columns
acs5_18.drop(columns=['under_5_years', '5_to_9_years', '10_to_14_years',
                      '15_to_19_years', '20_to_24_years', '25_to_34_years',
                      '35_to_44_years', '45_to_54_years', '55_to_59_years',
                      '60_to_64_years', '65_years_and_over'], inplace=True)

In [46]:
acs5_18.head()

Unnamed: 0,GEOID,year,white,female,male,median_age,total population,households,Age < 15,Age 15-24,Age 25-44,Age 45-64,Age >= 65
0,1001,2018,42437,28326,26874,37.8,55200,23315,10842,7192,14438,14678,8050
1,1003,2018,179526,106919,101188,42.8,208107,111945,37621,23497,48703,57621,40665
2,1005,2018,12216,12085,13697,39.9,25782,11937,4517,3092,6779,6760,4634
3,1007,2018,17268,10375,12152,39.9,22527,9161,3742,3005,5970,6149,3661
4,1009,2018,55054,29211,28434,40.8,57645,24222,11112,6906,13939,15455,10233


##ACS 2019

In [50]:
acs5_19 = pd.read_csv(dpath + "ACSDP5Y2019.DP05-Data.csv", skiprows=1, low_memory=False)

# -- create GEOID
acs5_19["GEOID"] = acs5_19.Geography.str[-5:]
# -- create year
acs5_19["year"] = 2019

# -- select columns
cols = ['GEOID', 'year',
         'Estimate!!RACE!!Total population!!One race!!White',
         'Estimate!!SEX AND AGE!!Total population!!Under 5 years',
         'Estimate!!SEX AND AGE!!Total population!!5 to 9 years',
         'Estimate!!SEX AND AGE!!Total population!!10 to 14 years',
         'Estimate!!SEX AND AGE!!Total population!!15 to 19 years',
         'Estimate!!SEX AND AGE!!Total population!!20 to 24 years',
         'Estimate!!SEX AND AGE!!Total population!!25 to 34 years',
         'Estimate!!SEX AND AGE!!Total population!!35 to 44 years',
         'Estimate!!SEX AND AGE!!Total population!!45 to 54 years',
         'Estimate!!SEX AND AGE!!Total population!!55 to 59 years',
         'Estimate!!SEX AND AGE!!Total population!!60 to 64 years',
         'Estimate!!SEX AND AGE!!Total population!!65 years and over',
         'Estimate!!SEX AND AGE!!Total population!!Female',
         'Estimate!!SEX AND AGE!!Total population!!Male',
         'Estimate!!SEX AND AGE!!Total population!!Median age (years)',
         'Estimate!!RACE!!Total population',
         'Estimate!!Total housing units']
acs5_19 = acs5_19[cols]

renamed_columns = {'Estimate!!RACE!!Total population!!One race!!White': 'white',
         'Estimate!!SEX AND AGE!!Total population!!Under 5 years': "under_5_years" ,
         'Estimate!!SEX AND AGE!!Total population!!5 to 9 years': '5_to_9_years',
         'Estimate!!SEX AND AGE!!Total population!!10 to 14 years': '10_to_14_years',
         'Estimate!!SEX AND AGE!!Total population!!15 to 19 years': '15_to_19_years',
         'Estimate!!SEX AND AGE!!Total population!!20 to 24 years': '20_to_24_years',
         'Estimate!!SEX AND AGE!!Total population!!25 to 34 years': '25_to_34_years',
         'Estimate!!SEX AND AGE!!Total population!!35 to 44 years': '35_to_44_years',
         'Estimate!!SEX AND AGE!!Total population!!45 to 54 years': '45_to_54_years',
         'Estimate!!SEX AND AGE!!Total population!!55 to 59 years': '55_to_59_years',
         'Estimate!!SEX AND AGE!!Total population!!60 to 64 years': '60_to_64_years',
         'Estimate!!SEX AND AGE!!Total population!!65 years and over': '65_years_and_over',
         'Estimate!!SEX AND AGE!!Total population!!Female': 'female',
         'Estimate!!SEX AND AGE!!Total population!!Male': 'male',
         'Estimate!!SEX AND AGE!!Total population!!Median age (years)': 'median_age',
         'Estimate!!RACE!!Total population': 'total population',
         'Estimate!!Total housing units': 'households'}
acs5_19.rename(columns=renamed_columns, inplace=True)

# -- create age groups
acs5_19["Age < 15"]  = acs5_19['under_5_years']  + acs5_19['5_to_9_years']  + acs5_19['10_to_14_years']
acs5_19["Age 15-24"] = acs5_19['15_to_19_years'] + acs5_19['20_to_24_years']
acs5_19["Age 25-44"] = acs5_19['25_to_34_years'] + acs5_19['35_to_44_years']
acs5_19["Age 45-64"] = acs5_19['45_to_54_years'] + acs5_19['55_to_59_years'] + acs5_19['60_to_64_years']
acs5_19["Age >= 65"] = acs5_19['65_years_and_over']


# -- drop unnecesary columns
acs5_19.drop(columns=['under_5_years', '5_to_9_years', '10_to_14_years',
                      '15_to_19_years', '20_to_24_years', '25_to_34_years',
                      '35_to_44_years', '45_to_54_years', '55_to_59_years',
                      '60_to_64_years', '65_years_and_over'], inplace=True)

In [51]:
acs5_19.head()

Unnamed: 0,GEOID,year,white,female,male,median_age,total population,households,Age < 15,Age 15-24,Age 25-44,Age 45-64,Age >= 65
0,1001,2019,42527,28446,26934,38.2,55380,23493,10631,7382,14362,14722,8283
1,1003,2019,183471,109334,103496,43.0,212830,114164,38009,23709,49776,58805,42531
2,1005,2019,11869,11940,13421,40.4,25361,12013,4393,3004,6680,6574,4710
3,1007,2019,17272,10343,12150,40.9,22493,9185,3666,2659,6186,6398,3584
4,1009,2019,55062,29186,28495,40.7,57681,24323,11056,6834,14102,15363,10326


##ACS 2020

In [52]:
acs5_20 = pd.read_csv(dpath + "ACSDP5Y2020.DP05-Data.csv", skiprows=1, low_memory=False)

# -- create GEOID
acs5_20["GEOID"] = acs5_20.Geography.str[-5:]
# -- create year
acs5_20["year"] = 2020

# -- select columns
cols = ['GEOID', 'year',
         'Estimate!!RACE!!Total population!!One race!!White',
         'Estimate!!SEX AND AGE!!Total population!!Under 5 years',
         'Estimate!!SEX AND AGE!!Total population!!5 to 9 years',
         'Estimate!!SEX AND AGE!!Total population!!10 to 14 years',
         'Estimate!!SEX AND AGE!!Total population!!15 to 19 years',
         'Estimate!!SEX AND AGE!!Total population!!20 to 24 years',
         'Estimate!!SEX AND AGE!!Total population!!25 to 34 years',
         'Estimate!!SEX AND AGE!!Total population!!35 to 44 years',
         'Estimate!!SEX AND AGE!!Total population!!45 to 54 years',
         'Estimate!!SEX AND AGE!!Total population!!55 to 59 years',
         'Estimate!!SEX AND AGE!!Total population!!60 to 64 years',
         'Estimate!!SEX AND AGE!!Total population!!65 years and over',
         'Estimate!!SEX AND AGE!!Total population!!Female',
         'Estimate!!SEX AND AGE!!Total population!!Male',
         'Estimate!!SEX AND AGE!!Total population!!Median age (years)',
         'Estimate!!RACE!!Total population',
         'Estimate!!Total housing units']
acs5_20 = acs5_20[cols]

renamed_columns = {'Estimate!!RACE!!Total population!!One race!!White': 'white',
         'Estimate!!SEX AND AGE!!Total population!!Under 5 years': "under_5_years" ,
         'Estimate!!SEX AND AGE!!Total population!!5 to 9 years': '5_to_9_years',
         'Estimate!!SEX AND AGE!!Total population!!10 to 14 years': '10_to_14_years',
         'Estimate!!SEX AND AGE!!Total population!!15 to 19 years': '15_to_19_years',
         'Estimate!!SEX AND AGE!!Total population!!20 to 24 years': '20_to_24_years',
         'Estimate!!SEX AND AGE!!Total population!!25 to 34 years': '25_to_34_years',
         'Estimate!!SEX AND AGE!!Total population!!35 to 44 years': '35_to_44_years',
         'Estimate!!SEX AND AGE!!Total population!!45 to 54 years': '45_to_54_years',
         'Estimate!!SEX AND AGE!!Total population!!55 to 59 years': '55_to_59_years',
         'Estimate!!SEX AND AGE!!Total population!!60 to 64 years': '60_to_64_years',
         'Estimate!!SEX AND AGE!!Total population!!65 years and over': '65_years_and_over',
         'Estimate!!SEX AND AGE!!Total population!!Female': 'female',
         'Estimate!!SEX AND AGE!!Total population!!Male': 'male',
         'Estimate!!SEX AND AGE!!Total population!!Median age (years)': 'median_age',
         'Estimate!!RACE!!Total population': 'total population',
         'Estimate!!Total housing units': 'households'}
acs5_20.rename(columns=renamed_columns, inplace=True)

# -- create age groups
acs5_20["Age < 15"]  = acs5_20['under_5_years']  + acs5_20['5_to_9_years']  + acs5_20['10_to_14_years']
acs5_20["Age 15-24"] = acs5_20['15_to_19_years'] + acs5_20['20_to_24_years']
acs5_20["Age 25-44"] = acs5_20['25_to_34_years'] + acs5_20['35_to_44_years']
acs5_20["Age 45-64"] = acs5_20['45_to_54_years'] + acs5_20['55_to_59_years'] + acs5_20['60_to_64_years']
acs5_20["Age >= 65"] = acs5_20['65_years_and_over']


# -- drop unnecesary columns
acs5_20.drop(columns=['under_5_years', '5_to_9_years', '10_to_14_years',
                      '15_to_19_years', '20_to_24_years', '25_to_34_years',
                      '35_to_44_years', '45_to_54_years', '55_to_59_years',
                      '60_to_64_years', '65_years_and_over'], inplace=True)

##ACS 2021

In [54]:
acs5_21 = pd.read_csv(dpath + "ACSDP5Y2021.DP05-Data.csv", skiprows=1, low_memory=False)

# -- create GEOID
acs5_21["GEOID"] = acs5_21.Geography.str[-5:]
# -- create year
acs5_21["year"] = 2021

# -- select columns
cols = ['GEOID', 'year',
         'Estimate!!RACE!!Total population!!One race!!White',
         'Estimate!!SEX AND AGE!!Total population!!Under 5 years',
         'Estimate!!SEX AND AGE!!Total population!!5 to 9 years',
         'Estimate!!SEX AND AGE!!Total population!!10 to 14 years',
         'Estimate!!SEX AND AGE!!Total population!!15 to 19 years',
         'Estimate!!SEX AND AGE!!Total population!!20 to 24 years',
         'Estimate!!SEX AND AGE!!Total population!!25 to 34 years',
         'Estimate!!SEX AND AGE!!Total population!!35 to 44 years',
         'Estimate!!SEX AND AGE!!Total population!!45 to 54 years',
         'Estimate!!SEX AND AGE!!Total population!!55 to 59 years',
         'Estimate!!SEX AND AGE!!Total population!!60 to 64 years',
         'Estimate!!SEX AND AGE!!Total population!!65 years and over',
         'Estimate!!SEX AND AGE!!Total population!!Female',
         'Estimate!!SEX AND AGE!!Total population!!Male',
         'Estimate!!SEX AND AGE!!Total population!!Median age (years)',
         'Estimate!!RACE!!Total population',
         'Estimate!!Total housing units']
acs5_21 = acs5_21[cols]

renamed_columns = {'Estimate!!RACE!!Total population!!One race!!White': 'white',
         'Estimate!!SEX AND AGE!!Total population!!Under 5 years': "under_5_years" ,
         'Estimate!!SEX AND AGE!!Total population!!5 to 9 years': '5_to_9_years',
         'Estimate!!SEX AND AGE!!Total population!!10 to 14 years': '10_to_14_years',
         'Estimate!!SEX AND AGE!!Total population!!15 to 19 years': '15_to_19_years',
         'Estimate!!SEX AND AGE!!Total population!!20 to 24 years': '20_to_24_years',
         'Estimate!!SEX AND AGE!!Total population!!25 to 34 years': '25_to_34_years',
         'Estimate!!SEX AND AGE!!Total population!!35 to 44 years': '35_to_44_years',
         'Estimate!!SEX AND AGE!!Total population!!45 to 54 years': '45_to_54_years',
         'Estimate!!SEX AND AGE!!Total population!!55 to 59 years': '55_to_59_years',
         'Estimate!!SEX AND AGE!!Total population!!60 to 64 years': '60_to_64_years',
         'Estimate!!SEX AND AGE!!Total population!!65 years and over': '65_years_and_over',
         'Estimate!!SEX AND AGE!!Total population!!Female': 'female',
         'Estimate!!SEX AND AGE!!Total population!!Male': 'male',
         'Estimate!!SEX AND AGE!!Total population!!Median age (years)': 'median_age',
         'Estimate!!RACE!!Total population': 'total population',
         'Estimate!!Total housing units': 'households'}
acs5_21.rename(columns=renamed_columns, inplace=True)

# -- create age groups
acs5_21["Age < 15"]  = acs5_21['under_5_years']  + acs5_21['5_to_9_years']  + acs5_21['10_to_14_years']
acs5_21["Age 15-24"] = acs5_21['15_to_19_years'] + acs5_21['20_to_24_years']
acs5_21["Age 25-44"] = acs5_21['25_to_34_years'] + acs5_21['35_to_44_years']
acs5_21["Age 45-64"] = acs5_21['45_to_54_years'] + acs5_21['55_to_59_years'] + acs5_21['60_to_64_years']
acs5_21["Age >= 65"] = acs5_21['65_years_and_over']


# -- drop unnecesary columns
acs5_21.drop(columns=['under_5_years', '5_to_9_years', '10_to_14_years',
                      '15_to_19_years', '20_to_24_years', '25_to_34_years',
                      '35_to_44_years', '45_to_54_years', '55_to_59_years',
                      '60_to_64_years', '65_years_and_over'], inplace=True)

In [55]:
acs5_21.head()

Unnamed: 0,GEOID,year,white,female,male,median_age,total population,households,Age < 15,Age 15-24,Age 25-44,Age 45-64,Age >= 65
0,1001,2021,43755,30033,28206,38.5,58239,24170,11038,7587,15178,15621,8815
1,1003,2021,192034,116350,110781,43.4,227131,121763,40172,24982,52642,62530,46805
2,1005,2021,11495,11898,13361,40.2,25259,11667,4343,2921,6744,6450,4801
3,1007,2021,17020,10112,12300,39.7,22412,9013,3700,2655,6396,6067,3594
4,1009,2021,54439,29354,29530,41.1,58884,24527,11239,6977,14169,15915,10584


##ACS 2022

In [56]:
acs5_22 = pd.read_csv(dpath + "ACSDP5Y2022.DP05-Data.csv", skiprows=1, low_memory=False)

# -- create GEOID
acs5_22["GEOID"] = acs5_22.Geography.str[-5:]
# -- create year
acs5_22["year"] = 2022

# -- select columns
cols = ['GEOID', 'year',
         'Estimate!!RACE!!Total population!!One race!!White',
         'Estimate!!SEX AND AGE!!Total population!!Under 5 years',
         'Estimate!!SEX AND AGE!!Total population!!5 to 9 years',
         'Estimate!!SEX AND AGE!!Total population!!10 to 14 years',
         'Estimate!!SEX AND AGE!!Total population!!15 to 19 years',
         'Estimate!!SEX AND AGE!!Total population!!20 to 24 years',
         'Estimate!!SEX AND AGE!!Total population!!25 to 34 years',
         'Estimate!!SEX AND AGE!!Total population!!35 to 44 years',
         'Estimate!!SEX AND AGE!!Total population!!45 to 54 years',
         'Estimate!!SEX AND AGE!!Total population!!55 to 59 years',
         'Estimate!!SEX AND AGE!!Total population!!60 to 64 years',
         'Estimate!!SEX AND AGE!!Total population!!65 years and over',
         'Estimate!!SEX AND AGE!!Total population!!Female',
         'Estimate!!SEX AND AGE!!Total population!!Male',
         'Estimate!!SEX AND AGE!!Total population!!Median age (years)',
         'Estimate!!RACE!!Total population',
         'Estimate!!Total housing units']
acs5_22 = acs5_22[cols]

renamed_columns = {'Estimate!!RACE!!Total population!!One race!!White': 'white',
         'Estimate!!SEX AND AGE!!Total population!!Under 5 years': "under_5_years" ,
         'Estimate!!SEX AND AGE!!Total population!!5 to 9 years': '5_to_9_years',
         'Estimate!!SEX AND AGE!!Total population!!10 to 14 years': '10_to_14_years',
         'Estimate!!SEX AND AGE!!Total population!!15 to 19 years': '15_to_19_years',
         'Estimate!!SEX AND AGE!!Total population!!20 to 24 years': '20_to_24_years',
         'Estimate!!SEX AND AGE!!Total population!!25 to 34 years': '25_to_34_years',
         'Estimate!!SEX AND AGE!!Total population!!35 to 44 years': '35_to_44_years',
         'Estimate!!SEX AND AGE!!Total population!!45 to 54 years': '45_to_54_years',
         'Estimate!!SEX AND AGE!!Total population!!55 to 59 years': '55_to_59_years',
         'Estimate!!SEX AND AGE!!Total population!!60 to 64 years': '60_to_64_years',
         'Estimate!!SEX AND AGE!!Total population!!65 years and over': '65_years_and_over',
         'Estimate!!SEX AND AGE!!Total population!!Female': 'female',
         'Estimate!!SEX AND AGE!!Total population!!Male': 'male',
         'Estimate!!SEX AND AGE!!Total population!!Median age (years)': 'median_age',
         'Estimate!!RACE!!Total population': 'total population',
         'Estimate!!Total housing units': 'households'}
acs5_22.rename(columns=renamed_columns, inplace=True)

# -- create age groups
acs5_22["Age < 15"]  = acs5_22['under_5_years']  + acs5_22['5_to_9_years']  + acs5_22['10_to_14_years']
acs5_22["Age 15-24"] = acs5_22['15_to_19_years'] + acs5_22['20_to_24_years']
acs5_22["Age 25-44"] = acs5_22['25_to_34_years'] + acs5_22['35_to_44_years']
acs5_22["Age 45-64"] = acs5_22['45_to_54_years'] + acs5_22['55_to_59_years'] + acs5_22['60_to_64_years']
acs5_22["Age >= 65"] = acs5_22['65_years_and_over']


# -- drop unnecesary columns
acs5_22.drop(columns=['under_5_years', '5_to_9_years', '10_to_14_years',
                      '15_to_19_years', '20_to_24_years', '25_to_34_years',
                      '35_to_44_years', '45_to_54_years', '55_to_59_years',
                      '60_to_64_years', '65_years_and_over'], inplace=True)

#Data Preparation

In [89]:
# It sees that there was a mistake in the data collection phase that I could not resolve before.
# The issue was that the dataset acs5_10 has the age categories in percentages.
# This portion of the code will solve that.

temp = acs5_10.copy()
temp['Age < 15'] = temp['Age < 15'] * 0.01 * temp['total_population']
temp['Age 15-24'] = temp['Age 15-24'] * 0.01 * temp['total_population']
temp['Age 25-44'] = temp['Age 25-44'] * 0.01 * temp['total_population']
temp['Age 45-64'] = temp['Age 45-64'] * 0.01 * temp['total_population']
temp['Age >= 65'] = temp['Age >= 65'] * 0.01 * temp['total_population']

acs5_10 = temp.copy()