2016 Census Data for Selected Variables - Baltimore City

In [1]:
#pip install us

In [2]:
# pip install censusgeocode
# pip install censusdata

In [3]:
#conda install -c conda-forge cenpy

In [4]:
#conda update -n base -c defaults conda


In [5]:
# From https://cenpy-devs.github.io/cenpy/:
# Cenpy (pronounced sen-pie) is a package that automatically discovers US Census Bureau API endpoints and exposes them to Python in a consistent fashion. 
# It also provides easy-to-use access to certain well-used data products, like the American Community Survey (ACS) and 2010 Decennial Census.
#pip install cenpy

In [6]:
# From https://www.census.gov/programs-surveys/acs/guidance/comparing-acs-data.html:
# "Due to the impact of the COVID-19 pandemic, the Census Bureau changed the 2020 ACS release. 
# Instead of providing the standard 1-year data products, the Census Bureau released experimental estimates from the 1-year data. 
# Data users should not compare 2020 ACS 1-year experimental estimates with any other data.""

In [7]:
# Dependencies
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import requests
from census import Census
from us import states
import censusdata 
import censusgeocode as cg
import cenpy
import gmaps
import time
from scipy.stats import linregress
from matplotlib import pyplot as plt

# Census & gmaps API Keys
from config import (api_key, gkey)
c = Census(api_key, year=2016)

# Configure gmaps
#gmaps.configure(api_key=gkey)

  warn("geopandas not available. Some functionality will be disabled.")


In [8]:
# Set this to false if you're trying to do this without an internet connection
# and data which would have been fetched from an API query will be read from cached files instead
INTERNET_IS_WORKING = True

if INTERNET_IS_WORKING:
    con = cenpy.remote.APIConnection('ACSDT5Y2016')
    variables = con.variables
else:
    variables = pd.read_csv('data/ACSDT5Y2016_variables.csv',index_col='Unnamed: 0')

# No matter which product you use, a cenpy APIConnection can show you the variables 
# which it can retrieve for you. They come back as a pandas DataFrame.
print(f"ACSDT5Y2016 provides {len(variables)} variables.") # how many are there?
variables.head()

ACSDT5Y2016 provides 22858 variables.


Unnamed: 0,label,concept,predicateType,group,limit,predicateOnly,hasGeoCollectionSupport,attributes,required,values
for,Census API FIPS 'for' clause,Census API Geography Specification,fips-for,,0,True,,,,
in,Census API FIPS 'in' clause,Census API Geography Specification,fips-in,,0,True,,,,
ucgid,Uniform Census Geography Identifier clause,Census API Geography Specification,ucgid,,0,True,True,,,
B99104_007E,Estimate!!Total!!Not living with own grandchil...,ALLOCATION OF LENGTH OF TIME GRANDPARENT RESPO...,int,B99104,0,,,B99104_007EA,,
B24022_060E,Estimate!!Total!!Female!!Service occupations!!...,SEX BY OCCUPATION AND MEDIAN EARNINGS IN THE P...,int,B24022,0,,,"B24022_060EA,B24022_060M,B24022_060MA",,


In [9]:
# Comments and code in this block are from https://github.com/censusreporter/nicar20-advanced-census-python/blob/master/workshop.ipynb:
# will use (Nicar20) as citation for the above site from here on out
# values for 'group' are ACS table IDs; 
# for this data, N/A means other kinds of API variables.
# so will not include those
short_vars = variables[~(variables['group'] == 'N/A')] 

# Get a list of all of the table IDs and their titles
short_vars[['group', 'concept']].drop_duplicates().sort_values('group').head(10) 

Unnamed: 0,group,concept
B00001_001E,B00001,UNWEIGHTED SAMPLE COUNT OF THE POPULATION
B00002_001E,B00002,UNWEIGHTED SAMPLE HOUSING UNITS
B01001_012E,B01001,SEX BY AGE
B01001A_002E,B01001A,SEX BY AGE (WHITE ALONE)
B01001B_029E,B01001B,SEX BY AGE (BLACK OR AFRICAN AMERICAN ALONE)
B01001C_008E,B01001C,SEX BY AGE (AMERICAN INDIAN AND ALASKA NATIVE ...
B01001D_008E,B01001D,SEX BY AGE (ASIAN ALONE)
B01001E_013E,B01001E,SEX BY AGE (NATIVE HAWAIIAN AND OTHER PACIFIC ...
B01001F_001E,B01001F,SEX BY AGE (SOME OTHER RACE ALONE)
B01001G_022E,B01001G,SEX BY AGE (TWO OR MORE RACES)


In [10]:
# (From Nicar20)
# Use when you know which group but still need specific API variable codes
# "attributes" column shows related variables you can request. The one that ends with M is the margin of error, and since we want to be responsible when we aggregate data, we'll be sure to aggregate the error as well. 
# The other two, which end with A are "annotations." 
short_vars[short_vars['group'] == 'B01001'][['label','attributes']].sort_index() 

Unnamed: 0,label,attributes
B01001_001E,Estimate!!Total,"B01001_001EA,B01001_001M,B01001_001MA"
B01001_002E,Estimate!!Total!!Male,"B01001_002EA,B01001_002M,B01001_002MA"
B01001_003E,Estimate!!Total!!Male!!Under 5 years,"B01001_003EA,B01001_003M,B01001_003MA"
B01001_004E,Estimate!!Total!!Male!!5 to 9 years,"B01001_004EA,B01001_004M,B01001_004MA"
B01001_005E,Estimate!!Total!!Male!!10 to 14 years,"B01001_005EA,B01001_005M,B01001_005MA"
B01001_006E,Estimate!!Total!!Male!!15 to 17 years,"B01001_006EA,B01001_006M,B01001_006MA"
B01001_007E,Estimate!!Total!!Male!!18 and 19 years,"B01001_007EA,B01001_007M,B01001_007MA"
B01001_008E,Estimate!!Total!!Male!!20 years,"B01001_008EA,B01001_008M,B01001_008MA"
B01001_009E,Estimate!!Total!!Male!!21 years,"B01001_009EA,B01001_009M,B01001_009MA"
B01001_010E,Estimate!!Total!!Male!!22 to 24 years,"B01001_010EA,B01001_010M,B01001_010MA"


In [11]:
# search function
#sample = censusdata.search('acs5', 2016, 'concept', 'age ')
#print(sample)

#  to see list of all tables in the ACS5
# c.acs5.tables()

In [12]:
censusdata.printtable(censusdata.censustable('acs5', 2016, 'B01001'))

Variable     | Table                          | Label                                                    | Type 
-------------------------------------------------------------------------------------------------------------------
B01001_001E  | SEX BY AGE                     | !! Estimate Total                                        | int  
B01001_002E  | SEX BY AGE                     | !! !! Estimate Total Male                                | int  
B01001_003E  | SEX BY AGE                     | !! !! !! Estimate Total Male Under 5 years               | int  
B01001_004E  | SEX BY AGE                     | !! !! !! Estimate Total Male 5 to 9 years                | int  
B01001_005E  | SEX BY AGE                     | !! !! !! Estimate Total Male 10 to 14 years              | int  
B01001_006E  | SEX BY AGE                     | !! !! !! Estimate Total Male 15 to 17 years              | int  
B01001_007E  | SEX BY AGE                     | !! !! !! Estimate Total Male 18 and 19 years 

In [13]:
# Run Census Search to retrieve data on Baltimore City, MD (all census tracts in Baltimore City)
# ***See https://api.census.gov/data/2019/acs/acs5/groups.html  for list of variables and groups for the ACS 5-year estimates***
# ex. "B23025_005E" is "unemployment count"
# The state FIPS code for MD is 24 and the FIPS code for Balt City is 510; * is to pull data for all census tracts in the 510 FIPs
census_data = c.acs5.state_county_tract(("NAME", "B19013_001E", "B01003_001E", "B01002_001E",
                          "B19301_001E",
                          "B17001_002E",
                          "B23025_005E",
                          "B23025_004E",                                                            
                          "B15003_017E",
                          "B15003_022E",                
                          "B02001_002E",
                          "B02001_003E",
                          "B02001_005E",              
                          "B02001_008E",
                          "B03001_003E",
                          "B25008_002E",
                          "B25003_002E",                                    
                          "B25003_003E"),               
                          state_fips = "24",
                          county_fips = "510",
                          tract = "*")
census_pd = pd.DataFrame(census_data)
census_pd.head()      

Unnamed: 0,NAME,B19013_001E,B01003_001E,B01002_001E,B19301_001E,B17001_002E,B23025_005E,B23025_004E,B15003_017E,B15003_022E,...,B02001_003E,B02001_005E,B02001_008E,B03001_003E,B25008_002E,B25003_002E,B25003_003E,state,county,tract
0,"Census Tract 2008, Baltimore city, Maryland",34886.0,2164.0,39.9,18717.0,563.0,132.0,838.0,483.0,119.0,...,1882.0,0.0,33.0,41.0,1210.0,459.0,400.0,24,510,200800
1,"Census Tract 2402, Baltimore city, Maryland",128839.0,3233.0,36.1,86743.0,99.0,43.0,2340.0,256.0,1086.0,...,274.0,132.0,152.0,133.0,2228.0,1059.0,573.0,24,510,240200
2,"Census Tract 2502.04, Baltimore city, Maryland",14163.0,4083.0,21.5,9208.0,2667.0,240.0,874.0,671.0,95.0,...,3997.0,0.0,10.0,116.0,154.0,81.0,1396.0,24,510,250204
3,"Census Tract 2502.06, Baltimore city, Maryland",38158.0,1827.0,45.1,27793.0,377.0,237.0,882.0,469.0,99.0,...,304.0,25.0,27.0,160.0,1534.0,675.0,147.0,24,510,250206
4,"Census Tract 2505, Baltimore city, Maryland",38967.0,5462.0,31.1,16999.0,1736.0,445.0,2351.0,1264.0,77.0,...,1846.0,0.0,265.0,583.0,1684.0,555.0,1394.0,24,510,250500


In [14]:
# Pull values from the ACS 5 yr. census codes/ variables listed and save them in the variable "age_data"

age_data = c.acs5.state_county_tract(("NAME", "B01001_003E",                                    
                          "B01001_004E",
                          "B01001_005E",
                          "B01001_006E",
                          "B01001_007E",
                          "B01001_008E",
                          "B01001_009E",
                          "B01001_010E",
                          "B01001_011E",
                          "B01001_012E",
                          "B01001_013E",
                          "B01001_014E",
                          "B01001_015E",
                          "B01001_016E",            
                          "B01001_017E",
                          "B01001_018E",
                          "B01001_019E",
                          "B01001_020E",
                          "B01001_021E",
                          "B01001_022E",
                          "B01001_023E",
                          "B01001_024E",
                          "B01001_025E",
                          
                          "B01001_027E",
                          "B01001_028E",
                          "B01001_029E",
                          "B01001_030E",
                          "B01001_031E",
                          "B01001_032E",
                          "B01001_033E",
                          "B01001_034E",
                          "B01001_035E",
                          "B01001_036E",
                          "B01001_037E",
                          "B01001_038E",
                          "B01001_039E",            
                          "B01001_040E",
                          "B01001_041E",
                          "B01001_042E",
                          "B01001_043E",
                          "B01001_044E",
                          "B01001_045E",
                          "B01001_046E",
                          "B01001_047E",
                          "B01001_048E",
                          "B01001_049E"),          
                                     
                          state_fips = "24",
                          county_fips = "510",
                          tract = "*")
age_pd = pd.DataFrame(age_data)
age_pd.head()      

Unnamed: 0,NAME,B01001_003E,B01001_004E,B01001_005E,B01001_006E,B01001_007E,B01001_008E,B01001_009E,B01001_010E,B01001_011E,...,B01001_043E,B01001_044E,B01001_045E,B01001_046E,B01001_047E,B01001_048E,B01001_049E,state,county,tract
0,"Census Tract 2008, Baltimore city, Maryland",30.0,50.0,60.0,31.0,17.0,28.0,4.0,0.0,85.0,...,58.0,18.0,35.0,37.0,21.0,10.0,0.0,24,510,200800
1,"Census Tract 2402, Baltimore city, Maryland",57.0,27.0,9.0,0.0,0.0,0.0,6.0,83.0,197.0,...,69.0,35.0,27.0,55.0,40.0,22.0,33.0,24,510,240200
2,"Census Tract 2502.04, Baltimore city, Maryland",207.0,476.0,355.0,99.0,49.0,0.0,0.0,47.0,71.0,...,61.0,8.0,24.0,25.0,36.0,27.0,24.0,24,510,250204
3,"Census Tract 2502.06, Baltimore city, Maryland",78.0,10.0,43.0,6.0,0.0,0.0,0.0,14.0,77.0,...,50.0,7.0,27.0,15.0,23.0,0.0,5.0,24,510,250206
4,"Census Tract 2505, Baltimore city, Maryland",329.0,257.0,125.0,104.0,7.0,0.0,30.0,189.0,262.0,...,86.0,41.0,84.0,66.0,14.0,27.0,15.0,24,510,250500


In [15]:
# Sum columns of age groups that are under 18 years old for male and female and add new column "Pop. <18 years"
columns_under18 = ["B01001_003E",                                    
                   "B01001_004E",
                   "B01001_005E",
                   "B01001_006E",
                   "B01001_027E",
                   "B01001_028E",
                   "B01001_029E",
                   "B01001_030E"]
age_pd['Pop. <18 years']= age_pd[columns_under18].sum(axis=1)
age_pd.head()

Unnamed: 0,NAME,B01001_003E,B01001_004E,B01001_005E,B01001_006E,B01001_007E,B01001_008E,B01001_009E,B01001_010E,B01001_011E,...,B01001_044E,B01001_045E,B01001_046E,B01001_047E,B01001_048E,B01001_049E,state,county,tract,Pop. <18 years
0,"Census Tract 2008, Baltimore city, Maryland",30.0,50.0,60.0,31.0,17.0,28.0,4.0,0.0,85.0,...,18.0,35.0,37.0,21.0,10.0,0.0,24,510,200800,539.0
1,"Census Tract 2402, Baltimore city, Maryland",57.0,27.0,9.0,0.0,0.0,0.0,6.0,83.0,197.0,...,35.0,27.0,55.0,40.0,22.0,33.0,24,510,240200,275.0
2,"Census Tract 2502.04, Baltimore city, Maryland",207.0,476.0,355.0,99.0,49.0,0.0,0.0,47.0,71.0,...,8.0,24.0,25.0,36.0,27.0,24.0,24,510,250204,1896.0
3,"Census Tract 2502.06, Baltimore city, Maryland",78.0,10.0,43.0,6.0,0.0,0.0,0.0,14.0,77.0,...,7.0,27.0,15.0,23.0,0.0,5.0,24,510,250206,259.0
4,"Census Tract 2505, Baltimore city, Maryland",329.0,257.0,125.0,104.0,7.0,0.0,30.0,189.0,262.0,...,41.0,84.0,66.0,14.0,27.0,15.0,24,510,250500,1454.0


In [16]:
# Sum columns of age groups that are 18-64 years old (working age) for male and female and add new column "Pop. working age"
columns_working_age = [                                    
                   "B01001_007E",
                   "B01001_008E",
                   "B01001_009E",
                   "B01001_010E",
                   "B01001_011E",
                   "B01001_012E",   
                   "B01001_013E",
                   "B01001_014E",    
                   "B01001_015E",    
                   "B01001_016E",   
                   "B01001_017E", 
                   "B01001_018E",    
                   "B01001_019E",    
                   "B01001_031E",
                   "B01001_032E",
                   "B01001_033E",
                   "B01001_034E",
                   "B01001_035E",   
                   "B01001_036E",
                   "B01001_037E",    
                   "B01001_038E",    
                   "B01001_039E",   
                   "B01001_040E", 
                   "B01001_041E",    
                   "B01001_042E",    
                   "B01001_043E"]    
                           
age_pd['Pop. working age']= age_pd[columns_working_age].sum(axis=1)
age_pd.head()         
            
               


Unnamed: 0,NAME,B01001_003E,B01001_004E,B01001_005E,B01001_006E,B01001_007E,B01001_008E,B01001_009E,B01001_010E,B01001_011E,...,B01001_045E,B01001_046E,B01001_047E,B01001_048E,B01001_049E,state,county,tract,Pop. <18 years,Pop. working age
0,"Census Tract 2008, Baltimore city, Maryland",30.0,50.0,60.0,31.0,17.0,28.0,4.0,0.0,85.0,...,35.0,37.0,21.0,10.0,0.0,24,510,200800,539.0,1371.0
1,"Census Tract 2402, Baltimore city, Maryland",57.0,27.0,9.0,0.0,0.0,0.0,6.0,83.0,197.0,...,27.0,55.0,40.0,22.0,33.0,24,510,240200,275.0,2568.0
2,"Census Tract 2502.04, Baltimore city, Maryland",207.0,476.0,355.0,99.0,49.0,0.0,0.0,47.0,71.0,...,24.0,25.0,36.0,27.0,24.0,24,510,250204,1896.0,1913.0
3,"Census Tract 2502.06, Baltimore city, Maryland",78.0,10.0,43.0,6.0,0.0,0.0,0.0,14.0,77.0,...,27.0,15.0,23.0,0.0,5.0,24,510,250206,259.0,1336.0
4,"Census Tract 2505, Baltimore city, Maryland",329.0,257.0,125.0,104.0,7.0,0.0,30.0,189.0,262.0,...,84.0,66.0,14.0,27.0,15.0,24,510,250500,1454.0,3611.0


In [17]:
# Sum columns of age groups that are 65+ years old for male and female and add new column "Pop. 65+ years"
columns_senior = ["B01001_020E",
                  "B01001_021E",
                  "B01001_022E",
                  "B01001_023E",                                    
                  "B01001_024E",
                  "B01001_025E",
                  "B01001_044E",
                  "B01001_045E",
                  "B01001_046E",
                  "B01001_047E",                                    
                  "B01001_048E",
                  "B01001_049E"]               
                          
age_pd['Pop. 65+ years']= age_pd[columns_senior].sum(axis=1)
age_pd.head()                           

Unnamed: 0,NAME,B01001_003E,B01001_004E,B01001_005E,B01001_006E,B01001_007E,B01001_008E,B01001_009E,B01001_010E,B01001_011E,...,B01001_046E,B01001_047E,B01001_048E,B01001_049E,state,county,tract,Pop. <18 years,Pop. working age,Pop. 65+ years
0,"Census Tract 2008, Baltimore city, Maryland",30.0,50.0,60.0,31.0,17.0,28.0,4.0,0.0,85.0,...,37.0,21.0,10.0,0.0,24,510,200800,539.0,1371.0,254.0
1,"Census Tract 2402, Baltimore city, Maryland",57.0,27.0,9.0,0.0,0.0,0.0,6.0,83.0,197.0,...,55.0,40.0,22.0,33.0,24,510,240200,275.0,2568.0,390.0
2,"Census Tract 2502.04, Baltimore city, Maryland",207.0,476.0,355.0,99.0,49.0,0.0,0.0,47.0,71.0,...,25.0,36.0,27.0,24.0,24,510,250204,1896.0,1913.0,274.0
3,"Census Tract 2502.06, Baltimore city, Maryland",78.0,10.0,43.0,6.0,0.0,0.0,0.0,14.0,77.0,...,15.0,23.0,0.0,5.0,24,510,250206,259.0,1336.0,232.0
4,"Census Tract 2505, Baltimore city, Maryland",329.0,257.0,125.0,104.0,7.0,0.0,30.0,189.0,262.0,...,66.0,14.0,27.0,15.0,24,510,250500,1454.0,3611.0,397.0


In [18]:
age_final = age_pd[[ "tract", "Pop. <18 years", "Pop. working age", "Pop. 65+ years"]]
age_final.head()

Unnamed: 0,tract,Pop. <18 years,Pop. working age,Pop. 65+ years
0,200800,539.0,1371.0,254.0
1,240200,275.0,2568.0,390.0
2,250204,1896.0,1913.0,274.0
3,250206,259.0,1336.0,232.0
4,250500,1454.0,3611.0,397.0


In [19]:
age_final = age_final.rename(columns={"tract": "Census_tract"})
age_final

Unnamed: 0,Census_tract,Pop. <18 years,Pop. working age,Pop. 65+ years
0,200800,539.0,1371.0,254.0
1,240200,275.0,2568.0,390.0
2,250204,1896.0,1913.0,274.0
3,250206,259.0,1336.0,232.0
4,250500,1454.0,3611.0,397.0
...,...,...,...,...
195,030200,341.0,1687.0,219.0
196,070100,1043.0,1930.0,207.0
197,080800,632.0,636.0,53.0
198,080102,590.0,1367.0,130.0


In [20]:
# did not add in daytime population - will use ESRI business analyst for this 
# See https://www.census.gov/topics/employment/commuting/guidance/calculations.html
# "commuter-adjusted daytime population estimates" =    
#         total resident population + total workers working in area - total workers living in area

# For "Workers in Workplace Geography," see https://www.census.gov/topics/employment/commuting/guidance/calculations.html
# "Total workers working in area:
# B08604 Total Workers for Workplace Geography
# B08604 is only available for data years 2011 and after. 
# The tables for workplace geography are only available for the following geographic summary levels: States; 
# Counties; Places; County Subdivisions in selected states (not MD); Combined Statistical Areas; Metropolitan 
# and Micropolitan Statistical Areas, and their associated Metropolitan Divisions and Principal Cities; 

census_data_workers = c.acs5.state_county(("NAME", 
                          "B08604_001E"),               
                          state_fips = "24",
                          county_fips = "510") 

# convert to dataframe
workers_df = pd.DataFrame(census_data_workers)
workers_df
                         

Unnamed: 0,NAME,B08604_001E,state,county
0,"Baltimore city, Maryland",376343.0,24,510


In [21]:
# Create Geographic Identifier ("GEOID") for each census tract by adding state fips code + county fips code + census tract #
# see https://www.census.gov/programs-surveys/geography/guidance/geo-identifiers.html#:~:text=The%20full%20GEOID%20for%20many,codes%2C%20in%20which%20they%20nest.
census_pd["GEOID"] = census_pd['state'] + census_pd['county'] + census_pd['tract']
census_pd.head()

Unnamed: 0,NAME,B19013_001E,B01003_001E,B01002_001E,B19301_001E,B17001_002E,B23025_005E,B23025_004E,B15003_017E,B15003_022E,...,B02001_005E,B02001_008E,B03001_003E,B25008_002E,B25003_002E,B25003_003E,state,county,tract,GEOID
0,"Census Tract 2008, Baltimore city, Maryland",34886.0,2164.0,39.9,18717.0,563.0,132.0,838.0,483.0,119.0,...,0.0,33.0,41.0,1210.0,459.0,400.0,24,510,200800,24510200800
1,"Census Tract 2402, Baltimore city, Maryland",128839.0,3233.0,36.1,86743.0,99.0,43.0,2340.0,256.0,1086.0,...,132.0,152.0,133.0,2228.0,1059.0,573.0,24,510,240200,24510240200
2,"Census Tract 2502.04, Baltimore city, Maryland",14163.0,4083.0,21.5,9208.0,2667.0,240.0,874.0,671.0,95.0,...,0.0,10.0,116.0,154.0,81.0,1396.0,24,510,250204,24510250204
3,"Census Tract 2502.06, Baltimore city, Maryland",38158.0,1827.0,45.1,27793.0,377.0,237.0,882.0,469.0,99.0,...,25.0,27.0,160.0,1534.0,675.0,147.0,24,510,250206,24510250206
4,"Census Tract 2505, Baltimore city, Maryland",38967.0,5462.0,31.1,16999.0,1736.0,445.0,2351.0,1264.0,77.0,...,0.0,265.0,583.0,1684.0,555.0,1394.0,24,510,250500,24510250500


In [22]:
# number of rows = # of census tracts in the dataframe
print("Number of rows, columns: ", census_pd.shape)

Number of rows, columns:  (200, 22)


In [23]:
# remove extraneous column "tract"
census_pd = census_pd.drop(["tract"], axis=1)
census_pd.head()

Unnamed: 0,NAME,B19013_001E,B01003_001E,B01002_001E,B19301_001E,B17001_002E,B23025_005E,B23025_004E,B15003_017E,B15003_022E,...,B02001_003E,B02001_005E,B02001_008E,B03001_003E,B25008_002E,B25003_002E,B25003_003E,state,county,GEOID
0,"Census Tract 2008, Baltimore city, Maryland",34886.0,2164.0,39.9,18717.0,563.0,132.0,838.0,483.0,119.0,...,1882.0,0.0,33.0,41.0,1210.0,459.0,400.0,24,510,24510200800
1,"Census Tract 2402, Baltimore city, Maryland",128839.0,3233.0,36.1,86743.0,99.0,43.0,2340.0,256.0,1086.0,...,274.0,132.0,152.0,133.0,2228.0,1059.0,573.0,24,510,24510240200
2,"Census Tract 2502.04, Baltimore city, Maryland",14163.0,4083.0,21.5,9208.0,2667.0,240.0,874.0,671.0,95.0,...,3997.0,0.0,10.0,116.0,154.0,81.0,1396.0,24,510,24510250204
3,"Census Tract 2502.06, Baltimore city, Maryland",38158.0,1827.0,45.1,27793.0,377.0,237.0,882.0,469.0,99.0,...,304.0,25.0,27.0,160.0,1534.0,675.0,147.0,24,510,24510250206
4,"Census Tract 2505, Baltimore city, Maryland",38967.0,5462.0,31.1,16999.0,1736.0,445.0,2351.0,1264.0,77.0,...,1846.0,0.0,265.0,583.0,1684.0,555.0,1394.0,24,510,24510250500


In [24]:
# GIS Analyst, Patrick, provided a csv (created from ARcGIS) that contains a key to match Baltimore's commercial corridors 
# with specific GEOIDS 

#Store filepath in a variable
corridor_key = "./CSVs/corr_key.csv"

# Read the file with the pandas library
corr_key_df = pd.read_csv(corridor_key)
corr_key_df.dtypes


GEOID        int64
Corridor    object
dtype: object

In [25]:
corr_key_df

Unnamed: 0,GEOID,Corridor
0,24510260403,Highlandtown
1,24510230200,Hamilton Lauraville
2,24510260102,Pimlico
3,24510260303,Hamilton Lauraville
4,24510260800,Highlandtown
...,...,...
195,24510220100,
196,24510230300,
197,24510250207,
198,24510250303,


In [26]:
# combine age data with rest of census data
census_joined = pd.concat([age_final, census_pd], axis="columns")


census_joined.shape

(200, 25)

In [27]:
# Replace the census variable codes (such as "B19013_001E") in the dataframe with text so it's understandable
census_joined = census_joined.rename(columns={"B01003_001E": "Population",
                                      "tract": "Census Tract",        
                                      "B01002_001E": "Median age",
                                      "B19013_001E": "Median household income",
                                      "B19301_001E": "Per capita income", 
                                      "B17001_002E": "Poverty count",
                                      "B23025_004E": "# employed, age 16+",
                                      "B23025_005E": "Unemployment count",
                                      "B15003_017E": "# persons age 25+ graduated high school",
                                      "B15003_022E": "# persons age 25+ with Bachelor's degree",
                                      "B02001_002E": "Pop. white",
                                      "B02001_003E": "Pop. Black",
                                      "B02001_005E": "Pop. Asian",        
                                      "B02001_008E": "Pop. 2 or more races",
                                      "B03001_003E": "Pop. Hispanic origin",
                                      "B25008_002E": "Total pop. in occupied housing units by tenure",
                                      "B25003_002E": "Total owner-occupied units",
                                      "B25003_003E": "Total renter-occupied units",
                                      "NAME": "Name", "state": "State", "GEOID": "GEOID"})

# Add a new column for poverty rate (Poverty Count / Population)
census_joined["Poverty rate"] = 100 * \
    census_joined["Poverty count"].astype(
        int) / census_joined["Population"].astype(int)

# Add a new column for unemployment rate (Employment Count / Population)
census_joined["Unemployment rate"] = 100 * \
    census_joined["Unemployment count"].astype(
        int) / census_joined["Population"].astype(int)
census_joined.head()

Unnamed: 0,Census_tract,Pop. <18 years,Pop. working age,Pop. 65+ years,Name,Median household income,Population,Median age,Per capita income,Poverty count,...,Pop. 2 or more races,Pop. Hispanic origin,Total pop. in occupied housing units by tenure,Total owner-occupied units,Total renter-occupied units,State,county,GEOID,Poverty rate,Unemployment rate
0,200800,539.0,1371.0,254.0,"Census Tract 2008, Baltimore city, Maryland",34886.0,2164.0,39.9,18717.0,563.0,...,33.0,41.0,1210.0,459.0,400.0,24,510,24510200800,26.016636,6.099815
1,240200,275.0,2568.0,390.0,"Census Tract 2402, Baltimore city, Maryland",128839.0,3233.0,36.1,86743.0,99.0,...,152.0,133.0,2228.0,1059.0,573.0,24,510,24510240200,3.062171,1.330034
2,250204,1896.0,1913.0,274.0,"Census Tract 2502.04, Baltimore city, Maryland",14163.0,4083.0,21.5,9208.0,2667.0,...,10.0,116.0,154.0,81.0,1396.0,24,510,24510250204,65.319618,5.878031
3,250206,259.0,1336.0,232.0,"Census Tract 2502.06, Baltimore city, Maryland",38158.0,1827.0,45.1,27793.0,377.0,...,27.0,160.0,1534.0,675.0,147.0,24,510,24510250206,20.634921,12.972085
4,250500,1454.0,3611.0,397.0,"Census Tract 2505, Baltimore city, Maryland",38967.0,5462.0,31.1,16999.0,1736.0,...,265.0,583.0,1684.0,555.0,1394.0,24,510,24510250500,31.78323,8.147199


In [28]:
# number of rows = # of census tracts in the dataframe
print("Number of rows, columns: ", census_joined.shape)

Number of rows, columns:  (200, 27)


In [29]:
# Add in home ownership rate (# owner-occupied units / # of occupied housing units)
# sum 2 columns: total owner-occupied units + total renter-occupied units to create additional column "Total occupied units" 
sum_column = census_joined['Total owner-occupied units'] + census_joined['Total renter-occupied units']
census_joined["Total occupied units"] = sum_column

In [30]:
census_joined["Home ownership rate"] = 100 * \
    census_joined["Total owner-occupied units"].astype(
        int) / census_joined["Total occupied units"].astype(
        int) 

census_joined.head()

Unnamed: 0,Census_tract,Pop. <18 years,Pop. working age,Pop. 65+ years,Name,Median household income,Population,Median age,Per capita income,Poverty count,...,Total pop. in occupied housing units by tenure,Total owner-occupied units,Total renter-occupied units,State,county,GEOID,Poverty rate,Unemployment rate,Total occupied units,Home ownership rate
0,200800,539.0,1371.0,254.0,"Census Tract 2008, Baltimore city, Maryland",34886.0,2164.0,39.9,18717.0,563.0,...,1210.0,459.0,400.0,24,510,24510200800,26.016636,6.099815,859.0,53.434226
1,240200,275.0,2568.0,390.0,"Census Tract 2402, Baltimore city, Maryland",128839.0,3233.0,36.1,86743.0,99.0,...,2228.0,1059.0,573.0,24,510,24510240200,3.062171,1.330034,1632.0,64.889706
2,250204,1896.0,1913.0,274.0,"Census Tract 2502.04, Baltimore city, Maryland",14163.0,4083.0,21.5,9208.0,2667.0,...,154.0,81.0,1396.0,24,510,24510250204,65.319618,5.878031,1477.0,5.484089
3,250206,259.0,1336.0,232.0,"Census Tract 2502.06, Baltimore city, Maryland",38158.0,1827.0,45.1,27793.0,377.0,...,1534.0,675.0,147.0,24,510,24510250206,20.634921,12.972085,822.0,82.116788
4,250500,1454.0,3611.0,397.0,"Census Tract 2505, Baltimore city, Maryland",38967.0,5462.0,31.1,16999.0,1736.0,...,1684.0,555.0,1394.0,24,510,24510250500,31.78323,8.147199,1949.0,28.476142


In [31]:
# round the home ownership rate to one decimal point; using "float" instead of "int" because want to use decimal points
census_joined["Home ownership rate"] = census_joined["Home ownership rate"].astype(float).round(1)


In [32]:
census_joined["Poverty rate"] = census_joined["Poverty rate"].astype(float).round(1)


In [33]:
census_joined["Unemployment rate"] = census_joined["Unemployment rate"].astype(float).round(1)
census_joined.head()

Unnamed: 0,Census_tract,Pop. <18 years,Pop. working age,Pop. 65+ years,Name,Median household income,Population,Median age,Per capita income,Poverty count,...,Total pop. in occupied housing units by tenure,Total owner-occupied units,Total renter-occupied units,State,county,GEOID,Poverty rate,Unemployment rate,Total occupied units,Home ownership rate
0,200800,539.0,1371.0,254.0,"Census Tract 2008, Baltimore city, Maryland",34886.0,2164.0,39.9,18717.0,563.0,...,1210.0,459.0,400.0,24,510,24510200800,26.0,6.1,859.0,53.4
1,240200,275.0,2568.0,390.0,"Census Tract 2402, Baltimore city, Maryland",128839.0,3233.0,36.1,86743.0,99.0,...,2228.0,1059.0,573.0,24,510,24510240200,3.1,1.3,1632.0,64.9
2,250204,1896.0,1913.0,274.0,"Census Tract 2502.04, Baltimore city, Maryland",14163.0,4083.0,21.5,9208.0,2667.0,...,154.0,81.0,1396.0,24,510,24510250204,65.3,5.9,1477.0,5.5
3,250206,259.0,1336.0,232.0,"Census Tract 2502.06, Baltimore city, Maryland",38158.0,1827.0,45.1,27793.0,377.0,...,1534.0,675.0,147.0,24,510,24510250206,20.6,13.0,822.0,82.1
4,250500,1454.0,3611.0,397.0,"Census Tract 2505, Baltimore city, Maryland",38967.0,5462.0,31.1,16999.0,1736.0,...,1684.0,555.0,1394.0,24,510,24510250500,31.8,8.1,1949.0,28.5


In [34]:
# Calculate population density  see: https://www.census.gov/quickfacts/fact/note/US/LND110210
# density is expressed as "population per square mile(kilometer)"
# Divide total population (or # of housing units)/ by land area of the entity measured in square miles

In [35]:
census_joined.count()

Census_tract                                      200
Pop. <18 years                                    200
Pop. working age                                  200
Pop. 65+ years                                    200
Name                                              200
Median household income                           200
Population                                        200
Median age                                        200
Per capita income                                 200
Poverty count                                     200
Unemployment count                                200
# employed, age 16+                               200
# persons age 25+ graduated high school           200
# persons age 25+ with Bachelor's degree          200
Pop. white                                        200
Pop. Black                                        200
Pop. Asian                                        200
Pop. 2 or more races                              200
Pop. Hispanic origin        

In [36]:
census_joined.dtypes

Census_tract                                       object
Pop. <18 years                                    float64
Pop. working age                                  float64
Pop. 65+ years                                    float64
Name                                               object
Median household income                           float64
Population                                        float64
Median age                                        float64
Per capita income                                 float64
Poverty count                                     float64
Unemployment count                                float64
# employed, age 16+                               float64
# persons age 25+ graduated high school           float64
# persons age 25+ with Bachelor's degree          float64
Pop. white                                        float64
Pop. Black                                        float64
Pop. Asian                                        float64
Pop. 2 or more

In [37]:
# Remove "State" column because it's understood we are looking at MD data
census_joined = census_joined.drop(["Name"], axis=1)

census_joined.dtypes

Census_tract                                       object
Pop. <18 years                                    float64
Pop. working age                                  float64
Pop. 65+ years                                    float64
Median household income                           float64
Population                                        float64
Median age                                        float64
Per capita income                                 float64
Poverty count                                     float64
Unemployment count                                float64
# employed, age 16+                               float64
# persons age 25+ graduated high school           float64
# persons age 25+ with Bachelor's degree          float64
Pop. white                                        float64
Pop. Black                                        float64
Pop. Asian                                        float64
Pop. 2 or more races                              float64
Pop. Hispanic 

In [38]:
# Split the "Name" column into 3 separate columns: "Census_Tract", "County", "State"
#census_joined[['Census_Tract', "County", "State"]]= census_joined['Name'].str.split(",", n=3, expand=True)
#census_joined.head()

In [39]:
# list the columns in the census_pd dataframe
census_joined.columns

Index(['Census_tract', 'Pop. <18 years', 'Pop. working age', 'Pop. 65+ years',
       'Median household income', 'Population', 'Median age',
       'Per capita income', 'Poverty count', 'Unemployment count',
       '# employed, age 16+', '# persons age 25+ graduated high school',
       '# persons age 25+ with Bachelor's degree', 'Pop. white', 'Pop. Black',
       'Pop. Asian', 'Pop. 2 or more races', 'Pop. Hispanic origin',
       'Total pop. in occupied housing units by tenure',
       'Total owner-occupied units', 'Total renter-occupied units', 'State',
       'county', 'GEOID', 'Poverty rate', 'Unemployment rate',
       'Total occupied units', 'Home ownership rate'],
      dtype='object')

In [40]:
# Create new column "Census_Tract" and remove the text "Census Tract" from the values in that column (to make calculations easier)
#census_joined["Census_Tract"] = census_joined['Census_Tract'].str.replace('Census Tract', "") 


In [41]:
# Calculate the number of unique census tracts in the DataFrame
tract_count = len(census_joined["Census_tract"].unique())
tract_count

200

In [42]:
# make sure that the GEOID is the same data type in each of the dataframes to be merged, by using .astype
#(int64 is a 64-bit integer, refers to how much storage needed for this datapoint)
census_joined["GEOID"] = census_joined["GEOID"].astype('int64')




In [43]:
census_joined.dtypes

Census_tract                                       object
Pop. <18 years                                    float64
Pop. working age                                  float64
Pop. 65+ years                                    float64
Median household income                           float64
Population                                        float64
Median age                                        float64
Per capita income                                 float64
Poverty count                                     float64
Unemployment count                                float64
# employed, age 16+                               float64
# persons age 25+ graduated high school           float64
# persons age 25+ with Bachelor's degree          float64
Pop. white                                        float64
Pop. Black                                        float64
Pop. Asian                                        float64
Pop. 2 or more races                              float64
Pop. Hispanic 

In [44]:
corr_key_df["GEOID"] = corr_key_df["GEOID"].astype('int64')

In [45]:
corr_key_df.dtypes

GEOID        int64
Corridor    object
dtype: object

In [46]:
# merge the census_joined dataframe with the corr_key_df dataframe on the common column "GEOID"

corridors_df = pd.merge(
    census_joined, corr_key_df, on="GEOID")

# remove any columns with NaN ("Not a Number"), used for missing values, by using .dropna()
corridors_df = corridors_df.dropna()



In [47]:
# reset the index numbers for the dataframe (the first column)
corridors_df = corridors_df.reset_index(drop=True)

# remove the "county" column as it is not needed
corridors_df = corridors_df.drop(["county"], axis=1)
corridors_df

Unnamed: 0,Census_tract,Pop. <18 years,Pop. working age,Pop. 65+ years,Median household income,Population,Median age,Per capita income,Poverty count,Unemployment count,...,Total pop. in occupied housing units by tenure,Total owner-occupied units,Total renter-occupied units,State,GEOID,Poverty rate,Unemployment rate,Total occupied units,Home ownership rate,Corridor
0,200800,539.0,1371.0,254.0,34886.0,2164.0,39.9,18717.0,563.0,132.0,...,1210.0,459.0,400.0,24,24510200800,26.0,6.1,859.0,53.4,Irvington
1,260301,1284.0,2622.0,336.0,35759.0,4242.0,30.9,17469.0,1332.0,301.0,...,2387.0,961.0,613.0,24,24510260301,31.4,7.1,1574.0,61.1,Belair Rd
2,270600,933.0,3130.0,623.0,58000.0,4686.0,40.3,27892.0,621.0,247.0,...,3853.0,1433.0,377.0,24,24510270600,13.3,5.3,1810.0,79.2,Hamilton Lauraville
3,070200,852.0,1969.0,202.0,27308.0,3023.0,27.2,14309.0,1317.0,389.0,...,802.0,283.0,739.0,24,24510070200,43.6,12.9,1022.0,27.7,E Monument St
4,070300,249.0,470.0,74.0,22361.0,793.0,34.5,13975.0,425.0,50.0,...,327.0,118.0,158.0,24,24510070300,53.6,6.3,276.0,42.8,E Monument St
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
120,130300,642.0,1283.0,234.0,31855.0,2159.0,35.3,17704.0,712.0,190.0,...,975.0,383.0,453.0,24,24510130300,33.0,8.8,836.0,45.8,Penn Ave
121,070100,1043.0,1930.0,207.0,39141.0,3180.0,27.1,13467.0,1141.0,300.0,...,720.0,279.0,523.0,24,24510070100,35.9,9.4,802.0,34.8,E Monument St
122,080800,632.0,636.0,53.0,-666666666.0,1321.0,21.3,12964.0,651.0,74.0,...,193.0,79.0,359.0,24,24510080800,49.3,5.6,438.0,18.0,E Monument St
123,080102,590.0,1367.0,130.0,35132.0,2087.0,33.1,14965.0,579.0,108.0,...,1015.0,330.0,362.0,24,24510080102,27.7,5.2,692.0,47.7,Belair Rd


In [48]:
# Change order of columns in DataFrame by using double brackets
corridors_df = corridors_df[["Corridor", "Census_tract", "GEOID", "Population", "Median household income",
                       "Per capita income", "Poverty count", "Poverty rate", "Unemployment rate", 
                       "# employed, age 16+", "Unemployment count",
                      "# persons age 25+ graduated high school", "# persons age 25+ with Bachelor's degree",
                      "Median age","Pop. white", "Pop. Black", "Pop. 2 or more races", "Pop. Hispanic origin", 
                      "Pop. Asian","Total pop. in occupied housing units by tenure", "Total owner-occupied units", "Total renter-occupied units",
                      "Pop. <18 years", "Pop. working age", "Pop. 65+ years"         
                      ]]

corridors_df.head()

Unnamed: 0,Corridor,Census_tract,GEOID,Population,Median household income,Per capita income,Poverty count,Poverty rate,Unemployment rate,"# employed, age 16+",...,Pop. Black,Pop. 2 or more races,Pop. Hispanic origin,Pop. Asian,Total pop. in occupied housing units by tenure,Total owner-occupied units,Total renter-occupied units,Pop. <18 years,Pop. working age,Pop. 65+ years
0,Irvington,200800,24510200800,2164.0,34886.0,18717.0,563.0,26.0,6.1,838.0,...,1882.0,33.0,41.0,0.0,1210.0,459.0,400.0,539.0,1371.0,254.0
1,Belair Rd,260301,24510260301,4242.0,35759.0,17469.0,1332.0,31.4,7.1,1638.0,...,3688.0,170.0,22.0,47.0,2387.0,961.0,613.0,1284.0,2622.0,336.0
2,Hamilton Lauraville,270600,24510270600,4686.0,58000.0,27892.0,621.0,13.3,5.3,2466.0,...,2631.0,138.0,37.0,10.0,3853.0,1433.0,377.0,933.0,3130.0,623.0
3,E Monument St,70200,24510070200,3023.0,27308.0,14309.0,1317.0,43.6,12.9,915.0,...,2877.0,11.0,196.0,7.0,802.0,283.0,739.0,852.0,1969.0,202.0
4,E Monument St,70300,24510070300,793.0,22361.0,13975.0,425.0,53.6,6.3,257.0,...,694.0,0.0,79.0,0.0,327.0,118.0,158.0,249.0,470.0,74.0


In [49]:
# sort the column "Census_tract" so that it is easier to compare with other years that we are going to pull in another notebook
census_2016_FINAL =corridors_df.sort_values("Census_tract")
census_2016_FINAL

Unnamed: 0,Corridor,Census_tract,GEOID,Population,Median household income,Per capita income,Poverty count,Poverty rate,Unemployment rate,"# employed, age 16+",...,Pop. Black,Pop. 2 or more races,Pop. Hispanic origin,Pop. Asian,Total pop. in occupied housing units by tenure,Total owner-occupied units,Total renter-occupied units,Pop. <18 years,Pop. working age,Pop. 65+ years
97,Highlandtown,010200,24510010200,3332.0,96171.0,43737.0,354.0,10.6,0.9,2242.0,...,393.0,21.0,246.0,29.0,2575.0,1119.0,322.0,436.0,2701.0,195.0
95,Brooklyn,020300,24510020300,3827.0,100778.0,80498.0,307.0,8.0,2.5,2901.0,...,69.0,198.0,280.0,233.0,1791.0,845.0,1205.0,191.0,3331.0,305.0
109,E Monument St,040100,24510040100,3868.0,55277.0,44017.0,1028.0,26.6,2.7,2506.0,...,575.0,96.0,288.0,786.0,183.0,115.0,2114.0,159.0,3608.0,101.0
110,Hamilton Lauraville,040200,24510040200,835.0,52361.0,20935.0,103.0,12.3,1.6,385.0,...,278.0,16.0,44.0,113.0,0.0,0.0,259.0,23.0,790.0,22.0
111,E Monument St,060200,24510060200,3198.0,65294.0,28985.0,757.0,23.7,6.9,1672.0,...,1340.0,141.0,505.0,7.0,1302.0,509.0,639.0,757.0,2240.0,201.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
68,Waverly,280302,24510280302,2377.0,40212.0,28363.0,587.0,24.7,5.5,1097.0,...,2220.0,18.0,13.0,6.0,771.0,301.0,756.0,615.0,1519.0,243.0
43,Pimlico,280401,24510280401,3693.0,52176.0,27838.0,444.0,12.0,6.3,1786.0,...,3063.0,124.0,57.0,0.0,2252.0,861.0,631.0,707.0,2442.0,544.0
58,Pigtown,280402,24510280402,1762.0,40769.0,19575.0,149.0,8.5,8.0,801.0,...,1684.0,10.0,13.0,0.0,1137.0,469.0,211.0,269.0,1117.0,376.0
59,Belair Rd,280403,24510280403,6001.0,56577.0,29920.0,407.0,6.8,4.5,3070.0,...,4384.0,53.0,159.0,0.0,2533.0,985.0,1404.0,1559.0,3839.0,603.0


In [50]:
# Export file as a CSV, without the Pandas index, but with the header
# Do not run this last code block until you have all previous code blocks in their final form:

census_2016_FINAL.to_csv("CommCorr_Census_Stats_2016.csv", index = False, header=True)

In [51]:
census_2016_formatted = census_2016_FINAL
census_2016_formatted.head()

Unnamed: 0,Corridor,Census_tract,GEOID,Population,Median household income,Per capita income,Poverty count,Poverty rate,Unemployment rate,"# employed, age 16+",...,Pop. Black,Pop. 2 or more races,Pop. Hispanic origin,Pop. Asian,Total pop. in occupied housing units by tenure,Total owner-occupied units,Total renter-occupied units,Pop. <18 years,Pop. working age,Pop. 65+ years
97,Highlandtown,10200,24510010200,3332.0,96171.0,43737.0,354.0,10.6,0.9,2242.0,...,393.0,21.0,246.0,29.0,2575.0,1119.0,322.0,436.0,2701.0,195.0
95,Brooklyn,20300,24510020300,3827.0,100778.0,80498.0,307.0,8.0,2.5,2901.0,...,69.0,198.0,280.0,233.0,1791.0,845.0,1205.0,191.0,3331.0,305.0
109,E Monument St,40100,24510040100,3868.0,55277.0,44017.0,1028.0,26.6,2.7,2506.0,...,575.0,96.0,288.0,786.0,183.0,115.0,2114.0,159.0,3608.0,101.0
110,Hamilton Lauraville,40200,24510040200,835.0,52361.0,20935.0,103.0,12.3,1.6,385.0,...,278.0,16.0,44.0,113.0,0.0,0.0,259.0,23.0,790.0,22.0
111,E Monument St,60200,24510060200,3198.0,65294.0,28985.0,757.0,23.7,6.9,1672.0,...,1340.0,141.0,505.0,7.0,1302.0,509.0,639.0,757.0,2240.0,201.0


In [52]:
# Use .map to format columns (helpful resource for this: https://towardsdatascience.com/apply-thousand-separator-and-other-formatting-to-pandas-dataframe-45f2f4c7ab01)
# Note: once you format values in a column, they are changed to strings (see cell below to see data types of each column)
# I will use the census_2016_FINAL dataframe to use for analysis as needed (can do calculations with number data types but not strings)
# You may need to restart the kernel after you format
census_2016_formatted["Median household income"] = census_2016_FINAL["Median household income"].map("${:.2f}".format)
census_2016_formatted["Per capita income"] = census_2016_FINAL["Per capita income"].map("${:.2f}".format)
census_2016_formatted["Population"] = census_2016_formatted["Population"].map("{:,.0f}".format)
census_2016_formatted["Poverty count"] = census_2016_formatted["Poverty count"].map("{:,.0f}".format)
census_2016_formatted["Poverty rate"] = census_2016_formatted["Poverty rate"].map("{:.2%}".format)
census_2016_formatted["Unemployment rate"] = census_2016_formatted["Unemployment rate"].map("{:.2%}".format)

census_2016_formatted.head()

Unnamed: 0,Corridor,Census_tract,GEOID,Population,Median household income,Per capita income,Poverty count,Poverty rate,Unemployment rate,"# employed, age 16+",...,Pop. Black,Pop. 2 or more races,Pop. Hispanic origin,Pop. Asian,Total pop. in occupied housing units by tenure,Total owner-occupied units,Total renter-occupied units,Pop. <18 years,Pop. working age,Pop. 65+ years
97,Highlandtown,10200,24510010200,3332,$96171.00,$43737.00,354,1060.00%,90.00%,2242.0,...,393.0,21.0,246.0,29.0,2575.0,1119.0,322.0,436.0,2701.0,195.0
95,Brooklyn,20300,24510020300,3827,$100778.00,$80498.00,307,800.00%,250.00%,2901.0,...,69.0,198.0,280.0,233.0,1791.0,845.0,1205.0,191.0,3331.0,305.0
109,E Monument St,40100,24510040100,3868,$55277.00,$44017.00,1028,2660.00%,270.00%,2506.0,...,575.0,96.0,288.0,786.0,183.0,115.0,2114.0,159.0,3608.0,101.0
110,Hamilton Lauraville,40200,24510040200,835,$52361.00,$20935.00,103,1230.00%,160.00%,385.0,...,278.0,16.0,44.0,113.0,0.0,0.0,259.0,23.0,790.0,22.0
111,E Monument St,60200,24510060200,3198,$65294.00,$28985.00,757,2370.00%,690.00%,1672.0,...,1340.0,141.0,505.0,7.0,1302.0,509.0,639.0,757.0,2240.0,201.0


In [53]:
census_2016_formatted.dtypes

Corridor                                           object
Census_tract                                       object
GEOID                                               int64
Population                                         object
Median household income                            object
Per capita income                                  object
Poverty count                                      object
Poverty rate                                       object
Unemployment rate                                  object
# employed, age 16+                               float64
Unemployment count                                float64
# persons age 25+ graduated high school           float64
# persons age 25+ with Bachelor's degree          float64
Median age                                        float64
Pop. white                                        float64
Pop. Black                                        float64
Pop. 2 or more races                              float64
Pop. Hispanic 