# Connecticut Inequality

To get a sense of how inequality in Connecticut's cities compared nationally, we decided to look at the ratio between two benchmarks: The lowest combined income a household could earn while still breaking into the top 5 percent of household income and the highest combined income a household could earn while still falling in the bottom 20 percent.

The US Census Bureau provides estimates of the numbers we need in a table titled [HOUSEHOLD INCOME QUINTILE UPPER LIMITS](https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=ACS_16_5YR_B19080&prodType=table).

In [29]:
import pandas

acs = pandas.read_csv('ACS_16_1YR_B19080-1/ACS_16_1YR_B19080.csv')

## Bottom 20 cutoff is HD01_VD02, top 5 cutoff is HD01_VD06
acs = acs[["GEO.id2","GEO.display-label","HD01_VD02","HD01_VD06"]].rename(columns = {"HD01_VD02":"Upper_Limit_Bottom_20","HD01_VD06":"Lower_Limit_Top_5"})

## We're only interested in the 100 biggest metros, so let's join the resident population table and
## only include the largest.
#
## Link to census: https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=PEP_2017_PEPANNRES&prodType=table
#
pop_16 = pandas.read_csv("PEP_2017_PEPANNRES-1/PEP_2017_PEPANNRES.csv",encoding = "ISO-8859-1")
acs = acs.join(pop_16[["GEO.id2","respop72016"]].set_index("GEO.id2"),on="GEO.id2")
acs = acs.sort_values("respop72016",ascending=False).reset_index(drop=True)
acs = acs.loc[0:99]

acs

Unnamed: 0,GEO.id2,GEO.display-label,Upper_Limit_Bottom_20,Lower_Limit_Top_5,respop72016
0,35620,"New York-Newark-Jersey City, NY-NJ-PA Metro Area",25717,,20275179
1,31080,"Los Angeles-Long Beach-Anaheim, CA Metro Area",25626,,13328261
2,16980,"Chicago-Naperville-Elgin, IL-IN-WI Metro Area",25921,245949.0,9546326
3,19100,"Dallas-Fort Worth-Arlington, TX Metro Area",28601,234781.0,7253424
4,26420,"Houston-The Woodlands-Sugar Land, TX Metro Area",25785,,6798010
5,47900,"Washington-Arlington-Alexandria, DC-VA-MD-WV M...",41076,,6150681
6,33100,"Miami-Fort Lauderdale-West Palm Beach, FL Metr...",21198,221668.0,6107433
7,37980,"Philadelphia-Camden-Wilmington, PA-NJ-DE-MD Me...",25571,246971.0,6077152
8,12060,"Atlanta-Sandy Springs-Roswell, GA Metro Area",26684,234699.0,5795723
9,14460,"Boston-Cambridge-Newton, MA-NH Metro Area",31367,,4805942


There are a few values missing from this table! The Census doesn't report estimates higher than $250,000. This is an example of [topcoding](link), which the Census says it does for privacy reasons. 16 of the cities in our dataset have topcoded lower limits to their top 5 percent of income earners.

In [31]:
acs[acs["Lower_Limit_Top_5"].isnull()]

Unnamed: 0,GEO.id2,GEO.display-label,Upper_Limit_Bottom_20,Lower_Limit_Top_5,respop72016
0,35620,"New York-Newark-Jersey City, NY-NJ-PA Metro Area",25717,,20275179
1,31080,"Los Angeles-Long Beach-Anaheim, CA Metro Area",25626,,13328261
4,26420,"Houston-The Woodlands-Sugar Land, TX Metro Area",25785,,6798010
5,47900,"Washington-Arlington-Alexandria, DC-VA-MD-WV M...",41076,,6150681
9,14460,"Boston-Cambridge-Newton, MA-NH Metro Area",31367,,4805942
10,41860,"San Francisco-Oakland-Hayward, CA Metro Area",36353,,4699077
14,42660,"Seattle-Tacoma-Bellevue, WA Metro Area",34576,,3802660
16,41740,"San Diego-Carlsbad, CA Metro Area",30265,,3317200
18,19740,"Denver-Aurora-Lakewood, CO Metro Area",32297,,2851848
20,12580,"Baltimore-Columbia-Towson, MD Metro Area",31209,,2801028


Luckily, we can use the [Public Use Microdata Sample (PUMS)](https://www.census.gov/programs-surveys/acs/technical-documentation/pums.html) to estimate these values ourselves.

According to the Census, PUMS is "a set of untabulated records about individual people or housing units. The Census Bureau produces the PUMS files so that data users can create custom tables that are not available through pretabulated (or summary) ACS data products."

PUMS are also topcoded, but they're topcoded at $999,999 [CHECK THIS] which should let us get a good estimate of the lower cut-off of the top 5 percent of households. In order to link the metro areas to the PUMS data, we need to map the GEO.id2 (FIPs codes) to the only geographic type provided in the PUMS data: Public Use Microdata Areas(PUMA).

To do this we use the Missouri Census Data Center [Geographic Correspondence Engine](http://mcdc.missouri.edu/websas/geocorr14.html). Let's pull the mappings for CBSAs in all states, because we'll want them later:

We'll be mapping PUMAs to CBSAs (MSAs are a kind of CBSA):

And most PUMAs are entirely within a CBSA, but for those that aren't, Geocorr will estimate the percentage of the PUMA's population contained within that CBSA based off of 2014 measurements:



In [34]:
# I don't think this link is permanent

geocorr = pandas.read_csv("http://mcdc.missouri.edu/tmpscratch/18JUN1549689.geocorr14/geocorr14.csv", encoding = "ISO-8859-1")[1:]


Unnamed: 0,state,puma12,cbsa,stab,cbsaname15,PUMAname,pop14,afact
0,FIPS state,puma12,CBSA current as of CBSAyr,State Postal Code,2015 CBSA Name,PUMA12 Name,Pop 2014 estimate fr county level ests,puma12 to cbsa alloc factor
1,01,00100,,AL,,"Lauderdale, Colbert, Franklin & Marion (Northe...",39326.125,0.21
2,01,00100,22520,AL,"Florence-Muscle Shoals, AL (Metro)","Lauderdale, Colbert, Franklin & Marion (Northe...",147639,0.79
3,01,00200,26620,AL,"Huntsville, AL (Metro)",Limestone & Madison (Outer) Counties--Huntsvil...,183944.849,1
4,01,00301,26620,AL,"Huntsville, AL (Metro)",Huntsville (North) & Madison (East) Cities,124425.297,1
5,01,00302,26620,AL,"Huntsville, AL (Metro)",Huntsville City (Central & South),106218.3,1
6,01,00400,22840,AL,"Fort Payne, AL (Micro)",DeKalb & Jackson Counties,71065,0.574
7,01,00400,42460,AL,"Scottsboro, AL (Micro)",DeKalb & Jackson Counties,52665,0.426
8,01,00500,10700,AL,"Albertville, AL (Micro)",Marshall & Madison (Southeast) Counties--Hunts...,94636,0.781
9,01,00500,26620,AL,"Huntsville, AL (Metro)",Marshall & Madison (Southeast) Counties--Hunts...,26497.554,0.219
