# Ann Arbor Inequality

The NYC Federal Reserve released a [paper](https://www.newyorkfed.org/medialibrary/media/research/epr/2019/epr_2019_wage-inequality_abel-deitz.pdf) on income inequality in Metropolitan Statistical Areas (MSAs), discussed [here](https://libertystreeteconomics.newyorkfed.org/2019/10/some-places-are-much-more-unequal-than-others.html) in a blog post.

Ann Arbor is ranked 14 out of the 15 most unequal MSAs in the country. I wanted to test whether this statistic included student incomes, and if this affected the inequality ratio.

In [1]:
import pandas as pd

I used the same data source as the Fed Study, IPUMS 2015 ACS data. The dataset I downloaded only contains individuals within the Ann Arbor MSA.

Ruggles, S., K. Genadek, R. Goeken, J. Grover, and M. Sobek. 2015. Integrated Public Use
Microdata Series: Version 6.0 [dataset]. Minneapolis: University of Minnesota. http://doi.org/10.18128/D010.V6.0

In [2]:
a2 = pd.read_csv("usa_00001.csv")

In [3]:
a2

Unnamed: 0,YEAR,SAMPLE,SERIAL,CBSERIAL,HHWT,MET2013,GQ,PERNUM,PERWT,SCHOOL,EMPSTAT,EMPSTATD,LABFORCE,INCWAGE
0,2015,201501,622532,2076,409,11460,1,1,409,1,1,10,2,114000
1,2015,201501,622532,2076,409,11460,1,2,129,1,1,10,2,23700
2,2015,201501,622532,2076,409,11460,1,3,329,2,0,0,0,999999
3,2015,201501,622540,2357,175,11460,1,1,176,1,3,30,1,0
4,2015,201501,622540,2357,175,11460,1,2,154,1,3,30,1,0
5,2015,201501,622542,2385,170,11460,1,1,170,1,2,20,2,53000
6,2015,201501,622548,2593,49,11460,1,1,50,1,1,10,2,50000
7,2015,201501,622575,3859,110,11460,1,1,109,1,3,30,1,700
8,2015,201501,622584,4115,120,11460,1,1,119,1,1,10,2,45000
9,2015,201501,622584,4115,120,11460,1,2,97,2,3,30,1,0


### School and Employment
To determine if there are potentially students in the wage data, I check the crosstab between school attendance and employment. 
#### School
| Value | Label             |
|-------|-------------------|
| 0     | N/A               |
| 1     | No, not in school |
| 2     | Yes, in school    |
| 9     | Missing           |

#### Empstat
| Value | Label              |
|-------|--------------------|
| 0     | N/A                |
| 1     | Employed           |
| 2     | Unemployed         |
| 3     | Not in Labor Force |

We want to see the intersection of School == 2 and Empstat == 1


In [5]:
pd.crosstab(a2.SCHOOL, a2.EMPSTAT)

EMPSTAT,0,1,2,3
SCHOOL,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,87,0,0,0
1,41,1281,44,690
2,365,308,28,305


There are 308 observations in the sample, so it is possible that the results are influenced.

### Student Population

In order to understand the potential impact, I measure the estimated number of students in the Ann Arbor MSA.

In [15]:
pop = a2.groupby("SCHOOL").sum()
pop["Pop Pct"] = pop["PERWT"]/pop["PERWT"].sum()*100
pop

Unnamed: 0_level_0,YEAR,SAMPLE,SERIAL,CBSERIAL,HHWT,MET2013,GQ,PERNUM,PERWT,EMPSTAT,EMPSTATD,LABFORCE,INCWAGE,Pop Pct
SCHOOL,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
0,175305,17530587,56087638,66719734,10389,997020,87,336,9866,0,0,0,86999913,2.755566
1,4142840,414286056,1324582655,1545816438,221638,23561760,2225,3297,228078,3439,34464,3340,121953359,63.701999
2,2027090,202710006,649240055,795112114,112967,11528760,1664,2504,120095,1279,12832,977,371750229,33.542435


Over 33% of Ann Arbor's population is students

### Wage Ratio

The original study uses a ratio of the 90th percentile of wages to the 10th percentile to measure inequality. To measure the impact of students, I create two different samples, one with all wage earners and one that excludes students.

In [49]:
a2_all = a2[a2["EMPSTAT"] == 1].sort_values(by="INCWAGE")
a2_ns = a2_all[a2_all["SCHOOL"] != 2]

In [50]:
a2_all["RunningTot"] = a2_all.cumsum()["PERWT"]
a2_all["RunningPct"] = a2_all["RunningTot"]/a2_all["RunningTot"].max()*100

In [51]:
a2_all[a2_all["RunningPct"]>=10]

Unnamed: 0,YEAR,SAMPLE,SERIAL,CBSERIAL,HHWT,MET2013,GQ,PERNUM,PERWT,SCHOOL,EMPSTAT,EMPSTATD,LABFORCE,INCWAGE,RunningTot,RunningPct
2606,2015,201501,658993,1261168,200,11460,1,3,93,2,1,10,2,3000,18434,10.004776
2199,2015,201501,653684,1079370,78,11460,1,6,150,2,1,12,2,3000,18584,10.086186
339,2015,201501,626751,144067,59,11460,4,1,59,2,1,10,2,3000,18643,10.118208
2515,2015,201501,657931,1225049,7,11460,4,1,7,2,1,10,2,3100,18650,10.122007
2503,2015,201501,657757,1219268,93,11460,1,1,93,1,1,10,2,3100,18743,10.172481
885,2015,201501,634854,425731,50,11460,1,1,51,1,1,10,2,3100,18794,10.200161
2443,2015,201501,657148,1197933,293,11460,1,3,284,2,1,10,2,3300,19078,10.354297
242,2015,201501,625614,106396,128,11460,1,4,120,2,1,10,2,3300,19198,10.419426
422,2015,201501,628401,203084,43,11460,4,1,43,2,1,10,2,3400,19241,10.442763
3084,2015,201501,665598,1487811,94,11460,1,2,88,1,1,10,2,3400,19329,10.490524


In [52]:
a2_all[a2_all["RunningPct"]>=90]

Unnamed: 0,YEAR,SAMPLE,SERIAL,CBSERIAL,HHWT,MET2013,GQ,PERNUM,PERWT,SCHOOL,EMPSTAT,EMPSTATD,LABFORCE,INCWAGE,RunningTot,RunningPct
452,2015,201501,628909,220585,106,11460,1,1,106,1,1,10,2,104000,165856,90.015848
921,2015,201501,635459,448262,144,11460,1,1,144,1,1,10,2,105000,166000,90.094002
2656,2015,201501,659450,1276867,65,11460,1,2,52,1,1,10,2,105000,166052,90.122224
2727,2015,201501,660499,1313367,93,11460,1,1,93,1,1,10,2,105000,166145,90.172698
1471,2015,201501,643453,724558,133,11460,1,1,132,1,1,10,2,107000,166277,90.244339
998,2015,201501,636431,481466,95,11460,1,1,95,1,1,10,2,108000,166372,90.295899
2043,2015,201501,651401,1001703,42,11460,1,1,42,1,1,10,2,108000,166414,90.318694
2145,2015,201501,653092,1059469,98,11460,1,2,79,1,1,10,2,108000,166493,90.361570
2442,2015,201501,657148,1197933,293,11460,1,2,231,1,1,10,2,108000,166724,90.486942
1125,2015,201501,638243,544146,27,11460,1,2,27,1,1,10,2,109000,166751,90.501596


In [53]:
104000/3000

34.666666666666664

This puts the 90/10 ratio at 34.67, well above the 6.6 given in the report.

In [54]:
a2_ns["RunningTot"] = a2_ns.cumsum()["PERWT"]
a2_ns["RunningPct"] = a2_ns["RunningTot"]/a2_ns["RunningTot"].max()*100

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [55]:
a2_ns[a2_ns["RunningPct"]>=10]

Unnamed: 0,YEAR,SAMPLE,SERIAL,CBSERIAL,HHWT,MET2013,GQ,PERNUM,PERWT,SCHOOL,EMPSTAT,EMPSTATD,LABFORCE,INCWAGE,RunningTot,RunningPct
332,2015,201501,626686,141961,69,11460,1,4,99,1,1,10,2,7000,14611,10.013364
1605,2015,201501,645614,798129,80,11460,1,1,81,1,1,10,2,7000,14692,10.068876
1072,2015,201501,637649,522715,41,11460,1,2,48,1,1,10,2,7000,14740,10.101772
520,2015,201501,629771,250304,93,11460,1,2,119,1,1,10,2,7000,14859,10.183326
2759,2015,201501,660712,1321006,26,11460,1,2,28,1,1,10,2,7000,14887,10.202515
2065,2015,201501,651919,1018693,168,11460,1,1,167,1,1,10,2,7200,15054,10.316965
2453,2015,201501,657254,1202403,134,11460,1,1,134,1,1,10,2,7400,15188,10.408800
2356,2015,201501,655995,1157833,155,11460,1,2,156,1,1,10,2,7800,15344,10.515711
2228,2015,201501,654037,1091831,94,11460,1,2,100,1,1,10,2,7800,15444,10.584244
187,2015,201501,625111,90330,64,11460,1,1,64,1,1,10,2,7900,15508,10.628105


In [56]:
a2_all[a2_all["RunningPct"]>=90]

Unnamed: 0,YEAR,SAMPLE,SERIAL,CBSERIAL,HHWT,MET2013,GQ,PERNUM,PERWT,SCHOOL,EMPSTAT,EMPSTATD,LABFORCE,INCWAGE,RunningTot,RunningPct
452,2015,201501,628909,220585,106,11460,1,1,106,1,1,10,2,104000,165856,90.015848
921,2015,201501,635459,448262,144,11460,1,1,144,1,1,10,2,105000,166000,90.094002
2656,2015,201501,659450,1276867,65,11460,1,2,52,1,1,10,2,105000,166052,90.122224
2727,2015,201501,660499,1313367,93,11460,1,1,93,1,1,10,2,105000,166145,90.172698
1471,2015,201501,643453,724558,133,11460,1,1,132,1,1,10,2,107000,166277,90.244339
998,2015,201501,636431,481466,95,11460,1,1,95,1,1,10,2,108000,166372,90.295899
2043,2015,201501,651401,1001703,42,11460,1,1,42,1,1,10,2,108000,166414,90.318694
2145,2015,201501,653092,1059469,98,11460,1,2,79,1,1,10,2,108000,166493,90.361570
2442,2015,201501,657148,1197933,293,11460,1,2,231,1,1,10,2,108000,166724,90.486942
1125,2015,201501,638243,544146,27,11460,1,2,27,1,1,10,2,109000,166751,90.501596


In [57]:
104000/7000

14.857142857142858

This puts the 90/10 wage ratio at 14.9, still above what was reported in the paper. 