# Historical Poverty Data by State

This is just a very quick notebook to pad the years in an otherwise manually processed Excel table.  The table includes total  and impoverished population information over the 1980-2014 period.  The only reason I am using this Notebook at all is to take note of where I found it, and the relevant [footnotes](http://www.census.gov/hhes/www/poverty/histpov/footnotes.html).

In [21]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

These data come from [Table 21 of the Census Historical Poverty Tables](https://www.census.gov/hhes/www/poverty/data/historical/people.html).  Note that the data had two subsets for 2013.  The processed CSV file to be read in here retained the subset consistent with the 2013 ASEC.  The other one, which reflects redesigned income questions, was dropped.  The reason for this decision is the current analysis window for the Medicaid imputation project extends from 1980-2011.  We may be able to squeeze a couple more years out, but the game changes dramatically in 2014 (due to the Affordable Care Act).  Consequently, consistency won out.  The relevant footnotes are here:

    18. Data are based on the CPS ASEC sample of 68,000 addresses. The 2014 CPS ASEC included redesigned questions for income and health insurance coverage. All of the approximately 98,000 addresses were eligible to receive the redesigned set of health insurance coverage questions. The redesigned income questions were implemented to a subsample of these 98,000 addresses using a probability split panel design. Approximately 68,000 addresses were eligible to receive a set of income questions similar to those used in the 2013 CPS ASEC and the remaining 30,000 addresses were eligible to receive the redesigned income questions. The source of the 2013 data for this table is the portion of the CPS ASEC sample which received the income questions consistent with the 2013 CPS ASEC, approximately 68,000 addresses.

    19. The 2014 CPS ASEC included redesigned questions for income and health insurance coverage. All of the approximately 98,000 addresses were eligible to receive the redesigned set of health insurance coverage questions. The redesigned income questions were implemented to a subsample of these 98,000 addresses using a probability split panel design. Approximately 68,000 addresses were eligible to receive a set of income questions similar to those used in the 2013 CPS ASEC and the remaining 30,000 addresses were eligible to receive the redesigned income questions. The source of data for this table is the portion of the CPS ASEC sample which received the redesigned income questions, approximately 30,000 addresses.
    
The remaining, applicable footnotes relate to unavoidable changes that usually occur at decennial Census years.

In [22]:
#Read in data
pov=pd.read_csv('pov_by_state_input_1980_2014.csv',skiprows=1).rename(columns={'Year':'year'})

#Drop rows with out data (will be missing value for state)
pov=pov[pov['state'].notnull()]

#Pad years
pov=pov.ffill()

print pov.info()

pov.head(10)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1785 entries, 0 to 1784
Data columns (total 7 columns):
state         1785 non-null object
pop           1785 non-null float64
num_pov       1785 non-null float64
st_err_num    1785 non-null float64
pct_pov       1785 non-null float64
st_err_pct    1785 non-null float64
year          1785 non-null float64
dtypes: float64(6), object(1)
memory usage: 111.6+ KB
None


Unnamed: 0,state,pop,num_pov,st_err_num,pct_pov,st_err_pct,year
0,Alabama,4765,848,53,17.806,1.105,2014
1,Alaska,694,82,8,11.853,1.17,2014
2,Arizona,6657,1409,76,21.164,1.147,2014
3,Arkansas,2891,532,44,18.402,1.533,2014
4,California,38666,6112,218,15.807,0.563,2014
5,Colorado,5376,661,72,12.297,1.347,2014
6,Connecticut,3577,308,44,8.605,1.216,2014
7,Delaware,929,103,10,11.043,1.033,2014
8,DC,657,125,9,19.024,1.328,2014
9,Florida,19694,3282,150,16.667,0.761,2014


Let's map in postal abbreviations as well.

In [23]:
#Read in mapping
st_map=pd.read_csv('https://raw.githubusercontent.com/jasonong/List-of-US-States/master/states.csv')

#Convert to dict
st_dict=dict(zip(st_map['State'],st_map['Abbreviation']))

#Include DC
st_dict.update({'District of Columbia':'DC',
                'DC':'DC'})

#Generate new state variable
pov['st']=pov['state'].map(st_dict)

#Set year and state to index
if 'year' in pov.columns:
    pov.set_index(['state','year'],inplace=True)
    
#Sort index
pov.sortlevel(0,inplace=True)

pov.head(51)

Unnamed: 0_level_0,Unnamed: 1_level_0,pop,num_pov,st_err_num,pct_pov,st_err_pct,st
state,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Alabama,1980,3831.0,810.0,80.0,21.2,1.9,AL
Alabama,1981,3878.0,935.0,86.0,24.1,1.9,AL
Alabama,1982,3937.0,849.0,82.0,21.6,1.8,AL
Alabama,1983,3950.0,909.0,89.0,23.0,1.98,AL
Alabama,1984,3875.0,738.0,76.0,19.1,1.8,AL
Alabama,1985,3981.0,821.0,82.0,20.6,2.1,AL
Alabama,1986,4025.0,959.0,87.0,23.8,2.2,AL
Alabama,1987,3989.0,851.0,86.0,21.3,2.2,AL
Alabama,1988,4015.0,775.0,83.0,19.3,2.1,AL
Alabama,1989,4074.0,770.0,83.0,18.9,2.0,AL


Ok, now we can write this to disk.

In [24]:
pov.to_csv('pov_by_state_1980_2014.csv')

In [26]:
# print pov.to_string()