# Historical Poverty Data by State

This is just a very quick notebook to pad the years in an otherwise manually processed Excel table.  The table includes total  and impoverished population information over the 1980-2014 period.  The only reason I am using this Notebook at all is to take note of where I found it, and the relevant [footnotes](http://www.census.gov/hhes/www/poverty/histpov/footnotes.html).

In [1]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

These data come from [Table 21 of the Census Historical Poverty Tables](https://www.census.gov/hhes/www/poverty/data/historical/people.html).  Note that the data had two subsets for 2013.  The processed CSV file to be read in here retained the subset consistent with the 2013 ASEC.  The other one, which reflects redesigned income questions, was dropped.  The reason for this decision is the current analysis window for the Medicaid imputation project extends from 1980-2011.  We may be able to squeeze a couple more years out, but the game changes dramatically in 2014 (due to the Affordable Care Act).  Consequently, consistency won out.  The relevant footnotes are here:

    18. Data are based on the CPS ASEC sample of 68,000 addresses. The 2014 CPS ASEC included redesigned questions for income and health insurance coverage. All of the approximately 98,000 addresses were eligible to receive the redesigned set of health insurance coverage questions. The redesigned income questions were implemented to a subsample of these 98,000 addresses using a probability split panel design. Approximately 68,000 addresses were eligible to receive a set of income questions similar to those used in the 2013 CPS ASEC and the remaining 30,000 addresses were eligible to receive the redesigned income questions. The source of the 2013 data for this table is the portion of the CPS ASEC sample which received the income questions consistent with the 2013 CPS ASEC, approximately 68,000 addresses.

    19. The 2014 CPS ASEC included redesigned questions for income and health insurance coverage. All of the approximately 98,000 addresses were eligible to receive the redesigned set of health insurance coverage questions. The redesigned income questions were implemented to a subsample of these 98,000 addresses using a probability split panel design. Approximately 68,000 addresses were eligible to receive a set of income questions similar to those used in the 2013 CPS ASEC and the remaining 30,000 addresses were eligible to receive the redesigned income questions. The source of data for this table is the portion of the CPS ASEC sample which received the redesigned income questions, approximately 30,000 addresses.
    
The remaining, applicable footnotes relate to unavoidable changes that usually occur at decennial Census years.

In [4]:
#Read in data
pov=pd.read_csv('pov_by_state_input_1980_2014.csv',skiprows=1).rename(columns={'Year':'year'})

#Drop rows with out data (will be missing value for state)
pov=pov[pov['state'].notnull()]

#Replace DC abbreviations
pov=pov.replace('DC','District of Columbia')
pov=pov.replace('D.C.','District of Columbia')

#Pad years
pov=pov.ffill()

print pov.info()

pov.head(10)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1785 entries, 0 to 1784
Data columns (total 7 columns):
state         1785 non-null object
pop           1785 non-null float64
num_pov       1785 non-null float64
st_err_num    1785 non-null float64
pct_pov       1785 non-null float64
st_err_pct    1785 non-null float64
year          1785 non-null float64
dtypes: float64(6), object(1)
memory usage: 111.6+ KB
None


Unnamed: 0,state,pop,num_pov,st_err_num,pct_pov,st_err_pct,year
0,Alabama,4765,848,53,17.806,1.105,2014
1,Alaska,694,82,8,11.853,1.17,2014
2,Arizona,6657,1409,76,21.164,1.147,2014
3,Arkansas,2891,532,44,18.402,1.533,2014
4,California,38666,6112,218,15.807,0.563,2014
5,Colorado,5376,661,72,12.297,1.347,2014
6,Connecticut,3577,308,44,8.605,1.216,2014
7,Delaware,929,103,10,11.043,1.033,2014
8,District of Columbia,657,125,9,19.024,1.328,2014
9,Florida,19694,3282,150,16.667,0.761,2014


In [5]:
sorted(set(pov['state']))

['Alabama',
 'Alaska',
 'Arizona',
 'Arkansas',
 'California',
 'Colorado',
 'Connecticut',
 'Delaware',
 'District of Columbia',
 'Florida',
 'Georgia',
 'Hawaii',
 'Idaho',
 'Illinois',
 'Indiana',
 'Iowa',
 'Kansas',
 'Kentucky',
 'Louisiana',
 'Maine',
 'Maryland',
 'Massachusetts',
 'Michigan',
 'Minnesota',
 'Mississippi',
 'Missouri',
 'Montana',
 'Nebraska',
 'Nevada',
 'New Hampshire',
 'New Jersey',
 'New Mexico',
 'New York',
 'North Carolina',
 'North Dakota',
 'Ohio',
 'Oklahoma',
 'Oregon',
 'Pennsylvania',
 'Rhode Island',
 'South Carolina',
 'South Dakota',
 'Tennessee',
 'Texas',
 'Utah',
 'Vermont',
 'Virginia',
 'Washington',
 'West Virginia',
 'Wisconsin',
 'Wyoming']

Let's map in postal abbreviations as well.

In [8]:
#Read in mapping
st_map=pd.read_csv('https://raw.githubusercontent.com/jasonong/List-of-US-States/master/states.csv')

#Convert to dict
st_dict=dict(zip(st_map['State'],st_map['Abbreviation']))

#Include DC
st_dict.update({'District of Columbia':'DC',
                'DC':'DC'})

#Generate new state variable
if 'state' in pov.columns:
    pov['st']=pov['state'].map(st_dict)

#Set year and state to index
if 'year' in pov.columns:
    pov.set_index(['state','year'],inplace=True)
    
#Sort index
pov.sortlevel(0,inplace=True)

print pov.to_string()

                                    pop      num_pov  st_err_num    pct_pov  st_err_pct  st
state                year                                                                  
Alabama              1980   3831.000000   810.000000   80.000000  21.200000    1.900000  AL
                     1981   3878.000000   935.000000   86.000000  24.100000    1.900000  AL
                     1982   3937.000000   849.000000   82.000000  21.600000    1.800000  AL
                     1983   3950.000000   909.000000   89.000000  23.000000    1.980000  AL
                     1984   3875.000000   738.000000   76.000000  19.100000    1.800000  AL
                     1985   3981.000000   821.000000   82.000000  20.600000    2.100000  AL
                     1986   4025.000000   959.000000   87.000000  23.800000    2.200000  AL
                     1987   3989.000000   851.000000   86.000000  21.300000    2.200000  AL
                     1988   4015.000000   775.000000   83.000000  19.300000    2

Ok, now we can write this to disk.

In [9]:
pov.to_csv('pov_by_state_1980_2014.csv')

In [26]:
# print pov.to_string()