This is the notebook to establish the data required to create descriptive statistics of the debt data. Essentially this data will be used to tell the narrative of the story. To be useful the data will need to be deflated and then expressed in percapita terms. It will liklely need to be aggregated at the state, region, and national level with the debt separated by general obligation and revenue bonds.

In [1]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
import glob
import requests

# Import State and Local Price deflators

State and local price deflators were downloaded [here](https://research.stlouisfed.org/fred2/series/A829RD3A086NBEA#)

In [2]:
defl = pd.read_excel('../data/state_local_deflators.xls', header=10 )

In [3]:
defl.rename(columns={'A829RD3A086NBEA': 'SLDefl'}, inplace=True)

In [4]:
defl.head()

Unnamed: 0,observation_date,SLDefl
0,1929-01-01,3.968
1,1930-01-01,3.862
2,1931-01-01,3.632
3,1932-01-01,3.258
4,1933-01-01,3.372


In [5]:
defl['yr'] = defl.observation_date.map(lambda x: x.strftime('%Y'))

In [6]:
defl.drop('observation_date', inplace=True, axis = 1)

Now I need to convert the index into a multiplier. Since 2009 is the base year, we need the value for 2009.

In [7]:
defl.loc[defl['yr']=='2009']

Unnamed: 0,SLDefl,yr
80,100,2009



The formula for the index will be Base/Comparison

In [8]:
defl['index'] = 100/defl['SLDefl']

In [9]:
defl.head()

Unnamed: 0,SLDefl,yr,index
0,3.968,1929,25.201613
1,3.862,1930,25.89332
2,3.632,1931,27.53304
3,3.258,1932,30.693677
4,3.372,1933,29.655991


Now I need to import the personal income and population data from [here](http://www.bea.gov/regional/downloadzip.cfm)

In [10]:
inc = pd.read_csv('../data/SA1_1929_2014.csv')

In [11]:
inc['GeoFIPS'] = inc['GeoFIPS'].apply(lambda x: x.zfill(5))

inc['ST']= inc['GeoFIPS'].str.extract('(..)')

In [12]:
inc.head()

Unnamed: 0,GeoFIPS,GeoName,Region,Table,LineCode,IndustryClassification,Description,1929,1930,1931,...,2006,2007,2008,2009,2010,2011,2012,2013,2014,ST
0,0,United States,,SA1,1,...,Personal income (thousands of dollars),85126000,76371000,65507000,...,11381350000,11995419000,12492705000,12079444000,12459613000,13233436000,13904485000,14064468000,14683147000,0
1,0,United States,,SA1,2,...,Population (persons) 1/,121769000,123075000,124038000,...,298379912,301231207,304093966,306771529,309347057,311721632,314112078,316497531,318857056,0
2,0,United States,,SA1,3,...,Per capita personal income (dollars) 2/,699,621,528,...,38144,39821,41082,39376,40277,42453,44266,44438,46049,0
3,1000,Alabama,5.0,SA1,1,...,Personal income (thousands of dollars),842894,697154,583342,...,146661249,153787754,159993535,157141435,163066901,169030399,173601429,174876574,181908767,1
4,1000,Alabama,5.0,SA1,2,...,Population (persons) 1/,2644000,2647000,2649000,...,4628981,4672840,4718206,4757938,4785822,4801695,4817484,4833996,4849377,1


1 - Personal Income
2 - Population
3 - Per Capita Personal Income

In [13]:
droplist = (['Table','GeoName','Description','Region','IndustryClassification'])

inc.drop(droplist, inplace = True, axis = 1)

This will drop the notes in the dataframe, as well as the unnecessary years

In [14]:
inc = inc.drop(inc.index[[180,181,182,183]])

In [15]:
inc.drop(['1929', '1930','1931','1932','1933','1934','1935','1936','1937','1938','1939','1940',\
            '1941','1942','1943','1944','1945','1946','1947','1948','1949','1950','1951','1952',\
            '1953','1954','1955','1956','1957','1958','1959','1960','1961','1962','1963','1964',\
            '1965','1966','1967','1968','1969','1970','1971','1972','1973','1974','1975','1976',\
            '1977','1978','1979','1980','1981','1982','1983'], axis = 1, inplace = True)

In [16]:
perinc = inc[inc['LineCode']==1]
perpop = inc[inc['LineCode']==2]
perpopinc = inc[inc['LineCode']==3]


In [17]:
perinc.drop(['LineCode', 'GeoFIPS'], inplace = True, axis = 1)
perpop.drop(['LineCode', 'GeoFIPS'], inplace = True, axis = 1)
perpopinc.drop(['LineCode', 'GeoFIPS'], inplace = True, axis = 1)

A value is trying to be set on a copy of a slice from a DataFrame

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':
A value is trying to be set on a copy of a slice from a DataFrame

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from IPython.kernel.zmq import kernelapp as app
A value is trying to be set on a copy of a slice from a DataFrame

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()


I need to transpose this so that each year is a row

In [18]:
perpop.head()

Unnamed: 0,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,...,2006,2007,2008,2009,2010,2011,2012,2013,2014,ST
1,235824907,237923734,240132831,242288936,244499004,246819222,249622814,252980941,256514224,259918588,...,298379912,301231207,304093966,306771529,309347057,311721632,314112078,316497531,318857056,0
4,3951824,3972520,3991569,4015262,4023842,4030219,4050055,4099156,4154014,4214202,...,4628981,4672840,4718206,4757938,4785822,4801695,4817484,4833996,4849377,1
7,513704,532496,544269,539310,541984,547160,553290,570193,588736,599432,...,675302,680300,687455,698895,713856,722572,731081,737259,736732,2
10,3067134,3183539,3308261,3437103,3535183,3622184,3684097,3788576,3915740,4065440,...,6029141,6167681,6280362,6343154,6411999,6472867,6556236,6634997,6731484,4
13,2319767,2327046,2331988,2342357,2342655,2346354,2356586,2383144,2415984,2456303,...,2821761,2848650,2874554,2896843,2922297,2938430,2949300,2958765,2966369,5


In [19]:
perinc = perinc.set_index(['ST'])

In [20]:
t = perinc.stack()

In [21]:
df = pd.read_csv('../data/debt_out.csv')

This is the list of variables (largly costats) that are not required for this analysis

In [22]:
droplist =(['FIPSST','POP_TH18','POP_OV65','TOT_EMP','MFG_EMP','RETL_EMP','T_PUB_SCH',\
            'PUB_SCHL','PVT_SCHL','HSLD_PERS','HSG_UNITS','CH_HS_UNT','PRE_1940','VACANT',\
            'MDHOMEVAL','MED_INC','PC_INC','LANDAREA','GEN_REV','IGR_ST','TAX_REV','PT_REV',\
            'D_GEN_EXP','PC_GEN_EXP','RES_POP','PERS_POVT','SS_PERS','SS_PMT','POVT_PCT','FIPSCO',\
            'ST_MED_H_V','ST_MED_INC','ST_PC_INC','ST_L_GREV','ST_IGR_ST','ST_L_TAX','ST_PT_REV',\
            'ST_L_EXP','ST_P_LEXP','FIPST_N','RATE_L','RATE_L2','MRATE_L','CRATE_L','SRATE_L',\
            'SRATE_L2','MRATE_L2','CRATE_L2','MLEVY_L','CLEVY_L','CLEVY_L2','MLEVY_L2','MLEVY_L3',\
            'SLEVY_L','CLEVY_L3','CLEVY_L4','MLEVY_L4','SLEVY_L2','SLEVY_L3','SLEVY_L4','ASMT_L',\
            'ASMT_L2','ASMT_L3','CREVU_L','MREVU_L','SREVU_L','CGEXP_L','MGEXP_L','SGEXP_L','SGEXP_L2',\
            'MGEXP_L2','CFDISC_L','MFDISC_L','SFDISC_L','HOME_STEAD','HOME_STEAD2','HOME_STEAD3','CB_E',\
            'CB_G','CB_G2','CB_E2','CB_E3','CB_E4','FFDISC_L','TYPE1','TYPE2','BOTH','LIMITS',\
            'TYPE2_Y','SPC_RATE','LEVY_L','REVU_L','GEXP_L','GP_RATE','GP_LEVY','GP_REVU','GP_GEXP',\
            'GP_LMT','SC_LMT','TREND','FOOD_SERV_EMP_PNFARM','PRV_SCHL_KIND','OTH_SERV_EMP_PNFARM',\
            'PRV_SCHL_9_12','MANU_EMP_PNFARM','PUB_SCHL_OV3_M','PRV_SCHL_9_12_F','PRV_SCHL_ELEM_HS',\
            'PUB_SCHL_ELEM_HS','RES_POP1','PUB_SCHL_OV3_F','PRV_SCHL_PREK_F','RETL_EMP_PNFARM',\
            'PROF_SERV_EMP_PNFARM','PRV_SCHL_KIND_M','PRV_SCHL_PREK_M','SUPP_SERV_EMP_PNFARM',\
            'POV_EST_FAM_NUMER','EDUC_SERV_EMP_PNFARM','PRV_SCHL_PREK','PRV_SCHL_KIND_F',\
            'PRV_SCHL_1_8','PRV_SCHL_5_8_M','PUB_SCHL_TOT','PRV_SCHL_1_4_M','HSG_UNITS_ACS',\
            'TOT_AREA','POV_EST_FAM_DENOM','PUB_SCHL_OV3','RES_POP2','PRV_SCHL_9_12_M',\
            'PRV_SCHL_1_4_F','PRV_SCHL_5_8_F','TOT_EMP_PNFARM','RESPOP','DENSITY','POPGROWTH',\
            'PYOUNG','POP65','RESPOP2','PRE1940','PVT_SCH_02','PVT_SCH03_','POVERTY','PC_SSI',\
            'DIVERSITY','EMP_RES','MANU_RES','RETL_RES','SERV_RES'])

In [26]:
# df.drop(droplist, inplace = True, axis = 1)

In [27]:
df['FIPS']=df.FIPS.astype('str')
df['Year']=df.Year.astype('str')

In [28]:
df['FIPS'] = df['FIPS'].apply(lambda x: x.zfill(5))

This establishes the list of dollar values that need to be deflated

In [30]:
dollar_list =(['GO','RV','GO_City, Town Vlg','GO_Co-op Utility',\
               'GO_College or Univ','GO_County/Parish','GO_Direct Issuer',\
               'GO_District','GO_Indian Tribe','GO_Local Authority',\
               'GO_State Authority','GO_State/Province','RV_City, Town Vlg',\
               'RV_Co-op Utility','RV_College or Univ','RV_County/Parish',\
               'RV_Direct Issuer','RV_District','RV_Indian Tribe','RV_Local Authority',\
               'RV_State Authority','RV_State/Province','GO_Development','GO_Education',\
               'GO_Electric Power','GO_Environmental Facilities','GO_General Purpose',\
               'GO_Healthcare','GO_Housing','GO_Public Facilities','GO_Transportation',\
               'GO_Utilities','RV_Development','RV_Education','RV_Electric Power',\
               'RV_Environmental Facilities','RV_General Purpose','RV_Healthcare',\
               'RV_Housing','RV_Public Facilities','RV_Transportation','RV_Utilities'])

In [31]:
defl.drop('SLDefl', inplace = True, axis = 1)

In [32]:
df = pd.merge(defl, df, left_on = 'yr', right_on = 'Year', how = 'right')

This will deflate the $$$'s'

In [33]:
for i in dollar_list:
    df[i] = df[i] * df['index']

I think it is best to break the FIPS into the first two digits which is the state

In [34]:
df['ST']= df['FIPS'].str.extract('(..)')

In [35]:
df.drop(['yr','index'], inplace = True, axis = 1)

In [36]:
state_agg = df.groupby(['Year', 'ST']).sum()

In [37]:
state_agg.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,GO,RV,"GO_City, Town Vlg",GO_Co-op Utility,GO_College or Univ,GO_County/Parish,GO_Direct Issuer,GO_District,GO_Indian Tribe,GO_Local Authority,...,LEVY_L,REVU_L,GEXP_L,GP_RATE,GP_LEVY,GP_REVU,GP_GEXP,GP_LMT,SC_LMT,TREND
Year,ST,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
1984,1,371.058034,3904.337431,202.035888,0,0,162.98452,0,6.037626,0,0.0,...,0,0,0,0,0,0,0,0,0,540
1984,2,79.769121,675.042867,40.246818,0,0,0.0,0,0.0,0,0.0,...,2,0,0,2,2,0,0,2,0,30
1984,4,972.224503,2282.55609,293.945468,0,0,123.39701,0,554.882025,0,0.0,...,13,0,13,0,13,0,13,13,13,195
1984,5,35.027894,394.870433,7.245152,0,0,0.0,0,27.782742,0,0.0,...,0,0,0,0,0,0,0,0,0,345
1984,6,8639.488492,17662.146497,747.347067,0,0,1991.689811,0,1328.548313,0,780.18934,...,0,0,41,0,0,0,41,41,41,615


At this point, I have the income, and population, and percap family income in three separate dataframes. perinc, perpop, and perpopinc.

The debt data has been deflated and is in the dataframe labelled df.

At this point, I need to aggregate the debt data and then divide by the per capita numbers at different levels of analysis. I have the data aggregated at the state level in state_agg.



In [40]:
perinc.head()

Unnamed: 0_level_0,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,...,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014
ST,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,3268535000,3501927000,3712243000,3940859000,4260753000,4603969000,4890453000,5055766000,5402109000,5639780000,...,10610320000,11381350000,11995419000,12492705000,12079444000,12459613000,13233436000,13904485000,14064468000,14683147000
1,42649118,45942335,48627446,51486400,55388090,60312581,64070364,67662101,72847428,76199461,...,138019022,146661249,153787754,159993535,157141435,163066901,169030399,173601429,174876574,181908767
2,9961459,10772149,10868683,10430514,10875485,11938954,12650093,13199663,14093288,14837766,...,25691736,27380164,29516606,32496796,32283659,34103385,36527487,38213009,37791031,39792685
4,39699293,44054015,48294793,52226134,56265019,60486975,63383028,66575466,70987974,76430689,...,189357627,210103296,221596785,226573627,216064765,219195775,230920326,241192186,243656863,255092928
5,24474975,26082631,27399821,28476005,30546100,32776869,34487482,36434402,39744696,41722523,...,77635263,83182071,88819878,93232570,91625136,93486029,99791639,107032727,108080656,112076107


In [41]:
perpop.head()

Unnamed: 0,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,...,2006,2007,2008,2009,2010,2011,2012,2013,2014,ST
1,235824907,237923734,240132831,242288936,244499004,246819222,249622814,252980941,256514224,259918588,...,298379912,301231207,304093966,306771529,309347057,311721632,314112078,316497531,318857056,0
4,3951824,3972520,3991569,4015262,4023842,4030219,4050055,4099156,4154014,4214202,...,4628981,4672840,4718206,4757938,4785822,4801695,4817484,4833996,4849377,1
7,513704,532496,544269,539310,541984,547160,553290,570193,588736,599432,...,675302,680300,687455,698895,713856,722572,731081,737259,736732,2
10,3067134,3183539,3308261,3437103,3535183,3622184,3684097,3788576,3915740,4065440,...,6029141,6167681,6280362,6343154,6411999,6472867,6556236,6634997,6731484,4
13,2319767,2327046,2331988,2342357,2342655,2346354,2356586,2383144,2415984,2456303,...,2821761,2848650,2874554,2896843,2922297,2938430,2949300,2958765,2966369,5


In [42]:
perpopinc.head()

Unnamed: 0,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,...,2006,2007,2008,2009,2010,2011,2012,2013,2014,ST
2,13860,14719,15459,16265,17426,18653,19591,19985,21060,21698,...,38144,39821,41082,39376,40277,42453,44266,44438,46049,0
5,10792,11565,12183,12823,13765,14965,15820,16506,17537,18082,...,31683,32911,33910,33027,34073,35202,36036,36176,37512,1
8,19391,20230,19969,19340,20066,21820,22863,23149,23938,24753,...,40545,43388,47271,46192,47773,50552,52269,51259,54012,2
11,12943,13838,14598,15195,15916,16699,17204,17573,18129,18800,...,34848,35929,36077,34063,34185,35675,36788,36723,37895,4
14,10551,11208,11750,12157,13039,13969,14635,15288,16451,16986,...,29479,31180,32434,31629,31991,33961,36291,36529,37782,5


In [43]:
df.head()

Unnamed: 0,Year,FIPS,GO,RV,"GO_City, Town Vlg",GO_Co-op Utility,GO_College or Univ,GO_County/Parish,GO_Direct Issuer,GO_District,...,REVU_L,GEXP_L,GP_RATE,GP_LEVY,GP_REVU,GP_GEXP,GP_LMT,SC_LMT,TREND,ST
0,1984,1000,0.0,1927.094453,0,0,0,0.0,0,0,...,0,0,0,0,0,0,0,0,15,1
1,1984,1001,0.0,3.924457,0,0,0,0.0,0,0,...,0,0,0,0,0,0,0,0,15,1
2,1984,1003,15.335571,56.34313,0,0,0,15.335571,0,0,...,0,0,0,0,0,0,0,0,15,1
3,1984,1007,0.0,0.96602,0,0,0,0.0,0,0,...,0,0,0,0,0,0,0,0,15,1
4,1984,1021,2.415051,3.441447,0,0,0,2.415051,0,0,...,0,0,0,0,0,0,0,0,15,1


In [44]:
state_agg.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,GO,RV,"GO_City, Town Vlg",GO_Co-op Utility,GO_College or Univ,GO_County/Parish,GO_Direct Issuer,GO_District,GO_Indian Tribe,GO_Local Authority,...,LEVY_L,REVU_L,GEXP_L,GP_RATE,GP_LEVY,GP_REVU,GP_GEXP,GP_LMT,SC_LMT,TREND
Year,ST,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
1984,1,371.058034,3904.337431,202.035888,0,0,162.98452,0,6.037626,0,0.0,...,0,0,0,0,0,0,0,0,0,540
1984,2,79.769121,675.042867,40.246818,0,0,0.0,0,0.0,0,0.0,...,2,0,0,2,2,0,0,2,0,30
1984,4,972.224503,2282.55609,293.945468,0,0,123.39701,0,554.882025,0,0.0,...,13,0,13,0,13,0,13,13,13,195
1984,5,35.027894,394.870433,7.245152,0,0,0.0,0,27.782742,0,0.0,...,0,0,0,0,0,0,0,0,0,345
1984,6,8639.488492,17662.146497,747.347067,0,0,1991.689811,0,1328.548313,0,780.18934,...,0,0,41,0,0,0,41,41,41,615


In [45]:
perinc.stack()

ST      
00  1984     3268535000
    1985     3501927000
    1986     3712243000
    1987     3940859000
    1988     4260753000
    1989     4603969000
    1990     4890453000
    1991     5055766000
    1992     5402109000
    1993     5639780000
    1994     5930316000
    1995     6275761000
    1996     6661697000
    1997     7075132000
    1998     7588703000
    1999     7988183000
    2000     8634847000
    2001     8987890000
    2002     9150761000
    2003     9484225000
    2004    10047876000
    2005    10610320000
    2006    11381350000
    2007    11995419000
    2008    12492705000
    2009    12079444000
    2010    12459613000
    2011    13233436000
    2012    13904485000
    2013    14064468000
               ...     
98  1985      586907512
    1986      628744627
    1987      674634044
    1988      733923275
    1989      796053804
    1990      857059080
    1991      892213369
    1992      946821359
    1993      977869636
    1994     1016044964
    199

#STUCK

I have been unable to get the income and population data cobind with the debt data.

In [38]:
We will be using the BEA region definitions that are found [here](http://www.bea.gov/regional/docs/regions.cfm). This file was added to the data file and labeled region.

SyntaxError: invalid syntax (<ipython-input-38-8a1ef00df74b>, line 1)