This is the notebook to establish the data required to create descriptive statistics of the debt data. Essentially this data will be used to tell the narrative of the story. To be useful the data will need to be deflated and then expressed in percapita terms. It will liklely need to be aggregated at the state, region, and national level with the debt separated by general obligation and revenue bonds.

In [1]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
import glob
import requests

# Import State and Local Price deflators

State and local price deflators were downloaded [here](https://research.stlouisfed.org/fred2/series/A829RD3A086NBEA#)

In [2]:
defl = pd.read_excel('../data/state_local_deflators.xls', header=10 )

In [3]:
defl.rename(columns={'A829RD3A086NBEA': 'SLDefl'}, inplace=True)

In [4]:
defl.head()

Unnamed: 0,observation_date,SLDefl
0,1929-01-01,3.968
1,1930-01-01,3.862
2,1931-01-01,3.632
3,1932-01-01,3.258
4,1933-01-01,3.372


In [5]:
defl['yr'] = defl.observation_date.map(lambda x: x.strftime('%Y'))

In [6]:
defl.drop('observation_date', inplace=True, axis = 1)

Now I need to convert the index into a multiplier. Since 2009 is the base year, we need the value for 2009.

In [7]:
defl.loc[defl['yr']=='2009']

Unnamed: 0,SLDefl,yr
80,100,2009



The formula for the index will be Base/Comparison

In [8]:
defl['index'] = 100/defl['SLDefl']

In [9]:
defl.head()

Unnamed: 0,SLDefl,yr,index
0,3.968,1929,25.201613
1,3.862,1930,25.89332
2,3.632,1931,27.53304
3,3.258,1932,30.693677
4,3.372,1933,29.655991


Now I need to import the personal income and population data from [here](http://www.bea.gov/regional/downloadzip.cfm)

In [10]:
inc = pd.read_csv('../data/SA1_1929_2014.csv')

In [11]:
inc['GeoFIPS'] = inc['GeoFIPS'].apply(lambda x: x.zfill(5))

inc['ST']= inc['GeoFIPS'].str.extract('(..)')

In [12]:
inc.head()

Unnamed: 0,GeoFIPS,GeoName,Region,Table,LineCode,IndustryClassification,Description,1929,1930,1931,...,2006,2007,2008,2009,2010,2011,2012,2013,2014,ST
0,0,United States,,SA1,1,...,Personal income (thousands of dollars),85126000,76371000,65507000,...,11381350000,11995419000,12492705000,12079444000,12459613000,13233436000,13904485000,14064468000,14683147000,0
1,0,United States,,SA1,2,...,Population (persons) 1/,121769000,123075000,124038000,...,298379912,301231207,304093966,306771529,309347057,311721632,314112078,316497531,318857056,0
2,0,United States,,SA1,3,...,Per capita personal income (dollars) 2/,699,621,528,...,38144,39821,41082,39376,40277,42453,44266,44438,46049,0
3,1000,Alabama,5.0,SA1,1,...,Personal income (thousands of dollars),842894,697154,583342,...,146661249,153787754,159993535,157141435,163066901,169030399,173601429,174876574,181908767,1
4,1000,Alabama,5.0,SA1,2,...,Population (persons) 1/,2644000,2647000,2649000,...,4628981,4672840,4718206,4757938,4785822,4801695,4817484,4833996,4849377,1


1 - Personal Income
2 - Population
3 - Per Capita Personal Income

In [13]:
droplist = (['Table','GeoName','Description','IndustryClassification'])

inc.drop(droplist, inplace = True, axis = 1)

This will drop the notes in the dataframe, as well as the unnecessary years

In [14]:
inc = inc.drop(inc.index[[180,181,182,183]])

In [15]:
inc.drop(['1929', '1930','1931','1932','1933','1934','1935','1936','1937','1938','1939','1940',\
            '1941','1942','1943','1944','1945','1946','1947','1948','1949','1950','1951','1952',\
            '1953','1954','1955','1956','1957','1958','1959','1960','1961','1962','1963','1964',\
            '1965','1966','1967','1968','1969','1970','1971','1972','1973','1974','1975','1976',\
            '1977','1978','1979','1980','1981','1982','1983'], axis = 1, inplace = True)

In [16]:
perinc = inc[inc['LineCode']==1]
perpop = inc[inc['LineCode']==2]
perpopinc = inc[inc['LineCode']==3]

In [17]:
perinc.drop(['LineCode', 'GeoFIPS'], inplace = True, axis = 1)
perpop.drop(['LineCode', 'GeoFIPS'], inplace = True, axis = 1)
perpopinc.drop(['LineCode', 'GeoFIPS'], inplace = True, axis = 1)

A value is trying to be set on a copy of a slice from a DataFrame

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':
A value is trying to be set on a copy of a slice from a DataFrame

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from IPython.kernel.zmq import kernelapp as app
A value is trying to be set on a copy of a slice from a DataFrame

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()


I need to transpose this so that each year is a row. This is tricky because groupby, stack, and pivot all either change the index or make vital columns unusable. **Melt, is the right tool here. That only cost me a day and my sanity...**

In [18]:
perpop = pd.melt(perpop, id_vars=['ST','Region'])
perpop.rename(columns={'variable':'Year', 'value':'population'},inplace=True)

In [19]:
perinc.head()

Unnamed: 0,Region,1984,1985,1986,1987,1988,1989,1990,1991,1992,...,2006,2007,2008,2009,2010,2011,2012,2013,2014,ST
0,,3268535000,3501927000,3712243000,3940859000,4260753000,4603969000,4890453000,5055766000,5402109000,...,11381350000,11995419000,12492705000,12079444000,12459613000,13233436000,13904485000,14064468000,14683147000,0
3,5.0,42649118,45942335,48627446,51486400,55388090,60312581,64070364,67662101,72847428,...,146661249,153787754,159993535,157141435,163066901,169030399,173601429,174876574,181908767,1
6,8.0,9961459,10772149,10868683,10430514,10875485,11938954,12650093,13199663,14093288,...,27380164,29516606,32496796,32283659,34103385,36527487,38213009,37791031,39792685,2
9,6.0,39699293,44054015,48294793,52226134,56265019,60486975,63383028,66575466,70987974,...,210103296,221596785,226573627,216064765,219195775,230920326,241192186,243656863,255092928,4
12,5.0,24474975,26082631,27399821,28476005,30546100,32776869,34487482,36434402,39744696,...,83182071,88819878,93232570,91625136,93486029,99791639,107032727,108080656,112076107,5


In [20]:
perpopinc = pd.melt(perpopinc, id_vars=['ST','Region'])
perpopinc.rename(columns={'variable':'Year', 'value':'per_cap_person_income'}, inplace=True)

In [21]:
perinc = pd.melt(perinc, id_vars=['ST','Region'])
perinc.rename(columns={'variable':'Year', 'value':'person_income'}, inplace=True)

In [22]:
perpopinc.head()

Unnamed: 0,ST,Region,Year,per_cap_person_income
0,0,,1984,13860
1,1,5.0,1984,10792
2,2,8.0,1984,19391
3,4,6.0,1984,12943
4,5,5.0,1984,10551


Now we can merge these back into one file

In [23]:
inc = pd.merge(perinc, perpop, left_on=['ST','Region','Year'], right_on=['ST','Region','Year'])

In [24]:
inc = pd.merge(inc, perpopinc, left_on=['ST','Region','Year'], right_on=['ST','Region','Year'])

In [25]:
inc.head()

Unnamed: 0,ST,Region,Year,person_income,population,per_cap_person_income
0,0,,1984,3268535000,235824907,13860
1,1,5.0,1984,42649118,3951824,10792
2,2,8.0,1984,9961459,513704,19391
3,4,6.0,1984,39699293,3067134,12943
4,5,5.0,1984,24474975,2319767,10551


In [26]:
df = pd.read_csv('../data/debt_out.csv')

This is the list of variables (largly costats) that are not required for this analysis

In [27]:
df = df[['Year','FIPS','GO','RV','GO_City, Town Vlg','GO_Co-op Utility',\
        'GO_College or Univ','GO_County/Parish','GO_Direct Issuer','GO_District',\
        'GO_Indian Tribe','GO_Local Authority','GO_State Authority','GO_State/Province',\
        'RV_City, Town Vlg','RV_Co-op Utility','RV_College or Univ','RV_County/Parish',\
        'RV_Direct Issuer','RV_District','RV_Indian Tribe','RV_Local Authority','RV_State Authority',\
        'RV_State/Province','GO_Development','GO_Education','GO_Electric Power','GO_Environmental Facilities',\
        'GO_General Purpose','GO_Healthcare','GO_Housing','GO_Public Facilities','GO_Transportation',\
        'GO_Utilities','RV_Development','RV_Education','RV_Electric Power','RV_Environmental Facilities',\
        'RV_General Purpose','RV_Healthcare','RV_Housing','RV_Public Facilities','RV_Transportation',\
        'RV_Utilities']]

In [28]:
df.head()

Unnamed: 0,Year,FIPS,GO,RV,"GO_City, Town Vlg",GO_Co-op Utility,GO_College or Univ,GO_County/Parish,GO_Direct Issuer,GO_District,...,RV_Development,RV_Education,RV_Electric Power,RV_Environmental Facilities,RV_General Purpose,RV_Healthcare,RV_Housing,RV_Public Facilities,RV_Transportation,RV_Utilities
0,1984,1000,0.0,797.952,0,0,0,0.0,0,0,...,6.0,0,0.0,0,0.0,259.12,123.722,0,409.11,0.0
1,1984,1001,0.0,1.625,0,0,0,0.0,0,0,...,0.0,0,0.0,0,0.0,0.0,0.0,0,0.0,1.625
2,1984,1003,6.35,23.33,0,0,0,6.35,0,0,...,1.9,12,6.395,0,0.0,0.0,0.0,0,0.0,3.035
3,1984,1007,0.0,0.4,0,0,0,0.0,0,0,...,0.4,0,0.0,0,0.0,0.0,0.0,0,0.0,0.0
4,1984,1021,1.0,1.425,0,0,0,1.0,0,0,...,0.0,0,0.0,0,1.425,0.0,0.0,0,0.0,0.0


In [29]:
df['FIPS']=df.FIPS.astype('str')
df['Year']=df.Year.astype('str')

In [30]:
df['FIPS'] = df['FIPS'].apply(lambda x: x.zfill(5))

This establishes the list of dollar values that need to be deflated

In [31]:
dollar_list =(['GO','RV','GO_City, Town Vlg','GO_Co-op Utility',\
               'GO_College or Univ','GO_County/Parish','GO_Direct Issuer',\
               'GO_District','GO_Indian Tribe','GO_Local Authority',\
               'GO_State Authority','GO_State/Province','RV_City, Town Vlg',\
               'RV_Co-op Utility','RV_College or Univ','RV_County/Parish',\
               'RV_Direct Issuer','RV_District','RV_Indian Tribe','RV_Local Authority',\
               'RV_State Authority','RV_State/Province','GO_Development','GO_Education',\
               'GO_Electric Power','GO_Environmental Facilities','GO_General Purpose',\
               'GO_Healthcare','GO_Housing','GO_Public Facilities','GO_Transportation',\
               'GO_Utilities','RV_Development','RV_Education','RV_Electric Power',\
               'RV_Environmental Facilities','RV_General Purpose','RV_Healthcare',\
               'RV_Housing','RV_Public Facilities','RV_Transportation','RV_Utilities'])

In [32]:
defl.drop('SLDefl', inplace = True, axis = 1)

In [33]:
defl.head()

Unnamed: 0,yr,index
0,1929,25.201613
1,1930,25.89332
2,1931,27.53304
3,1932,30.693677
4,1933,29.655991


In [34]:
df = pd.merge(defl, df, left_on = 'yr', right_on = 'Year', how = 'right')

This will deflate the $$$'s'

In [35]:
for i in dollar_list:
    df[i] = df[i] * df['index']

I think it is best to break the FIPS into the first two digits which is the state

In [36]:
df['ST']= df['FIPS'].str.extract('(..)')

In [37]:
df.drop(['yr','index'], inplace = True, axis = 1)

In [38]:
df.head()

Unnamed: 0,Year,FIPS,GO,RV,"GO_City, Town Vlg",GO_Co-op Utility,GO_College or Univ,GO_County/Parish,GO_Direct Issuer,GO_District,...,RV_Education,RV_Electric Power,RV_Environmental Facilities,RV_General Purpose,RV_Healthcare,RV_Housing,RV_Public Facilities,RV_Transportation,RV_Utilities,ST
0,1984,1000,0.0,1927.094453,0,0,0,0.0,0,0,...,0.0,0.0,0,0.0,625.78791,298.79489,0,988.021349,0.0,1
1,1984,1001,0.0,3.924457,0,0,0,0.0,0,0,...,0.0,0.0,0,0.0,0.0,0.0,0,0.0,3.924457,1
2,1984,1003,15.335571,56.34313,0,0,0,15.335571,0,0,...,28.980607,15.444249,0,0.0,0.0,0.0,0,0.0,7.329679,1
3,1984,1007,0.0,0.96602,0,0,0,0.0,0,0,...,0.0,0.0,0,0.0,0.0,0.0,0,0.0,0.0,1
4,1984,1021,2.415051,3.441447,0,0,0,2.415051,0,0,...,0.0,0.0,0,3.441447,0.0,0.0,0,0.0,0.0,1


In [39]:
state_agg = df.groupby(['Year', 'ST']).sum()

In [40]:
state_agg.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,GO,RV,"GO_City, Town Vlg",GO_Co-op Utility,GO_College or Univ,GO_County/Parish,GO_Direct Issuer,GO_District,GO_Indian Tribe,GO_Local Authority,...,RV_Development,RV_Education,RV_Electric Power,RV_Environmental Facilities,RV_General Purpose,RV_Healthcare,RV_Housing,RV_Public Facilities,RV_Transportation,RV_Utilities
Year,ST,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
1984,1,371.058034,3904.337431,202.035888,0,0,162.98452,0,6.037626,0,0.0,...,303.970343,63.902239,28.847779,897.287898,12.075253,788.282175,391.955466,51.773855,1013.560509,352.681914
1984,2,79.769121,675.042867,40.246818,0,0,0.0,0,0.0,0,0.0,...,54.978627,14.490304,0.0,0.0,0.0,0.0,605.573937,0.0,0.0,0.0
1984,4,972.224503,2282.55609,293.945468,0,0,123.39701,0,554.882025,0,0.0,...,327.565387,72.705098,63.056971,549.448161,169.232255,349.083488,326.210544,16.398194,223.597459,185.258531
1984,5,35.027894,394.870433,7.245152,0,0,0.0,0,27.782742,0,0.0,...,67.923298,23.447726,0.0,77.643877,0.0,51.017944,125.039245,9.660202,16.905354,23.232787
1984,6,8639.488492,17662.146497,747.347067,0,0,1991.689811,0,1328.548313,0,780.18934,...,1427.657159,1217.103388,2038.242326,1303.173376,2703.134736,1297.15507,4311.973821,570.302123,1621.805492,1171.599005


In [41]:
state_agg = state_agg.reset_index()

In [42]:
inc.head()

Unnamed: 0,ST,Region,Year,person_income,population,per_cap_person_income
0,0,,1984,3268535000,235824907,13860
1,1,5.0,1984,42649118,3951824,10792
2,2,8.0,1984,9961459,513704,19391
3,4,6.0,1984,39699293,3067134,12943
4,5,5.0,1984,24474975,2319767,10551


In [43]:
closing_in=pd.merge(state_agg, inc, left_on=['Year', 'ST'], right_on=['Year', 'ST'])

**FINALLY**

That took much longer than it should. Since debt os expressed in millions, I now need to multiply all dollar amounts so that they are in actual millions before dividing them into the different terms. I think the most efficient way to do this is to create another function using the 'dollar_list' that was previously established.

In [44]:
for i in dollar_list:
    closing_in[i] = closing_in[i] * 1000000
    closing_in[i +'_percapita']= closing_in[i]/closing_in['population']

In [45]:
closing_in.head()

Unnamed: 0,Year,ST,GO,RV,"GO_City, Town Vlg",GO_Co-op Utility,GO_College or Univ,GO_County/Parish,GO_Direct Issuer,GO_District,...,RV_Development_percapita,RV_Education_percapita,RV_Electric Power_percapita,RV_Environmental Facilities_percapita,RV_General Purpose_percapita,RV_Healthcare_percapita,RV_Housing_percapita,RV_Public Facilities_percapita,RV_Transportation_percapita,RV_Utilities_percapita
0,1984,1,371058000.0,3904337000.0,202035900.0,0,0,162984500.0,0,6037626.0,...,76.918998,16.170315,7.299864,227.056645,3.055615,199.472996,99.183432,13.101255,256.479162,89.245349
1,1984,2,79769120.0,675042900.0,40246820.0,0,0,0.0,0,0.0,...,107.023941,28.207496,0.0,0.0,0.0,0.0,1178.838274,0.0,0.0,0.0
2,1984,4,972224500.0,2282556000.0,293945500.0,0,0,123397000.0,0,554882000.0,...,106.798525,23.704572,20.558923,179.140579,55.176023,113.814228,106.356796,5.346422,72.901106,60.401186
3,1984,5,35027890.0,394870400.0,7245152.0,0,0,0.0,0,27782740.0,...,29.280224,10.107794,0.0,33.47055,0.0,21.992702,53.90164,4.164299,7.287522,10.015138
4,1984,6,8639488000.0,17662150000.0,747347100.0,0,0,1991690000.0,0,1328548000.0,...,55.24049,47.093511,78.865927,50.423826,104.59268,50.190959,166.843661,22.066761,62.752692,45.332805


We will be using the BEA region definitions that are found [here](http://www.bea.gov/regional/docs/regions.cfm). This file was added to the data file and labeled region.

In [46]:
regions = pd.read_csv('../data/regions.csv')

In [47]:
regions.head()

Unnamed: 0,State or Region code,State or Region name,Abbreviation,Region code
0,91,New England Region,NENG,1
1,92,Mideast Region,MEST,2
2,93,Great Lakes Region,GLAK,3
3,94,Plains Region,PLNS,4
4,95,Southeast Region,SEST,5


In [48]:
droplist = (['State or Region code','Abbreviation'])
regions.drop(droplist, inplace=True, axis=1)

In [49]:
closing_in = pd.merge(closing_in, regions, left_on='Region', right_on='Region code')

In [50]:
closing_in.head()

Unnamed: 0,Year,ST,GO,RV,"GO_City, Town Vlg",GO_Co-op Utility,GO_College or Univ,GO_County/Parish,GO_Direct Issuer,GO_District,...,RV_Electric Power_percapita,RV_Environmental Facilities_percapita,RV_General Purpose_percapita,RV_Healthcare_percapita,RV_Housing_percapita,RV_Public Facilities_percapita,RV_Transportation_percapita,RV_Utilities_percapita,State or Region name,Region code
0,1984,1,371058000.0,3904337000.0,202035900.0,0,0,162984500.0,0,6037626.0,...,7.299864,227.056645,3.055615,199.472996,99.183432,13.101255,256.479162,89.245349,Southeast Region,5
1,1984,5,35027890.0,394870400.0,7245152.0,0,0,0.0,0,27782740.0,...,0.0,33.47055,0.0,21.992702,53.90164,4.164299,7.287522,10.015138,Southeast Region,5
2,1984,12,1212459000.0,11364130000.0,225152800.0,0,0,287753300.0,0,503632200.0,...,268.307162,257.588553,24.167077,149.411489,131.348363,29.582164,69.811624,86.549964,Southeast Region,5
3,1984,13,758808900.0,3849458000.0,32820540.0,0,0,62646410.0,0,137875200.0,...,39.088193,361.040139,5.198101,24.55222,111.907957,2.692383,1.034736,52.203681,Southeast Region,5
4,1984,21,161325400.0,2635192000.0,12292610.0,0,0,0.0,0,149032800.0,...,6.026747,159.425823,4.747811,78.003961,86.57943,63.665766,211.975241,29.346245,Southeast Region,5


#Lets Ship IT

In [51]:
closing_in.to_csv('../data/Descriptives.csv')