## Analyzing the Impact of TELs on Debt Issues

This Notebook uses the data constructed in [sas2csv](https://github.com/choct155/TELs_debt/blob/master/code/sas2csv.ipynb) and [DebtDataSeries](https://github.com/choct155/TELs_debt/blob/master/code/DebtDataSeries.ipynb) to evaluate the impact of tax and expenditure limitations on debt issues by county.  This Notebook will do the following:

1. Subset to the variables critical to our analysis (**Data Input**);
2. Build specifications that feature a set of debt related dependent variables (**Model Design**);
3. Estimate the relationship between TELs and debt by way of pooled and fixed effect models (**Estimation**).

In [37]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
import seaborn as sb
import statsmodels.api as sm
import statsmodels.formula.api as smf

%pylab inline

Populating the interactive namespace from numpy and matplotlib


`%matplotlib` prevents importing * from pylab and numpy


## Data Input

Our data is housed in ... the **`data/`** directory.  We are looking for `debt_out.csv` which has aggregate debt issue, institutional, socioeconomic, and spatial information aggregated to the county level.

In [38]:
!ls -l ../data/

total 400136
-rw-r--r--  1 MarvinW Domain Users    165888 Oct 17 10:18 13slsstab1a.xls
-rw-r--r--  1 MarvinW Domain Users     93112 Oct 17 10:18 2013_GFS_debt.xcf
-rw-r--r--  1 MarvinW Domain Users  12226235 Oct 17 10:18 bonds.csv
-rwxr-xr-x  1 MarvinW Domain Users  86475193 Nov 13 13:40 costat_mod_vars1940_2010.csv
-rwxr-xr-x  1 MarvinW Domain Users   3104008 Nov 13 14:31 cty_coverage.csv
-rw-r--r--  1 MarvinW Domain Users         0 Oct 17 10:18 current_issue_geocode_list.csv
-rwxr-xr-x  1 MarvinW Domain Users  62306853 Nov 13 14:30 debt_out.csv
-rw-r--r--  1 MarvinW Domain Users  47620501 Nov 10 09:05 debt_ts_pre_fips.csv
-rw-r--r--  1 MarvinW Domain Users  49481004 Nov 13 14:30 debt_w_fips.csv
-rw-r--r--  1 MarvinW Domain Users    104148 Nov 10 09:05 fips_st_co_02_07.csv
-rw-r--r--  1 MarvinW Domain Users      7578 Nov 10 09:05 g_api_college.csv
-rw-r--r--  1 MarvinW Domain Users    103193 Nov 10 09:05 g_api_rando.csv
-rw-r--r--  1 MarvinW Domain Users   2874978 Oct 17 10:18 geocorr

Let's go ahead and read in the data.

In [39]:
data_in=pd.read_csv('../data/debt_out.csv')

print sorted(data_in.columns),'\n\n',data_in.info()

['ASMT_L', 'ASMT_L2', 'ASMT_L3', 'BOTH', 'CB_E', 'CB_E2', 'CB_E3', 'CB_E4', 'CB_G', 'CB_G2', 'CFDISC_L', 'CGEXP_L', 'CH_HS_UNT', 'CLEVY_L', 'CLEVY_L2', 'CLEVY_L3', 'CLEVY_L4', 'CRATE_L', 'CRATE_L2', 'CREVU_L', 'DENSITY', 'DIVERSITY', 'D_GEN_EXP', 'EDUC_SERV_EMP_PNFARM', 'EMP_RES', 'FFDISC_L', 'FIPS', 'FIPSST', 'FIPST_N', 'FOOD_SERV_EMP_PNFARM', 'GEN_REV', 'GEXP_L', 'GO', 'GO_City, Town Vlg', 'GO_Co-op Utility', 'GO_College or Univ', 'GO_County/Parish', 'GO_Development', 'GO_Direct Issuer', 'GO_District', 'GO_Education', 'GO_Electric Power', 'GO_Environmental Facilities', 'GO_General Purpose', 'GO_Healthcare', 'GO_Housing', 'GO_Indian Tribe', 'GO_Local Authority', 'GO_Public Facilities', 'GO_State Authority', 'GO_State/Province', 'GO_Transportation', 'GO_Utilities', 'GP_GEXP', 'GP_LEVY', 'GP_LMT', 'GP_RATE', 'GP_REVU', 'HOME_STEAD', 'HOME_STEAD2', 'HOME_STEAD3', 'HSG_UNITS', 'HSG_UNITS_ACS', 'HSLD_PERS', 'IGR_ST', 'LANDAREA', 'LEVY_L', 'LIMITS', 'MANU_EMP_PNFARM', 'MANU_RES', 'MDHOMEVAL

The set of variables in play appear in the table below:

**DEPENDENT VARIABLES**

Concept|Input Variables
-------|---------------
Per capita GO debt issued|*Variables beginning with GO* & `RES_POP`
Per capita revenue debt issued|*Variables beginning with RV* & `RES_POP`
Ratio of GO to revenue debt issued|*Variables beginning with GO or RV*

**INSTITUTIONAL VARIABLES**

Concept|Input Variables
-------|---------------
Any TEL|`LIMITS`
Non-binding TEL|`TYPE1`
Potentially binding TEL|`TYPE2`
Both `TYPE1` & `TYPE2`|`BOTH`
Years since `TYPE2` enacted|`TYPE2_y`
Overall property tax rate limit|`RATE_L`
Overall assessment limit|`SC_LMT`
Limit applied to general purpose gov|`GP_LMT`
Limit applied to school district|`SC_LMT`

*Note that all limits above can be interacted with primary county status (`PRIMARY`; see spatial table below), in which case we append an `i` to the variable name.*

**SCALE & SUPPLY MEASURES**

Concept|Input Variables
-------|---------------
Population|`RES_POP`
<span style="color:red">Population$^2$</span>|`RES_POP2`
Population density|`DENSITY`
Population growth rate|`POPGROW`
Household size|`PERS_HLD`
Pre-1940 housing stock|`PRE1940`

**DEMAND MEASURES**

Concept|Input Variables
-------|---------------
Population under 17|`PYOUNG`
Private school enrollment|`PVT_SCH`
Population over 65|`POP65`
Per capita income|`PCINC`
Povery rate|`POVERTY`
Average monthly Social Security payments (to recipients)|`PC_SSI`
Per capita income weighted by poverty rate|`DIVERSITY`

**ECONOMIC ACTIVITY**

Concept|Input Variables
-------|---------------
Employment to population ratio|`EMP_RESI`
Manufacturing employment to population ratio|`MANU_RES`
Retail employment to population ratio|`RETL_RES`
Service employment to population ratio|`SERV_RES`

**SPATIAL CHARACTERISTICS**

Concept|Input Variables
-------|---------------
Primary central county in 1974|`PRIMARY`
Co-central county in 1974|`CO_PRIM`
Urban fringe county in 1974|`FRINGE`

Let's grab these in category lists to make them more accessible.

In [40]:
#Capture issuer suffixes
issuers=['City, Town Vlg','Co-op Utility','College or Univ','County/Parish','Direct Issuer','District',\
         'Indian Tribe','Local Authority','State Authority','State/Province']
purposes=['Development','Education','Electric Power','Environmental Facilities','General Purpose','Healthcare',\
          'Housing','Public Facilities','Transportation','Utilities']

#Capture dependent variables
debt_vars=['GO','RV']
go_vars={'iss':['GO_'+var for var in issuers],
         'pur':['GO_'+var for var in purposes]}
rv_vars={'iss':['RV_'+var for var in issuers],
         'pur':['RV_'+var for var in purposes]}

#Capture independnet vars
tel_vars={'types':['TYPE1','TYPE2','TYPE2_Y'],
          'either':['LIMITS','BOTH'],
          'hi_res':['RATE_L','ASMT_L','GP_LMT','SC_LMT']}
supply_vars=['RESPOP','DENSITY','POPGROWTH','HSLD_PERS','PRE1940']
demand_vars=['PYOUNG','PVT_SCH','POP65','PC_INC','POVERTY','PC_SSI','DIVERSITY']
economic_vars=['EMP_RES','MANU_RES','RETL_RES','SERV_RES']
spatial_vars=['PRIMARY','CO_PRIM','FRINGE']

#Capture all modeling variables in a single list
mod_vars=debt_vars+go_vars['iss']+go_vars['pur']+rv_vars['iss']+rv_vars['pur']+tel_vars['types']+\
         tel_vars['either']+tel_vars['hi_res']+supply_vars+demand_vars+economic_vars+spatial_vars
    
#For each model variable...
for var in mod_vars:
    #...tell me if it's not in the set
    if var not in data_in.columns:
        print "Is "+var+" in the data set??  Maaaan, we ain't found shit!"
        
#Capture model subset
data=data_in[[var for var in mod_vars if var in data_in.columns]+['Year','FIPS']]

data.head().T

Is PRIMARY in the data set??  Maaaan, we ain't found shit!
Is CO_PRIM in the data set??  Maaaan, we ain't found shit!
Is FRINGE in the data set??  Maaaan, we ain't found shit!


Unnamed: 0,0,1,2,3,4
GO,0.000000,0.000000,6.350000,0.000000,1.000000
RV,797.952000,1.625000,23.330000,0.400000,1.425000
"GO_City, Town Vlg",0.000000,0.000000,0.000000,0.000000,0.000000
GO_Co-op Utility,0.000000,0.000000,0.000000,0.000000,0.000000
GO_College or Univ,0.000000,0.000000,0.000000,0.000000,0.000000
GO_County/Parish,0.000000,0.000000,6.350000,0.000000,1.000000
GO_Direct Issuer,0.000000,0.000000,0.000000,0.000000,0.000000
GO_District,0.000000,0.000000,0.000000,0.000000,0.000000
GO_Indian Tribe,0.000000,0.000000,0.000000,0.000000,0.000000
GO_Local Authority,0.000000,0.000000,0.000000,0.000000,0.000000


## Accounting for Inflation

The first thing we need to do is adjust all of the dollar figures for inflation.  Taking a look at descriptives can help us get all the variables of interest.

In [41]:
print data.describe().ix[['count','mean','min','max']].T.to_string()

                             count            mean          min           max
GO                           61291       53.150500     0.000000  1.667689e+04
RV                           61291       56.103908     0.000000  1.817637e+04
GO_City, Town Vlg            61291       12.160855     0.000000  6.983080e+03
GO_Co-op Utility             61291        0.000364     0.000000  8.000000e+00
GO_College or Univ           61291        0.254717     0.000000  4.000000e+02
GO_County/Parish             61291        6.426371     0.000000  2.185000e+03
GO_Direct Issuer             61291        0.046429     0.000000  4.085000e+02
GO_District                  61291       14.993014     0.000000  4.575779e+03
GO_Indian Tribe              61291        0.003671     0.000000  1.095000e+02
GO_Local Authority           61291        1.319969     0.000000  3.487245e+03
GO_State Authority           61291        2.702028     0.000000  3.901125e+03
GO_State/Province            61291       15.243081     0.000000 

In [42]:
sorted(set(zip(data['FIPS'],data['Year'])))

[(1000, 1984),
 (1000, 1985),
 (1000, 1986),
 (1000, 1987),
 (1000, 1988),
 (1000, 1989),
 (1000, 1990),
 (1000, 1991),
 (1000, 1992),
 (1000, 1993),
 (1000, 1994),
 (1000, 1995),
 (1000, 1996),
 (1000, 1997),
 (1000, 1998),
 (1000, 1999),
 (1000, 2000),
 (1000, 2001),
 (1000, 2002),
 (1000, 2003),
 (1000, 2004),
 (1000, 2005),
 (1000, 2006),
 (1000, 2007),
 (1000, 2008),
 (1000, 2009),
 (1000, 2010),
 (1000, 2011),
 (1000, 2012),
 (1000, 2013),
 (1000, 2014),
 (1000, 2015),
 (1001, 1984),
 (1001, 1988),
 (1001, 1989),
 (1001, 1990),
 (1001, 1992),
 (1001, 1993),
 (1001, 1994),
 (1001, 1996),
 (1001, 1997),
 (1001, 1998),
 (1001, 1999),
 (1001, 2000),
 (1001, 2001),
 (1001, 2002),
 (1001, 2003),
 (1001, 2005),
 (1001, 2006),
 (1001, 2007),
 (1001, 2008),
 (1001, 2009),
 (1001, 2010),
 (1001, 2011),
 (1001, 2012),
 (1001, 2013),
 (1001, 2014),
 (1001, 2015),
 (1003, 1984),
 (1003, 1985),
 (1003, 1986),
 (1003, 1987),
 (1003, 1988),
 (1003, 1989),
 (1003, 1990),
 (1003, 1991),
 (1003, 19