## Analyzing the Impact of TELs on Debt Issues

This Notebook uses the data constructed in [sas2csv](https://github.com/choct155/TELs_debt/blob/master/code/sas2csv.ipynb) and [DebtDataSeries](https://github.com/choct155/TELs_debt/blob/master/code/DebtDataSeries.ipynb) to evaluate the impact of tax and expenditure limitations on debt issues by county.  This Notebook will do the following:

1. Subset to the variables critical to our analysis (**Data Input**);
2. Build specifications that feature a set of debt related dependent variables (**Model Design**);
3. Estimate the relationship between TELs and debt by way of pooled and fixed effect models (**Estimation**).

In [81]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
import seaborn as sb
import statsmodels.api as sm
import statsmodels.formula.api as smf
import pandas.io.data as web

%pylab inline

Populating the interactive namespace from numpy and matplotlib


`%matplotlib` prevents importing * from pylab and numpy


## Data Input

Our data is housed in ... the **`data/`** directory.  We are looking for `debt_out.csv` which has aggregate debt issue, institutional, socioeconomic, and spatial information aggregated to the county level.

In [82]:
!ls -l ../data/

total 322012
-rw-r--r-- 1 root     root       165888 Nov 10 17:20 13slsstab1a.xls
-rw-r--r-- 1 root     root        93112 Nov 10 17:20 2013_GFS_debt.xcf
-rw-r--r-- 1 root     root     12226235 Nov 10 17:20 bonds.csv
-rwxr-xr-x 1 root     root     77725960 Nov 14 17:12 costat_mod_vars1940_2010.csv
-rwxr-xr-x 1 root     root      2966812 Nov 15 13:27 cty_coverage.csv
-rw-r--r-- 1 root     root            0 Nov 10 17:20 current_issue_geocode_list.csv
-rwxr-xr-x 1 root     root     55434544 Nov 15 13:50 debt_out.csv
-rw-r--r-- 1 root     root     47620501 Nov 10 17:20 debt_ts_pre_fips.csv
-rw-r--r-- 1 root     root     46996855 Nov 15 13:44 debt_w_fips.csv
-rw-r--r-- 1 root     root     55588087 Nov 15 13:50 debt_w_int.csv
-rw-r--r-- 1 root     root       104148 Nov 10 17:20 fips_st_co_02_07.csv
-rw-r----- 1 choct155 choct155      686 Nov 14 21:31 FRB_TREAS_30YR.csv
-rw-r--r-- 1 root     root         7578 Nov 10 17:20 g_api_college.csv
-rw-r--r-- 1 root     root       103193 

Let's go ahead and read in the data.

In [83]:
#Read in data
# data_in=pd.read_csv('../data/debt_out.csv')
data_in=pd.read_csv('../data/debt_w_int.csv')

#Generate variable that captures gap between general revenue and direct general expenditure, normalized by exp
data_in['OSRC_GAP']=(data_in['GEN_REV']-data_in['D_GEN_EXP'])/data_in['D_GEN_EXP']

#Capture issuer suffixes
issuers=['City, Town Vlg','Co-op Utility','County/Parish','Direct Issuer','District',\
         'Indian Tribe','Local Authority']
purposes=['Development','Education','Electric Power','Environmental Facilities','General Purpose','Healthcare',\
          'Housing','Public Facilities','Transportation','Utilities']

#Define replacement suffixes
issuers_new=['GEN_MUNI','COOP_UTIL','CTY','DIRECT','DISTRICT','TRIBE','LOC_AUTH']
purposes_new=['DEV','EDUC','ELECTRIC','ENVIRON','GEN_PUR','HEALTH','HOUS','PUB_FAC','TRANSPORT','UTIL']

#Capture lists of old and new name pairings
goi=zip(['GO_'+var for var in issuers],['GO_'+var for var in issuers_new])
gop=zip(['GO_'+var for var in purposes],['GO_'+var for var in purposes_new])
rvi=zip(['RV_'+var for var in issuers],['RV_'+var for var in issuers_new])
rvp=zip(['RV_'+var for var in purposes],['RV_'+var for var in purposes_new])

#Build renaming dict
rename_dict=dict(goi+gop+rvi+rvp)

#Rename relevant variables
data_in=data_in.rename(columns=rename_dict)
print 'Before subset:',len(data_in)

#Subset to non-null values of RESPOP and counties in MSAs
data_in=data_in[(data_in['RESPOP'].notnull()) & (data_in['MSA']==1)]
print 'After subset:',len(data_in)
print 'The vast majority of observations lost (all but maybe 100) come in 2011 on (after COSTAT data ends)'

print sorted(data_in.columns),'\n\n',data_in.info()

Before subset: 59669
After subset: 23658
The vast majority of observations lost (all but maybe 100) come in 2011 on (after COSTAT data ends)
['ASMT_L', 'ASMT_L2', 'ASMT_L3', 'BOTH', 'CB_E', 'CB_E2', 'CB_E3', 'CB_E4', 'CB_G', 'CB_G2', 'CFDISC_L', 'CGEXP_L', 'CH_HS_UNT', 'CLEVY_L', 'CLEVY_L2', 'CLEVY_L3', 'CLEVY_L4', 'CRATE_L', 'CRATE_L2', 'CREVU_L', 'CTY_INTEREST', 'DENSITY', 'DIVERSITY', 'D_GEN_EXP', 'EDUC_SERV_EMP_PNFARM', 'EMP_RES', 'FFDISC_L', 'FIPS', 'FIPSST', 'FIPST_N', 'FOOD_SERV_EMP_PNFARM', 'GEN_REV', 'GEXP_L', 'GO', 'GO_COOP_UTIL', 'GO_CTY', 'GO_DEV', 'GO_DIRECT', 'GO_DISTRICT', 'GO_EDUC', 'GO_ELECTRIC', 'GO_ENVIRON', 'GO_GEN_MUNI', 'GO_GEN_PUR', 'GO_HEALTH', 'GO_HOUS', 'GO_LOC_AUTH', 'GO_PUB_FAC', 'GO_TRANSPORT', 'GO_TRIBE', 'GO_UTIL', 'GP_GEXP', 'GP_LEVY', 'GP_LMT', 'GP_RATE', 'GP_REVU', 'HOME_STEAD', 'HOME_STEAD2', 'HOME_STEAD3', 'HSG_UNITS', 'HSG_UNITS_ACS', 'HSLD_PERS', 'IGR_ST', 'LANDAREA', 'LEVY_L', 'LIMITS', 'MANU_EMP_PNFARM', 'MANU_RES', 'MDHOMEVAL', 'MED_INC', 'MFDIS

In [84]:
data_in[['GEN_REV','D_GEN_EXP','OSRC_GAP']]

Unnamed: 0,GEN_REV,D_GEN_EXP,OSRC_GAP
5,23020.4,20952.0,0.098721
6,30817.8,26915.6,0.144979
7,32131.6,29090.2,0.104551
8,33445.4,31264.8,0.069746
9,36073.0,35614.0,0.012888
10,40466.0,43001.6,-0.058965
11,44859.0,50389.2,-0.109750
12,53645.0,65164.4,-0.176774
13,58038.0,72552.0,-0.200050
14,63978.2,75520.0,-0.152831


In [85]:
print len(data_in[pd.isnull(data_in).any(axis=1)])
print len(data_in[pd.isnull(data_in).any(axis=1)])/float(len(data_in))

783
0.0330966269338


Subset to complete cases.

In [86]:
print 'Before subset:',len(data_in)
data_in=data_in[pd.notnull(data_in).all(axis=1)]
print 'After subset:',len(data_in)

Before subset: 23658
After subset: 22875


The set of variables in play appear in the table below:

**DEPENDENT VARIABLES**

Concept|Input Variables
-------|---------------
Per capita GO debt issued|*Variables beginning with GO* & `RES_POP`
Per capita revenue debt issued|*Variables beginning with RV* & `RES_POP`
Ratio of GO to revenue debt issued|*Variables beginning with GO or RV*

**INSTITUTIONAL VARIABLES**

Concept|Input Variables
-------|---------------
Any TEL|`LIMITS`
Non-binding TEL|`TYPE1`
Potentially binding TEL|`TYPE2`
Both `TYPE1` & `TYPE2`|`BOTH`
Years since `TYPE2` enacted|`TYPE2_y`
Overall property tax rate limit|`RATE_L`
Overall assessment limit|`SC_LMT`
Limit applied to general purpose gov|`GP_LMT`
Limit applied to school district|`SC_LMT`

*Note that all limits above can be interacted with primary county status (`PRIMARY`; see spatial table below), in which case we append an `i` to the variable name.*

**SCALE & SUPPLY MEASURES**

Concept|Input Variables
-------|---------------
Population|`RES_POP`
<span style="color:red">Population$^2$</span>|`RES_POP2`
Population density|`DENSITY`
Population growth rate|`POPGROW`
Household size|`PERS_HLD`
Pre-1940 housing stock|`PRE1940`

**DEMAND MEASURES**

Concept|Input Variables
-------|---------------
Population under 17|`PYOUNG`
Private school enrollment|`PVT_SCH`
Population over 65|`POP65`
Per capita income|`PCINC`
Povery rate|`POVERTY`
Average monthly Social Security payments (to recipients)|`PC_SSI`
Per capita income weighted by poverty rate|`DIVERSITY`

**ECONOMIC ACTIVITY**

Concept|Input Variables
-------|---------------
Employment to population ratio|`EMP_RESI`
Manufacturing employment to population ratio|`MANU_RES`
Retail employment to population ratio|`RETL_RES`
Service employment to population ratio|`SERV_RES`

**SPATIAL CHARACTERISTICS**

Concept|Input Variables
-------|---------------
Primary central county in 1974|`PRIMARY`
Co-central county in 1974|`CO_PRIM`
Urban fringe county in 1974|`FRINGE`

Let's grab these in category lists to make them more accessible.

In [87]:
#Capture dependent variables
debt_vars=['GO','RV']
go_vars={'iss':['GO_'+var for var in issuers_new],
         'pur':['GO_'+var for var in purposes_new]}
rv_vars={'iss':['RV_'+var for var in issuers_new],
         'pur':['RV_'+var for var in purposes_new]}

#Capture independent vars
tel_vars={'types':['TYPE1','TYPE2','TYPE2_Y'],
          'either':['LIMITS','BOTH'],
          'hi_res':['RATE_L','ASMT_L','GP_LMT','SC_LMT']}
supply_vars=['RESPOP','DENSITY','POPGROWTH','HSLD_PERS','PRE1940']
demand_vars=['PYOUNG','PVT_SCH','POP65','PC_INC','POVERTY','PC_SSI','DIVERSITY']
economic_vars=['EMP_RES','MANU_RES','RETL_RES','SERV_RES','CTY_INTEREST']
spatial_vars=['PRIMARY','CO_PRIM','FRINGE']
fiscal_vars=['GEN_REV','OSRC_GAP']

#Capture all modeling variables in a single list
mod_vars=debt_vars+go_vars['iss']+go_vars['pur']+rv_vars['iss']+rv_vars['pur']+tel_vars['types']+\
         tel_vars['either']+tel_vars['hi_res']+supply_vars+demand_vars+economic_vars+spatial_vars+fiscal_vars
    
#For each model variable...
for var in mod_vars:
    #...tell me if it's not in the set
    if var not in data_in.columns:
        print "Is "+var+" in the data set??  Maaaan, we ain't found shit!"
        
#Capture model subset
data=data_in[[var for var in mod_vars if var in data_in.columns]+['Year','FIPS']]

#Generate populations squared
data['RESPOP2']=data['RESPOP']**2

data.head().T

Is PRIMARY in the data set??  Maaaan, we ain't found shit!
Is CO_PRIM in the data set??  Maaaan, we ain't found shit!
Is FRINGE in the data set??  Maaaan, we ain't found shit!


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,7,8,9,10,11
GO,0.000000e+00,0.000000e+00,3.000000e+00,0.000000e+00,4.340000e+00
RV,0.000000e+00,2.200000e+00,4.450000e+00,4.545000e+00,1.365000e+00
GO_GEN_MUNI,0.000000e+00,0.000000e+00,3.000000e+00,0.000000e+00,4.340000e+00
GO_COOP_UTIL,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00
GO_CTY,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00
GO_DIRECT,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00
GO_DISTRICT,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00
GO_TRIBE,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00
GO_LOC_AUTH,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00
GO_DEV,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00


In [88]:
len(data[pd.isnull(data).any(axis=1)])

0

### Accounting for Inflation

The first thing we need to do is adjust all of the dollar figures for inflation.  The relevant variables are in the following list.

In [89]:
#Capture debt variables
debt_vars=[var for var in data.columns if var.startswith('GO')]+\
          [var for var in data.columns if var.startswith('RV')]

#Capture variables to be deflated
defl_vars=debt_vars+['PC_INC','PC_SSI','DIVERSITY']

We need to pull an index for deflation.  To be consistent with the descriptive analysis, we will use the PCE deflator from [FRED](https://research.stlouisfed.org/fred2/).  Specifically, we are using the chained PCE deflator ([PCECTPI](https://research.stlouisfed.org/fred2/)).  We can pull that directly with pandas FRED API and use it to deflate our data.

**UPDATE** We are converting to the state and local government implicit price deflator (`A829RD3A086NBEA`).  The mechanics are the same as with the PCE, so we are leaving the code undisturbed.  Note that the SLD increases faster than the PCE.

In [90]:
#Capture PCE
# pce=web.DataReader("PCECTPI","fred",'1/1/1980','1/1/2015').reset_index()
pce=web.DataReader("A829RD3A086NBEA","fred",'1/1/1980','1/1/2015').reset_index()

#Pull out year
pce['Year']=pce['DATE'].apply(lambda x: x.year)

#Set index
pce.set_index('Year',inplace=True)

#Drop date
pce.pop('DATE')

#Capture average by year
pce=pce.groupby(level='Year').mean()

#Calculate deflators
# pce['defl']=pce['PCECTPI'].apply(lambda x: pce['PCECTPI'].ix[2009]/x)
pce['defl']=pce['A829RD3A086NBEA'].apply(lambda x: pce['A829RD3A086NBEA'].ix[2009]/x)

#Map deflator into data via Year
data['defl']=data['Year'].map(pce['defl'])

print pce.head()

data[['Year','defl']].head(20)

      A829RD3A086NBEA      defl
Year                           
1980           32.583  3.069085
1981           35.824  2.791425
1982           38.012  2.630748
1983           39.700  2.518892
1984           41.407  2.415051


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,Year,defl
7,1989,1.988941
8,1990,1.894513
9,1992,1.766753
10,1993,1.722683
11,1994,1.678049
12,1996,1.596755
13,1997,1.562598
14,1998,1.531745
15,1999,1.473297
16,2000,1.405284


With the deflators in hand, we can loop through a copy of the data and generate deflated versions of the contents of `defl_vars`.

In [91]:
#Create a data copy to hold deflated values
data_d=data.copy(deep=True)

#For each var to be deflated...
for var in defl_vars:
    #....deflate the variable
    data_d[var]=data_d[var]*data_d['defl']

print data[['GO','RV','Year','defl']].head()    
print '\n',data_d[['GO','RV','Year','defl']].head()

      GO     RV  Year      defl
7   0.00  0.000  1989  1.988941
8   0.00  2.200  1990  1.894513
9   3.00  4.450  1992  1.766753
10  0.00  4.545  1993  1.722683
11  4.34  1.365  1994  1.678049

          GO        RV  Year      defl
7   0.000000  0.000000  1989  1.988941
8   0.000000  4.167930  1990  1.894513
9   5.300260  7.862052  1992  1.766753
10  0.000000  7.829592  1993  1.722683
11  7.282735  2.290537  1994  1.678049


### Accounting for the Business Cycle

We also need to grab recessionary dates, but some assumptions are required since those are tracked on an monthly basis.  Our imperfect approach will be to treat an year with a majority of months in recession as being a recessionary year.  Since the annual average leaves us with the proportion of recessionary months, a recessionary year is any one in which the average exceeds 0.5.

In [92]:
#Capture recessionary dates
usrec=web.DataReader("USREC","fred",'1/1/1980','1/1/2015').reset_index()

#Pull out year
usrec['Year']=usrec['DATE'].apply(lambda x: x.year)

#Set index
usrec.set_index('Year',inplace=True)

#Drop date
usrec.pop('DATE')

#Capture average by year
usrec=usrec.groupby(level='Year').mean()

#Calculate binary recession variable
usrec['BIN_REC']=np.where(usrec['USREC']>=.5,1,0)

#Map bin_rec into data via Year
data_d['BIN_REC']=data_d['Year'].map(usrec['BIN_REC'])

print usrec.head()

print data_d[['Year','BIN_REC']].head()
print data_d[data_d['Year']==2001][['Year','BIN_REC']].head()

         USREC  BIN_REC
Year                   
1980  0.500000        1
1981  0.416667        0
1982  0.916667        1
1983  0.000000        0
1984  0.000000        0
    Year  BIN_REC
7   1989        0
8   1990        0
9   1992        0
10  1993        0
11  1994        0
     Year  BIN_REC
17   2001        1
48   2001        1
98   2001        1
180  2001        1
255  2001        1


## Capturing the Prevailing Interest Rate

The prevailing interest rate speaks to the supply of money, and is therefore a useful indicator to understand the macro context.  The coupon rate is missing all values in the raw data, so we will use the 30-year treasury as a proxy.  The inflation-adjusted history is short, so we will manually adjust the nominal rates.  The Federal Reserve has [data on historic market yield of US Treasuries at 30-year constant maturity](http://www.federalreserve.gov/releases/h15/data.htm).  This data has been downloaded and placed in `../data/`.  The [inflation adjustment](https://research.stlouisfed.org/fred2/series/FPCPITOTLZGUSA) to calculate the real rate of interest comes from FRED.

In [93]:
#Read in data
treas30=pd.read_csv('../data/FRB_TREAS_30YR.csv',skiprows=6,header=None,names=['Year','RATE'])

#Replace ND with null values
treas30['RATE']=treas30['RATE'].str.replace('ND','')

#Convert RATE to float
def to_float(x):
    if x!='':
        return float(x)
    else:
        return np.NaN

# treas30['RATE'].apply(lambda x: to_float(x)) NOT WORKING FOR SOME REASON

#Generate float version of RATE
treas30['RATE']=[to_float(val) for val in treas30['RATE']]

#Interpolate for missing values
treas30['RATE']=treas30['RATE'].interpolate()

#Set index
# treas30.set_index('Year',inplace=True)

#Capture CPI
cpi=web.DataReader("FPCPITOTLZGUSA","fred",'1/1/1980','1/1/2015').reset_index()

#Pull out year
cpi['Year']=cpi['DATE'].apply(lambda x: x.year)

#Set index
cpi.set_index('Year',inplace=True)

#Drop date
cpi.pop('DATE')

#Capture average by year
cpi=cpi.groupby(level='Year').mean()

#Join inflation to rate data
treas30['INFLAT']=treas30['Year'].map(cpi['FPCPITOTLZGUSA'])

#Calculate real rate of interest
treas30['REAL_RATE']=treas30['RATE']-treas30['INFLAT']

#Set index
treas30.set_index('Year',inplace=True)

#Map REAL_RATE into data via Year
data_d['REAL_RATE']=data_d['Year'].map(treas30['REAL_RATE'])

#Map INFLAT into data by year
data_d['INFLAT']=data_d['Year'].map(treas30['INFLAT'])

#Create new real interest rate by county
data_d['R_CTY_INT']=data_d['CTY_INTEREST']-data_d['INFLAT']

#Capture difference between prevailing interest rate and county rate
data_d['R_CTY_INT_DIFF']=data_d['REAL_RATE']-data_d['R_CTY_INT']

data_d[['Year','REAL_RATE','CTY_INTEREST','INFLAT','R_CTY_INT','R_CTY_INT_DIFF']].head()

Unnamed: 0,Year,REAL_RATE,CTY_INTEREST,INFLAT,R_CTY_INT,R_CTY_INT_DIFF
7,1989,3.622997,6.891819,4.827003,2.064816,1.558181
8,1990,3.212044,6.8385,5.397956,1.440544,1.7715
9,1992,4.64118,4.483,3.02882,1.45418,3.187
10,1993,3.638343,5.676,2.951657,2.724343,0.914
11,1994,4.762558,6.269296,2.607442,3.661854,1.100704


In [94]:
data_d['R_CTY_INT_DIFF'].describe()

count    22875.000000
mean         1.004267
std          4.094910
min       -119.647292
25%          0.644457
50%          1.172694
75%          1.700901
max          7.210000
Name: R_CTY_INT_DIFF, dtype: float64

### Mapping in State Labels

Just for facilitating the reading of results, state labels would be useful.  Here is a mapping from that FIPS codes.

In [95]:
#Capture state FIPS codes
data_d['FIPSST']=data_d['FIPS'].apply(lambda x: str(x).zfill(5)[:2])

#Map FIPS to state
fips_st_map={'01': 'AL',
             '02': 'AK',
             '04': 'AZ',
             '05': 'AR',
             '06': 'CA',
             '08': 'CO',
             '09': 'CT',
             '10': 'DE',
             '11': 'DC',
             '12': 'FL',
             '13': 'GA',
             '15': 'HI',
             '16': 'ID',
             '17': 'IL',
             '18': 'IN',
             '19': 'IA',
             '20': 'KS',
             '21': 'KY',
             '22': 'LA',
             '23': 'ME',
             '24': 'MD',
             '25': 'MA',
             '26': 'MI',
             '27': 'MN',
             '28': 'MS',
             '29': 'MO',
             '30': 'MT',
             '31': 'NE',
             '32': 'NV',
             '33': 'NH',
             '34': 'NJ',
             '35': 'NM',
             '36': 'NY',
             '37': 'NC',
             '38': 'ND',
             '39': 'OH',
             '40': 'OK',
             '41': 'OR',
             '42': 'PA',
             '44': 'RI',
             '45': 'SC',
             '46': 'SD',
             '47': 'TN',
             '48': 'TX',
             '49': 'UT',
             '50': 'VT',
             '51': 'VA',
             '53': 'WA',
             '54': 'WV',
             '55': 'WI',
             '56': 'WY'}

#Map in state labels
data_d['STATE']=data_d['FIPSST'].map(fips_st_map)

It would also be useful to cluster by BEA region, so we need to map those in as well.

In [96]:
#Capture BEA regions
bea_reg_map={'NENG':['CT','ME','MA','NH','RI','VT'],
             'MEST':['DE','DC','MD','NJ','NY','PA'],
             'GLAK':['IL','IN','MI','OH','WI'],
             'PLNS':['IA','KS','MN','MO','NE','ND','SD'],
             'SEST':['AL','AR','FL','GA','KY','LA','MS','NC','SC','TN','VA','WV'],
             'SWST':['AZ','NM','OK','TX'],
             'RKMT':['CO','ID','MT','UT','WY'],
             'FWST':['AK','CA','HI','NV','OR','WA']}

#Provide integer labels for each region
bea_reg_ints={'NENG':1,
              'MEST':2,
              'GLAK':3,
              'PLNS':4,
              'SEST':5,
              'SWST':6,
              'RKMT':7,
              'FWST':8}

#Create container for tuples (st to reg)
st_reg_tups=[]

#For each region...
for reg in bea_reg_map.keys():
    #...and for each state in the region...
    for st in bea_reg_map[reg]:
        #...throw the st-reg pair in st_reg_tups
        st_reg_tups.append((st,reg))
        
#Capture reverse BEA region dict
bea_st_map=dict(st_reg_tups)

#Define BEA region variable
data_d['BEA']=data_d['STATE'].map(bea_st_map)

In [97]:
#Subset to states
data_d=data_d[~data_d['FIPSST'].isin(['60', '72', '66', '78'])]

#Define integer version of BEA region
data_d['BEA_INT']=data_d['BEA'].map(bea_reg_ints)

data_d[['FIPSST','FIPS','STATE','BEA']].head()

Unnamed: 0,FIPSST,FIPS,STATE,BEA
7,1,1001,AL,SEST
8,1,1001,AL,SEST
9,1,1001,AL,SEST
10,1,1001,AL,SEST
11,1,1001,AL,SEST


It turns out we have issues in here from American Samoa, Guam, Puerto Rico, and the Virgin Islands.  Since they do not have BEA regions, we will be dropping them.  There are only 85 records of this type.

In [98]:
data_d[data_d['BEA'].isnull()][['FIPSST','FIPS','STATE','BEA']]

Unnamed: 0,FIPSST,FIPS,STATE,BEA


In [99]:
len(data_d[pd.isnull(data_d).any(axis=1)])

0

## Model Design

### Dependent Variables

#### Per Capita Measures

Now that we have our deflated data, we need to generate a few dependent variables:

1. Per Capita GO Debt Issued
2. Per Capita Revenue Debt Issued
3. Proportion of All Debt Issued that is GO

Note also that while we have the totals for all of these items, we also have GO and revenue debt by issuer type and debt purpose.  All of these could serve as dependent variables, so we should generate them systematically.  The first step would be capturing all per capita measures.

In [100]:
data_d[data_d['RESPOP'].isnull()][['Year','FIPS']]

Unnamed: 0,Year,FIPS


In [101]:
#Create container for per capita debt variables
pc_debt_vars=[]

#For each debt variable...
for var in debt_vars:
    #...generate per capita versions...
    data_d[var+'_PC']=data_d[var]/data_d['RESPOP']
    #...and store the variable name
    pc_debt_vars.append(var+'_PC')

print pc_debt_vars

['GO_PC', 'GO_GEN_MUNI_PC', 'GO_COOP_UTIL_PC', 'GO_CTY_PC', 'GO_DIRECT_PC', 'GO_DISTRICT_PC', 'GO_TRIBE_PC', 'GO_LOC_AUTH_PC', 'GO_DEV_PC', 'GO_EDUC_PC', 'GO_ELECTRIC_PC', 'GO_ENVIRON_PC', 'GO_GEN_PUR_PC', 'GO_HEALTH_PC', 'GO_HOUS_PC', 'GO_PUB_FAC_PC', 'GO_TRANSPORT_PC', 'GO_UTIL_PC', 'RV_PC', 'RV_GEN_MUNI_PC', 'RV_COOP_UTIL_PC', 'RV_CTY_PC', 'RV_DIRECT_PC', 'RV_DISTRICT_PC', 'RV_TRIBE_PC', 'RV_LOC_AUTH_PC', 'RV_DEV_PC', 'RV_EDUC_PC', 'RV_ELECTRIC_PC', 'RV_ENVIRON_PC', 'RV_GEN_PUR_PC', 'RV_HEALTH_PC', 'RV_HOUS_PC', 'RV_PUB_FAC_PC', 'RV_TRANSPORT_PC', 'RV_UTIL_PC']


In [102]:
len(data_d[pd.isnull(data_d).any(axis=1)])

0

The set of proportional dependents can also be generated systematically.

In [103]:
#Capture absolute debt pairs
debt_var_pairs=zip(debt_vars[:len(debt_vars)/2],debt_vars[len(debt_vars)/2:])

#Create a container for proportional debt variables
prop_debt_vars=[]

#For each pair...
for dp in debt_var_pairs:
    #...generate the GO proportion of total debt (in that area)...
    data_d[dp[0]+'_PROP']=np.where((data_d[dp[0]]+data_d[dp[1]])>0,
                                    data_d[dp[0]]/(data_d[dp[0]]+data_d[dp[1]]),
                                    0)
    #...and capture the variable name
    prop_debt_vars.append(dp[0]+'_PROP')
    
print prop_debt_vars

['GO_PROP', 'GO_GEN_MUNI_PROP', 'GO_COOP_UTIL_PROP', 'GO_CTY_PROP', 'GO_DIRECT_PROP', 'GO_DISTRICT_PROP', 'GO_TRIBE_PROP', 'GO_LOC_AUTH_PROP', 'GO_DEV_PROP', 'GO_EDUC_PROP', 'GO_ELECTRIC_PROP', 'GO_ENVIRON_PROP', 'GO_GEN_PUR_PROP', 'GO_HEALTH_PROP', 'GO_HOUS_PROP', 'GO_PUB_FAC_PROP', 'GO_TRANSPORT_PROP', 'GO_UTIL_PROP']


In [104]:
len(data_d[pd.isnull(data_d).any(axis=1)]),len(data_d)

(0, 22875)

#### Deflation by General Revenue

Another way of looking at debt loads is normalize them by general revenue, which speaks to debt significance among financing sources.  Let's generate a list similar to our per capita set above.

In [105]:
#Create container for revenue deflated debt variables
rd_debt_vars=[]

#For each debt variable...
for var in debt_vars:
    #...generate revenue deflated versions...
    data_d[var+'_RD']=data_d[var]/data_d['GEN_REV']
    #...and store the variable name
    rd_debt_vars.append(var+'_RD')

print rd_debt_vars

['GO_RD', 'GO_GEN_MUNI_RD', 'GO_COOP_UTIL_RD', 'GO_CTY_RD', 'GO_DIRECT_RD', 'GO_DISTRICT_RD', 'GO_TRIBE_RD', 'GO_LOC_AUTH_RD', 'GO_DEV_RD', 'GO_EDUC_RD', 'GO_ELECTRIC_RD', 'GO_ENVIRON_RD', 'GO_GEN_PUR_RD', 'GO_HEALTH_RD', 'GO_HOUS_RD', 'GO_PUB_FAC_RD', 'GO_TRANSPORT_RD', 'GO_UTIL_RD', 'RV_RD', 'RV_GEN_MUNI_RD', 'RV_COOP_UTIL_RD', 'RV_CTY_RD', 'RV_DIRECT_RD', 'RV_DISTRICT_RD', 'RV_TRIBE_RD', 'RV_LOC_AUTH_RD', 'RV_DEV_RD', 'RV_EDUC_RD', 'RV_ELECTRIC_RD', 'RV_ENVIRON_RD', 'RV_GEN_PUR_RD', 'RV_HEALTH_RD', 'RV_HOUS_RD', 'RV_PUB_FAC_RD', 'RV_TRANSPORT_RD', 'RV_UTIL_RD']


In [106]:
len(data_d[pd.isnull(data_d).any(axis=1)]),len(data_d)

(0, 22875)

In [107]:
print 'Before subset:',len(data_d)
data_d=data_d[pd.notnull(data_d).all(axis=1)]
print 'After subset:',len(data_d)

Before subset: 22875
After subset: 22875


### Debt Proportions

It may also prove useful to capture the proportions of debt by issuer type or purpose.  The idea here would be to explore the impact of debt classifications on the issuance of debt.  (In other words, we are talking about these proportions going on the independent side of the equation.)  Proportionality allows us to separate these regressors from the absolute debt figures on the dependent side.

For this purpose, we will actually focus on total debt to limit model complexity.

In [121]:
data_d[['Year','RESPOP','TOT_DEBT']].notnull().sum()
(data_d['TOT_DEBT']==0).sum()

2498

In [123]:
#Calculate total debt
data_d['TOT_DEBT']=data_d['GO']+data_d['RV']

#Calculate total debt per capita
data_d['TOT_DEBT_PC']=data_d['TOT_DEBT']/data_d['RESPOP']

#Create container for total debt variables
tot_debt_vars=['TOT_DEBT']

#For each of the remaining debt pairs...
for dp in debt_var_pairs[1:]:
    #...calculate the total...
    data_d['TOT'+dp[0][2:]]=data_d[dp[0]]+data_d[dp[1]]
    #...and capture the var name
    tot_debt_vars.append('TOT'+dp[0][2:])
    
#Create a container for proportions of total debt
prop_of_tot_debt_vars=[]

#For each of the total debt subsets...
for td in tot_debt_vars[1:]:
    #...calculate the proportion of total debt...
    data_d['TPROP'+td[3:]]=np.where(data_d['TOT_DEBT']>0,
                                    data_d[td]/data_d['TOT_DEBT'],
                                    0)
    #...and capture the name
    prop_of_tot_debt_vars.append('TPROP'+td[3:])
    
#Split out type from purpose
tprop_vars={'ISSUER':prop_of_tot_debt_vars[:10],
            'PURPOSE':prop_of_tot_debt_vars[10:]}
tprop_vars

{'ISSUER': ['TPROP_GEN_MUNI',
  'TPROP_COOP_UTIL',
  'TPROP_CTY',
  'TPROP_DIRECT',
  'TPROP_DISTRICT',
  'TPROP_TRIBE',
  'TPROP_LOC_AUTH',
  'TPROP_DEV',
  'TPROP_EDUC',
  'TPROP_ELECTRIC'],
 'PURPOSE': ['TPROP_ENVIRON',
  'TPROP_GEN_PUR',
  'TPROP_HEALTH',
  'TPROP_HOUS',
  'TPROP_PUB_FAC',
  'TPROP_TRANSPORT',
  'TPROP_UTIL']}

In [124]:
len(data_d[pd.isnull(data_d).any(axis=1)])

0

### Indepenent Variables

There are a few regressor lists we should construct, which vary largely with respect to how TELs are represented.

In [125]:
#Capture lists of regressors
reg1=['TYPE1','TYPE2','TYPE2_Y','RESPOP','RESPOP2','DENSITY','POPGROWTH','HSLD_PERS','PRE1940',\
        'PYOUNG','PVT_SCH','POP65','PC_INC','POVERTY','PC_SSI','DIVERSITY','EMP_RES','MANU_RES',\
        'RETL_RES','SERV_RES','BIN_REC','OSRC_GAP','REAL_RATE','R_CTY_INT_DIFF','GEN_REV']
reg2=['LIMITS','BOTH']+reg1[3:]
reg3=['RATE_L','ASMT_L','GP_LMT','SC_LMT']+reg1[3:]

#Capture in dict
reg_dict={'TYPE':reg1,
          'AGG':reg2,
          'HIRES':reg3}

### Fixed Effects

We have four fixed effect options which will be included in this analysis: `pooled`, `year fixed effects`, `state fixed effects`, and `both`.  These can be appended to each specification via simple extensions.  These extensions can be captured in a dictionary for easy access.

In [126]:
fe_dict={'POOLED':[],
         'YEAR':['C(Year)'],
         'STATE':['C(FIPSST)'],
         'BOTH':['C(Year)+C(FIPSST)']}

### Specification Builds

We now have host of moving parts on the specification side of the equation.  There are 63 possible dependent variables for the debt type breakouts (`GO` vs `RV`) and one total debt dependent for use in two models evaluating the debt composition impact.  Couple this with three regressor sets and four fixed effect options, and we have a whole mess of specifications (780).  

An orderly collection of formulas would be most useful, to say the least.  It perhaps goes without saying here, but not all of these models will receive the same level of scrutiny.  We will focus on a subset in any detailed discussion of results.  However, running all of these models gives us a chance to test variance in our TEL measures under a large variety of specifications.  Each group of TEL variables will have 260 estimates associated with it.

We will house our specifications in a hierarchical dictionary.  The first level keys will capture four main groups:

1. **Per Capita GO models** (`PC_GO`) are defined by their dependent variables (all per capita GO variables) at the next level down.  Only the simple regressor lists (without debt composition by issuer type or purpose) will be used.
2. **Per Capita RV models** (`PC_RV`) are defined by their dependent variables (all per capita RV variables) at the next level down.  Only the simple regressor lists (without debt composition by issuer type or purpose) will be used.
3. **Proportional Models** (`PROP`) both use total debt issued as the dependent.  The simple regressor lists are augmented by debt composition.  Whether this composition is split by issuer type (`ISSUER`) or debt purpose (`PURPOSE`) defines these models at the next level.

The second level being captured by the definitions within each group of the first level, the third level rotates across each regressor list captured in `reg_dict`.  The fourth level captures the fixed effect combinations.

The first level groups differ strongly due to the proportional group, so we will build each group separetely, and then manually combine them in the master dictionary.

In [127]:
## DEFINE FUNCTION TO CAPTURE PER CAPITA SPECS ##
def spec_dict_build(var_list):
    #Create dictionary to hold outgoing specs
    out_dict={}
    #For each per capita dependent...
    for dep in var_list:
        #...create a temp dict to hold specs for all three regressor groups...
        tmp_reg_dict={}
        #...and for each regressor set...
        for rd in reg_dict.keys():
            #...create a temp dict to hold specs for all four fixed effect types...
            tmp_spec_dict={}
            #...and for each FE type...
            for fe in fe_dict.keys():
                #...build the spec and throw it in tmp_spec_dict...
                tmp_spec_dict.update({fe:dep+'~'+'+'.join(reg_dict[rd]+fe_dict[fe])})
            #...once tmp_spec_dict is full, throw it in tmp_reg_dict...
            tmp_reg_dict.update({rd:tmp_spec_dict})
        #...and once tmp_reg_dict is full, throw it in out_dict
        if dep[3:-3]=='':
            out_dict.update({'Total':tmp_reg_dict})
        else:
            out_dict.update({dep[3:-3]:tmp_reg_dict})
    return out_dict

## CAPTURE DICTIONARY FOR PER CAPITA GO SPECS ##
go_spec_dict=spec_dict_build([var for var in pc_debt_vars if var.startswith('GO')])

## CAPTURE DICTIONARY FOR PER CAPITA GO SPECS ##
rv_spec_dict=spec_dict_build([var for var in pc_debt_vars if var.startswith('RV')])

## CAPTURE DICTIONARY FOR PROPORTIONAL SPECS ##

#Create dictionary to hold prop specs
prop_dict={}

#For each prop key...
for key in ['ISSUER','PURPOSE']:
    #...create a temp dict to hold specs for all three regressor groups...
    tmp_reg_dict={}
    #...and for each regressor set...
    for rd in reg_dict.keys():
        #...create a temp dict to hold specs for all four fixed effect types...
        tmp_spec_dict={}
        #...and for each FE type...
        for fe in fe_dict.keys():
            #...build the spec and throw it in tmp_spec_dict...
            tmp_spec_dict.update({fe:'TOT_DEBT_PC'+'~'+'+'.join(reg_dict[rd]+tprop_vars[key]+fe_dict[fe])})
        #...once tmp_spec_dict is full, throw it in tmp_reg_dict...
        tmp_reg_dict.update({rd:tmp_spec_dict})
    #...and once tmp_reg_dict is full, throw it in prop_dict
    prop_dict.update({key:tmp_reg_dict})
    
## CAPTURE DICTIONARY FOR REVENUE DEFLATED GO SPECS ##
go_rd_spec_dict=spec_dict_build([var for var in rd_debt_vars if var.startswith('GO')])

## CAPTURE DICTIONARY FOR REVENUE DEFLATED GO SPECS ##
rv_rd_spec_dict=spec_dict_build([var for var in rd_debt_vars if var.startswith('RV')])
    
## CAPTURE FIRST LEVEL DICTS IN SPECIFICATIONS DICT ##
spec_dict={'PC_GO':go_spec_dict,
           'PC_RV':rv_spec_dict,
           'PROP':prop_dict,
           'RD_GO':go_rd_spec_dict,
           'RD_RV':rv_rd_spec_dict}

Just to provide a sense of how this works, let's say we want to regress revenue debt per capita for housing on TELs split by type with year fixed effects, we can call that spec with the following:

In [128]:
spec_dict['PC_RV']['HOUS']['TYPE']['YEAR']

'RV_HOUS_PC~TYPE1+TYPE2+TYPE2_Y+RESPOP+RESPOP2+DENSITY+POPGROWTH+HSLD_PERS+PRE1940+PYOUNG+PVT_SCH+POP65+PC_INC+POVERTY+PC_SSI+DIVERSITY+EMP_RES+MANU_RES+RETL_RES+SERV_RES+BIN_REC+OSRC_GAP+REAL_RATE+R_CTY_INT_DIFF+C(Year)'

If I wanted to evaluate the impact of aggregate TEL measures and debt composition by issuer type on total debt per capita, using state fixed effects, this call would work:

In [129]:
spec_dict['PROP']['ISSUER']['AGG']['STATE']

'TOT_DEBT_PC~LIMITS+BOTH+RESPOP+RESPOP2+DENSITY+POPGROWTH+HSLD_PERS+PRE1940+PYOUNG+PVT_SCH+POP65+PC_INC+POVERTY+PC_SSI+DIVERSITY+EMP_RES+MANU_RES+RETL_RES+SERV_RES+BIN_REC+OSRC_GAP+REAL_RATE+R_CTY_INT_DIFF+TPROP_GEN_MUNI+TPROP_COOP_UTIL+TPROP_CTY+TPROP_DIRECT+TPROP_DISTRICT+TPROP_TRIBE+TPROP_LOC_AUTH+TPROP_DEV+TPROP_EDUC+TPROP_ELECTRIC+C(FIPSST)'

## Estimation

The first thing we need to do is dump all of the issues from States and Colleges/Universities.  They are likely large enough to dominate the effects we seek.

In [130]:
data_d

Unnamed: 0,GO,RV,GO_GEN_MUNI,GO_COOP_UTIL,GO_CTY,GO_DIRECT,GO_DISTRICT,GO_TRIBE,GO_LOC_AUTH,GO_DEV,...,TPROP_DEV,TPROP_EDUC,TPROP_ELECTRIC,TPROP_ENVIRON,TPROP_GEN_PUR,TPROP_HEALTH,TPROP_HOUS,TPROP_PUB_FAC,TPROP_TRANSPORT,TPROP_UTIL
7,0.000000,0.000000,0.000000,0,0.000000,0,0.000000,0,0,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0,0.000000,0.000000,0.000000
8,0.000000,4.167930,0.000000,0,0.000000,0,0.000000,0,0,0.000000,...,0.000000,1.000000,0.000000,0.000000,0.000000,0.000000,0,0.000000,0.000000,0.000000
9,5.300260,7.862052,5.300260,0,0.000000,0,0.000000,0,0,0.000000,...,0.000000,0.187919,0.000000,0.409396,0.402685,0.000000,0,0.000000,0.000000,0.000000
10,0.000000,7.829592,0.000000,0,0.000000,0,0.000000,0,0,0.000000,...,0.000000,0.660066,0.000000,0.000000,0.000000,0.000000,0,0.000000,0.000000,0.339934
11,7.282735,2.290537,7.282735,0,0.000000,0,0.000000,0,0,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.815951,0.000000,0,0.000000,0.000000,0.184049
12,15.967554,11.416801,0.000000,0,0.000000,0,15.967554,0,0,0.000000,...,0.157434,0.583090,0.000000,0.000000,0.000000,0.000000,0,0.000000,0.000000,0.259475
13,23.813988,1.226639,0.000000,0,0.000000,0,23.813988,0,0,0.000000,...,0.000000,1.000000,0.000000,0.000000,0.000000,0.000000,0,0.000000,0.000000,0.000000
14,9.190473,5.920196,9.190473,0,0.000000,0,0.000000,0,0,0.000000,...,0.000000,0.000000,0.000000,0.101368,0.608211,0.000000,0,0.000000,0.000000,0.290421
15,0.000000,12.000000,0.000000,0,0.000000,0,0.000000,0,0,0.000000,...,0.000000,0.444444,0.000000,0.000000,0.000000,0.000000,0,0.000000,0.000000,0.555556
16,0.000000,10.616920,0.000000,0,0.000000,0,0.000000,0,0,0.000000,...,0.000000,0.444077,0.000000,0.555923,0.000000,0.000000,0,0.000000,0.000000,0.000000


Now we are in a position to estimate some models.  We will first estimate some top line models, and then move into the analysis of all specs.  The top line models are as follows:

1. Total GO debt per capita on TELs by type with year and state fixed effects;
2. Total RV debt per capita on TELs by type with year and state fixed effects;
3. Total debt per capita on TELs by type and debt composition by issuer type, with year and state fixed effects;
4. Total debt per capita on TELs by type and debt composition by purpose, with year and state fixed effects.

In [131]:
#Capture topline specs
topline_specs={'GO':spec_dict['PC_GO']['Total']['TYPE']['BOTH'],
               'RV':spec_dict['PC_RV']['Total']['TYPE']['BOTH'],
               'ISSUER':spec_dict['PROP']['ISSUER']['TYPE']['BOTH'],
               'PURPOSE':spec_dict['PROP']['PURPOSE']['TYPE']['BOTH'],
               'RD_GO':spec_dict['RD_GO']['Total']['TYPE']['BOTH'],
               'RD_RV':spec_dict['RD_RV']['Total']['TYPE']['BOTH'],}

#Estimate each model
# topline_mods={'GO':smf.ols(formula=topline_specs['GO'],data=data_d).fit(cov_type='cluster',
#                                                                        cov_kwds={'groups':data_d['BEA_INT']})}
topline_mods={'GO':smf.ols(formula=topline_specs['GO'],data=data_d).\
                      fit(cov_type='cluster',cov_kwds={'groups':data_d['BEA_INT']}),
              'RV':smf.ols(formula=topline_specs['RV'],data=data_d).\
                      fit(cov_type='cluster',cov_kwds={'groups':data_d['BEA_INT']}),
              'ISSUER':smf.ols(formula=topline_specs['ISSUER'],data=data_d).\
                      fit(cov_type='cluster',cov_kwds={'groups':data_d['BEA_INT']}),
              'PURPOSE':smf.ols(formula=topline_specs['PURPOSE'],data=data_d).\
                      fit(cov_type='cluster',cov_kwds={'groups':data_d['BEA_INT']}),
              'RD_GO':smf.ols(formula=topline_specs['RD_GO'],data=data_d).\
                      fit(cov_type='cluster',cov_kwds={'groups':data_d['BEA_INT']}),
              'RD_RV':smf.ols(formula=topline_specs['RD_RV'],data=data_d).\
                      fit(cov_type='cluster',cov_kwds={'groups':data_d['BEA_INT']})}

topline_mods['GO'].summary()

0,1,2,3
Dep. Variable:,GO_PC,R-squared:,0.115
Model:,OLS,Adj. R-squared:,0.112
Method:,Least Squares,F-statistic:,-4954000000000000.0
Date:,"Sun, 15 Nov 2015",Prob (F-statistic):,1.0
Time:,14:00:51,Log-Likelihood:,138370.0
No. Observations:,22875,AIC:,-276600.0
Df Residuals:,22798,BIC:,-276000.0
Df Model:,76,,
Covariance Type:,cluster,,

0,1,2,3,4,5
,coef,std err,z,P>|z|,[95.0% Conf. Int.]
Intercept,-0.0001,0.000,-0.703,0.482,-0.001 0.000
C(Year)[T.1985],7.45e-05,2.62e-05,2.842,0.004,2.31e-05 0.000
C(Year)[T.1986],4.342e-05,1.14e-05,3.816,0.000,2.11e-05 6.57e-05
C(Year)[T.1987],-3.388e-05,1.5e-05,-2.253,0.024,-6.34e-05 -4.4e-06
C(Year)[T.1988],-0.0003,3.76e-05,-7.395,0.000,-0.000 -0.000
C(Year)[T.1989],-0.0003,4.25e-05,-6.709,0.000,-0.000 -0.000
C(Year)[T.1990],-2.788e-05,1.85e-05,-1.509,0.131,-6.41e-05 8.33e-06
C(Year)[T.1991],2.219e-05,2.06e-05,1.078,0.281,-1.81e-05 6.25e-05
C(Year)[T.1992],0.0001,2.32e-05,4.345,0.000,5.53e-05 0.000

0,1,2,3
Omnibus:,65235.078,Durbin-Watson:,1.35
Prob(Omnibus):,0.0,Jarque-Bera (JB):,7349270227.796
Skew:,37.87,Prob(JB):,0.0
Kurtosis:,2778.783,Cond. No.,4.76e+16


In [140]:
topline_mods['RV'].summary()

0,1,2,3
Dep. Variable:,RV_PC,R-squared:,0.044
Model:,OLS,Adj. R-squared:,0.04
Method:,Least Squares,F-statistic:,158500000000000.0
Date:,"Sun, 15 Nov 2015",Prob (F-statistic):,1.5e-49
Time:,14:05:25,Log-Likelihood:,113850.0
No. Observations:,22875,AIC:,-227600.0
Df Residuals:,22798,BIC:,-226900.0
Df Model:,76,,
Covariance Type:,cluster,,

0,1,2,3,4,5
,coef,std err,z,P>|z|,[95.0% Conf. Int.]
Intercept,-0.0003,0.000,-0.928,0.354,-0.001 0.000
C(Year)[T.1985],0.0007,0.000,3.105,0.002,0.000 0.001
C(Year)[T.1986],2.919e-05,0.000,0.185,0.853,-0.000 0.000
C(Year)[T.1987],2.743e-05,9.13e-05,0.301,0.764,-0.000 0.000
C(Year)[T.1988],-0.0003,0.000,-2.542,0.011,-0.001 -6.93e-05
C(Year)[T.1989],-0.0002,9.65e-05,-1.628,0.103,-0.000 3.2e-05
C(Year)[T.1990],0.0002,8.61e-05,1.942,0.052,-1.55e-06 0.000
C(Year)[T.1991],8.009e-05,0.000,0.665,0.506,-0.000 0.000
C(Year)[T.1992],0.0001,5.41e-05,2.051,0.040,4.91e-06 0.000

0,1,2,3
Omnibus:,55879.93,Durbin-Watson:,1.518
Prob(Omnibus):,0.0,Jarque-Bera (JB):,795353547.742
Skew:,25.911,Prob(JB):,0.0
Kurtosis:,915.022,Cond. No.,4.76e+16


In [133]:
topline_mods['ISSUER'].summary()

0,1,2,3
Dep. Variable:,TOT_DEBT_PC,R-squared:,0.076
Model:,OLS,Adj. R-squared:,0.073
Method:,Least Squares,F-statistic:,-113100000000000.0
Date:,"Sun, 15 Nov 2015",Prob (F-statistic):,1.0
Time:,14:00:52,Log-Likelihood:,112050.0
No. Observations:,22875,AIC:,-223900.0
Df Residuals:,22796,BIC:,-223300.0
Df Model:,78,,
Covariance Type:,cluster,,

0,1,2,3,4,5
,coef,std err,z,P>|z|,[95.0% Conf. Int.]
Intercept,-0.0007,0.001,-1.436,0.151,-0.002 0.000
C(Year)[T.1985],0.0007,0.000,3.348,0.001,0.000 0.001
C(Year)[T.1986],0.0001,0.000,0.880,0.379,-0.000 0.000
C(Year)[T.1987],2.46e-05,6.59e-05,0.374,0.709,-0.000 0.000
C(Year)[T.1988],-0.0004,0.000,-3.576,0.000,-0.001 -0.000
C(Year)[T.1989],-0.0003,0.000,-2.483,0.013,-0.001 -6.88e-05
C(Year)[T.1990],0.0002,0.000,1.764,0.078,-2.04e-05 0.000
C(Year)[T.1991],0.0001,0.000,1.158,0.247,-9.85e-05 0.000
C(Year)[T.1992],0.0002,5.92e-05,4.095,0.000,0.000 0.000

0,1,2,3
Omnibus:,52626.1,Durbin-Watson:,1.447
Prob(Omnibus):,0.0,Jarque-Bera (JB):,474538833.539
Skew:,22.447,Prob(JB):,0.0
Kurtosis:,707.174,Cond. No.,3.34e+16


In [134]:
topline_mods['PURPOSE'].summary()

0,1,2,3
Dep. Variable:,TOT_DEBT_PC,R-squared:,0.094
Model:,OLS,Adj. R-squared:,0.09
Method:,Least Squares,F-statistic:,-413100000000000.0
Date:,"Sun, 15 Nov 2015",Prob (F-statistic):,1.0
Time:,14:00:52,Log-Likelihood:,112260.0
No. Observations:,22875,AIC:,-224400.0
Df Residuals:,22793,BIC:,-223700.0
Df Model:,81,,
Covariance Type:,cluster,,

0,1,2,3,4,5
,coef,std err,z,P>|z|,[95.0% Conf. Int.]
Intercept,-0.0002,0.000,-0.323,0.747,-0.001 0.001
C(Year)[T.1985],0.0007,0.000,2.920,0.003,0.000 0.001
C(Year)[T.1986],0.0001,0.000,0.890,0.373,-0.000 0.000
C(Year)[T.1987],-2.485e-05,7.93e-05,-0.313,0.754,-0.000 0.000
C(Year)[T.1988],-0.0006,8.23e-05,-7.447,0.000,-0.001 -0.000
C(Year)[T.1989],-0.0005,8.93e-05,-5.843,0.000,-0.001 -0.000
C(Year)[T.1990],3.493e-05,8.2e-05,0.426,0.670,-0.000 0.000
C(Year)[T.1991],2.327e-05,0.000,0.211,0.833,-0.000 0.000
C(Year)[T.1992],0.0001,6.97e-05,1.736,0.083,-1.56e-05 0.000

0,1,2,3
Omnibus:,53215.624,Durbin-Watson:,1.472
Prob(Omnibus):,0.0,Jarque-Bera (JB):,525404243.89
Skew:,23.04,Prob(JB):,0.0
Kurtosis:,744.027,Cond. No.,5.26e+16


In [135]:
topline_mods['RD_GO'].summary()

0,1,2,3
Dep. Variable:,GO_RD,R-squared:,0.084
Model:,OLS,Adj. R-squared:,0.081
Method:,Least Squares,F-statistic:,3435000000000000.0
Date:,"Sun, 15 Nov 2015",Prob (F-statistic):,3.1699999999999995e-54
Time:,14:00:52,Log-Likelihood:,150990.0
No. Observations:,22875,AIC:,-301800.0
Df Residuals:,22798,BIC:,-301200.0
Df Model:,76,,
Covariance Type:,cluster,,

0,1,2,3,4,5
,coef,std err,z,P>|z|,[95.0% Conf. Int.]
Intercept,-5.276e-05,7.92e-05,-0.666,0.506,-0.000 0.000
C(Year)[T.1985],6.334e-05,1.78e-05,3.559,0.000,2.85e-05 9.82e-05
C(Year)[T.1986],6.767e-05,1.58e-05,4.295,0.000,3.68e-05 9.86e-05
C(Year)[T.1987],2.218e-05,8.32e-06,2.665,0.008,5.87e-06 3.85e-05
C(Year)[T.1988],-0.0002,1.64e-05,-9.644,0.000,-0.000 -0.000
C(Year)[T.1989],-0.0001,7.49e-06,-17.598,0.000,-0.000 -0.000
C(Year)[T.1990],2.328e-05,1.18e-05,1.968,0.049,9.41e-08 4.65e-05
C(Year)[T.1991],2.679e-05,1.5e-05,1.782,0.075,-2.67e-06 5.63e-05
C(Year)[T.1992],4.69e-05,1.51e-05,3.099,0.002,1.72e-05 7.66e-05

0,1,2,3
Omnibus:,70375.924,Durbin-Watson:,1.421
Prob(Omnibus):,0.0,Jarque-Bera (JB):,13762017211.963
Skew:,46.653,Prob(JB):,0.0
Kurtosis:,3801.701,Cond. No.,4.76e+16


In [136]:
topline_mods['RD_GO'].summary()

0,1,2,3
Dep. Variable:,GO_RD,R-squared:,0.084
Model:,OLS,Adj. R-squared:,0.081
Method:,Least Squares,F-statistic:,3435000000000000.0
Date:,"Sun, 15 Nov 2015",Prob (F-statistic):,3.1699999999999995e-54
Time:,14:00:52,Log-Likelihood:,150990.0
No. Observations:,22875,AIC:,-301800.0
Df Residuals:,22798,BIC:,-301200.0
Df Model:,76,,
Covariance Type:,cluster,,

0,1,2,3,4,5
,coef,std err,z,P>|z|,[95.0% Conf. Int.]
Intercept,-5.276e-05,7.92e-05,-0.666,0.506,-0.000 0.000
C(Year)[T.1985],6.334e-05,1.78e-05,3.559,0.000,2.85e-05 9.82e-05
C(Year)[T.1986],6.767e-05,1.58e-05,4.295,0.000,3.68e-05 9.86e-05
C(Year)[T.1987],2.218e-05,8.32e-06,2.665,0.008,5.87e-06 3.85e-05
C(Year)[T.1988],-0.0002,1.64e-05,-9.644,0.000,-0.000 -0.000
C(Year)[T.1989],-0.0001,7.49e-06,-17.598,0.000,-0.000 -0.000
C(Year)[T.1990],2.328e-05,1.18e-05,1.968,0.049,9.41e-08 4.65e-05
C(Year)[T.1991],2.679e-05,1.5e-05,1.782,0.075,-2.67e-06 5.63e-05
C(Year)[T.1992],4.69e-05,1.51e-05,3.099,0.002,1.72e-05 7.66e-05

0,1,2,3
Omnibus:,70375.924,Durbin-Watson:,1.421
Prob(Omnibus):,0.0,Jarque-Bera (JB):,13762017211.963
Skew:,46.653,Prob(JB):,0.0
Kurtosis:,3801.701,Cond. No.,4.76e+16


Let's try it on just the subset of folks with positive total debt.

In [141]:
#Subset to positive total debt
data_pos_debt=data_d[data_d['TOT_DEBT']>0]

topline_mods_debt_sub={'GO':smf.ols(formula=topline_specs['GO'],data=data_pos_debt).\
                      fit(cov_type='cluster',cov_kwds={'groups':data_pos_debt['BEA_INT']}),
              'RV':smf.ols(formula=topline_specs['RV'],data=data_pos_debt).\
                      fit(cov_type='cluster',cov_kwds={'groups':data_pos_debt['BEA_INT']}),
              'ISSUER':smf.ols(formula=topline_specs['ISSUER'],data=data_pos_debt).\
                      fit(cov_type='cluster',cov_kwds={'groups':data_pos_debt['BEA_INT']}),
              'PURPOSE':smf.ols(formula=topline_specs['PURPOSE'],data=data_pos_debt).\
                      fit(cov_type='cluster',cov_kwds={'groups':data_pos_debt['BEA_INT']}),
              'RD_GO':smf.ols(formula=topline_specs['RD_GO'],data=data_pos_debt).\
                      fit(cov_type='cluster',cov_kwds={'groups':data_pos_debt['BEA_INT']}),
              'RD_RV':smf.ols(formula=topline_specs['RD_RV'],data=data_pos_debt).\
                      fit(cov_type='cluster',cov_kwds={'groups':data_pos_debt['BEA_INT']})}

topline_mods_debt_sub['GO'].summary()

0,1,2,3
Dep. Variable:,GO_PC,R-squared:,0.101
Model:,OLS,Adj. R-squared:,0.098
Method:,Least Squares,F-statistic:,-1109000000000000.0
Date:,"Sun, 15 Nov 2015",Prob (F-statistic):,1.0
Time:,14:09:42,Log-Likelihood:,122210.0
No. Observations:,20377,AIC:,-244300.0
Df Residuals:,20303,BIC:,-243700.0
Df Model:,73,,
Covariance Type:,cluster,,

0,1,2,3,4,5
,coef,std err,z,P>|z|,[95.0% Conf. Int.]
Intercept,-3.628e-05,0.000,-0.185,0.853,-0.000 0.000
C(Year)[T.1985],6.301e-05,2.56e-05,2.465,0.014,1.29e-05 0.000
C(Year)[T.1986],1.422e-05,1.63e-05,0.874,0.382,-1.77e-05 4.61e-05
C(Year)[T.1987],-7.543e-05,1.85e-05,-4.077,0.000,-0.000 -3.92e-05
C(Year)[T.1990],-9.859e-05,1.24e-05,-7.956,0.000,-0.000 -7.43e-05
C(Year)[T.1991],-3.143e-05,1.95e-05,-1.615,0.106,-6.96e-05 6.71e-06
C(Year)[T.1992],5.932e-05,2.94e-05,2.015,0.044,1.61e-06 0.000
C(Year)[T.1993],8.349e-05,2.65e-05,3.150,0.002,3.15e-05 0.000
C(Year)[T.1994],-1.923e-05,2.5e-05,-0.770,0.441,-6.82e-05 2.97e-05

0,1,2,3
Omnibus:,57167.896,Durbin-Watson:,1.295
Prob(Omnibus):,0.0,Jarque-Bera (JB):,5419562601.46
Skew:,36.25,Prob(JB):,0.0
Kurtosis:,2528.45,Cond. No.,4.32e+16


In [142]:
topline_mods_debt_sub['RV'].summary()

0,1,2,3
Dep. Variable:,RV_PC,R-squared:,0.044
Model:,OLS,Adj. R-squared:,0.04
Method:,Least Squares,F-statistic:,1335000000000000.0
Date:,"Sun, 15 Nov 2015",Prob (F-statistic):,8.68e-53
Time:,14:09:42,Log-Likelihood:,100290.0
No. Observations:,20377,AIC:,-200400.0
Df Residuals:,20303,BIC:,-199800.0
Df Model:,73,,
Covariance Type:,cluster,,

0,1,2,3,4,5
,coef,std err,z,P>|z|,[95.0% Conf. Int.]
Intercept,8.468e-05,0.000,0.200,0.841,-0.001 0.001
C(Year)[T.1985],0.0007,0.000,2.998,0.003,0.000 0.001
C(Year)[T.1986],-6.075e-06,0.000,-0.039,0.969,-0.000 0.000
C(Year)[T.1987],-1.183e-05,9.22e-05,-0.128,0.898,-0.000 0.000
C(Year)[T.1990],7.548e-05,6.92e-05,1.091,0.275,-6.01e-05 0.000
C(Year)[T.1991],-9.205e-06,0.000,-0.083,0.934,-0.000 0.000
C(Year)[T.1992],2.705e-05,5.65e-05,0.479,0.632,-8.36e-05 0.000
C(Year)[T.1993],0.0002,6.74e-05,2.379,0.017,2.82e-05 0.000
C(Year)[T.1994],-0.0002,5.23e-05,-3.066,0.002,-0.000 -5.79e-05

0,1,2,3
Omnibus:,48569.571,Durbin-Watson:,1.41
Prob(Omnibus):,0.0,Jarque-Bera (JB):,561519994.469
Skew:,24.415,Prob(JB):,0.0
Kurtosis:,814.772,Cond. No.,4.32e+16


In [143]:
topline_mods_debt_sub['ISSUER'].summary()

0,1,2,3
Dep. Variable:,TOT_DEBT_PC,R-squared:,0.069
Model:,OLS,Adj. R-squared:,0.065
Method:,Least Squares,F-statistic:,-47750000000000.0
Date:,"Sun, 15 Nov 2015",Prob (F-statistic):,1.0
Time:,14:09:42,Log-Likelihood:,98677.0
No. Observations:,20377,AIC:,-197200.0
Df Residuals:,20300,BIC:,-196600.0
Df Model:,76,,
Covariance Type:,cluster,,

0,1,2,3,4,5
,coef,std err,z,P>|z|,[95.0% Conf. Int.]
Intercept,-0.0007,0.001,-1.203,0.229,-0.002 0.000
C(Year)[T.1985],0.0007,0.000,3.355,0.001,0.000 0.001
C(Year)[T.1986],0.0001,0.000,0.890,0.373,-0.000 0.000
C(Year)[T.1987],1.636e-05,6.49e-05,0.252,0.801,-0.000 0.000
C(Year)[T.1990],0.0002,0.000,1.448,0.148,-5.44e-05 0.000
C(Year)[T.1991],0.0001,0.000,1.022,0.307,-0.000 0.000
C(Year)[T.1992],0.0002,6.61e-05,3.379,0.001,9.38e-05 0.000
C(Year)[T.1993],0.0004,0.000,4.223,0.000,0.000 0.001
C(Year)[T.1994],-1.482e-05,3.79e-05,-0.391,0.695,-8.91e-05 5.94e-05

0,1,2,3
Omnibus:,45739.186,Durbin-Watson:,1.331
Prob(Omnibus):,0.0,Jarque-Bera (JB):,336606159.816
Skew:,21.193,Prob(JB):,0.0
Kurtosis:,631.218,Cond. No.,3.02e+16


In [144]:
topline_mods_debt_sub['PURPOSE'].summary()

0,1,2,3
Dep. Variable:,TOT_DEBT_PC,R-squared:,0.086
Model:,OLS,Adj. R-squared:,0.083
Method:,Least Squares,F-statistic:,-155800000000000.0
Date:,"Sun, 15 Nov 2015",Prob (F-statistic):,1.0
Time:,14:09:42,Log-Likelihood:,98870.0
No. Observations:,20377,AIC:,-197600.0
Df Residuals:,20298,BIC:,-197000.0
Df Model:,78,,
Covariance Type:,cluster,,

0,1,2,3,4,5
,coef,std err,z,P>|z|,[95.0% Conf. Int.]
Intercept,-2.498e-05,0.001,-0.046,0.963,-0.001 0.001
C(Year)[T.1985],0.0007,0.000,2.799,0.005,0.000 0.001
C(Year)[T.1986],9.7e-05,0.000,0.668,0.504,-0.000 0.000
C(Year)[T.1987],-5.352e-05,9.12e-05,-0.587,0.557,-0.000 0.000
C(Year)[T.1990],-2.208e-05,9.23e-05,-0.239,0.811,-0.000 0.000
C(Year)[T.1991],-4.013e-05,0.000,-0.357,0.721,-0.000 0.000
C(Year)[T.1992],7.409e-05,6.96e-05,1.065,0.287,-6.23e-05 0.000
C(Year)[T.1993],0.0002,8.38e-05,2.843,0.004,7.4e-05 0.000
C(Year)[T.1994],-0.0002,6.09e-05,-3.300,0.001,-0.000 -8.16e-05

0,1,2,3
Omnibus:,46271.465,Durbin-Watson:,1.358
Prob(Omnibus):,0.0,Jarque-Bera (JB):,373327886.749
Skew:,21.767,Prob(JB):,0.0
Kurtosis:,664.672,Cond. No.,5.53e+16


In [145]:
topline_mods_debt_sub['RD_GO'].summary()

0,1,2,3
Dep. Variable:,GO_RD,R-squared:,0.074
Model:,OLS,Adj. R-squared:,0.071
Method:,Least Squares,F-statistic:,-1205000000000000.0
Date:,"Sun, 15 Nov 2015",Prob (F-statistic):,1.0
Time:,14:09:42,Log-Likelihood:,133400.0
No. Observations:,20377,AIC:,-266700.0
Df Residuals:,20303,BIC:,-266100.0
Df Model:,73,,
Covariance Type:,cluster,,

0,1,2,3,4,5
,coef,std err,z,P>|z|,[95.0% Conf. Int.]
Intercept,-6.311e-06,7.8e-05,-0.081,0.936,-0.000 0.000
C(Year)[T.1985],5.946e-05,1.72e-05,3.457,0.001,2.57e-05 9.32e-05
C(Year)[T.1986],5.568e-05,1.7e-05,3.283,0.001,2.24e-05 8.89e-05
C(Year)[T.1987],6.062e-06,9.97e-06,0.608,0.543,-1.35e-05 2.56e-05
C(Year)[T.1990],-5.358e-06,9.87e-06,-0.543,0.587,-2.47e-05 1.4e-05
C(Year)[T.1991],5.008e-06,1.55e-05,0.324,0.746,-2.53e-05 3.53e-05
C(Year)[T.1992],3.081e-05,1.75e-05,1.763,0.078,-3.43e-06 6.51e-05
C(Year)[T.1993],6.222e-05,1.04e-05,5.990,0.000,4.19e-05 8.26e-05
C(Year)[T.1994],-2.91e-05,1.7e-05,-1.715,0.086,-6.24e-05 4.16e-06

0,1,2,3
Omnibus:,61560.238,Durbin-Watson:,1.369
Prob(Omnibus):,0.0,Jarque-Bera (JB):,9955710929.784
Skew:,44.352,Prob(JB):,0.0
Kurtosis:,3426.15,Cond. No.,4.32e+16


In [146]:
topline_mods_debt_sub['RD_RV'].summary()

0,1,2,3
Dep. Variable:,RV_RD,R-squared:,0.057
Model:,OLS,Adj. R-squared:,0.054
Method:,Least Squares,F-statistic:,266800000000000.0
Date:,"Sun, 15 Nov 2015",Prob (F-statistic):,2.43e-50
Time:,14:09:42,Log-Likelihood:,114110.0
No. Observations:,20377,AIC:,-228100.0
Df Residuals:,20303,BIC:,-227500.0
Df Model:,73,,
Covariance Type:,cluster,,

0,1,2,3,4,5
,coef,std err,z,P>|z|,[95.0% Conf. Int.]
Intercept,0.0002,0.000,0.716,0.474,-0.000 0.001
C(Year)[T.1985],0.0005,0.000,3.247,0.001,0.000 0.001
C(Year)[T.1986],1.126e-05,7.83e-05,0.144,0.886,-0.000 0.000
C(Year)[T.1987],4.222e-05,5.32e-05,0.793,0.428,-6.21e-05 0.000
C(Year)[T.1990],9.645e-05,4.84e-05,1.993,0.046,1.61e-06 0.000
C(Year)[T.1991],3.5e-05,6.47e-05,0.541,0.588,-9.17e-05 0.000
C(Year)[T.1992],-2.298e-05,1.95e-05,-1.176,0.239,-6.13e-05 1.53e-05
C(Year)[T.1993],9.088e-05,3.71e-05,2.451,0.014,1.82e-05 0.000
C(Year)[T.1994],-0.0001,3.95e-05,-3.655,0.000,-0.000 -6.7e-05

0,1,2,3
Omnibus:,56949.494,Durbin-Watson:,1.545
Prob(Omnibus):,0.0,Jarque-Bera (JB):,4028258456.583
Skew:,36.044,Prob(JB):,0.0
Kurtosis:,2179.989,Cond. No.,4.32e+16


In [147]:
len(data_d[pd.isnull(data_d).any(axis=1)])

0

## Estimation with a Standardized Set

It would be useful to take scale differences out of the equation by standardizing non-categorical variables.  Let's identify which variables need to standardized.

In [138]:
## Variables to standardize
std_vars=debt_vars+pc_debt_vars+prop_debt_vars+rd_debt_vars+tprop_vars['ISSUER']+tprop_vars['PURPOSE']+\
         tot_debt_vars+fiscal_vars+['DENSITY','DIVERSITY','HSLD_PERS','MANU_RES','PC_INC','PC_SSI','POP65',\
                                    'POPGROWTH','POVERTY','PRE1940','PVT_SCH','PYOUNG','RESPOP','RESPOP2',\
                                    'RETL_RES','SERV_RES','TOT_DEBT_PC']

print sorted([var for var in data_d.columns if var not in std_vars])

['ASMT_L', 'BEA', 'BEA_INT', 'BIN_REC', 'BOTH', 'CTY_INTEREST', 'EMP_RES', 'FIPS', 'FIPSST', 'GP_LMT', 'INFLAT', 'LIMITS', 'RATE_L', 'REAL_RATE', 'R_CTY_INT', 'R_CTY_INT_DIFF', 'SC_LMT', 'STATE', 'TYPE1', 'TYPE2', 'TYPE2_Y', 'Year', 'defl']


To remain consistent with our clustered standard errors by region, our standardization method will subtract from each float type variable the average value for the BEA region, and that difference will be normalized by dividing it by the standard deviation for the region.  If no variation exists, the standard deviation is zero.  In this eventuality, the function (`std_val`) will return zero.  The point of standardization is to express a scale-invariant distance from central tendency.  No such distance can exist if no variation occurs.

In [139]:
#Capture copy of data_d
data_std=data_d.copy(deep=True)

#Capture set of years
yrs=sorted(set(data_std['Year']))

#Capture set of BEA regions
bea_regs=sorted(set(data_std['BEA']))

#Set index
data_std.set_index(['Year','BEA'],inplace=True)

#Sort index
data_std.sortlevel(0,inplace=True)

#Define function to return a standardized value or zero
def std_val(x):
    if np.std(x)>0:
        return (x - np.mean(x)) / np.std(x)
    else:
        return 0

#Create container for std subset
std_subs=[]

#For each year...
for yr in yrs:
    #...for each region...
    for br in bea_regs:
        #...capture the subset...
        tmp_sub=data_std.ix[(yr,br)]
        #...and standardize the relevant variables...
        tmp_sub[std_vars]=tmp_sub[std_vars].apply(lambda x: std_val(x))
        #...and throw it in std_subs
        std_subs.append(tmp_sub)
        
#Concatenate back together
data_std=pd.concat(std_subs)

#Sort index
data_std.sortlevel(0,inplace=True)
            
data_std.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[k1] = value[k2]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


ValueError: Must have equal len keys and value when setting with an iterable

In [None]:
data_std.ix[1990]

In [None]:
np.array([0,0,0,0,0,0,0]).std()

In [None]:
d=DataFrame(np.arange(16).reshape(4,4))
print d
d[[1,2,3]]=d[[1,2,3]].apply(lambda x: (x - np.mean(x)) / np.std(x))
d

In [None]:
len(data_d)

In [None]:
smf.ols(formula=topline_specs['GO'],data=data_d).fit(cov_type='cluster',cov_kwds={'groups':data_d['BEA_INT']})

Add in years

dummy for recessions

cluster by BEA region

implicit price deflator for state and local government services

try total volume as dependent and proportions

interest rates