# Trends in Municipal Bond Issues

This script seeks to create some tabs and graphs identifying trends in municipal bond issues identified in the Reuters data.  Here are the views we wish to create:

1. Nation - Total debt issues per capita by year. Include GO and RV subtotals, as well as GO %.
2. Nation - Debt issues per capita by year and issuer type (general vs special purpose governments).  Include GO %.
3. Nation - Debt issues per capita by year and debt purpose.
4. Nation - GO % by year and debt purpose.
5. Region - Total debt issues per capita by year.
6. Region - GO % by year. 

In [57]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
import seaborn as sb
import glob
import pandas.io.data as web
from IPython.display import HTML

## Data Input

### Raw Debt Data

We don't actually need many variables for this analysis.  Let's build a list of what is required and read in our raw data.

In [58]:
#Grab list of files
files=glob.glob('../../debt_data/*.csv')

#Define required variables
req_vars=['Security Type','Issuer Type Description','Bond Buyer UOP30','Amount of Maturity ($ mils)','State','Sale Date']

#Create a dict to map in new names
req_var_new_names=['debt_type','issuer_type','purpose','amount','state','Sale Date']
new_name_dict=dict(zip(req_vars,req_var_new_names))

#Create a container for DFs from all years
df_list=[]

#For each file...
for f in files[:-1]:
    print f
    #...throw the subset into df_list
    df_list.append(pd.read_csv(f,usecols=req_vars))
    
#Concatenate all the years together and rename the variables
debt=pd.concat(df_list).rename(columns=new_name_dict).reset_index()

#Convert sale date to datetime
debt['Sale Date']=debt['Sale Date'].apply(lambda x: pd.to_datetime(x))

#Generate a year variable
debt['year']=debt['Sale Date'].apply(lambda x: x.year)

#Jettison Sale Date and old index
debt.pop('Sale Date')
debt.pop('index')

print debt.info()
debt.head()

../../debt_data/1988to1989.csv
../../debt_data/2004.csv
../../debt_data/2014to2015.csv
../../debt_data/1990to1991.csv
../../debt_data/2006to2007.csv
../../debt_data/1992to1993.csv
../../debt_data/2010to2011.csv
../../debt_data/2012to2013.csv
../../debt_data/2000to2001.csv
../../debt_data/2005.csv
../../debt_data/2008to2009.csv
../../debt_data/1998to1999.csv
../../debt_data/1986to1987.csv
../../debt_data/1994to1995.csv
../../debt_data/1996to1997.csv
../../debt_data/1984to1985.csv
../../debt_data/2002to2003.csv
<class 'pandas.core.frame.DataFrame'>
Int64Index: 465391 entries, 0 to 465390
Data columns (total 6 columns):
purpose        465388 non-null object
amount         465391 non-null object
issuer_type    465357 non-null object
state          465365 non-null object
debt_type      465391 non-null object
year           465391 non-null int64
dtypes: int64(1), object(5)
memory usage: 24.9+ MB
None


Unnamed: 0,purpose,amount,issuer_type,state,debt_type,year
0,Utilities,0.48,District,MO,RV,1988
1,Utilities,0.05,"City, Town Vlg",MO,RV,1988
2,General Purpose,5.175,District,CO,GO,1988
3,Education,0.273,District,OH,GO,1988
4,Transportation,0.22,District,IN,GO,1988


We need to convert `amount` to float because ... math.  It turns out that, for some reason, the `np.where()` approach with Series objects really struggled here.  It seriously blew up *all* of the memory of my home workstation (so, 32 GB worth).  I have no idea how this occurred since the data involved is many orders of magnitude smaller.  Nevertheless, numpy arrays are reliable as always, and stupid fast.

In [59]:
#Remove commas
debt['amt_str']=debt['amount'].str.replace(',','')

##Convert to float
#Capture floats in amount
amt_floats=debt['amount'].apply(lambda x: isinstance(x,float)).values

#Capture values in amount and amt_str
f_vals=debt['amount'].values
s_vals=debt['amt_str'].values

#Create container to hold new float amounts
new_floats=np.empty(len(amt_floats))

#For each amount...
for i in range(len(amt_floats)):
    #...if the value is a float...
    if amt_floats[i]:
        #...use it...
        new_floats[i]=f_vals[i]*1000000.
    #...if it is not a float...
    else:
        #...convert the string version to float
        new_floats[i]=float(s_vals[i])*1000000.
        
#Assign new float values to amt_f
debt['amt_f']=new_floats


# debt['amt_f']=np.where(debt['amount'].apply(lambda x: isinstance(x,float)),
#                        debt['amount']*1000000,
#                        debt['amt_str'].astype(float)*1000000)

# #Drop old amount vars
# debt.pop('amount')
# debt.pop('amt_str')

debt.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 465391 entries, 0 to 465390
Data columns (total 8 columns):
purpose        465388 non-null object
amount         465391 non-null object
issuer_type    465357 non-null object
state          465365 non-null object
debt_type      465391 non-null object
year           465391 non-null int64
amt_str        345063 non-null object
amt_f          465391 non-null float64
dtypes: float64(1), int64(1), object(6)
memory usage: 32.0+ MB


We also need to get rid of all issues that do not come from local governments.

In [60]:
debt['issuer_type'].value_counts(),debt['issuer_type'].value_counts().sum()

(City, Town Vlg     154777
 District           147972
 Local Authority     59096
 State Authority     48656
 County/Parish       38925
 State/Province       7975
 College or Univ      6253
 Direct Issuer        1511
 Indian Tribe          113
 Co-op Utility          76
 14                      3
 dtype: int64, 465357)

Ok, we only want to keep the local subset.

In [61]:
#Build list of desired issuers
desired_issuers=['City, Town Vlg','District','Local Authority','County/Parish']
print np.array([iss in set(debt['issuer_type']) for iss in desired_issuers]).all()

#Subset to only those issues from the desired issuers
debt=debt[debt['issuer_type'].isin(desired_issuers)]

#Set index
if np.array([var in debt.columns for var in ['state','year']]).all():
    debt.set_index(['state','year'],inplace=True)

#Sort index
debt.sortlevel(0,inplace=True)

debt.info()

True
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 400770 entries, (AK, 1984) to (WY, 2015)
Data columns (total 6 columns):
purpose        400767 non-null object
amount         400770 non-null object
issuer_type    400770 non-null object
debt_type      400770 non-null object
amt_str        297179 non-null object
amt_f          400770 non-null float64
dtypes: float64(1), object(5)
memory usage: 19.1+ MB


Here are a couple cells to test whether or not our conversion distorted things.  We will display the sum of all float values in the `amount` variable, as well as the sum of the floats and converted strings.  Along the way, we will list the original string values and the converted-to-float counterparts to visually inspect the conversion process.

In [86]:
tst=[]
tst_str=[]
for val in debt['amount'].values:
    if isinstance(val,float):
        tst.append(val)
    elif ',' in val:
        tst_str.append(float(val.replace(',','')))
        print val,'|',val.replace(',','')
    else:
        tst.append(float(val))
        
print 'Sum of all unconverted bond volumes:',np.array(tst).sum()*1000000.
print 'Sum of all bond volumes (including those converted from string):',(np.array(tst).sum()+np.array(tst_str).sum())*1000000.

1,100.000 | 1100.000
1,300.000 | 1300.000
1,000.000 | 1000.000
1,500.000 | 1500.000
1,150.000 | 1150.000
1,300.000 | 1300.000
1,050.000 | 1050.000
1,300.000 | 1300.000
1,470.655 | 1470.655
1,300.000 | 1300.000
1,300.000 | 1300.000
1,250.585 | 1250.585
1,300.000 | 1300.000
1,204.665 | 1204.665
1,369.200 | 1369.200
1,495.575 | 1495.575
1,386.235 | 1386.235
1,224.265 | 1224.265
1,012.235 | 1012.235
1,191.540 | 1191.540
1,259.000 | 1259.000
1,200.000 | 1200.000
1,000.000 | 1000.000
1,000.000 | 1000.000
1,000.000 | 1000.000
1,250.000 | 1250.000
1,269.100 | 1269.100
1,000.000 | 1000.000
1,325.000 | 1325.000
2,579.955 | 2579.955
2,573.035 | 2573.035
1,218.495 | 1218.495
2,035.330 | 2035.330
1,080.000 | 1080.000
1,292.170 | 1292.170
3,487.245 | 3487.245
1,159.860 | 1159.860
Sum of all unconverted bond volumes: 3.933934248e+12
Sum of all bond volumes (including those converted from string): 3.984868393e+12


The conversion looks on the up and up.  Do our totals match?  What was the bond issue volume in 1984?

In [88]:
print 'Sum of new composite bond volumes:',debt['amt_f'].sum()
print 'Sum of new composite bond volumes in 1984:',debt.xs(1984,level='year')['amt_f'].sum()/1000000000

Sum of new composite bond volumes: 3.984868393e+12
Sum of new composite bond volumes in 1984: 58.714146


The total definitely matches, and we have our total for 1984 ($58.7 B).

### Population by State

So, the Census has made it comically inconvenient to get a time series of population by state.  Each decade is separated out, and each has its own format apparently.  (Honestly, who chooses these formats?).  In a nutshell, F that noise.  The good people at the [Federal Reserve Bank of St. Louis](https://research.stlouisfed.org/fred2/) have done the work for us.  Why not leverage the fruits of their labor?

First, let's grab the states.

In [68]:
# #Read in state names and abbreviations
# states=pd.read_csv('https://raw.githubusercontent.com/chris-taylor/USElection/master/data/state-abbreviations.csv',
#                   names=['state','st'])

# states.to_csv('../data/state_abbr.csv')

In [None]:
states=pd.read_csv('../data/state_abbr.csv')

Our approach will be to roll through the states and pull all of the population series together.  Note that they are all in thousands of people.

In [69]:
# #Construct strings
# fred_calls=[st+'POP' for st in states['st']]

# #Create container for state populations
# st_pop_dfs=[]

# #For each state...
# for st in fred_calls:
#     #...capture the state population series
#     st_pop_dfs.append(DataFrame(web.DataReader(st,'fred','1/1/1980','1/1/2014')))
    
# #Join them together
# st_pops=st_pop_dfs[0].join(st_pop_dfs[1:])

# #Rename them by dropping POP from the variable name
# new_st_vars=[var[:2] for var in st_pops.columns]
# st_pops.columns=new_st_vars

# st_pops.to_csv('../data/state_pops1980_2014.csv')

In [None]:
st_pops=pd.read_csv('../data/state_pops1980_2014.csv')

We will need to join on `year` and `state` eventually, so let's get the pop data in that shape.

In [70]:
#Reset index
st_pop=st_pops.stack().reset_index()

#Define year
st_pop['year']=st_pop['DATE'].apply(lambda x: x.year)

#Rename columns 
st_pop.columns=['date','state','pop','year']

#Drop date
st_pop.pop('date')

#Set the index
st_pop.set_index(['state','year'],inplace=True)

#Sort the index
st_pop.sortlevel(0,inplace=True)

#Convert to individual counts
st_pop=st_pop*1000

st_pop

Unnamed: 0_level_0,Unnamed: 1_level_0,pop
state,year,Unnamed: 2_level_1
AK,1980,405315
AK,1981,418491
AK,1982,449606
AK,1983,488417
AK,1984,513702
AK,1985,532495
AK,1986,544268
AK,1987,539309
AK,1988,541983
AK,1989,547159


## Total Debt Issues Per Capita

Let's perform our first aggregation.  Total, GO, and RV debt by state and year.

In [89]:
#Capture subsets by GO and RV debt
go_debt=debt[debt['debt_type']=='GO']
rv_debt=debt[debt['debt_type']=='RV']

#Aggregate by state and year
tot_agg=DataFrame(debt.groupby(level=['state','year']).sum()['amt_f']).rename(columns={'amt_f':'TOTAL'})
go_agg=DataFrame(go_debt.groupby(level=['state','year']).sum()['amt_f']).rename(columns={'amt_f':'GO'})
rv_agg=DataFrame(rv_debt.groupby(level=['state','year']).sum()['amt_f']).rename(columns={'amt_f':'RV'})

#Join sets together
st_yr=tot_agg.join([go_agg,rv_agg])

#Join in population
st_yr=st_yr.join(st_pop)

#Generate per capita measures
for var in ['TOTAL','GO','RV']:
    st_yr[var+'_PC']=st_yr[var]/st_yr['pop']
    
#Calculate GO % of total debt
st_yr['GO_PROP']=st_yr['GO']/st_yr['TOTAL']

st_yr

Unnamed: 0_level_0,Unnamed: 1_level_0,TOTAL,GO,RV,pop,TOTAL_PC,GO_PC,RV_PC,GO_PROP
state,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
AK,1984,747710000,724525000,23185000,513702,1455.532585,1410.399414,45.133171,0.968992
AK,1985,958200000,466990000,491210000,532495,1799.453516,876.984760,922.468756,0.487362
AK,1986,537924000,340829000,197095000,544268,988.343978,626.215394,362.128584,0.633601
AK,1987,93205000,69135000,24070000,539309,172.823001,128.191816,44.631185,0.741752
AK,1988,229018000,214203000,14815000,541983,422.555689,395.220883,27.334806,0.935311
AK,1989,177466000,90248000,87218000,547159,324.340822,164.939259,159.401563,0.508537
AK,1990,242430000,227105000,15325000,553120,438.295487,410.589022,27.706465,0.936786
AK,1991,223443000,161455000,61988000,569273,392.505880,283.616121,108.889759,0.722578
AK,1992,245378000,219363000,26015000,587073,417.968464,373.655406,44.313058,0.893980
AK,1993,1094960000,374585000,720375000,596993,1834.125358,627.452918,1206.672440,0.342099


Time for another check.  If the aggregation is correct, we should match our total from the disaggregated data above.

In [90]:
'{:,}'.format(st_yr.xs(1984,level='year').sum()['TOTAL'])

'58,714,146,000.0'

Indeed we do.  How do we stack up against an external source?  The following was taken from [sifma](http://www.sifma.org/research/statistics.aspx).  The following are in $B.

Issuance in the U.S. Bond Markets								
USD Billions		

Year|Municipal
----|---------
1996|185.2 
1997|220.7 
1998|286.8 
1999|224.4 
2000|198.3 
2001|286.2 
2002|355.8 
2003|380.2 
2004|358.1 
2005|407.2 
2006|386.0 
2007|429.2 
2008|389.3 
2009|409.6
2010|433.1
2011|295.2
2012|382.4
2013|334.9
2014|337.5

How does this stack up against the volume of bond issues by year in our Reuter's set?

In [85]:
for yr in range(1996,2015):
    print yr,'|',st_yr.xs(yr,level='year').sum()['TOTAL']/1000000000.

1996 | 96.789144
1997 | 113.341457
1998 | 133.0422
1999 | 114.783274
2000 | 106.480824
2001 | 153.282527
2002 | 163.052001
2003 | 178.932868
2004 | 160.768258
2005 | 172.326391
2006 | 147.763784
2007 | 165.772779
2008 | 155.860923
2009 | 158.389028
2010 | 171.915652
2011 | 135.738801
2012 | 176.32159
2013 | 153.929374
2014 | 177.944166


## 

We are well below in fact, which may speak to definitional issues, but certainly precludes the idea that we are capturing figures that are too high.