# Medicaid Data by State from CMS (1997-2014)

In an attempt to develop a proxy for Medicaid program generosity (see **`mcaid_by_state`**), it became clear that NIPA data conflicted with other sources.  Specifically, some states had wildly different data from the information presented by [KFF](http://kff.org/medicaid/state-indicator/total-medicaid-spending/#), which is based on Form CM-64 data reported by states to CMS.  To some extent, we have limited recourse.  Before 1997, we NIPA is the only consolidate source of Medicaid spending by state.  However, CMS has made data available from 1997 on.

In this Notebook, we will liberate the data between 1997 and 2014 from the sadistic shackles that CMS has chosen to store it, otherwise known as Excel.

In [2]:
#Data Management
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
from openpyxl import *

#Visualization
import seaborn as sb

%pylab inline

plt.rcParams['axes.edgecolor']='k'
plt.rcParams['axes.linewidth']=1
plt.rcParams['axes.facecolor']=(1,1,1,0)
plt.rcParams['grid.color']='k'
plt.rcParams['grid.linestyle']=':'
plt.rcParams['grid.linewidth']=0.3

Populating the interactive namespace from numpy and matplotlib




The [data in question](https://www.medicaid.gov/medicaid-chip-program-information/by-topics/financing-and-reimbursement/expenditure-reports-mbes-cbes.html) are housed in four workbooks with irregular temporal coverage and formatting.  The formatting in particular means we will not be able to rely on a general function to handle all of it.  However, we should be fairly consistent within the workbook, so some room for programmatic efficiency is available.

## 1997-2001

We can start by loading the first notebook and grabbing the sheet names.

In [14]:
#Establish data location
data_dir='O:/Analyst/Marvin/Medicaid/'

#Read in data
fmr97=load_workbook(data_dir+'FMR1997through2001.xlsx')

#Create container to hold sheet objects
fmr97_dict={}

#Capture sheet objects
for sheet in fmr97.get_sheet_names():
    fmr97_dict.update({int(sheet[-4:]):fmr97.get_sheet_by_name(sheet)})
    
fmr97_dict[1997]

<Worksheet "FMR1997">

Once we have a sheet, we can query its contents in a couple ways...

In [32]:
print fmr97_dict[1998]['A1'].value
print fmr97_dict[1998].cell(row=1,column=1).value

Medicaid Finanacial Management Report
Medicaid Finanacial Management Report


Visual inspection suggests that there are a few pieces of information we want to collect.

1. The state name;
2. Total Medicaid expenditure;
3. The federal share of Medicaid expenditure; and,
4. The state share of Medicaid expenditure.

Each observation in our final set will carry these pieces of information, in addition to the year (which is constant within sheet).  Total expenditure (and the shares) must be aggregated over two data within-state data points:

1. `Total Net Expenditures`
2. `C-Total Net`

To allocate these expenditures, we will rely on position.  The values of net expenditures that follow Alaska, but precede Alabama (presumably ordered by postal code) will be allocated to Alaska.  Therefore, we need a function that will identify a state name as the start of an observation, capture the cumulative expenditures as we move down through the rows, and then simultaneously close out that observation while opening a new observation upon encountering the next state name.

The `cell()` method enables us to iteratively run across the sheet via simple loops.  We can capture the start position of each observation, and then the positions of the relevant values for that observation.  Capturing the rows with the data is easy, because the tags don't change.  The states, on the other hand, are not marked by some common concept ID.  We need to test membership in a list of state names to identify them.  Let's grab that state info first.

In [44]:
#Read in mapping
st_map=pd.read_csv('https://raw.githubusercontent.com/jasonong/List-of-US-States/master/states.csv')

#Convert to dict
st_dict=dict(zip(st_map['State'],st_map['Abbreviation']))

#Include DC
st_dict.update({'District of Columbia':'DC',
                'Dist. Of Col.':'DC',
                'DC':'DC',
                'Amer. Samoa':'AS',
                'American Samoa':'AS',
                'N. Mariana Islands':'NMI',
                'Guam':'GU',
                'Mass. Blind':'MB'})

...and now the positions.

In [46]:
#Create a container for state positions
state_pos=[]

#For each row...
for i in range(int(fmr97_dict[1998].max_row)):
    #...if the cell value is in the list of states...
    if fmr97_dict[1998].cell(row=i+1,column=1).value in st_dict.keys():
        #...capture the position and the cell value
        state_pos.append((i+1,fmr97_dict[1998].cell(row=i+1,column=1).value))
        
#Create a container for data positions
data_pos=[]

#For each row...
for i in range(int(fmr97_dict[1998].max_row)):
    #...if the cell value is in the list of states...
    if fmr97_dict[1998].cell(row=i+1,column=1).value in ['Total Net Expenditures','C-Total Net']:
        #...capture the position and the cell value
        data_pos.append((i+1,fmr97_dict[1998].cell(row=i+1,column=1).value))
        

for i in range(len(state_pos))[1:]:
    if state_pos[i][0]-state_pos[i-1][0]!=86:
        print state_pos[i],state_pos[i-1]

(1209, u'Hawaii') (1037, u'Georgia')
(2069, u'Maryland') (1897, u'Massachusetts')
(3789, u'Rhode Island') (3617, u'Pennsylvania')
(4477, u'Vermont') (4305, u'Virginia')


In [62]:
fmr97_dict[1998].cell(row=pos_i,column=[2,3]).value

TypeError: unhashable type: 'list'

In [59]:
exp_components=['Total Net Expenditures','C-Total Net']

state_tmp={}
for pos in state_pos:
    print '***',pos,'***'
    pos_tmp={}
    pos_i=pos[0]
    for component in exp_components:
#         'Seeking '+component
        while fmr97_dict[1998].cell(row=pos_i,column=1).value!=component and pos_i<20000:
            pos_i+=1
            if fmr97_dict[1998].cell(row=pos_i,column=1).value==component:
                pos_tmp.update({component:fmr97_dict[1998].cell(row=pos_i,column=2).value})
                break
#         print 'Eureka! ',fmr97_dict[1998].cell(row=pos_i,column=1).value
    state_tmp.update({pos[1]:pos_tmp})
        
state_tmp

*** (5, u'Alaska') ***
*** (91, u'Alabama') ***
*** (177, u'Arkansas') ***
*** (263, u'Amer. Samoa') ***
*** (349, u'Arizona') ***
*** (435, u'California') ***
*** (521, u'N. Mariana Islands') ***
*** (607, u'Colorado') ***
*** (693, u'Connecticut') ***
*** (779, u'Dist. Of Col.') ***
*** (865, u'Delaware') ***
*** (951, u'Florida') ***
*** (1037, u'Georgia') ***
*** (1209, u'Hawaii') ***
*** (1295, u'Iowa') ***
*** (1381, u'Idaho') ***
*** (1467, u'Illinois') ***
*** (1553, u'Indiana') ***
*** (1639, u'Kansas') ***
*** (1725, u'Kentucky') ***
*** (1811, u'Louisiana') ***
*** (1897, u'Massachusetts') ***
*** (2069, u'Maryland') ***
*** (2155, u'Maine') ***
*** (2241, u'Michigan') ***
*** (2327, u'Minnesota') ***
*** (2413, u'Missouri') ***
*** (2499, u'Mississippi') ***
*** (2585, u'Montana') ***
*** (2671, u'North Carolina') ***
*** (2757, u'North Dakota') ***
*** (2843, u'Nebraska') ***
*** (2929, u'New Hampshire') ***
*** (3015, u'New Jersey') ***
*** (3101, u'New Mexico') ***
*** (

{u'Alabama': {'C-Total Net': 2788912, 'Total Net Expenditures': 2324140936L},
 u'Alaska': {'C-Total Net': 0, 'Total Net Expenditures': 362921319},
 u'Amer. Samoa': {'C-Total Net': 0, 'Total Net Expenditures': 11602064},
 u'Arizona': {'C-Total Net': 0, 'Total Net Expenditures': 1861049085},
 u'Arkansas': {'C-Total Net': 0, 'Total Net Expenditures': 1406248828},
 u'California': {'C-Total Net': 1186963,
  'Total Net Expenditures': 18332289798L},
 u'Colorado': {'C-Total Net': 0, 'Total Net Expenditures': 1604443241},
 u'Connecticut': {'C-Total Net': 0, 'Total Net Expenditures': 2831673760L},
 u'Delaware': {'C-Total Net': 0, 'Total Net Expenditures': 414170166},
 u'Dist. Of Col.': {'C-Total Net': 0, 'Total Net Expenditures': 865522374},
 u'Florida': {'C-Total Net': 6454111, 'Total Net Expenditures': 6364304715L},
 u'Georgia': {'C-Total Net': 0, 'Total Net Expenditures': 3487596382L},
 u'Hawaii': {'C-Total Net': 0, 'Total Net Expenditures': 586017229},
 u'Idaho': {'C-Total Net': 1562098, 'To

In [50]:
data_pos

[(54, u'Total Net Expenditures'),
 (89, u'C-Total Net'),
 (140, u'Total Net Expenditures'),
 (175, u'C-Total Net'),
 (226, u'Total Net Expenditures'),
 (261, u'C-Total Net'),
 (312, u'Total Net Expenditures'),
 (347, u'C-Total Net'),
 (398, u'Total Net Expenditures'),
 (433, u'C-Total Net'),
 (484, u'Total Net Expenditures'),
 (519, u'C-Total Net'),
 (570, u'Total Net Expenditures'),
 (605, u'C-Total Net'),
 (656, u'Total Net Expenditures'),
 (691, u'C-Total Net'),
 (742, u'Total Net Expenditures'),
 (777, u'C-Total Net'),
 (828, u'Total Net Expenditures'),
 (863, u'C-Total Net'),
 (914, u'Total Net Expenditures'),
 (949, u'C-Total Net'),
 (1000, u'Total Net Expenditures'),
 (1035, u'C-Total Net'),
 (1086, u'Total Net Expenditures'),
 (1121, u'C-Total Net'),
 (1172, u'Total Net Expenditures'),
 (1207, u'C-Total Net'),
 (1258, u'Total Net Expenditures'),
 (1293, u'C-Total Net'),
 (1344, u'Total Net Expenditures'),
 (1379, u'C-Total Net'),
 (1430, u'Total Net Expenditures'),
 (1465, u'C-

In [30]:
print fmr97_dict[1998].cell(row=1,column=1).value
print fmr97_dict[1998].max_row
print type(fmr97_dict[1998].max_row)
print type(int(fmr97_dict[1998].max_row))
for i in range(int(fmr97_dict[1998].max_row)):
    print (i+1,fmr97_dict[1998].cell(row=i+1,column=1).value)

Medicaid Finanacial Management Report
4991
<type 'long'>
<type 'int'>
(1, u'Medicaid Finanacial Management Report')
(2, u'FY 1998')
(3, u'Net Services')
(4, u'National')
(5, u'Alaska')
(6, None)
(7, u'Medical Assistance Program')
(8, u'Service Category')
(9, None)
(10, u'Inpatient Hospital - Reg. Payments')
(11, u'Inpatient Hospital - DSH')
(12, u'Mental Health Facility Services - Reg. Payments')
(13, u'Mental Health Facility - DSH')
(14, u'Nursing Facility Services')
(15, u'Intermediate Care Facility - Public')
(16, u'Intermediate Care - Private')
(17, u"Physicians' Services")
(18, u'Outpatient Hospital Services')
(19, u'Prescribed Drugs')
(20, u'Drug Rebate Offset - National')
(21, u'Drug Rebate Offset - State Sidebar Agreement')
(22, u'Dental Services')
(23, u'Other Practicioners')
(24, u'Clinic Services')
(25, u'Laboratory/Radiological')
(26, u'Home Health Services')
(27, u'Sterilizations')
(28, u'Abortions')
(29, u'EPSDT Screening')
(30, u'Rural Health')
(31, u'Medicare - Part A')

In [16]:
dir(fmr97_dict[1998])

['BREAK_COLUMN',
 'BREAK_NONE',
 'BREAK_ROW',
 'ORIENTATION_LANDSCAPE',
 'ORIENTATION_PORTRAIT',
 'PAPERSIZE_A3',
 'PAPERSIZE_A4',
 'PAPERSIZE_A4_SMALL',
 'PAPERSIZE_A5',
 'PAPERSIZE_EXECUTIVE',
 'PAPERSIZE_LEDGER',
 'PAPERSIZE_LEGAL',
 'PAPERSIZE_LETTER',
 'PAPERSIZE_LETTER_SMALL',
 'PAPERSIZE_STATEMENT',
 'PAPERSIZE_TABLOID',
 'SHEETSTATE_HIDDEN',
 'SHEETSTATE_VERYHIDDEN',
 'SHEETSTATE_VISIBLE',
 '__class__',
 '__delattr__',
 '__dict__',
 '__doc__',
 '__format__',
 '__getattribute__',
 '__getitem__',
 '__hash__',
 '__init__',
 '__module__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_add_cell',
 '_auto_filter',
 '_cells',
 '_charts',
 '_comment_count',
 '_create_relationship',
 '_data_validations',
 '_freeze_panes',
 '_garbage_collect',
 '_get_cell',
 '_images',
 '_invalid_row',
 '_merged_cells',
 '_new_cell',
 '_parent',
 '_styles',
 '_title',
 '_unique_sheet_name',
 'act

In [6]:
dir(fmr97)

['_Workbook__read_only',
 '_Workbook__thread_local_data',
 '_Workbook__write_only',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dict__',
 '__doc__',
 '__format__',
 '__getattribute__',
 '__getitem__',
 '__hash__',
 '__init__',
 '__iter__',
 '__module__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_active_sheet_index',
 '_add_sheet',
 '_alignments',
 '_borders',
 '_cell_styles',
 '_colors',
 '_differential_styles',
 '_external_links',
 '_fills',
 '_fonts',
 '_guess_types',
 '_local_data',
 '_named_ranges',
 '_number_formats',
 '_optimized_worksheet_class',
 '_protections',
 '_read_workbook_settings',
 '_setup_styles',
 '_worksheet_class',
 'active',
 'add_named_range',
 'add_sheet',
 'code_name',
 'cond_styles',
 'create_named_range',
 'create_sheet',
 'data_only',
 'drawings',
 'encoding',
 'excel_base_date',
 'get_active_sheet',
 'get_index',
 'get_named_range',
 'ge