### Followups 2020-Apr-5
> **Data part**
- [x] State level for Connecticut
- [x] County level, for Indiana/Connecticut

> **Build Github/Upload data files & Jupyter notebook (2020-Apr-12)**
- [x] Start drafting the report now
    - Start from a Outline first(Just key point): data source (how to collect, how to process, how to output)
    - Write up the details in each contents/chapter
        - population data source<br>
        - price Fee Schedule structure (medicaid (CMS + STATE (mainly)), medicare (CMS), private insurance (assupmtion-to be same as medicare, not enough time to contact the private providers))<br>
        - Take a weighted sum of the above three fee schedule. What is the wight? How to choose the weight? Why do you choose the population rate as the weights? <br>
    - Understandable to readers(hard)
    - Conclusion & Further Research direction
- [x] Build Github
- [ ] Upload Final Jupyter notebook script as well as data files
- [ ] Add a new state like N.Y.C. if possible rather than visualizing
- [x] Make a duplicate medicare column in the price table as private insurance amount

> *Visualization (less important) {Optional}*
- [ ] State level, nationwide %Medicaid, %Medicare, % Private visualize with COVID data from CDC. Heatmap %Medicaid. COVID map side by side; Heatmap %Medicaid, COVID map overlay circles, the size of each circle is Number of COVID cases by state
- [ ] Try the same heatmap by county (California, LA, CT) by county levle. COVID data from Johns Hopkins or CDC

> *Challenge*{optional}
- [ ] can you visualize compete with [data usa](https://datausa.io/profile/geo/connecticut#health)

https://towardsdatascience.com/a-complete-guide-to-an-interactive-geographical-map-using-python-f4c5197e23e0

- [ ] Updates
    - [x] Title -> Estimation of physician fees adjusted for demographic data
    - [ ] Upload data files onto UIC Box.com (optional)
    - [ ] 1.1 Specify the goals/objectsions instead of what of data there : 
        - such as "get total population per state/county census data", "get the medicaid enrollment data - our denominator", 
        - calculate the weighting coefficients based on the fraction of each population in state/county
     

# Estimation of Physician Fees Adjusted for Demographic Data

#### Background
One day, there comes a new patient (either from other states with unknown insurance coverage or uninsured) to visit your clinic. The patient wants to see a phsician and order several medical services. If you were the physician, what would you do? Reject him because the patient was not covered by a valid insurance. However, this could cause trouble for your clinic or yourself if the patient could not get in-time cure. If you accept the case, then you might think how much you you are going to charge for per medical service? You don't want to get accused because of charging too much. Neither do you want to lose your job/get bothered by monthly utility bill. Here comes our **Physician Fee  Estimation Project**. 

# 1 The Goal
We simplify the process of complex fee formula by CMS and offers an estimation of the service fee based on proportion and costs of current insurance types. That is to say, physicians get paid for a particular procedure that they perform for their patients depending on the proportion of patients with each insurance type.  Each insurance type, such as medicaid, medicare, and private insurance have different prices for each procedure. these prices are published as fee schedules. So if we want to determine the price a physician would be willing to accept from a new insurance or an uninsured patient, we need to calculate that weighted fee based on fee schedules for those known insurance coverages. The weights are determined as the proportion of the population, such as medicare recipients or medicaid recipients,  in each state or county. Our analysis involves getting the fee schedules, and proportions of each population within different states and counties.

#### Assumptions
1. The gap between actual cost/expense and fee schedule for a specific medical procedure service can be ignored.
2. The fees schedule for private providers are temporarily same as MEDICARE because of their confidentiality.
3. The portion of private providers is seen as the population who is not enrolled in MEDICAID or MEDICARE because of their confidentiality.
4. Given non-facility and facility fees from Medicaid and Medicare insurance, we suppose that physicians would like to choose the max one, i.e. non-facility fee amount.

##### Additional References (optional-can remove later)
- [Markdown Github helper for writing/formatting syntax](https://help.github.com/en/github/writing-on-github/basic-writing-and-formatting-syntax)
- [LeTex for scientific formulas](https://www.math.ubc.ca/~pwalls/math-python/jupyter/latex/)
- [How to handle SettingWithCopyWarning](https://www.dataquest.io/blog/settingwithcopywarning/)
- [CPT Code](https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/Part-B-National-Summary-Data-File/Overview)
The data sets are summarized by meaningful ***Healthcare Common Procedure Coding/Current Procedural Terminology, (HCPC/CPT)***, code ranges. Brief descriptions for the code ranges and modifiers are provided in the readme file. The data set name contains the year followed by a five character sequence that is the HCPC/CPT code. This HCPC/CPT code corresponds to the first HCPC/CPT in the selected code range of disciplines.
- [HCPCT Codeset](https://www.cms.gov/Medicare/Coding/HCPCSReleaseCodeSets/Alpha-Numeric-HCPCS)These files contain the Level II alphanumeric HCPCS procedure and modifier codes, their long and short descriptions, and applicable Medicare administrative, coverage and pricing data

In [1]:
import pandas as pd
import numpy as np

import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats
import os
import re

import string
import matplotlib.pyplot as plotter
import matplotlib.pyplot as plt

from collections import Counter
from time import time

# 2 Input
1. **All states population and Medicaid/Medicare Enrollment**<br>
    - The all states population data is obtained from [***United States Census Bureau***](https://www2.census.gov/programs-surveys/popest/datasets/), the latest one is [***2018 all state population estimation***](https://www2.census.gov/programs-surveys/popest/datasets/2010-2018/counties/totals/co-est2018-alldata.csv)
    - The Medicaid enrollment data by state is obtained from [***Medicaid and CHIP Enrollment data***](https://data.medicaid.gov/Enrollment/2018-12-Updated-applications-eligibility-determina/gy72-q4z9/data)
    - The Medicare enrollment data by state and county is obtained from [***CMS Public Use File***](https://www.cms.gov/files/zip/statecounty-table-all-beneficiaries.zip)
2. **Medicaid Physician Fee Schedule For Connecticut/Indiana/New York**<br>
    - Connecticut Medicaid Physcian Fee Schedule is obtained from [***Provider Fee Schedule Portal***](https://www.ctdssmap.com/CTPortal/Provider/ProviderFeeScheduleDownload/tabid/54/Default.aspx)
    - Indiana Medicaid Physician Fee Schedule is from [***IHCP Fee Schedule***](http://provider.indianamedicaid.com/ihcp/Publications/MaxFee/fee_home.asp)
    - NewYork Medicaid Physician Fee Schedule is from [***MED Comply***](https://med-comply.com/NY-Medicaid-Fee-Schedule)
3. **Medicare Physcian Fee Schedule** <br>
    - As we know Medicare is a Federal program, it is organized and published through the search tool [***CMS Physician Fee Schedule Search***](https://www.cms.gov/apps/physician-fee-schedule/search/search-criteria.aspx). This is its related [***specification***](https://www.cms.gov/apps/physician-fee-schedule/help/How_to_MPFS_Booklet_ICN901344.pdf) for the searchable tool. Here is an example on how to get Medicare Physician Fee Schedule datasets for our interest:
        - Year: the fiscal year 2018
        - HCSPC Code Range: the list is from what we got from Medicaid PFS
        - Locality: it depends on the state that we choose
<img src="images/IN_medicare_pfs.png">

In [215]:
%%time
# Population Data from U.S. Census Bureau
df_pop_alldata_2018 = pd.read_csv("data/FY2018_pop_est_alldata.csv", usecols=[x for x in range(3,18)], index_col = None, encoding="ISO-8859-1")
#*** State Medicare / Medicaid Enrollment ***#
# Y2018 Medicaid Enrollment Data By State
df_mdcaid_2018Bs = pd.read_csv("data/FY2018_Medicaid_enrollment_data_By_State.csv", encoding="ISO-8859-1")
# Y2018 Medicare Enrollment Data By State and County
df_mdcare_2018Bsc = pd.read_excel("data/FY2018_Medicare_By_State_County.xlsx", sheet_name="State_county 2018", header=1, index_col=None)
#*** County Medicare / Medicaid Enrollment ***#
#*** Medicaid physician fee schedule ***#
# Connecticut #
df_CT_mdcaid_ASCPFS_2018 = pd.read_csv("data/Connecticut/FY2018_CT_Medicaid_PFS_casc_24.csv", header=2, index_col=None)
# Indiana #
df_IN_mdcaid_OPFS_2018 = pd.read_excel("data/Indiana/FY2018_IN_Medicaid_Outpatient_Fee_Schedule.xlsx", sheet_name="Tab 3 - Fee Schedule", header=16, index_col=None, usecols=None)
# New York #
df_NY_mdcaid_MPFS_2018 = pd.read_excel("data/NewYork/FY2018_NY_Physician_Manual_Fee_Schedule_Sect5.xls", sheet_name="PHY SURG FS JAN 2020", header=2, index_col=None, usecols=None)
#*** Medicare physician fee schedule ***#
# Indiana #
df_IN_mdcare_pfs_2018 = pd.read_csv("data/Indiana/FY2018_IN_Medicare_PFSExport.csv", header=0, index_col=None, usecols=[0, 2, 5])
# Connecticut #
df_CT_mdcare_pfs_2018 = pd.read_csv("data/Connecticut/FY2018_CT_Medicare_PFSExport.csv", header=0, index_col=None, usecols=[0, 2, 5])
# New York #
df_NY_mdcare_pfs_2018 = pd.read_csv("data/NewYork/FY2018_NY_medicare_PFSExport.csv", header=0, index_col=None, usecols=[0, 2, 5])

CPU times: user 10.5 s, sys: 285 ms, total: 10.8 s
Wall time: 11.4 s


### Get to know our population data
Let's do EDA-Exploratory Data Analysis on above datasets.Structure-Granularity-Scope-Temporaility-Faithfulness

In [157]:
#Population
df_pop_alldata_2018.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3193 entries, 0 to 3192
Data columns (total 15 columns):
STATE                3193 non-null int64
COUNTY               3193 non-null int64
STNAME               3193 non-null object
CTYNAME              3193 non-null object
CENSUS2010POP        3193 non-null int64
ESTIMATESBASE2010    3193 non-null int64
POPESTIMATE2010      3193 non-null int64
POPESTIMATE2011      3193 non-null int64
POPESTIMATE2012      3193 non-null int64
POPESTIMATE2013      3193 non-null int64
POPESTIMATE2014      3193 non-null int64
POPESTIMATE2015      3193 non-null int64
POPESTIMATE2016      3193 non-null int64
POPESTIMATE2017      3193 non-null int64
POPESTIMATE2018      3193 non-null int64
dtypes: int64(13), object(2)
memory usage: 374.3+ KB


In [142]:
df_pop_alldata_2018.head()

Unnamed: 0,STATE,COUNTY,STNAME,CTYNAME,CENSUS2010POP,ESTIMATESBASE2010,POPESTIMATE2010,POPESTIMATE2011,POPESTIMATE2012,POPESTIMATE2013,POPESTIMATE2014,POPESTIMATE2015,POPESTIMATE2016,POPESTIMATE2017,POPESTIMATE2018
0,1,0,Alabama,Alabama,4779736,4780138,4785448,4798834,4815564,4830460,4842481,4853160,4864745,4875120,4887871
1,1,1,Alabama,Autauga County,54571,54574,54754,55208,54936,54713,54876,54838,55242,55443,55601
2,1,3,Alabama,Baldwin County,182265,182264,183111,186540,190143,194886,199189,202995,207712,212619,218022
3,1,5,Alabama,Barbour County,27457,27457,27330,27350,27174,26944,26758,26294,25819,25158,24881
4,1,7,Alabama,Bibb County,22915,22920,22872,22747,22664,22516,22541,22562,22576,22555,22400


**Granularity**

When county = 0, the row of record stands for whole state's population counts from year 2010-2018. Non-zero county rows means each county's population counts from year 2010 - 2018.

In [151]:
df_pop_alldata_2018["CTYNAME"] = df_pop_alldata_2018["CTYNAME"].apply(lambda x:x.split(" ")[0])
df_POP_alldata_2018Bs = df_pop_alldata_2018[df_pop_alldata_2018["COUNTY"] == 0][["STNAME", "CTYNAME", "POPESTIMATE2018"]]
df_POP_alldata_2018Bc = df_pop_alldata_2018[df_pop_alldata_2018["COUNTY"] != 0][["STNAME", "CTYNAME", "POPESTIMATE2018"]]

In [195]:
df_POP_alldata_2018Bs.head()

Unnamed: 0,STNAME,CTYNAME,POPESTIMATE2018
0,Alabama,Alabama,4887871
68,Alaska,Alaska,737438
98,Arizona,Arizona,7171646
114,Arkansas,Arkansas,3013825
190,California,California,39557045


In [152]:
df_POP_alldata_2018Bc.head()

Unnamed: 0,STNAME,CTYNAME,POPESTIMATE2018
1,Alabama,Autauga,55601
2,Alabama,Baldwin,218022
3,Alabama,Barbour,24881
4,Alabama,Bibb,22400
5,Alabama,Blount,57840


In [158]:
#Medicaid State Enrollment
df_mdcaid_2018Bs.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51 entries, 0 to 50
Data columns (total 29 columns):
State Abbreviation                                                                              51 non-null object
State Name                                                                                      51 non-null object
Report Date                                                                                     51 non-null object
State Expanded Medicaid                                                                         51 non-null object
Preliminary_Updated                                                                             51 non-null object
Final_Report                                                                                    51 non-null object
New Applications Submitted to Medicaid and CHIP Agencies                                        48 non-null float64
New Applications Submitted to Medicaid and CHIP Agencies â footnotes                     

In [144]:
df_mdcaid_2018Bs.head()

Unnamed: 0,State Abbreviation,State Name,Report Date,State Expanded Medicaid,Preliminary_Updated,Final_Report,New Applications Submitted to Medicaid and CHIP Agencies,New Applications Submitted to Medicaid and CHIP Agencies â footnotes,Applications for Financial Assistance Submitted to the State Based Marketplace,Applications for Financial Assistance Submitted to the State Based Marketplace â footnotes,...,Medicaid and CHIP Child Enrollment - footnotes,Total Medicaid and CHIP Enrollment,Total Medicaid and CHIP Enrollment - footnotes,Latitude,Longitude,New Georeferenced Column,Total Medicaid Enrollment,Total Medicaid Enrollment - footnotes,Total CHIP Enrollment,Total CHIP Enrollment - footnotes
0,TN,Tennessee,12/01/2018,N,U,Y,,,,,...,,1396302,,41.6772,-71.5101,"(41.6772, -71.5101)",1342027,,54275,
1,ID,Idaho,12/01/2018,N,U,Y,14639.0,,,,...,,280570,,42.0046,-93.214,"(42.0046, -93.214)",256565,,24005,
2,MA,Massachusetts,12/01/2018,Y,U,Y,16180.0,,4462.0,,...,,1598878,,31.1801,-91.8749,"(31.1801, -91.8749)",1407486,,191392,
3,NM,New Mexico,12/01/2018,Y,U,Y,10579.0,,,,...,,728327,,43.4108,-71.5653,"(43.4108, -71.5653)",691223,,37104,
4,HI,Hawaii,12/01/2018,Y,U,Y,4822.0,,,,...,,331075,,32.9866,-83.6487,"(32.9866, -83.6487)",305872,,25203,


Each row represents the statistic in one state. We need to select and convert the label names for some attributes to better visualize and organize the data.

In [184]:
df_MDcaid_2018Bs = df_mdcaid_2018Bs[["State Abbreviation", "State Name", "Total Medicaid and CHIP Enrollment"]].copy()
df_MDcaid_2018Bs.rename(columns={"State Name":"STNAME", "State Abbreviation":"STATE", "Total Medicaid and CHIP Enrollment":"MDCAID_CNT_2018BS"}, inplace=True)
df_MDcaid_2018Bs.head()

Unnamed: 0,STATE,STNAME,MDCAID_CNT_2018BS
0,TN,Tennessee,1396302
1,ID,Idaho,280570
2,MA,Massachusetts,1598878
3,NM,New Mexico,728327
4,HI,Hawaii,331075


In [146]:
df_mdcare_2018Bsc.head()

Unnamed: 0,State,County,State and County FIPS Code,Beneficiaries with Part A and Part B,FFS Beneficiaries,MA Beneficiaries,MA Participation Rate,Average Age,Percent Female,Percent Male,...,PQI11 Bacterial Pneumonia Admission Rate (age < 65),PQI11 Bacterial Pneumonia Admission Rate (age 65-74),PQI11 Bacterial Pneumonia Admission Rate (age 75+),PQI12 UTI Admission Rate (age < 65),PQI12 UTI Admission Rate (age 65-74),PQI12 UTI Admission Rate (age 75+),PQI15 Asthma in Younger Adults Admission Rate (age < 40),PQI16 Lower Extremity Amputation Admission Rate (age < 65),PQI16 Lower Extremity Amputation Admission Rate (age 65-74),PQI16 Lower Extremity Amputation Admission Rate (age 75+)
0,Na,NATIONAL TOTAL,,56031636,33499472,22532164,40.21 %,72,54.67 %,45.33 %,...,497.0,344.0,1005.0,292.0,219.0,943.0,159,238.0,69.0,58.0
1,AK,STATE TOTAL,,86462,84714,1748,2.02 %,71,50.41 %,49.59 %,...,211.0,203.0,728.0,140.0,99.0,499.0,*,193.0,35.0,70.0
2,AK,Aleutians East,2013.0,*,117,*,*,72,47.01 %,52.99 %,...,,,,,,,,,,
3,AK,Aleutians West,2016.0,*,135,*,*,71,46.67 %,53.33 %,...,,,,,,,,,,
4,AK,Anchorage,2020.0,32227,31503,724,2.25 %,71,52.76 %,47.24 %,...,,,,,,,,,,


In [185]:
df_mdcare_2018Bsc.iloc[:,0:6].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3250 entries, 0 to 3249
Data columns (total 6 columns):
State                                   3250 non-null object
County                                  3250 non-null object
State and County FIPS Code              3194 non-null float64
Beneficiaries with Part A and Part B    3250 non-null object
FFS Beneficiaries                       3250 non-null object
MA Beneficiaries                        3250 non-null object
dtypes: float64(1), object(5)
memory usage: 152.4+ KB


**Granularity**
1. First, we can get the first row means the national total counts of Medicare beneficiaries.
2. Second, for each state, the row statistics starts from state total counts of Medicare beneficiaries following by each county
3. We could see that there are invalid and missing values in total counts of medicare beneficiaries, such as **"*"**
4. We figure out the relationship of column **"Beneficiaries with Part A and Part B" is the sum of "FFS Beneficiaries" and "MA Beneficiaries"**.

In [189]:
# DataFrame Column Labels Formatter
df_MDcare_2018Bsc = df_mdcare_2018Bsc[["State", "County", "Beneficiaries with Part A and Part B", "FFS Beneficiaries", "MA Beneficiaries"]].copy()
df_MDcare_2018Bsc.rename(columns = {"State":"STATE", "County":"CTYNAME",
                                   "Beneficiaries with Part A and Part B":"MDCARE_CNT_2018",
                                   "FFS Beneficiaries":"FFS_CNT",
                                   "MA Beneficiaries":"MA_CNT"}, inplace=True)
df_MDcare_2018Bsc["FFS_CNT"] = df_MDcare_2018Bsc["FFS_CNT"].apply(lambda x: int(x) if x != "*" else 0)
df_MDcare_2018Bsc["MA_CNT"] = df_MDcare_2018Bsc["MA_CNT"].apply(lambda x: int(x) if x != "*" else 0)
df_MDcare_2018Bsc["MDCARE_CNT_2018"] = df_MDcare_2018Bsc["MDCARE_CNT_2018"].apply(lambda x: int(x) if x != "*" else 0)
df_MDcare_2018Bsc["MDCARE_CNT_2018"] = df_MDcare_2018Bsc["FFS_CNT"] + df_MDcare_2018Bsc["MA_CNT"]
df_MDcare_2018Bsc.head()

Unnamed: 0,STATE,CTYNAME,MDCARE_CNT_2018,FFS_CNT,MA_CNT
0,Na,NATIONAL TOTAL,56031636,33499472,22532164
1,AK,STATE TOTAL,86462,84714,1748
2,AK,Aleutians East,117,117,0
3,AK,Aleutians West,135,135,0
4,AK,Anchorage,32227,31503,724


In [188]:
df_MDcare_2018Bsc.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3250 entries, 0 to 3249
Data columns (total 5 columns):
STATE              3250 non-null object
CTYNAME            3250 non-null object
MDCARE_CNT_2018    3250 non-null int64
FFS_CNT            3250 non-null int64
MA_CNT             3250 non-null int64
dtypes: int64(3), object(2)
memory usage: 127.0+ KB


5. We then get the Medicare enroll separately by state and by county through CTYNAME is "STATE TOTAL" or not

In [196]:
# PICK up state level statistics
df_mdcare_2018Bs = df_MDcare_2018Bsc[df_MDcare_2018Bsc["CTYNAME"] == "STATE TOTAL"].copy()
df_MDcare_2018Bs = df_mdcare_2018Bs[["STATE", "MDCARE_CNT_2018"]].copy()
df_mdcare_2018Bc = df_MDcare_2018Bsc[~df_MDcare_2018Bsc["CTYNAME"].isin(["STATE TOTAL", "NATIONAL TOTAL"])].copy()
df_MDcare_2018Bc = df_mdcare_2018Bc[["STATE", "CTYNAME", "MDCARE_CNT_2018"]]

In [197]:
df_MDcare_2018Bs.head()

Unnamed: 0,STATE,MDCARE_CNT_2018
1,AK,86462
32,AL,985296
101,AR,602253
178,AZ,1199206
195,CA,5608325


In [198]:
df_MDcare_2018Bc.head()

Unnamed: 0,STATE,CTYNAME,MDCARE_CNT_2018
2,AK,Aleutians East,117
3,AK,Aleutians West,135
4,AK,Anchorage,32227
5,AK,Bethel,1224
6,AK,Bristol Bay,116


In [206]:
df_POPM1_2018Bs = pd.merge(df_POP_alldata_2018Bs, df_MDcaid_2018Bs, on="STNAME")
df_POPM2_2018Bs = pd.merge(df_POPM1_2018Bs, df_MDcare_2018Bs, on="STATE")
df_POPMM_2018Bs = df_POPM2_2018Bs[["STATE", "STNAME", "POPESTIMATE2018", "MDCAID_CNT_2018BS", "MDCARE_CNT_2018"]].copy()
# Choose "TOTAL MEDICAID AND CHIP ENROLLMENT" as medicaid total
df_POPMM_2018Bs["MDCAID_RATE_2018BS"] = df_POPMM_2018Bs["MDCAID_CNT_2018BS"] / df_POPMM_2018Bs["POPESTIMATE2018"]
df_POPMM_2018Bs["MDCARE_RATE_2018BS"] = df_POPMM_2018Bs["MDCARE_CNT_2018"] / df_POPMM_2018Bs["POPESTIMATE2018"]
df_POPMM_2018Bs["PRIVATE_RATE_2018BS"] = 1 - (df_POPMM_2018Bs["MDCAID_RATE_2018BS"] + df_POPMM_2018Bs["MDCARE_RATE_2018BS"])
df_POPMM_2018Bs = df_POPMM_2018Bs[["STATE", "STNAME", "MDCAID_RATE_2018BS", "MDCARE_RATE_2018BS", "PRIVATE_RATE_2018BS"]]

Unnamed: 0,STATE,STNAME,MDCAID_RATE_2018BS,MDCARE_RATE_2018BS,PRIVATE_RATE_2018BS
0,AL,Alabama,0.186581,0.20158,0.611839
1,AK,Alaska,0.287362,0.117246,0.595391
2,AZ,Arizona,0.23711,0.167215,0.595675
3,AR,Arkansas,0.282264,0.19983,0.517906
4,CA,California,0.301531,0.141778,0.556691


In [210]:
df_POPMM_2018Bs.sort_values(by="MDCAID_RATE_2018BS")

Unnamed: 0,STATE,STNAME,MDCAID_RATE_2018BS,MDCARE_RATE_2018BS,PRIVATE_RATE_2018BS
44,UT,Utah,0.091235,0.112805,0.79596
50,WY,Wyoming,0.100596,0.174673,0.724731
34,ND,North Dakota,0.119819,0.159185,0.720995
46,VA,Virginia,0.123661,0.159901,0.716437
41,SD,South Dakota,0.125532,0.184201,0.690266
27,NE,Nebraska,0.128292,0.165617,0.706091
16,KS,Kansas,0.133792,0.170612,0.695596
29,NH,New Hampshire,0.135998,0.196414,0.667588
25,MO,Missouri,0.148749,0.187639,0.663612
43,TX,Texas,0.150117,0.131886,0.717996


### Get to know our PFS data

In [219]:
# Indiana's Medicaid Physician Fee Schedule
df_IN_mdcaid_OPFS_2018.head()

Unnamed: 0,Proc Code,Description,IPO CODE,PA,Cov,Pricing,HAF Exempt?,Fee Sched Amt,Manual Method,Price Effective,ASC
0,10004,Fna bx w/o img gdn ea addl,No,No,Yes,NONE,No,,,2019-01-01 00:00:00,
1,10005,Fna bx w/us gdn 1st les,No,No,Yes,PC,No,579.34,,2019-01-01 00:00:00,
2,10006,Fna bx w/us gdn ea addl,No,No,Yes,NONE,No,,,2019-01-01 00:00:00,
3,10007,Fna bx w/fluor gdn 1st les,No,No,Yes,PC,No,579.34,,2019-01-01 00:00:00,
4,10008,Fna bx w/fluor gdn ea addl,No,No,Yes,NONE,No,,,2019-01-01 00:00:00,


In [218]:
df_IN_mdcaid_OPFS_2018.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16534 entries, 0 to 16533
Data columns (total 11 columns):
Proc Code          16534 non-null object
Description        16534 non-null object
IPO CODE           16534 non-null object
PA                 16530 non-null object
Cov                16534 non-null object
Pricing            12615 non-null object
HAF Exempt?        12821 non-null object
Fee Sched Amt      3535 non-null object
Manual Method      397 non-null object
Price Effective    12822 non-null object
ASC                3988 non-null object
dtypes: object(11)
memory usage: 1.4+ MB


**Granularity**
1. Each row stands for a fee schedule of a procedure service noted by procedure code/HCPCT code
2. To map the temporality of our research timeline, we will only include procedure code with its price which is effective in year 2018.
3. As we can see there are missing values in column "Fee Sched Amt", we handled it by dropping those rows; Meanwhile, we can convert the type of fee amount from object to float. 

In [221]:
df_IN_m1OPFS_2018 = df_IN_mdcaid_OPFS_2018[pd.to_numeric(df_IN_mdcaid_OPFS_2018["Fee Sched Amt"], errors='coerce').notnull()].copy()
df_IN_m1OPFS_2018.rename(columns={"Proc Code":"PROC_CODE", "Fee Sched Amt":"MDCAID_PFS_AMT"}, inplace=True)
df_IN_m1OPFS_2018["Price Effective"] = df_IN_m1OPFS_2018["Price Effective"].apply(pd.to_datetime)
df_IN_m1OPFS_2018 = df_IN_m1OPFS_2018[df_IN_m1OPFS_2018["Price Effective"] < pd.to_datetime("1/1/2019")]
df_IN_m1OPFS_2018["MDCAID_PFS_AMT"] = df_IN_m1OPFS_2018["MDCAID_PFS_AMT"].astype(float).round(2)
df_IN_m1OPFS_2018 = df_IN_m1OPFS_2018[["PROC_CODE", "Description", "MDCAID_PFS_AMT"]]
df_IN_m1OPFS_2018.head()

Unnamed: 0,PROC_CODE,Description,MDCAID_PFS_AMT
11,10030,GUIDE CATHET FLUID DRAINAGE,539.11
12,10035,PERQ DEV SOFT TISS 1ST IMAG,480.64
370,19081,BX BREAST 1ST LESION STRTCTC,702.08
372,19083,BX BREAST 1ST LESION US IMAG,702.08
374,19085,BX BREAST 1ST LESION MR IMAG,702.08


In [222]:
df_IN_mdcare_pfs_2018.head()

Unnamed: 0,HCPCS CODE,SHORT DESCRIPTION,NON-FACILITY PRICE
0,A4890,Repair/maint cont hemo equip,$0.00
1,D0150,Comprehensve oral evaluation,$0.00
2,D0240,Intraoral occlusal film,$0.00
3,D0250,Extraoral 2d project image,$0.00
4,D0251,Extraoral posterior image,$0.00


**Granularity**
1. Each row stands for a fee schedule of a procedure service noted by procedure code/HCPCT code
2. As we can see the values are string type and has non numeric characters in it. We handled it by replacing the non numeric characters with space; Meanwhile, we can convert the type of fee amount from object to float. 

In [224]:
#Medicare Searchable CPT Pricing info for certain CPT
df_IN_m2pfs_2018 = df_IN_mdcare_pfs_2018.rename(columns={"HCPCS CODE":"PROC_CODE", "NON-FACILITY PRICE":"MDCARE_PFS_AMT"})
#Handle $12,234.56 char formatted cash amount to float type
df_IN_m2pfs_2018["MDCARE_PFS_AMT"] = df_IN_m2pfs_2018["MDCARE_PFS_AMT"].str.replace(r',', '')
df_IN_m2pfs_2018["MDCARE_PFS_AMT"] = df_IN_m2pfs_2018["MDCARE_PFS_AMT"].str.replace(r'[^-+\d.]', '').astype(float)
df_IN_m2pfs_2018.tail()

Unnamed: 0,PROC_CODE,SHORT DESCRIPTION,MDCARE_PFS_AMT
9189,99494,1st/sbsq psyc collab care,62.62
9190,99495,Trans care mgmt 14 day disch,157.13
9191,99496,Trans care mgmt 7 day disch,222.37
9192,99497,Advncd care plan 30 min,81.5
9193,99498,Advncd care plan addl 30 min,72.14


In [116]:
df_IN_MMPFS_2018 = pd.merge(df_IN_m1OPFS_2018, df_IN_m2pfs_2018, on="PROC_CODE")
df_IN_PFS = df_IN_PFS[["PROC CODE", "DESCRIPTION", "MEDICAID_PFS_AMT", "MEDICARE_PFS_AMT"]]
df_IN_PFS["ESTIMATED_PRIVATE_PFS_AMT"] = df_IN_PFS["MEDICARE_PFS_AMT"]
df_IN_PFS.head()

Unnamed: 0,PROC CODE,DESCRIPTION,MEDICAID_PFS_AMT,MEDICARE_PFS_AMT,ESTIMATED_PRIVATE_PFS_AMT
0,10030,GUIDE CATHET FLUID DRAINAGE,539.11,533.82,533.82
1,10035,PERQ DEV SOFT TISS 1ST IMAG,480.64,488.31,488.31
2,19081,BX BREAST 1ST LESION STRTCTC,702.08,652.03,652.03
3,19083,BX BREAST 1ST LESION US IMAG,702.08,634.19,634.19
4,19085,BX BREAST 1ST LESION MR IMAG,702.08,948.13,948.13


In [118]:
IN_medicaidPrcent = df_popMM_2018Bs.iloc[14, 2]
# print(type(IN_medicaidPrcent))
IN_medicarePrcent = df_popMM_2018Bs.iloc[14, 3]
# print(type(IN_medicarePrcent))
IN_privatePrcent = df_popMM_2018Bs.iloc[14, 4]
# print(type(IN_privatePrcent))

df_IN_PFS["PFS_AMT_BY_STATE"] = df_IN_PFS["MEDICAID_PFS_AMT"]*IN_medicaidPrcent + \
    df_IN_PFS["MEDICARE_PFS_AMT"]*IN_medicarePrcent + \
    df_IN_PFS["ESTIMATED_PRIVATE_PFS_AMT"]*IN_privatePrcent
df_IN_PFS["PFS_AMT_BY_STATE"] = df_IN_PFS["PFS_AMT_BY_STATE"].round(2)
df_IN_PFS.head()

Unnamed: 0,PROC CODE,DESCRIPTION,MEDICAID_PFS_AMT,MEDICARE_PFS_AMT,ESTIMATED_PRIVATE_PFS_AMT,PFS_AMT_BY_STATE
0,10030,GUIDE CATHET FLUID DRAINAGE,539.11,533.82,533.82,534.97
1,10035,PERQ DEV SOFT TISS 1ST IMAG,480.64,488.31,488.31,486.65
2,19081,BX BREAST 1ST LESION STRTCTC,702.08,652.03,652.03,662.88
3,19083,BX BREAST 1ST LESION US IMAG,702.08,634.19,634.19,648.91
4,19085,BX BREAST 1ST LESION MR IMAG,702.08,948.13,948.13,894.78


## 4 Conclusion
We have got a result dataframe of # of state-level population, # of medicaid enrollment, # of medicare beneficiaries. By utilizing these statistic data, we compute % of "STATE_MEDICAID_PRCENT" = medicaid enrollment / state-level population, % of "STATE_MEDICARE_PRCENT" = # of medicare beneficiaries / state-level population. 

Suppose % of private provider customers = 1 - % of "STATE_MEDICAID_PRCENT" - % of "STATE_MEDICARE_PRCENT

[Potential Provider list](https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/Provider-of-Services)

### Combination of both Medicaid and medicare price info

### Compute the anchor price

### Connecticut medicaid physician fee schedule
1. Clinic - Ambulatory Surgical Center CSV file manually handled the tailed specifications after downloaded.
2. Physician Office and Outpt Services CSV file

In [28]:
df_CT_mdcaid_ASCPFS.rename(columns = lambda x:x.strip().upper(), inplace=True)
df_CT_mdcaid_ASCPFS = df_CT_mdcaid_ASCPFS[pd.to_numeric(df_CT_mdcaid_ASCPFS["MAX FEE"], errors='coerce').notnull()]
df_CT_mdcaid_ASCPFS["EFFECTIVE DATE"] = df_CT_mdcaid_ASCPFS["EFFECTIVE DATE"].apply(pd.to_datetime)
df_CT_mdcaid_ASCPFS["PROCEDURE CODE"] = df_CT_mdcaid_ASCPFS["PROCEDURE CODE"].apply(str)
df_CT_mdcaid_ASCPFS = df_CT_mdcaid_ASCPFS[df_CT_mdcaid_ASCPFS["EFFECTIVE DATE"] < pd.to_datetime("1/1/2019")]
df_CT_mdcaid_ASCPFS["MAX FEE"] = df_CT_mdcaid_ASCPFS["MAX FEE"].astype(float)

In [29]:
df_CT_mdcaid_ASCPFS.rename(columns={"PROCEDURE CODE":"PROC CODE"}, inplace=True)

In [30]:
df_CT_mdcaid_ASCPFS.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2602 entries, 0 to 2625
Data columns (total 9 columns):
PROC CODE           2602 non-null object
PROC DESCRIPTION    2602 non-null object
MOD1                2602 non-null object
MOD1 DESC           2602 non-null object
RATE TYPE           2602 non-null object
MAX FEE             2602 non-null float64
EFFECTIVE DATE      2602 non-null datetime64[ns]
END DATE            2602 non-null object
PA                  2602 non-null object
dtypes: datetime64[ns](1), float64(1), object(7)
memory usage: 203.3+ KB


In [31]:
df_CT_mdcaid_ASCPFS.head()

Unnamed: 0,PROC CODE,PROC DESCRIPTION,MOD1,MOD1 DESC,RATE TYPE,MAX FEE,EFFECTIVE DATE,END DATE,PA
0,10121,Remove foreign body,,,ASC,446.0,2008-10-01,12/31/99,
1,10180,Complex drainage wound,,,ASC,446.0,2008-10-01,12/31/99,
2,11010,Debride skin at fx site,,,ASC,251.52,2008-10-01,12/31/99,
3,11011,Debride skin musc at fx site,,,ASC,251.52,2008-10-01,12/31/99,
4,11012,Deb skin bone at fx site,,,ASC,251.52,2008-10-01,12/31/99,


In [32]:
df_CT_mdcaid_ASCPFS.tail()

Unnamed: 0,PROC CODE,PROC DESCRIPTION,MOD1,MOD1 DESC,RATE TYPE,MAX FEE,EFFECTIVE DATE,END DATE,PA
2621,69915,Incise inner ear nerve,,,ASC,995.0,2008-10-01,12/31/99,
2622,69930,Implant cochlear device,,,ASC,995.0,2008-10-01,12/31/99,
2623,G0105,Colorectal scrn; hi risk ind,,,ASC,415.75,2008-10-01,12/31/99,
2624,G0121,Colon ca scrn not hi rsk ind,,,ASC,415.75,2008-10-01,12/31/99,
2625,G0260,Inj for sacroiliac jt anesth,,,ASC,333.0,2008-10-01,12/31/99,


#### Search 10121 to G0260 in Medicare Fee Search Tool in Connecticut locality.
<img src="images/CT_medicare_pfs.png">

In [33]:
#Medicare Searchable CPT Pricing info for certain CPT
df_CT_mdcare_pfs = pd.read_csv("data/Connecticut/FY2018_CT_Medicare_PFSExport.csv", header=0, 
                                       index_col=None, usecols=[0, 2, 5, 6, 7, 8])
df_CT_mdcare_pfs.rename(columns = lambda x:x.strip().upper(), inplace=True)
df_CT_mdcare_pfs.rename(columns={"HCPCS CODE":"PROC CODE"}, inplace=True)
df_CT_mdcare_pfs.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9048 entries, 0 to 9047
Data columns (total 6 columns):
HCPCS CODE                      9048 non-null object
SHORT DESCRIPTION               9048 non-null object
NON-FACILITY PRICE              9048 non-null object
FACILITY PRICE                  9048 non-null object
NON-FACILITY LIMITING CHARGE    5048 non-null object
FACILITY LIMITING CHARGE        6977 non-null object
dtypes: object(6)
memory usage: 424.2+ KB


In [34]:
#Handle $12,234.56 char formatted cash amount to float type
for col in ("NON-FACILITY PRICE", "FACILITY PRICE"):
    df_CT_mdcare_pfs[col] = df_CT_mdcare_pfs[col].str.replace(r',', '')
df_CT_mdcare_pfs["NON-FACILITY PRICE"] = (df_CT_mdcare_pfs["NON-FACILITY PRICE"]
                                                  .str.replace(r'[^-+\d.]', '').astype(float))
df_CT_mdcare_pfs["FACILITY PRICE"] = (df_CT_mdcare_pfs["FACILITY PRICE"]
                                                  .str.replace(r'[^-+\d.]', '').astype(float))
df_CT_mdcare_pfs.head() # NON-FACILITY PRICE VS. FACILITY PRICE

In [37]:
df_CT_PFS = pd.merge(df_CT_mdcaid_ASCPFS, df_CT_mdcare_pfs, on="PROC CODE")
df_CT_PFS = df_CT_PFS[["PROC CODE", "PROC DESCRIPTION", "MAX FEE", "NON-FACILITY PRICE"]]
df_CT_PFS.rename(columns={"MAX FEE":"MEDICAID_PFS_AMT", "NON-FACILITY PRICE":"MEDICARE_PFS_AMT"}, inplace=True)
df_CT_PFS["ESTIMATED_PRIVATE_PFS_AMT"] = df_CT_PFS["MEDICARE_PFS_AMT"]
df_CT_PFS.head()

Unnamed: 0,PROC CODE,PROC DESCRIPTION,MEDICAID_PFS_AMT,MEDICARE_PFS_AMT,ESTIMATED_PRIVATE_PFS_AMT
0,10121,Remove foreign body,446.0,306.48,306.48
1,10180,Complex drainage wound,446.0,277.66,277.66
2,11010,Debride skin at fx site,251.52,568.1,568.1
3,11011,Debride skin musc at fx site,251.52,608.45,608.45
4,11012,Deb skin bone at fx site,251.52,795.03,795.03


In [38]:
#compute the state level PFS price
CT_medicaidPrcent = df_popMM_2018Bs.iloc[6, 2]
CT_medicarePrcent = df_popMM_2018Bs.iloc[6, 3]
CT_privatePrcent = df_popMM_2018Bs.iloc[6, 4]
#
df_CT_PFS["PFS_AMT_BY_STATE"] = df_CT_PFS["MEDICAID_PFS_AMT"]*CT_medicaidPrcent + \
    df_CT_PFS["MEDICARE_PFS_AMT"]*CT_medicarePrcent + \
    df_CT_PFS["ESTIMATED_PRIVATE_PFS_AMT"]*CT_privatePrcent
df_CT_PFS["PFS_AMT_BY_STATE"] = df_CT_PFS["PFS_AMT_BY_STATE"].round(2)

In [39]:
df_CT_PFS.head()

Unnamed: 0,PROC CODE,PROC DESCRIPTION,MEDICAID_PFS_AMT,MEDICARE_PFS_AMT,ESTIMATED_PRIVATE_PFS_AMT,PFS_AMT_BY_STATE
0,10121,Remove foreign body,446.0,306.48,306.48,336.73
1,10180,Complex drainage wound,446.0,277.66,277.66,314.16
2,11010,Debride skin at fx site,251.52,568.1,568.1,499.46
3,11011,Debride skin musc at fx site,251.52,608.45,608.45,531.06
4,11012,Deb skin bone at fx site,251.52,795.03,795.03,677.19


# 3 - County Level
As we have already collected the census population, medicare enrollment population by county level. However, the county level enrollment of medicaid is not published in the CMS(Centers for Medicare and Medicaid Service). In order to get the county level statistics, we have to dive into the local government health administration website. For generalization, we just take two states (Indiana and Connecticut) as our research data.
>- Indiana
- Connecticut
- New York

## Indiana County Level
To get the other county level data - enrollment population of medicaid, we searched the local government [Indiana medicaid monthly enrollment reports](https://www.in.gov/fssa/ompp/4881.htm).
- Population Data, We collect from Inda
    - Census.org population county level: df_IN_popBc2018
    - [Medicaid county level](https://www.in.gov/fssa/ompp/4881.htm)
    - Medicare county level: df_mdcare_2018
- Price Data

In [40]:
# Indiana county level by state
df_IN_mdcaid_2018Bc = pd.read_excel("data/Indiana/FY2018_12_IN_Medicaid_Enrollment_By_County.xlsx", 
                                       sheet_name="County", header=8, nrows=95, index_col=None)
df_IN_mdcaid_2018Bc.rename(columns = lambda x:x.strip().upper(), inplace=True)
df_IN_mdcaid_2018Bc.rename(columns = {"COUNTY TOTAL":"MDCAID_CTY_TOTAL"}, inplace=True)
df_IN_mdcaid_2018Bc.head()

Unnamed: 0,ANTHEM,CARESOURCE,MDWISE,MHS,TOTAL,ANTHEM.1,MHS.1,TOTAL.1,ANTHEM.2,CARESOURCE.1,MDWISE.1,MHS.2,UNASSIGNED1,TOTAL.2,TOTAL.3,MDCAID_CTY_TOTAL
01-ADAMS,555.0,237.0,545.0,578.0,1915.0,180.0,102.0,282.0,491.0,132.0,290.0,295.0,18.0,1226.0,1127.0,4550
02-ALLEN,13119.0,3622.0,14646.0,5465.0,36852.0,3471.0,2020.0,5491.0,8157.0,2441.0,6782.0,3161.0,1239.0,21780.0,16594.0,80717
03-BARTHOLOMEW,894.0,394.0,1801.0,2663.0,5752.0,350.0,404.0,754.0,1218.0,314.0,989.0,1116.0,309.0,3946.0,3256.0,13708
04-BENTON,155.0,108.0,501.0,188.0,952.0,52.0,53.0,105.0,161.0,64.0,246.0,114.0,15.0,600.0,447.0,2104
05-BLACKFORD,315.0,123.0,563.0,255.0,1256.0,96.0,86.0,182.0,335.0,80.0,392.0,128.0,3.0,938.0,733.0,3109


In [41]:
df_IN_mdcaid_2018Bc.reset_index(inplace=True)
df_IN_mdcaid_2018Bc.rename(columns={"index":"COUNTY_ID_NAME"}, inplace=True)
df_IN_mdcaid_2018Bc.head()

Unnamed: 0,COUNTY_ID_NAME,ANTHEM,CARESOURCE,MDWISE,MHS,TOTAL,ANTHEM.1,MHS.1,TOTAL.1,ANTHEM.2,CARESOURCE.1,MDWISE.1,MHS.2,UNASSIGNED1,TOTAL.2,TOTAL.3,MDCAID_CTY_TOTAL
0,01-ADAMS,555.0,237.0,545.0,578.0,1915.0,180.0,102.0,282.0,491.0,132.0,290.0,295.0,18.0,1226.0,1127.0,4550
1,02-ALLEN,13119.0,3622.0,14646.0,5465.0,36852.0,3471.0,2020.0,5491.0,8157.0,2441.0,6782.0,3161.0,1239.0,21780.0,16594.0,80717
2,03-BARTHOLOMEW,894.0,394.0,1801.0,2663.0,5752.0,350.0,404.0,754.0,1218.0,314.0,989.0,1116.0,309.0,3946.0,3256.0,13708
3,04-BENTON,155.0,108.0,501.0,188.0,952.0,52.0,53.0,105.0,161.0,64.0,246.0,114.0,15.0,600.0,447.0,2104
4,05-BLACKFORD,315.0,123.0,563.0,255.0,1256.0,96.0,86.0,182.0,335.0,80.0,392.0,128.0,3.0,938.0,733.0,3109


In [42]:
df_IN_mdcaid_2018Bc = df_IN_mdcaid_2018Bc[["COUNTY_ID_NAME", "MDCAID_CTY_TOTAL"]]
df_IN_mdcaid_2018Bc[["COUNTY_ID", "CTYNAME"]] = df_IN_mdcaid_2018Bc.COUNTY_ID_NAME.str.split("-", expand=True)
df_IN_mdcaid_2018Bc.CTYNAME = df_IN_mdcaid_2018Bc.CTYNAME.apply(lambda x: x.strip() + " COUNTY")

In [43]:
df_IN_mdcaid_2018Bc.head()

Unnamed: 0,COUNTY_ID_NAME,MDCAID_CTY_TOTAL,COUNTY_ID,CTYNAME
0,01-ADAMS,4550,1,ADAMS COUNTY
1,02-ALLEN,80717,2,ALLEN COUNTY
2,03-BARTHOLOMEW,13708,3,BARTHOLOMEW COUNTY
3,04-BENTON,2104,4,BENTON COUNTY
4,05-BLACKFORD,3109,5,BLACKFORD COUNTY


In [44]:
df_IN_mdcaid_2018Bc.count()

COUNTY_ID_NAME      95
MDCAID_CTY_TOTAL    95
COUNTY_ID           95
CTYNAME             95
dtype: int64

#### Indiana's county level population from census statistics

In [45]:
#df_pop_by_county_2018
df_IN_popBc2018 = df_pop_total_2018Bc[df_pop_total_2018Bc["STNAME"] == "Indiana"]
df_IN_popBc2018 = df_IN_popBc2018[["STNAME", "CTYNAME", "TOT_POP"]]
df_IN_popBc2018.head()

Unnamed: 0,STNAME,CTYNAME,TOT_POP
145863,Indiana,ADAMS COUNTY,35636
146072,Indiana,ALLEN COUNTY,375351
146281,Indiana,BARTHOLOMEW COUNTY,82753
146490,Indiana,BENTON COUNTY,8653
146699,Indiana,BLACKFORD COUNTY,11930


In [46]:
df_IN_popBc2018.count()

STNAME     92
CTYNAME    92
TOT_POP    92
dtype: int64

In [47]:
df_mdcare_2018.head()

Unnamed: 0,STABBR,CTYNAME,FIPS_CODE,MDCARE_TOTAL_2018,FFS BENEFICIARIES,MA BENEFICIARIES
0,Na,NATIONAL TOTAL COUNTY,,56031636,33499472,22532164
1,AK,STATE TOTAL COUNTY,,86462,84714,1748
2,AK,ALEUTIANS EAST COUNTY,2013.0,117,117,0
3,AK,ALEUTIANS WEST COUNTY,2016.0,135,135,0
4,AK,ANCHORAGE COUNTY,2020.0,32227,31503,724


In [48]:
df_IN_mdcare_2018Bc = df_mdcare_2018[df_mdcare_2018["STABBR"] == "IN"]
df_IN_mdcare_2018Bc = df_IN_mdcare_2018Bc[df_IN_mdcare_2018Bc["CTYNAME"] != "STATE TOTAL COUNTY"]
#STATE TOTAL COUNTY
df_IN_mdcare_2018Bc = df_IN_mdcare_2018Bc[["STABBR", "CTYNAME", "FIPS_CODE", "MDCARE_TOTAL_2018"]]
df_IN_mdcare_2018Bc.head()

Unnamed: 0,STABBR,CTYNAME,FIPS_CODE,MDCARE_TOTAL_2018
826,IN,ADAMS COUNTY,18001,5785
827,IN,ALLEN COUNTY,18003,62136
828,IN,BARTHOLOMEW COUNTY,18005,14761
829,IN,BENTON COUNTY,18007,1777
830,IN,BLACKFORD COUNTY,18009,3042


In [49]:
df_IN_mdcare_2018Bc.count()

STABBR               92
CTYNAME              92
FIPS_CODE            92
MDCARE_TOTAL_2018    92
dtype: int64

In [50]:
#df_pop_mdcaid_2018Bs = pd.merge(df_pop_total_2018Bs, df_mdcaid_2018Bs, on="STNAME")
df_IN_temp_2018Bc = pd.merge(df_IN_popBc2018, df_IN_mdcaid_2018Bc, on="CTYNAME")
df_IN_popMM_2018Bc = pd.merge(df_IN_temp_2018Bc, df_IN_mdcare_2018Bc, on="CTYNAME")

In [51]:
df_IN_popMM_2018Bc = df_IN_popMM_2018Bc[["STABBR", "STNAME", "CTYNAME", "TOT_POP", 
                                         "MDCAID_CTY_TOTAL", "MDCARE_TOTAL_2018"]]

In [52]:
# Insert new column "COUNTY_MDCAID_PT", "COUNTY_MDCARE_PT", "COUNTY_PRIVATE_PT"
df_IN_popMM_2018Bc["COUNTY_MDCAID_PT"] = df_IN_popMM_2018Bc["MDCAID_CTY_TOTAL"] / df_IN_popMM_2018Bc["TOT_POP"]
df_IN_popMM_2018Bc["COUNTY_MDCARE_PT"] = df_IN_popMM_2018Bc["MDCARE_TOTAL_2018"] / df_IN_popMM_2018Bc["TOT_POP"]
# Assume 
df_IN_popMM_2018Bc["COUNTY_PRIVATE_PT"] = 1 - (df_IN_popMM_2018Bc["COUNTY_MDCAID_PT"] 
                                               + df_IN_popMM_2018Bc["COUNTY_MDCARE_PT"])
#
df_IN_popMM_2018Bc.head()

Unnamed: 0,STABBR,STNAME,CTYNAME,TOT_POP,MDCAID_CTY_TOTAL,MDCARE_TOTAL_2018,COUNTY_MDCAID_PT,COUNTY_MDCARE_PT,COUNTY_PRIVATE_PT
0,IN,Indiana,ADAMS COUNTY,35636,4550,5785,0.12768,0.162336,0.709984
1,IN,Indiana,ALLEN COUNTY,375351,80717,62136,0.215044,0.165541,0.619415
2,IN,Indiana,BARTHOLOMEW COUNTY,82753,13708,14761,0.16565,0.178374,0.655976
3,IN,Indiana,BENTON COUNTY,8653,2104,1777,0.243153,0.205362,0.551485
4,IN,Indiana,BLACKFORD COUNTY,11930,3109,3042,0.260604,0.254987,0.484409


In [53]:
df_IN_popMM_2018Bc.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 92 entries, 0 to 91
Data columns (total 9 columns):
STABBR               92 non-null object
STNAME               92 non-null object
CTYNAME              92 non-null object
TOT_POP              92 non-null int64
MDCAID_CTY_TOTAL     92 non-null int64
MDCARE_TOTAL_2018    92 non-null int64
COUNTY_MDCAID_PT     92 non-null float64
COUNTY_MDCARE_PT     92 non-null float64
COUNTY_PRIVATE_PT    92 non-null float64
dtypes: float64(3), int64(3), object(3)
memory usage: 7.2+ KB


### Physician Fee Schedule Amount for county level
"PFS_AMT_BY_STATE"

- df_IN_PFS
- df_CT_PFS

In [54]:
# df_IN_PFS, make one-hot matrix table for county-level price
IN_counties = ["PFS_AMT_BY_" + "_".join(county.split(" ")) for county in df_IN_popMM_2018Bc["CTYNAME"]]
# counties = [county for county in df_IN_popMM_2018Bc["CTYNAME"]]
# initialize all values as 0.0
df_IN_PFS[IN_counties] = pd.DataFrame([[0.0]*len(IN_counties)], index=df_IN_PFS.index)
df_IN_PFS.head()

Unnamed: 0,PROC CODE,DESCRIPTION,MEDICAID_PFS_AMT,MEDICARE_PFS_AMT,ESTIMATED_PRIVATE_PFS_AMT,PFS_AMT_BY_STATE,PFS_AMT_BY_ADAMS_COUNTY,PFS_AMT_BY_ALLEN_COUNTY,PFS_AMT_BY_BARTHOLOMEW_COUNTY,PFS_AMT_BY_BENTON_COUNTY,...,PFS_AMT_BY_VERMILLION_COUNTY,PFS_AMT_BY_VIGO_COUNTY,PFS_AMT_BY_WABASH_COUNTY,PFS_AMT_BY_WARREN_COUNTY,PFS_AMT_BY_WARRICK_COUNTY,PFS_AMT_BY_WASHINGTON_COUNTY,PFS_AMT_BY_WAYNE_COUNTY,PFS_AMT_BY_WELLS_COUNTY,PFS_AMT_BY_WHITE_COUNTY,PFS_AMT_BY_WHITLEY_COUNTY
0,10030,GUIDE CATHET FLUID DRAINAGE,539.11,533.82,533.82,534.97,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,10035,PERQ DEV SOFT TISS 1ST IMAG,480.64,488.31,488.31,486.65,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,19081,BX BREAST 1ST LESION STRTCTC,702.08,652.03,652.03,662.88,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,19083,BX BREAST 1ST LESION US IMAG,702.08,634.19,634.19,648.91,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,19085,BX BREAST 1ST LESION MR IMAG,702.08,948.13,948.13,894.78,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [55]:
df_IN_PFS.head()

Unnamed: 0,PROC CODE,DESCRIPTION,MEDICAID_PFS_AMT,MEDICARE_PFS_AMT,ESTIMATED_PRIVATE_PFS_AMT,PFS_AMT_BY_STATE,PFS_AMT_BY_ADAMS_COUNTY,PFS_AMT_BY_ALLEN_COUNTY,PFS_AMT_BY_BARTHOLOMEW_COUNTY,PFS_AMT_BY_BENTON_COUNTY,...,PFS_AMT_BY_VERMILLION_COUNTY,PFS_AMT_BY_VIGO_COUNTY,PFS_AMT_BY_WABASH_COUNTY,PFS_AMT_BY_WARREN_COUNTY,PFS_AMT_BY_WARRICK_COUNTY,PFS_AMT_BY_WASHINGTON_COUNTY,PFS_AMT_BY_WAYNE_COUNTY,PFS_AMT_BY_WELLS_COUNTY,PFS_AMT_BY_WHITE_COUNTY,PFS_AMT_BY_WHITLEY_COUNTY
0,10030,GUIDE CATHET FLUID DRAINAGE,539.11,533.82,533.82,534.97,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,10035,PERQ DEV SOFT TISS 1ST IMAG,480.64,488.31,488.31,486.65,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,19081,BX BREAST 1ST LESION STRTCTC,702.08,652.03,652.03,662.88,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,19083,BX BREAST 1ST LESION US IMAG,702.08,634.19,634.19,648.91,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,19085,BX BREAST 1ST LESION MR IMAG,702.08,948.13,948.13,894.78,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [56]:
# start computing the price based on population rate in each county
for i, county in enumerate(IN_counties):
    # find out the conresponding row index in df_IN_popMM_2018Bc with i
    # get medicaid, medicare, private insurance population rate
    IN_mdcaid_rate = df_IN_popMM_2018Bc.iloc[i, 6]
    IN_mdcare_rate = df_IN_popMM_2018Bc.iloc[i, 7]
    IN_private_rate = df_IN_popMM_2018Bc.iloc[i, 8]
    # use vectorization is better than loops (Time complexity + efficiency)
    df_IN_PFS[county] = (df_IN_PFS["MEDICAID_PFS_AMT"]*IN_mdcaid_rate + \
                        df_IN_PFS["MEDICARE_PFS_AMT"]*IN_mdcare_rate + \
                        df_IN_PFS["MEDICARE_PFS_AMT"]*IN_private_rate).round(2)
df_IN_PFS.head()

Unnamed: 0,PROC CODE,DESCRIPTION,MEDICAID_PFS_AMT,MEDICARE_PFS_AMT,ESTIMATED_PRIVATE_PFS_AMT,PFS_AMT_BY_STATE,PFS_AMT_BY_ADAMS_COUNTY,PFS_AMT_BY_ALLEN_COUNTY,PFS_AMT_BY_BARTHOLOMEW_COUNTY,PFS_AMT_BY_BENTON_COUNTY,...,PFS_AMT_BY_VERMILLION_COUNTY,PFS_AMT_BY_VIGO_COUNTY,PFS_AMT_BY_WABASH_COUNTY,PFS_AMT_BY_WARREN_COUNTY,PFS_AMT_BY_WARRICK_COUNTY,PFS_AMT_BY_WASHINGTON_COUNTY,PFS_AMT_BY_WAYNE_COUNTY,PFS_AMT_BY_WELLS_COUNTY,PFS_AMT_BY_WHITE_COUNTY,PFS_AMT_BY_WHITLEY_COUNTY
0,10030,GUIDE CATHET FLUID DRAINAGE,539.11,533.82,533.82,534.97,534.5,534.96,534.7,535.11,...,535.15,535.17,534.91,534.74,534.52,535.03,535.23,534.61,534.87,534.47
1,10035,PERQ DEV SOFT TISS 1ST IMAG,480.64,488.31,488.31,486.65,487.33,486.66,487.04,486.45,...,486.38,486.35,486.72,486.97,487.3,486.55,486.27,487.17,486.79,487.37
2,19081,BX BREAST 1ST LESION STRTCTC,702.08,652.03,652.03,662.88,658.42,662.79,660.32,664.2,...,664.61,664.81,662.39,660.77,658.63,663.49,665.37,659.49,661.96,658.16
3,19083,BX BREAST 1ST LESION US IMAG,702.08,634.19,634.19,648.91,642.86,648.79,645.44,650.7,...,651.26,651.52,648.24,646.05,643.14,649.73,652.28,644.31,647.67,642.5
4,19085,BX BREAST 1ST LESION MR IMAG,702.08,948.13,948.13,894.78,916.71,895.22,907.37,888.3,...,886.28,885.31,897.22,905.16,915.7,891.81,882.56,911.47,899.29,918.01


In [77]:
# df_IN_PFS.to_csv(r"output/Indiana_PFS.csv", index=False, header=True)
IN_filename = 'Indiana_PFS.csv'
filepath = './output'
if not os.path.exists(filepath):
    os.mkdir(filepath)
full_filename = os.path.join(filepath, IN_filename)    
df_IN_PFS.to_csv(full_filename)

## Connecticut County Level
To get the other county level data - enrollment population of medicaid, we searched the local government [Indiana medicaid monthly enrollment reports](https://portal.ct.gov/DSS/ITS/DSS-HealthIT/Business-Intelligence-and-DSS-HealthIT/Data-and-Program-Reports).
- Population Data, We collect from data.ct.gov
    - Census.org population county level: df_IN_popBc2018
    - [DSS Township Counts - by Program - CY 2018](https://data.ct.gov/Health-and-Human-Services/DSS-Township-Counts-by-Program-CY-2018/n5xw-nk45)
    Connecticut’s Medicaid program is called HUSKY Health, and it is broken into several categories (enrollment and cost data based on a report published in 2018).
    - [Towns and Counties List in Connecticut](https://ctstatelibrary.org/cttowns/counties)
    As state Connecticut has only township medicaid enrollment, we are going to combine the relation table of town and counties with the township medicaid enrollment to get our county level enrollment. Steps: download the html in local pc, then open it with excel. You can generate the table in the excel as in the html😀🤟
    - Medicare county level: df_mdcare_2018
- Physician Fee Schedule Data ()

In [57]:
#df_pop_by_county_2018
df_CT_popBc2018 = df_pop_total_2018Bc[df_pop_total_2018Bc["STNAME"] == "Connecticut"]
df_CT_popBc2018 = df_CT_popBc2018[["STNAME", "CTYNAME", "TOT_POP"]]
df_CT_popBc2018.head(10)

Unnamed: 0,STNAME,CTYNAME,TOT_POP
64562,Connecticut,FAIRFIELD COUNTY,943823
64771,Connecticut,HARTFORD COUNTY,892697
64980,Connecticut,LITCHFIELD COUNTY,181111
65189,Connecticut,MIDDLESEX COUNTY,162682
65398,Connecticut,NEW HAVEN COUNTY,857620
65607,Connecticut,NEW LONDON COUNTY,266784
65816,Connecticut,TOLLAND COUNTY,150921
66025,Connecticut,WINDHAM COUNTY,117027


In [58]:
# medicare enrollment by county
df_CT_mdcare_2018Bc = df_mdcare_2018[df_mdcare_2018["STABBR"] == "CT"]
df_CT_mdcare_2018Bc = df_CT_mdcare_2018Bc[df_CT_mdcare_2018Bc["CTYNAME"] != "STATE TOTAL COUNTY"]
#STATE TOTAL COUNTY
df_CT_mdcare_2018Bc = df_CT_mdcare_2018Bc[["STABBR", "CTYNAME", "FIPS_CODE", "MDCARE_TOTAL_2018"]]
df_CT_mdcare_2018Bc.head(10)

Unnamed: 0,STABBR,CTYNAME,FIPS_CODE,MDCARE_TOTAL_2018
322,CT,FAIRFIELD COUNTY,9001,140312
323,CT,HARTFORD COUNTY,9003,157942
324,CT,LITCHFIELD COUNTY,9005,38770
325,CT,MIDDLESEX COUNTY,9007,33051
326,CT,NEW HAVEN COUNTY,9009,149556
327,CT,NEW LONDON COUNTY,9011,50739
328,CT,TOLLAND COUNTY,9013,24445
330,CT,WINDHAM COUNTY,9015,21434


In [59]:
# Connecticut Towns and Counties List
# medicaid_DSS_Township_Counts_-_by_Program_-_CY_2018.csv, including program "medicaid" + "CHIP"
# Connecticut_Towns_and_Counties.xlsx
df_CT_DSS_2018Bt = pd.read_csv("data/Connecticut/FY2018_CT_Medicaid_DSS_Township_Counts.csv", 
                                   header=0, index_col=None, usecols=None)
df_CT_DSS_2018Bt.rename(columns = lambda x:x.strip().upper(), inplace=True)
df_CT_Town_County = pd.read_excel("data/Connecticut/FY2020_CT_Towns_and_Counties.xlsx",
                                 sheet_name="townslist",
                                 header=0, 
                                 index_col=None)
df_CT_Town_County.rename(columns = {"Town name":"TOWNSHIP", "County":"CTYNAME"}, inplace=True)

In [60]:
df_CT_TOT_2018Bt = df_CT_DSS_2018Bt[df_CT_DSS_2018Bt["PROGRAM"].isin(["Medicaid", "CHIP"])]
# Use .loc(label)/.iloc(int) to avoid SettingWithCopyWarning in Pandas
df_CT_TOT_2018Bt.loc[:, "COUNT"] = df_CT_TOT_2018Bt["COUNT"].apply(lambda x:x.replace(r',', ''))
df_CT_TOT_2018Bt.loc[:, "COUNT"] = df_CT_TOT_2018Bt["COUNT"].apply(lambda x:int(x.replace(r'[^-+\d.]', '')))
df_CT_mdcaid_2018Bt = df_CT_TOT_2018Bt.groupby(["TOWNSHIP"])["COUNT"].sum().reset_index()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item_labels[indexer[info_axis]]] = value


In [61]:
#MREGE
df_CT_mix = pd.merge(df_CT_mdcaid_2018Bt, df_CT_Town_County, on="TOWNSHIP")

In [62]:
df_CT_mdcaid_2018Bc = df_CT_mix.groupby(["CTYNAME"])["COUNT"].sum().reset_index()
df_CT_mdcaid_2018Bc.rename(columns={"COUNT":"MDCAID_CTY_TOTAL"}, inplace=True)
df_CT_mdcaid_2018Bc.CTYNAME = df_CT_mdcaid_2018Bc.CTYNAME.apply(lambda x:x.strip().upper() + " COUNTY")
df_CT_mdcaid_2018Bc.head(10)

Unnamed: 0,CTYNAME,MDCAID_CTY_TOTAL
0,FAIRFIELD COUNTY,239342
1,HARTFORD COUNTY,295583
2,LITCHFIELD COUNTY,44492
3,MIDDLESEX COUNTY,34245
4,NEW HAVEN COUNTY,297593
5,NEW LONDON COUNTY,78547
6,TOLLAND COUNTY,25401
7,WINDHAM COUNTY,39548


In [63]:
#df_pop_mdcaid_2018Bs = pd.merge(df_pop_total_2018Bs, df_mdcaid_2018Bs, on="STNAME")
df_CT_temp_2018Bc = pd.merge(df_CT_popBc2018, df_CT_mdcaid_2018Bc, on="CTYNAME")
df_CT_popMM_2018Bc = pd.merge(df_CT_temp_2018Bc, df_CT_mdcare_2018Bc, on="CTYNAME")
df_CT_popMM_2018Bc = df_CT_popMM_2018Bc[["STABBR", "STNAME", "CTYNAME", "TOT_POP", 
                                         "MDCAID_CTY_TOTAL", "MDCARE_TOTAL_2018"]]
# Insert new column "COUNTY_MDCAID_PT", "COUNTY_MDCARE_PT", "COUNTY_PRIVATE_PT"
df_CT_popMM_2018Bc["COUNTY_MDCAID_PT"] = df_CT_popMM_2018Bc["MDCAID_CTY_TOTAL"] / df_CT_popMM_2018Bc["TOT_POP"]
df_CT_popMM_2018Bc["COUNTY_MDCARE_PT"] = df_CT_popMM_2018Bc["MDCARE_TOTAL_2018"] / df_CT_popMM_2018Bc["TOT_POP"]
# Assume 
df_CT_popMM_2018Bc["COUNTY_PRIVATE_PT"] = 1 - (df_CT_popMM_2018Bc["COUNTY_MDCAID_PT"] 
                                               + df_CT_popMM_2018Bc["COUNTY_MDCARE_PT"])
#
df_CT_popMM_2018Bc.head(10)

Unnamed: 0,STABBR,STNAME,CTYNAME,TOT_POP,MDCAID_CTY_TOTAL,MDCARE_TOTAL_2018,COUNTY_MDCAID_PT,COUNTY_MDCARE_PT,COUNTY_PRIVATE_PT
0,CT,Connecticut,FAIRFIELD COUNTY,943823,239342,140312,0.253588,0.148663,0.597749
1,CT,Connecticut,HARTFORD COUNTY,892697,295583,157942,0.331112,0.176927,0.491961
2,CT,Connecticut,LITCHFIELD COUNTY,181111,44492,38770,0.245662,0.214068,0.540271
3,CT,Connecticut,MIDDLESEX COUNTY,162682,34245,33051,0.210503,0.203163,0.586334
4,CT,Connecticut,NEW HAVEN COUNTY,857620,297593,149556,0.346999,0.174385,0.478616
5,CT,Connecticut,NEW LONDON COUNTY,266784,78547,50739,0.294422,0.190188,0.515391
6,CT,Connecticut,TOLLAND COUNTY,150921,25401,24445,0.168307,0.161972,0.669721
7,CT,Connecticut,WINDHAM COUNTY,117027,39548,21434,0.337939,0.183154,0.478907


#### Physician Fee Schedule for Connecticut

In [64]:
# df_IN_PFS, make one-hot matrix table for county-level price
CT_counties = ["PFS_AMT_BY_" + "_".join(county.split(" ")) for county in df_CT_popMM_2018Bc["CTYNAME"]]
# initialize all values as 0.0
df_CT_PFS[CT_counties] = pd.DataFrame([[0.0]*len(CT_counties)], index=df_CT_PFS.index)
df_CT_PFS.head(10)

Unnamed: 0,PROC CODE,PROC DESCRIPTION,MEDICAID_PFS_AMT,MEDICARE_PFS_AMT,ESTIMATED_PRIVATE_PFS_AMT,PFS_AMT_BY_STATE,PFS_AMT_BY_FAIRFIELD_COUNTY,PFS_AMT_BY_HARTFORD_COUNTY,PFS_AMT_BY_LITCHFIELD_COUNTY,PFS_AMT_BY_MIDDLESEX_COUNTY,PFS_AMT_BY_NEW_HAVEN_COUNTY,PFS_AMT_BY_NEW_LONDON_COUNTY,PFS_AMT_BY_TOLLAND_COUNTY,PFS_AMT_BY_WINDHAM_COUNTY
0,10121,Remove foreign body,446.0,306.48,306.48,336.73,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,10180,Complex drainage wound,446.0,277.66,277.66,314.16,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,11010,Debride skin at fx site,251.52,568.1,568.1,499.46,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,11011,Debride skin musc at fx site,251.52,608.45,608.45,531.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,11012,Deb skin bone at fx site,251.52,795.03,795.03,677.19,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,11042,Deb subq tissue 20 sq cm/<,164.42,131.41,131.41,138.57,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,11043,Deb musc/fascia 20 sq cm/<,164.42,254.52,254.52,234.98,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,11044,Deb bone 20 sq cm/<,423.1,347.75,347.75,364.09,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,11404,Exc tr-ext b9+marg 3.1-4 cm,333.0,244.44,244.44,263.64,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,11406,Exc tr-ext b9+marg >4.0 cm,446.0,351.96,351.96,372.35,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [65]:
# start computing the price based on population rate in each county
for i, county in enumerate(CT_counties):
    # find out the conresponding row index in df_IN_popMM_2018Bc with i
    # get medicaid, medicare, private insurance population rate
    CT_mdcaid_rate = df_CT_popMM_2018Bc.iloc[i, 6]
    CT_mdcare_rate = df_CT_popMM_2018Bc.iloc[i, 7]
    CT_private_rate = df_CT_popMM_2018Bc.iloc[i, 8]
    # use vectorization is better than loops (Time complexity + efficiency)
    df_CT_PFS[county] = (df_CT_PFS["MEDICAID_PFS_AMT"]*CT_mdcaid_rate + \
                        df_CT_PFS["MEDICARE_PFS_AMT"]*CT_mdcare_rate + \
                        df_CT_PFS["MEDICARE_PFS_AMT"]*CT_private_rate).round(2)
df_CT_PFS.head(10)

Unnamed: 0,PROC CODE,PROC DESCRIPTION,MEDICAID_PFS_AMT,MEDICARE_PFS_AMT,ESTIMATED_PRIVATE_PFS_AMT,PFS_AMT_BY_STATE,PFS_AMT_BY_FAIRFIELD_COUNTY,PFS_AMT_BY_HARTFORD_COUNTY,PFS_AMT_BY_LITCHFIELD_COUNTY,PFS_AMT_BY_MIDDLESEX_COUNTY,PFS_AMT_BY_NEW_HAVEN_COUNTY,PFS_AMT_BY_NEW_LONDON_COUNTY,PFS_AMT_BY_TOLLAND_COUNTY,PFS_AMT_BY_WINDHAM_COUNTY
0,10121,Remove foreign body,446.0,306.48,306.48,336.73,341.86,352.68,340.75,335.85,354.89,347.56,329.96,353.63
1,10180,Complex drainage wound,446.0,277.66,277.66,314.16,320.35,333.4,319.01,313.1,336.07,327.22,305.99,334.55
2,11010,Debride skin at fx site,251.52,568.1,568.1,499.46,487.82,463.28,490.33,501.46,458.25,474.89,514.82,461.12
3,11011,Debride skin musc at fx site,251.52,608.45,608.45,531.06,517.94,490.27,520.77,533.32,484.6,503.36,548.38,487.83
4,11012,Deb skin bone at fx site,251.52,795.03,795.03,677.19,657.2,615.07,661.51,680.62,606.43,635.01,703.55,611.36
5,11042,Deb subq tissue 20 sq cm/<,164.42,131.41,131.41,138.57,139.78,142.34,139.52,138.36,142.86,141.13,136.97,142.57
6,11043,Deb musc/fascia 20 sq cm/<,164.42,254.52,254.52,234.98,231.67,224.69,232.39,235.55,223.26,227.99,239.36,224.07
7,11044,Deb bone 20 sq cm/<,423.1,347.75,347.75,364.09,366.86,372.7,366.26,363.61,373.9,369.93,360.43,373.21
8,11404,Exc tr-ext b9+marg 3.1-4 cm,333.0,244.44,244.44,263.64,266.9,273.76,266.2,263.08,275.17,270.51,259.35,274.37
9,11406,Exc tr-ext b9+marg >4.0 cm,446.0,351.96,351.96,372.35,375.81,383.1,375.06,371.76,384.59,379.65,367.79,383.74


In [78]:
# df_IN_PFS.to_csv(r"output/Indiana_PFS.csv", index=False, header=True)
CT_filename = 'Connecticut_PFS.csv'
filepath = './output'
if not os.path.exists(filepath):
    os.mkdir(filepath)
full_filename = os.path.join(filepath, CT_filename)    
df_CT_PFS.to_csv(full_filename)

### New York State
- [Physician Fee Schedule](https://med-comply.com/NY-Medicaid-Fee-Schedule)
- [County Level Medicaid Enrollment](https://www.health.ny.gov/health_care/managed_care/reports/enrollment/monthly/)

#### Medicaid Physician Fee Schedule

In [92]:
# medicaid physician fee schedule
df_medicaid_PFS_NY = pd.read_excel("data/NewYork/FY2020_NY_Physician_Manual_Fee_Schedule_Sect5.xls", 
                                    header=2, index_col=None, usecols=[0, 1, 2])
df_medicaid_PFS_NY.rename(columns = lambda x:x.strip().upper(), inplace=True)
df_medicaid_PFS_NY.rename(columns = {"NON-FACILITY GLOBAL FEE":"MEDICAID_PFS_AMT",
                                    "CODE":"PROC CODE"}, inplace=True)
df_medicaid_PFS_NY = df_medicaid_PFS_NY[pd.to_numeric(df_medicaid_PFS_NY["MEDICAID_PFS_AMT"], errors='coerce').notnull()]
df_medicaid_PFS_NY["PROC CODE"] = df_medicaid_PFS_NY["PROC CODE"].apply(str)
df_medicaid_PFS_NY["MEDICAID_PFS_AMT"] = df_medicaid_PFS_NY["MEDICAID_PFS_AMT"].astype(float)
df_medicaid_PFS_NY.head()

Unnamed: 0,PROC CODE,DESCRIPTION,MEDICAID_PFS_AMT
0,10004,"FINE NEEDLE ASPIRATION BIOPSY, WITHOUT IMAGING",34.12
1,10005,"FINE NEEDLE ASPIRATION BIOPSY, INCLUDING",82.15
2,10006,"FINE NEEDLE ASPIRATION BIOPSY, INCLUDING",38.92
3,10007,"FINE NEEDLE ASPIRATION BIOPSY, INCLUDING",180.07
4,10008,"FINE NEEDLE ASPIRATION BIOPSY, INCLUDING",104.76


In [93]:
df_medicaid_PFS_NY.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5378 entries, 0 to 5655
Data columns (total 3 columns):
PROC CODE           5378 non-null object
DESCRIPTION         5378 non-null object
MEDICAID_PFS_AMT    5378 non-null float64
dtypes: float64(1), object(2)
memory usage: 168.1+ KB


In [94]:
#Medicare Searchable CPT Pricing info for certain CPT
df_NY_mdcare_pfs = pd.read_csv("data/NewYork/FY2018_NY_medicare_PFSExport.csv", header=0, 
                                       index_col=None, usecols=[0, 2, 5])
df_NY_mdcare_pfs.rename(columns = lambda x:x.strip().upper(), inplace=True)
df_NY_mdcare_pfs.rename(columns={"HCPCS CODE":"PROC CODE",
                                    "NON-FACILITY PRICE":"MEDICARE_PFS_AMT"}, inplace=True)
df_NY_mdcare_pfs.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8933 entries, 0 to 8932
Data columns (total 6 columns):
PROC CODE                       8933 non-null int64
SHORT DESCRIPTION               8933 non-null object
NON-FACILITY PRICE              8933 non-null object
FACILITY PRICE                  8933 non-null object
NON-FACILITY LIMITING CHARGE    4933 non-null object
FACILITY LIMITING CHARGE        6884 non-null object
dtypes: int64(1), object(5)
memory usage: 418.8+ KB


In [95]:
#Handle $12,234.56 char formatted cash amount to float type
df_NY_mdcare_pfs["MEDICARE_PFS_AMT"] = df_NY_mdcare_pfs["MEDICARE_PFS_AMT"].str.replace(r',', '')
df_NY_mdcare_pfs["MEDICARE_PFS_AMT"] = (df_NY_mdcare_pfs["MEDICARE_PFS_AMT"]
                                                  .str.replace(r'[^-+\d.]', '').astype(float))
df_NY_mdcare_pfs.head() # NON-FACILITY PRICE VS. FACILITY PRICE

Unnamed: 0,PROC CODE,SHORT DESCRIPTION,NON-FACILITY PRICE,FACILITY PRICE,NON-FACILITY LIMITING CHARGE,FACILITY LIMITING CHARGE
0,10021,Fna w/o image,118.79,68.17,$129.77,$74.48
1,10022,Fna w/image,137.42,64.92,$150.13,$70.92
2,10030,Guide cathet fluid drainage,550.32,137.87,$601.22,$150.62
3,10035,Perq dev soft tiss 1st imag,503.96,86.04,$550.58,$94.00
4,10036,Perq dev soft tiss add imag,441.81,42.7,$482.68,$46.65


In [97]:
df_NY_PFS = pd.merge(df_medicaid_PFS_NY, df_NY_mdcare_pfs, on="PROC CODE")
df_NY_PFS = df_NY_PFS[["PROC CODE", "PROC DESCRIPTION", "MAX FEE", "NON-FACILITY PRICE"]]
df_NY_PFS.rename(columns={"MAX FEE":"MEDICAID_PFS_AMT", "NON-FACILITY PRICE":"MEDICARE_PFS_AMT"}, inplace=True)
df_NY_PFS["ESTIMATED_PRIVATE_PFS_AMT"] = df_NY_PFS["MEDICARE_PFS_AMT"]
df_NY_PFS.head()

ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

In [None]:
#compute the state level PFS price
NY_medicaidPrcent = df_popMM_2018Bs.iloc[32, 2]
NY_medicarePrcent = df_popMM_2018Bs.iloc[32, 3]
NY_privatePrcent = df_popMM_2018Bs.iloc[32, 4]
#
df_NY_PFS["PFS_AMT_BY_STATE"] = df_NY_PFS["MEDICAID_PFS_AMT"]*NY_medicaidPrcent + \
    df_NY_PFS["MEDICARE_PFS_AMT"]*NY_medicarePrcent + \
    df_NY_PFS["ESTIMATED_PRIVATE_PFS_AMT"]*NY_privatePrcent
df_NY_PFS["PFS_AMT_BY_STATE"] = df_NY_PFS["PFS_AMT_BY_STATE"].round(2)