### Table of Contents

* [1: Importing, Merging, and Cleaning the Data](#import_merge_clean_1)
    * [1.1: Import](#import_1_1)
    * [1.2: Merge](#merge_1_2)
        * [1.2.1: Merge HCPCS Codes](#merge_hcpcs_1_2_1)
        * [1.2.2: Merge Carrier Codes](#merge_carrier_1_2_2)
    * [1.3: Clean](#clean_1_3)
* [2: Visualization](#vis_2)
    * [2.1: Nation-Wide](#vis_nation_2_1)
    * [2.2: Statewide](#vis_state_2_2)
* [3: Business Application](#business_app_3)
    * [3.1: Revenue Forecasting](#business_app_rev_3_1)
    * [3.2: Statewide](#business_app_rev_3_2)

### 1 - Importing, Merging, and Cleaning the Data<a class="anchor" id="import_merge_clean_1"></a>

#### 1.1 - Import<a class="anchor" id="import_1_1"></a>
* Download Ambulance Fee Schedule from [The Centers for Medicare & Medicaid Services](https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AmbulanceFeeSchedule/), CMS

In [1]:
# List of urls to Ambulance Fee Schedule data
lst_url = ['https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AmbulanceFeeSchedule/Downloads/afs_2005.zip',
          'https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AmbulanceFeeSchedule/Downloads/2006_afs.zip',
          'https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AmbulanceFeeSchedule/Downloads/2007_afs.zip',
          'https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AmbulanceFeeSchedule/Downloads/2008_Ambulance_Fee_Schedule_PUF_update.zip',
          'https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AmbulanceFeeSchedule/Downloads/2009_AFS_PUF.zip',
          'https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AmbulanceFeeSchedule/Downloads/2010_AFS_PUF.zip',
          'https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AmbulanceFeeSchedule/Downloads/2011_AFS_PUF.zip',
          'https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AmbulanceFeeSchedule/Downloads/2012_AFS_PUF.zip',
          'https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AmbulanceFeeSchedule/Downloads/2013-AFS-PUF.zip',
          'https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AmbulanceFeeSchedule/Downloads/2014-AFS-PUF.zip',
          'https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AmbulanceFeeSchedule/Downloads/2015-AFS-PUF.zip',
          'https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AmbulanceFeeSchedule/Downloads/2016-AFS-PUF.zip',
          'https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AmbulanceFeeSchedule/Downloads/CY-2017-File.zip',
          'https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AmbulanceFeeSchedule/Downloads/2018-AFS-PUF.zip',
          'https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AmbulanceFeeSchedule/Downloads/2019-AFS-PUF.zip',
          'https://www.cms.gov/files/zip/cy-2020-file.zip']

# Import the data using the wget package
# !pip install wget
import wget
for url in lst_url:
    wget.download(url)
    file_name = url.split('/')[-1]
    print(f'Downloaded: {file_name}')
    
# Get list of zip files in current directory (name changes from website to here)
import os
zip_files = [f for f in os.listdir('.') if os.path.isfile(f) and f[-3:]=='zip'];

# Extract zip files to data folder
import zipfile
for file in zip_files:
    with zipfile.ZipFile(file, 'r') as zip_ref:
        zip_ref.extractall('data')
        
# Delete zip files
for file in zip_files:
    os.remove(file)

# Rename relevant spreadsheets
os.rename('./data/' + 'ambfspuf_2013_2ndrevisedfinal.xlsx', './data/' + '2013.xlsx')
os.rename('./data/' + 'ambfspuf 1Q2014_ext_finalv2.xlsx', './data/' + '2014.xlsx')
os.rename('./data/' + 'ambfspuf_2016_.xlsx', './data/' + '2016.xlsx')

# Delete non txt files
non_txt_files = [f for f in os.listdir('./data/') if f[-3:]!='txt']
txt_files = [f for f in os.listdir('./data/') if f[-3:]=='txt']
keep_files = ['2013.xlsx', '2014.xlsx', '2016.xlsx']
for file in non_txt_files:
    if file not in keep_files:
        os.remove('./data/' + file)

# Rename txt files for redability
txt_files_rename = []
for file in txt_files:
    s = ''.join([i for i in file if i.isnumeric()])
    idx_yr = s.find('2')
    s = s[idx_yr:idx_yr + 4] + '.txt'
    txt_files_rename.append(s)
txt_files_rename
for i in range(len(txt_files)):
    os.rename('./data/' + txt_files[i], './data/' + txt_files_rename[i])
os.remove('./data/8.txt')
os.remove('./data/2013.txt')
os.remove('./data/2014.txt')
os.remove('./data/2016.txt')

Downloaded: afs_2005.zip
Downloaded: 2006_afs.zip
Downloaded: 2007_afs.zip
Downloaded: 2008_Ambulance_Fee_Schedule_PUF_update.zip
Downloaded: 2009_AFS_PUF.zip
Downloaded: 2010_AFS_PUF.zip
Downloaded: 2011_AFS_PUF.zip
Downloaded: 2012_AFS_PUF.zip
Downloaded: 2013-AFS-PUF.zip
Downloaded: 2014-AFS-PUF.zip
Downloaded: 2015-AFS-PUF.zip
Downloaded: 2016-AFS-PUF.zip
Downloaded: CY-2017-File.zip
Downloaded: 2018-AFS-PUF.zip
Downloaded: 2019-AFS-PUF.zip
Downloaded: cy-2020-file.zip


#### Create dataframe for each year

In [2]:
# Clean data and create dataframe for each year
import pandas as pd

# Create list of years with associated txt files
lst_txt = [2009, 2010, 2011, 2015, 2012, 2017, 2018, 2019, 2020]

# Create list of years with associated xlsx files
lst_xlsx = [2013, 2014, 2016,]

# Create pandas df for each txt file / year
for yr in lst_txt:
    df_name = 'df_' + str(yr)
    globals()[df_name] = pd.read_csv('./data/' + str(yr) + '.txt', sep="\t", dtype={'CONTRACTOR/CARRIER': str,'contractor/carrier': str})
    globals()[df_name] = globals()[df_name].iloc[:, : 10]
    
# Create pandas df for each xlsx file / year
for yr in lst_xlsx:
    df_name = 'df_' + str(yr)
    globals()[df_name] = pd.read_excel('./data/' + str(yr) + '.xlsx')
    globals()[df_name] = globals()[df_name].iloc[:, : 10]

# Create list of all dataframes
lst_df = [df_2009,df_2010,df_2011,
          df_2012,df_2013,df_2014,
          df_2015,df_2016,df_2017,
          df_2018,df_2019,df_2020]

# Create list of column names to superimpose on each df for consistency
col_consistent = ['contractor/carrier', 'locality', 'hcpcs', 'rvu', 'gpci', 'base rate', 'urban base rate / urban mileage', 'rural base rate / rural mileage', 'rural base rate / lowest quartile', 'rural ground miles']

# Assign column names to each dataframe
for i in range(len(lst_df)):
    lst_df[i].columns = col_consistent
    # Found ID errors in data
    lst_df[i]['contractor/carrier'] = lst_df[i]['contractor/carrier'].astype(str).str.replace('01112', '01102').str.replace('10112', '10102').str.replace('10212', '10202').str.replace('10312', '10302')
    # Remove $ signs and convert to float
    lst_df[i]['base rate'] = lst_df[i]['base rate'].astype(str).str.replace('$', '').str.replace(',', '').astype(float)
    lst_df[i]['urban base rate / urban mileage'] = lst_df[i]['urban base rate / urban mileage'].astype(str).str.replace('$', '').str.replace(',', '').astype(float)
    lst_df[i]['rural base rate / rural mileage'] = lst_df[i]['rural base rate / rural mileage'].astype(str).str.replace('$', '').str.replace(',', '').astype(float)
    lst_df[i]['rural base rate / lowest quartile'] = lst_df[i]['rural base rate / lowest quartile'].astype(str).str.replace('$', '').str.replace(',', '').astype(float)
    lst_df[i]['rural ground miles'] = lst_df[i]['rural ground miles'].astype(str).str.replace('$', '').str.replace(',', '').astype(float)
    
# Some dataframes have labels for various states and NA values following. Delete these rows
for i in range(len(lst_df)):
    lst_df[i] = lst_df[i].loc[~lst_df[i]['locality'].isna()]

#### 1.2 - Merge<a class="anchor" id="merge_1_2"></a>

##### 1.2.1 - Merge HCPCS Codes<a class="anchor" id="merge_hcpcs_1_2_1"></a>
* "HCPCS is a collection of standardized codes that represent medical procedures, supplies, products and services." [source](https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/HCPCS/index.html#:~:text=HCPCS%20is%20a%20collection%20of,by%20Medicare%20and%20other%20insurers.&text=HCPCS%20is%20divided%20into%20two%20subsystems%2C%20Level%20I%20and%20Level%20II.)
* Join Alpha-Numeric HCPCS Codes (i.e. all the different activities to be billed) [download](https://www.cms.gov/Medicare/Coding/HCPCSReleaseCodeSets/Alpha-Numeric-HCPCS)

In [3]:
# Import HCPCS Code dataframe
df_code = pd.read_excel('./_hcpcs/hcpcs2020_anweb/HCPCS_2020.xlsx')

# Lowercase column names
df_code.columns = df_code.columns.str.lower()

# Merge HCPCS dataframe with main dataframe
lst_yrs = [yr for yr in range(2009,2020+1)]
for i in range(len(lst_yrs)):
    lst_df[i] = pd.merge(lst_df[i], df_code[['hcpc/mod', 'short description', 'long description']], how='left', left_on='hcpcs', right_on='hcpc/mod')

##### 1.2.2 - Merge Carrier Codes<a class="anchor" id="merge_carrier_1_2_2"></a>
* Import contractor/carrier code data and perform string manipulation

In [4]:
# Loop through 2009-2016 years, save as dataframe, and performe string manipulation
for yr in lst_yrs:
    df_name = 'df_carrier_code_' + str(yr)
    try:
        globals()[df_name] = pd.read_csv('./_index/' + str(yr) + '.csv', header=None)
    except:
        globals()[df_name] = pd.read_excel('./_index/' + str(yr) + '.xlsx', header=None)
    
    # new data frame with split value columns 
    split_columns = globals()[df_name][0].str.split(" ", n = 1, expand = True)
    
    # making separate contractor/carrier column from new data frame 
    globals()[df_name]["contractor/carrier"] = split_columns[0]
    globals()[df_name]["contractor/carrier"] = globals()[df_name]["contractor/carrier"].astype(str).str.replace('U', '').str.strip()
    # Found ID errors in data
    globals()[df_name]['contractor/carrier'] = globals()[df_name]['contractor/carrier'].str.replace('0632', '06302').str.replace('01112', '01102')
    
    # making separate location column from new data frame 
    globals()[df_name]["contractor/carrier name"]= split_columns[1].str.replace('\d+', '').str.replace('-', '').str.strip().str.replace('-', '').str.lower()
    
    # reducing number of columns in data frame
    globals()[df_name] = globals()[df_name][["contractor/carrier", "contractor/carrier name"]]

# Create list of dataframes for convenient merging
lst_carrier_df = [df_carrier_code_2009,df_carrier_code_2010,df_carrier_code_2011,
                  df_carrier_code_2012,df_carrier_code_2013,df_carrier_code_2014,
                  df_carrier_code_2015,df_carrier_code_2016,df_carrier_code_2017,
                  df_carrier_code_2018,df_carrier_code_2019,df_carrier_code_2020]

#  Merge the datasets
for i in range(len(lst_yrs)):
    lst_df[i] = pd.merge(lst_df[i], lst_carrier_df[i], how='left', left_on='contractor/carrier', right_on='contractor/carrier')

#### 1.3 - Clean<a class="anchor" id="clean_1_3"></a>

In [5]:
# Create list of column names for an intuitive reordering
col_reorder = ['year','contractor/carrier','contractor/carrier name',
               'locality','hcpcs','short description','long description',
               'rvu','gpci','base rate','urban base rate / urban mileage',
               'rural base rate / rural mileage','rural base rate / lowest quartile',
               'rural ground miles']

# Reorder columns
for i in range(len(lst_yrs)):
    lst_df[i]['year'] = lst_yrs[i]
    lst_df[i] = lst_df[i][col_reorder]
    
# Compile all dataframes
df = pd.concat(lst_df)
df.sample(5)

Unnamed: 0,year,contractor/carrier,contractor/carrier name,locality,hcpcs,short description,long description,rvu,gpci,base rate,urban base rate / urban mileage,rural base rate / rural mileage,rural base rate / lowest quartile,rural ground miles
293,2017,1102,northern california,75.0,A0430,Fixed wing air transport,"Ambulance service, conventional air services, ...",1.0,1.079,3016.51,3135.66,4703.49,,4703.49
86,2012,835,oregon,99.0,A0427,Als1-emergency,"Ambulance service, advanced life support, emer...",214.47,1.9,0.962,404.59,408.55,500.89,
323,2018,1182,southern california,18.0,A0436,Rotary wing air mileage,"Rotary wing air mileage, per statute mile",1.0,1.177,23.09,23.09,34.64,,34.64
1226,2020,13292,ghi/new york,4.0,A0427,Als1-emergency,"Ambulance service, advanced life support, emer...",1.9,1.214,231.98,516.92,521.99,639.96,
507,2015,6102,illinois,15.0,A0428,Bls,"Ambulance service, basic life support, non-eme...",221.63,1.0,1.057,235.08,237.39,291.04,


### 2 - Visualization<a class="anchor" id="vis_2"></a>

#### 2.1 - Nation-Wide<a class="anchor" id="vis_nation_2_1"></a>

#### 2.2 - Statewide<a class="anchor" id="vis_state_2_2"></a>

### 3 - Business Application<a class="anchor" id="business_app_3"></a>

#### 3.1 - Revenue Forecasting<a class="anchor" id="business_app_rev_3_1"></a>

#### 3.2 - Statewide<a class="anchor" id="business_app_rev_3_2"></a>