## Analyzing Dialysis Facilities in the United States  
This project will analyze dialysis facilities in the United States and attempt to answer the following question:
 - What factors lead to better care in a dialysis facility?

https://github.com/carl-schick-ds/meteorite-landings

https://catalog.data.gov/dataset/medicare-dialysis-facilities

***
### Setup
Import needed libraries.  Unless otherwise noted, all libraries are available in the baseline conda environment.

In [1]:
# Import Libraries
import pandas as pd
import numpy as np
from IPython.display import display

In [2]:
# Auto Re-load External Modules
%load_ext autoreload
%autoreload 2

In [3]:
# Toggle REFRESH_DATA literal

REFRESH_DATA = False

In [4]:
# Refresh Data
# See the dialysis_facilities.py file in this repostitory for details on the data collection routines

if REFRESH_DATA:
    import dialysis_facilities_dc as dc

    # Get the data
    raw_url = 'https://data.cms.gov/sites/default/files/2021-01/FY_2021_Facility_Level_Dialysis_Facility_Reports.csv'
    raw_facilities_df = pd.read_csv(raw_url, dtype={'NPI': 'str', 'Alternate CCN(s)': 'str'})
    # dc.raw_analysis(raw_facilities_df)
    print()
    facilities_data = dc.get_facilities(raw_facilities_df)
    print()
    measures_data = dc.get_measures(raw_facilities_df)
    print()
    scores_data = dc.get_scores(raw_facilities_df)

    # Convert to CSV files
    facilities_data.to_csv('facilities.csv')
    measures_data.to_csv('measures.csv')
    scores_data.to_csv('scores.csv')

***
### Data Loading and Cleaning

#### Load CSVs

Load the data from the csv files and run a quick review of the data for validity checks

In [5]:
# Read in CSV files
facilities_df = pd.read_csv('facilities.csv', index_col=0)
measures_df = pd.read_csv('measures.csv', index_col=0)
fac_scores_df = pd.read_csv('scores.csv', index_col=[0,1])

# Display the head of each dataframe
display(facilities_df.head(5))
display(measures_df.head(5))
display(fac_scores_df.head(5))

Unnamed: 0_level_0,State,Provider Name,City,Ownership Type,ESRD Network,NPI,Chain Name,Modality,Alternate CCN(s)
CCN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
12306,AL,CHILDRENS HOSPITAL OF ALABAMA ESRD,BIRMINGHAM,Non-profit,8,1720166085,INDEPENDENT,Hemodialysis and Peritoneal Dialysis,12306013300
12500,AL,FMC CAPITOL CITY,MONTGOMERY,For Profit,8,1780796532,FRESENIUS MEDICAL CARE,Hemodialysis and Peritoneal Dialysis,12500
12501,AL,GADSDEN DIALYSIS,GADSDEN,For Profit,8,1215900444,DAVITA,Hemodialysis,12501
12502,AL,TUSCALOOSA UNIVERSITY DIALYSIS,TUSCALOOSA,For Profit,8,1003889171,DAVITA,Hemodialysis and Peritoneal Dialysis,12502
12505,AL,PHYSICIANS CHOICE DIALYSIS-MONTGOMERY,MONTGOMERY,For Profit,8,1760446199,DAVITA,Hemodialysis and Peritoneal Dialysis,12505


Unnamed: 0_level_0,Measure
Measure ID,Unnamed: 1_level_1
pahy1_f,F: Prevalent Patients - End of Year Status: Nu...
agey1_f,F: Prevalent Patients - Age: Average patient a...
viny1_f,F: Prevalent Patients - Vintage: Average Years...
age1y1_f,F: Prevalent Patients - Age: % Less than 18 ye...
age2y1_f,F: Prevalent Patients - Age: % Between 18-64 y...


Unnamed: 0_level_0,Unnamed: 1_level_0,Measure ID,Year,Measure Score
Unnamed: 0_level_1,CCN,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,12306,pahy1_f,2016,20.0
1,12306,agey1_f,2016,8.4
2,12306,viny1_f,2016,3.58
3,12306,age1y1_f,2016,100.0
4,12306,age2y1_f,2016,0.0
