## Analyzing Dialysis Facilities in the United States  
This project will analyze dialysis facilities in the United States and attempt to answer the following question:
 - What factors lead to better care in a dialysis facility?

https://github.com/carl-schick-ds/meteorite-landings

https://catalog.data.gov/dataset/medicare-dialysis-facilities

***
### Setup
Import needed libraries.  Unless otherwise noted, all libraries are available in the baseline conda environment.

In [54]:
# Import Libraries
import pandas as pd
import numpy as np
from IPython.display import display

In [55]:
# Auto Re-load External Modules
%load_ext autoreload
%autoreload 2

In [56]:
# Toggle REFRESH_DATA literal

REFRESH_DATA = True

In [57]:
# Refresh Data
# See the dialysis_facilities.py file in this repostitory for details on the data collection routines

if REFRESH_DATA:
    import dialysis_facilities_dc as dc

    # Get the data
    raw_url = 'https://data.cms.gov/sites/default/files/2021-01/FY_2021_Facility_Level_Dialysis_Facility_Reports.csv'
    raw_facilities_df = pd.read_csv(raw_url, dtype={'NPI': 'str', 'Alternate CCN(s)': 'str'})
    # dc.raw_analysis(raw_facilities_df)
    print()
    facilities_data = dc.get_facilities(raw_facilities_df)
    print()
    measures_data = dc.get_measures(raw_facilities_df)
    print()
    scores_data = dc.get_scores(raw_facilities_df)

    # Convert to CSV files
    facilities_data.to_csv('facilities.csv')
    measures_data.to_csv('measures.csv')
    scores_data.to_csv('scores.csv')

KeyboardInterrupt: 

***
### Data Loading and Cleaning

#### Load CSVs

Load the data from the csv files and run a quick review of the data for validity checks

In [None]:
# Read in CSV files
facilities_df = pd.read_csv('facilities.csv', index_col=0)
measures_df = pd.read_csv('measures.csv', index_col=0)
fac_scores_df = pd.read_csv('scores.csv', index_col=[0,1])

# Display the head of each dataframe
display(facilities_df.head(5))
display(measures_df.head(5))
display(fac_scores_df.head(5))