# EDA for Medicare chronic conditions data

Link to the Medicare chronic conditions webpage [here](https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Chronic-Conditions/CC_Main)

In [1]:
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.cbook as cbook

In [2]:
# load data
def loadChronicSheet(sheet_name):
    orig_data = pd.read_excel("../data/County_Table_Chronic_Conditions_Prevalence_by_Age_2017.xlsx",
                              sheet_name = sheet_name,
                              skiprows = 5,
                              na_values = ["* ", "*", "  "])
    orig_data.columns = ["State", "County", "countyFIPS", "Alcohol Abuse", "Alzheimers", 
                         "Arthritis", "Asthma", "Atrial Fibrillation", "Autism", "Cancer",
                         "Kidney Disease", "COPD", "Depression", "Diabetes", "Drug Abuse",
                         "HIV/AIDS", "Heart Failure", "Hepatitis", "Hyperlipidemia", 
                         "Hypertension", "Ischemic Heart Disease", "Osteoporosis", 
                         "Psychotic Disorders", "Stroke"]
    orig_data.columns = list(orig_data.columns[:3]) + ["condition_" + name for name in orig_data.columns[3:]]
    orig_data = orig_data.dropna(subset = ["County"])
    return orig_data

chronic_young_orig = loadChronicSheet("Beneficiaries Less than 65 Year")
chronic_old_orig = loadChronicSheet("Beneficiaries 65 Years and Over")
chronic_all_orig = loadChronicSheet("All Beneficiaries")

In [3]:
#visualize the top 5 rows
chronic_old_orig.head(5)

Unnamed: 0,State,County,countyFIPS,condition_Alcohol Abuse,condition_Alzheimers,condition_Arthritis,condition_Asthma,condition_Atrial Fibrillation,condition_Autism,condition_Cancer,...,condition_Drug Abuse,condition_HIV/AIDS,condition_Heart Failure,condition_Hepatitis,condition_Hyperlipidemia,condition_Hypertension,condition_Ischemic Heart Disease,condition_Osteoporosis,condition_Psychotic Disorders,condition_Stroke
2,Alabama,Autauga,1001.0,1.5731,12.3911,36.6167,4.453,10.3098,,9.8742,...,1.7425,,17.062,0.2904,53.1462,67.788,34.9952,7.2604,1.8393,4.453
3,Alabama,Baldwin,1003.0,1.727,12.0132,38.1982,4.7225,10.5444,,9.3159,...,2.4213,0.0623,13.3262,0.316,46.8198,62.7676,33.6004,6.8589,1.5489,4.1617
4,Alabama,Barbour,1005.0,4.0377,14.603,40.4441,4.6097,8.8156,,10.2961,...,3.6676,,15.3432,,51.245,71.7362,30.6864,5.821,2.8264,5.2826
5,Alabama,Bibb,1007.0,1.8979,14.9215,42.212,4.2539,12.2382,0.0,9.1623,...,1.8979,,19.6335,,54.6466,74.8037,33.7042,8.1152,2.6832,6.6099
6,Alabama,Blount,1009.0,1.3167,13.5597,36.3132,4.9665,10.0485,,7.9695,...,2.772,,18.018,0.2541,50.8894,68.7919,33.9339,6.6297,1.7787,5.0127


In [4]:
# visualize the number of columns and rows
chronic_all_orig.shape

(3197, 24)

In [5]:
# countyFIPS is a unique id
chronic_all_orig["countyFIPS"].nunique()

3197

Here is a brief summary of the table:
- countyFIPS is a unique identifier for each county in the US. FIPS stands for Federal Information Processing Standard coding scheme. Since there are confirmed cases that we don't know the county information, their countyFIPS is 0.
- County Name is the name of a county
- State : which state the county belongs to
- other fields refers to the prevalence of chronic condition (in %) in 2017