# About the data  
##### The dataset loaded into this notebook is a .csv file obtained from the CDC's preventative measures site.  
##### The sample subject of this dataset is U.S. citizens who are 65 years or older. The sample is subdivided by gender, State and city.
##### The target or signal of this dataset is the percentage of citizens (broken down by gender and location) who are reported to have taken preventative measures such as immunizations and select cancer screenings against common illnesses.
##### [Bethlehem insert commentary here about what trends we want to see]  

# Purpose of this notebook:
##### Create a clean .csv file to work from for further analysis
##### Perform preliminary analysis as well as diagnostics of the data
##### Create visualizations from this notebook for upload onto our blog site

# Reading and cleaning the dataset

In [1]:
# Importing powerful data manipulation library
import pandas as pd

In [5]:
# Reading in prevention data
prevent_df = pd.read_csv('../data/preventativedata.csv')

# Removing with little to no information
prevent_df = prevent_df.drop(columns=['Data_Value_Unit', 'Data_Value_Footnote_Symbol', 'Data_Value_Footnote', 'TractFIPS', 'CategoryID', 'StateDesc', 'Data_Value_Type', 'DataSource',
'DataValueTypeID', 'Category'])

In [6]:
# Removing undescores
prevent_df.columns = prevent_df.columns.str.replace('_', '')
prevent_df.columns

Index(['Year', 'StateAbbr', 'CityName', 'GeographicLevel', 'UniqueID',
       'Measure', 'DataValue', 'LowConfidenceLimit', 'HighConfidenceLimit',
       'PopulationCount', 'GeoLocation', 'MeasureId', 'CityFIPS',
       'ShortQuestionText'],
      dtype='object')

In [7]:
# Sorting values by unique ID; creating new index and dropping the old one
prevent_df.sort_values(by=['UniqueID'], ascending=True).reset_index().head(5).drop(columns=['index'])

Unnamed: 0,Year,StateAbbr,CityName,GeographicLevel,UniqueID,Measure,DataValue,LowConfidenceLimit,HighConfidenceLimit,PopulationCount,GeoLocation,MeasureId,CityFIPS,ShortQuestionText
0,2016,HI,Honolulu,City,15003,Older adult men aged >=65 Years who are up to ...,31.4,31.1,31.7,953207,"(21.4588039305, -157.973296737)",COREM,15003,Core preventive services for older men
1,2016,HI,Honolulu,City,15003,Older adult women aged >=65 Years who are up t...,30.4,30.1,30.7,953207,"(21.4588039305, -157.973296737)",COREW,15003,Core preventive services for older women
2,2016,AL,Birmingham,City,107000,Older adult men aged >=65 Years who are up to ...,32.9,32.2,33.7,212237,"(33.5275663773, -86.7988174678)",COREM,107000,Core preventive services for older men
3,2016,AL,Birmingham,City,107000,Older adult women aged >=65 Years who are up t...,26.0,25.4,26.6,212237,"(33.5275663773, -86.7988174678)",COREW,107000,Core preventive services for older women
4,2016,AL,Hoover,City,135896,Older adult women aged >=65 Years who are up t...,38.6,37.1,40.0,81619,"(33.3767602729, -86.8051937568)",COREW,135896,Core preventive services for older women


# Data Visualization and Analysis

In [8]:
# Importing visualization tool
import plotly