# D210 Data Dashboard and Storytelling Assessment — Task 1
### NAM Task 1: Data Dashboard And Storytelling
#### Representation and Reporting — D210
#### PRFA — NAM2
> André Davis
> StudentID: 010630641
> MSDA
>
> Competencies
> 4033.2.1 : Storytelling with Data
>   The graduate communicates data insights to technical and nontechnical audiences.
>
> 4033.2.2 : Data Visualizations and Representations
>   The graduate creates data representations to offer insight into an organizational problem.
>
> 4033.2.3 : Dashboards
>   The graduate designs interactive dashboards to support executive decision-making.

#### Table of Contents
<ul>
    <li><a href="#data-cleaning">Pre-work: Data Cleaning</a></li>
    <li><a href="#interactice-data-dashboard">A1: Interactive Data Dashboard</a></li>
    <li><a href="#data-sets">A2: Data Sets</a></li>
    <li><a href="#installation-instructions">A3: Installation Instructions</a></li>
    <li><a href="#panopto-storying-telling-with-data">B: Panopto Storying Telling With Data</a></li>
    <li><a href="#dashboard-alignment">C1: Dashboard Alignment</a></li>
    <li><a href="#additional-data-set-insights">C2: Additional Data Set Insights</a></li>
    <li><a href="#decision-making-support">C3: Decision-Making Support</a></li>
    <li><a href="#interactice-controls">C4: Interactice Controls</a></li>
    <li><a href="#colorblindness">C5: Colorblindness</a></li>
    <li><a href="#data-representation">C6: Data Representation</a></li>
    <li><a href="#audience-analysis">C7: Audience Analysis</a></li>
    <li><a href="#universal-access">C8: Universal Access</a></li>
    <li><a href="#effective-storytelling">C9: Effective Storytelling</a></li>
    <li><a href="#sources">D: Sources</a></li>
    <li><a href="#professional-communication">E: Professional Communication</a></li>
</ul>

<a id="data-cleaning"></a>
# Data Cleaning

 * Cleaning the WGU supplied Medical Data with some basic cleaning to keep data similar to `D208` & `D209`
 * Cleaning additional data set related to Readmission from [`Kaggle`](https://www.kaggle.com/) called [`U.S. Hospital Overall Star Ratings 2016-2020`](https://www.kaggle.com/datasets/abrambeyer/us-hospital-overall-star-ratings-20162020)

In [11]:
import warnings
import pandas as pd
from pandas.api.types import CategoricalDtype
import numpy as np

warnings.filterwarnings('ignore')

medical_data = pd.read_csv('./Data/medical-data/medical_clean.csv', index_col=0)
any_missing_values = medical_data.isna().values.any()
if not any_missing_values:
    print('Medical data does NOT contain any missing values\n')
else:
    print('Medical data CONTAINS missing values.\n')

medical_data['Zip'] = medical_data['Zip'].astype('str').str.zfill(5)

column_renames = {
     'Item1': 'Timely_Admission'
    ,'Item2': 'Timely_Treatment'
    ,'Item3': 'Timely_Visits'
    ,'Item4': 'Reliability'
    ,'Item5': 'Options'
    ,'Item6': 'Hours_Of_Treatment'
    ,'Item7': 'Courteous_Staff'
    ,'Item8': 'Listening' #Evidence of active listening from Doctor
}
medical_data.rename(columns=column_renames, inplace=True)

category_dtype = 'category'
convert_to_category = {
    'Gender': category_dtype,
    'ReAdmis': category_dtype,
    'Soft_drink': category_dtype,
    'Initial_admin': category_dtype,
    'HighBlood': category_dtype,
    'Stroke': category_dtype,
    'Complication_risk': category_dtype,
    'Overweight': category_dtype,
    'Arthritis': category_dtype,
    'Diabetes': category_dtype,
    'Hyperlipidemia': category_dtype,
    'BackPain': category_dtype,
    'Anxiety': category_dtype,
    'Allergic_rhinitis': category_dtype,
    'Reflux_esophagitis': category_dtype,
    'Asthma': category_dtype,
    'Services': category_dtype,
    'Timely_Admission': category_dtype,
    'Timely_Treatment': category_dtype,
    'Timely_Visits': category_dtype,
    'Reliability': category_dtype,
    'Options': category_dtype,
    'Hours_Of_Treatment': category_dtype,
    'Courteous_Staff': category_dtype,
    'Listening': category_dtype
}

medical_data = medical_data.astype(convert_to_category)

#Convert Yes/No's to True and False for charting in Tableau
columns_to_reexpress = ['ReAdmis', 'Soft_drink', 'HighBlood', 'Stroke',
                        'Overweight', 'Arthritis', 'Diabetes', 'Hyperlipidemia',
                        'BackPain', 'Anxiety', 'Allergic_rhinitis', 'Reflux_esophagitis',
                        'Asthma']
for column in columns_to_reexpress:
    medical_data[column] = medical_data[column].map({'Yes': True, 'No': False }).astype(np.bool_)

tableau_visualizations = ['Zip', 'Children', 'Age', 'VitD_levels', 'HighBlood', 'Overweight', 'Arthritis', 'Diabetes', 'BackPain', 'Asthma', 'Initial_days', 'ReAdmis', 'Complication_risk', 'Initial_admin', 'Gender']

prepared_medical_data = medical_data[tableau_visualizations]

prepared_medical_data.to_csv('./tableau-wgu-dataset.csv')
prepared_medical_data.info()

Medical data does NOT contain any missing values

<class 'pandas.core.frame.DataFrame'>
Index: 10000 entries, 1 to 10000
Data columns (total 15 columns):
 #   Column             Non-Null Count  Dtype   
---  ------             --------------  -----   
 0   Zip                10000 non-null  object  
 1   Children           10000 non-null  int64   
 2   Age                10000 non-null  int64   
 3   VitD_levels        10000 non-null  float64 
 4   HighBlood          10000 non-null  bool    
 5   Overweight         10000 non-null  bool    
 6   Arthritis          10000 non-null  bool    
 7   Diabetes           10000 non-null  bool    
 8   BackPain           10000 non-null  bool    
 9   Asthma             10000 non-null  bool    
 10  Initial_days       10000 non-null  float64 
 11  ReAdmis            10000 non-null  bool    
 12  Complication_risk  10000 non-null  category
 13  Initial_admin      10000 non-null  category
 14  Gender             10000 non-null  category
dtypes: bool(

In [12]:
#Additional Data source
wgu_dataset_zip_codes = prepared_medical_data['Zip'].unique()
overall_hospital_ratings = pd.read_csv('./Data/Additional/Us Hospital Overall Rating/Hospital_General_Information_2016_2020.csv', index_col=0)

overall_hospital_ratings['ZIP Code'] = overall_hospital_ratings['ZIP Code'].astype('str').str.zfill(5)
match_overall_hospital_ratings = overall_hospital_ratings[overall_hospital_ratings['ZIP Code'].isin(wgu_dataset_zip_codes)]

#print(match_overall_hospital_ratings)

#TODO: Clean matched data

#move columns
additional_to_remove = ['Hospital overall rating footnote', 'Patient experience national comparison footnote']
match_overall_hospital_ratings.drop(additional_to_remove, axis=1, inplace=True)

missing_criteria_of_EHRs = match_overall_hospital_ratings['Meets criteria for promoting interoperability of EHRs'].isna()
match_overall_hospital_ratings[missing_criteria_of_EHRs] = 'N'

match_overall_hospital_ratings['Meets criteria for promoting interoperability of EHRs'] = match_overall_hospital_ratings['Meets criteria for promoting interoperability of EHRs'].map({'Y': True, 'N': False }).astype(np.bool_)

match_overall_hospital_ratings['Emergency Services'] = match_overall_hospital_ratings['Emergency Services'].map({'Yes': True, 'No': False}).astype(np.bool_)

match_overall_hospital_ratings.to_csv('./tableau-additional-dataset.csv')

<a id="interactice-data-dashboard"></a>
# A1: Interactive Data Dashboard

<a id="data-sets"></a>
# A2: Data Sets

<a id="installation-instructions"></a>
# A3: Installation Instructions

<a id="panopto-storying-telling-with-data"></a>
# B: Panopto Storying Telling With Data

<a id="dashboard-alignment"></a>
# C1: Dashboard Alignment

<a id="additional-data-set-insights"></a>
# C2: Additional Data Set Insights

<a id="decision-making-support"></a>
# C3: Decision-Making Support

<a id="interactice-controls"></a>
# C4: Interactive Controls

<a id="colorblindness"></a>
# C5: Colorblindness

<a id="data-representation"></a>
# C6: Data Representation

<a id="audience-analysis"></a>
# C7: Audience Analysis

<a id="universal-access"></a>
# C8: Universal Access

<a id="effective-storytelling"></a>
# C9: Effective Storytelling

<a id="sources"></a>
# D: Sources

<a id="professional-communication"></a>
# D: Professional Communication