# Graded Lab: Comprehensive EDA Challenge 

## Overview 
Welcome to your comprehensive data analysis challenge! In this lab, you'll apply your Python data science skills to solve a complex healthcare analytics problem through both required and extended analysis tasks.<br>
As a data scientist on MediTrack's analytics team, you're tasked with transforming their vast healthcare data into actionable insights. MediTrack's platform integrates electronic health records, billing systems, and patient visit histories to revolutionize healthcare delivery. However, they need your expertise to uncover patterns and trends that will optimize their operations.


### Your Challenge 
The Chief Medical Officer and Financial Director have requested a comprehensive analysis that will:
- Identify patterns in patient demographics and health conditions
- Analyze the effectiveness of insurance coverage
- Track billing efficiency and payment patterns
- Create interactive dashboards for real-time decision support

This lab consists of two crucial parts:

<b>Part 1 (Graded):</b>

You'll conduct a comprehensive EDA focusing on:
- Patient demographic analysis
- Financial pattern identification
- Correlation studies between health conditions and costs
- Static visualization development for key metrics

<b>Part 2 (Ungraded):</b>

You'll extend your analysis by:
- Creating interactive dashboards for dynamic data exploration
- Developing real-time filtering mechanisms
- Building user-friendly interfaces for medical staff

### Learning Outcomes 
By completing this analysis, you will:
- Generate actionable healthcare insights through comprehensive EDA
- Create dynamic visualizations for stakeholder communication
- Identify patterns in patient care and financial data
- Develop interactive tools for healthcare decision-makers

## Getting Started 


In [None]:
# Import essential libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
pio.renderers.default = 'notebook'
from plotly.subplots import make_subplots
#from plotchecker import PlotChecker


# Set visualization preferences
sns.set_style('whitegrid') 

# Part 1: Graded Challenges - Comprehensive EDA  

## Graded Challenge 1: Data Integration and Initial Analysis

As MediTrack's data scientist, establish a foundation for analysis by:
- Loading and validating all data sources
- Generating comprehensive statistical summaries
- Identifying data quality issues

In [None]:
# Load billing.csv and patients.csv in dataframes billing_df and patients_df respectively and generate basic statistics

# YOUR CODE HERE

In [None]:
# This cell contains test cases. Do not edit this cell.
# NBGRADER TEST - 1: Dataframes and Appropriate statistic calculations

## Graded Challenge 2: Patient Population Analysis

The Medical Director needs insights about patient demographics to optimize care delivery. Create demographic visualizations that show:
- Age distribution across insurance types
- Gender distribution by pre-existing conditions
- Insurance coverage distribution 

In [None]:
# Create a box plot that shows age distribution across insurance types.
# Do not change the function name

def age_distribution(): 
    
    # YOUR CODE HERE
    
age_distribution()

In [None]:
# This cell contains test cases. Do not edit this cell.
# NBGRADER TEST - 2A: Age distribution across Insurance Types Box plot

In [None]:
# Create a stacked bar chart showing gender distribution by pre-existing conditions.
# Each bar should represent a condition, segmented by gender.
# Do not change the function name.

def gender_distribution():
    
    # YOUR CODE HERE
    
gender_distribution()

In [None]:
# This cell contains test cases. Do not edit this cell.
# NBGRADER TEST - 2B: Gender by Condition Bar Chart

In [None]:
# Create a pie chart to show Insurance coverage distribution 
# Do not change the function name

def insurance_distribution(): 
    
    # YOUR CODE HERE
    
insurance_distribution()

In [None]:
# This cell contains test cases. Do not edit this cell.
# NBGRADER TEST - 2C: Insurance Coverage Pie Chart

## Graded Challenge 3: Financial Pattern Analysis

The Finance Team needs to understand billing efficiency.Analyze:
- Average charges for different conditions
- Payment status distributions

In [None]:
# Create a bar plot that shows average charges by condition. 
# 1. Calculate Average charges for each condition
# 2. Then, create a bar plot using seaborn

def average_charges_distribution():
    
    # YOUR CODE HERE
    
average_charges_distribution()

In [None]:
# This cell contains test cases. Do not edit this cell.
# NBGRADER TEST - 3A: Average Charges by Pre-existing Condition

In [None]:
# Create a pie chart that shows payment status distribution
# Do not change the function name

def payment_status_distribution():
    
    # YOUR CODE HERE
    
payment_status_distribution()

In [None]:
# This cell contains test cases. Do not edit this cell.
# NBGRADER TEST - 3B: Payment Status Pie Chart

## Graded Challenge 4: Healthcare Metrics Correlation

Analyze how patient characteristics, insurance coverage, and pre-existing conditions correlate with healthcare charges and billing outcomes.
- Merge the patient and billing datasets using a common key, into a dataframe called <b>merged_df</b>
- Focus on numeric columns for correlation analysis.
- Create and visualize a correlation matrix to identify significant relationships.

In [None]:
# Merge datasets for correlation analysis into a dataframe called "merged_df"

# YOUR CODE HERE

# Then, create and visualize a heatmap for correlation matrix
def correlation_analysis():
    
    # YOUR CODE HERE
    
correlation_analysis()

In [None]:
# This cell contains test cases. Do not edit this cell.
# NBGRADER TEST - 4A: Correlation Matrix Heatmap

# Part 2: Ungraded Interactive Visualization Extension 

## Activities

### Activity 1: Interactive Patient Analytics

In [None]:
# Create a dynamic scatter to explore patient demographics and billing data.
# Include hover information on:
# - Insurance types

pio.renderers.default = 'iframe_connected'

# YOUR CODE HERE 

### Activity 2: Financial Insights 

In [None]:
# Develop an interactive billing analysis 
# Feature requirements:
# - Age
# - Gender

# YOUR CODE HERE 


## Verify Your Results

Before submission, verify your analysis meets these requirements:

Data Integration & Initial Analysis:
- Both datasets successfully loaded
- All data types correctly identified
- Complete summary statistics generated
- Missing values properly documented

Patient Demographics Analysis:
- Age distribution visualization present
- Gender breakdown complete
- Insurance type distribution shown
- Pre-existing conditions analyzed

Financial Pattern Analysis:
- Average charges calculated correctly
- Payment status distributions visualized
- Insurance coverage patterns identified
- Temporal trends documented

Correlation Analysis:
- Correlation matrix properly calculated
- Key relationships visualized
- Statistical significance noted
- Insights documented

## Troubleshooting
If you encounter issues, check these common problems and solutions:

Data Loading Issues:
- Verify file paths and names
- Check for proper encoding
- Confirm data types match expected format

Visualization Problems:
- Ensure all required libraries are imported
- Check for proper figure sizes
- Verify color schemes are appropriate
- Confirm axes labels are readable

Analysis Errors:
- Review calculations for mathematical accuracy
- Check for proper handling of missing values
- Verify grouping operations
- Confirm correlation calculations

Documentation Issues:
- Ensure all insights are clearly written
- Check that visualizations have proper titles and labels
- Verify that findings are supported by data
- Confirm all required components are included