# 📊 Emergency Room Wait Time Data Analysis

## Data Science Support
- **GetDataHelp**: [go.ncsu.edu/getdatahelp](https://go.ncsu.edu/getdatahelp)
- **Email**: [getdatahelp@ncsu.edu](mailto:getdatahelp@ncsu.edu)
- **Consultations**: [Schedule a consultation](https://www.lib.ncsu.edu/services/data-visualization/get-help/appointments)

## Instructor and Session Information

### 👨‍💻 Instructors
- **Name**: Alp Tezbasaran, Shannon Ricci
- **Institution**: NC State Libraries, Data Science Services

### 📅 Session Information
- **Session Date**: Sept 18, 2025
- **Session Duration**: ~45-60 mins
- **Learning Level**: Beginner to Intermediate
- **Dependencies**: Basic Python, pandas, matplotlib, seaborn
- **Environment**: Google Colab (recommended)

### 🆘 Resources

#### **Data Source Details**
- **Repository**: NCSU Libraries AI in Data Science
- **URL**: https://github.com/NCSU-Libraries/ai_in_data_science
- **Dataset**: ER Wait Time Dataset
- **File Format**: CSV, Excel

#### **Introduction to ER dataset**
1. **Data Documentation**: Check the repository README for detailed information
2. **Data Overview**: Review the "ER Wait Time Data Overview.txt" file
3. **Column Descriptions**: Each column has descriptive names indicating its purpose
4. **Data Quality**: The dataset is pre-cleaned and ready for analysis

#### **Common Data Issues and Solutions**
- **Missing Values**: Check for null values using `df.isnull().sum()`
- **Data Types**: Verify column types with `df.dtypes`
- **Encoding Issues**: Use `encoding='utf-8'` if needed
- **Memory Issues**: Use `chunksize` parameter for large datasets


### 📋 Notebook Information
- **Version**: 1.0
- **Python Version**: 3.8+
- **Required Packages**: pandas, numpy, matplotlib, seaborn, scipy, statsmodels
- **Estimated Runtime**: 30-45 minutes
- **Output Files**: None (analysis only)

### 🎯 Learning Objectives
By the end of this session, you will be able to:
1. Load and inspect healthcare datasets
2. Perform exploratory data analysis
3. Apply statistical tests to healthcare data
4. Create meaningful visualizations
5. Draw actionable conclusions from data

### 📚 Additional Resources
- **Python Documentation**: https://docs.python.org/
- **Pandas Documentation**: https://pandas.pydata.org/docs/
- **Matplotlib Documentation**: https://matplotlib.org/stable/
- **Seaborn Documentation**: https://seaborn.pydata.org/
- **Statistics Resources**: https://www.khanacademy.org/math/statistics-probability

---



# 🎯 Google Colab Accessibility Guide

## Making This Notebook Accessible for Everyone

### 🌙 Dark/Light Theme Support
Google Colab supports both dark and light themes to accommodate different visual preferences and accessibility needs:

#### **How to Change Themes:**
1. **Via Settings Menu:**
   - Click the gear icon (⚙️) in the top-right corner
   - Select "Settings" from the dropdown
   - Choose "Dark" or "Light" theme under "Appearance"

2. **Via Command Palette:**
   - Press `Ctrl+Shift+P` (Windows/Linux) or `Cmd+Shift+P` (Mac)
   - Type "theme" and select your preferred option

#### **Theme Benefits:**
- **Dark Theme**: Reduces eye strain in low-light environments
- **Light Theme**: Better for users with certain visual impairments
- **High Contrast**: Both themes provide good contrast for readability

### 🔍 Font Size and Zoom Controls

#### **Zoom In/Out:**
- **Zoom In**: `Ctrl + +` (Windows/Linux) or `Cmd + +` (Mac)
- **Zoom Out**: `Ctrl + -` (Windows/Linux) or `Cmd + -` (Mac)
- **Reset Zoom**: `Ctrl + 0` (Windows/Linux) or `Cmd + 0` (Mac)

#### **Browser Zoom:**
- **Chrome/Edge**: `Ctrl + +` or `Ctrl + -`
- **Firefox**: `Ctrl + +` or `Ctrl + -`
- **Safari**: `Cmd + +` or `Cmd + -`

### ⌨️ Keyboard Navigation

#### **Essential Shortcuts:**
- **Run Cell**: `Shift + Enter`
- **Run Cell & Move Down**: `Ctrl + Enter`
- **Add Cell Above**: `Ctrl + M A`
- **Add Cell Below**: `Ctrl + M B`
- **Delete Cell**: `Ctrl + M D`
- **Toggle Cell Type**: `Ctrl + M Y` (Code) or `Ctrl + M M` (Markdown)

#### **Navigation:**
- **Move Between Cells**: `Up/Down Arrow` keys
- **Edit Mode**: `Enter` (when cell is selected)
- **Command Mode**: `Escape` (when cell is selected)

### 🎨 Visual Accessibility Features

#### **Code Cell Features:**
- **Syntax Highlighting**: Automatic color coding for better readability
- **Line Numbers**: Toggle with `Ctrl + M L`
- **Code Folding**: Collapse/expand code blocks
- **Indentation Guides**: Visual guides for code structure

#### **Markdown Cell Features:**
- **Header Hierarchy**: Clear visual hierarchy with different font sizes
- **Bold/Italic Text**: Enhanced readability with formatting
- **Code Blocks**: Monospace font for code snippets
- **Lists and Tables**: Structured information presentation

### 🔧 Additional Accessibility Tools

#### **Screen Reader Support:**
- **Alt Text**: Images and plots include descriptive alt text
- **Semantic Structure**: Proper heading hierarchy for navigation
- **Descriptive Links**: All links include meaningful descriptions

#### **Color and Contrast:**
- **High Contrast**: All text meets WCAG contrast requirements
- **Color Independence**: Information is not conveyed by color alone
- **Pattern Alternatives**: Charts use patterns in addition to colors

### 📱 Mobile and Tablet Accessibility

#### **Touch-Friendly Interface:**
- **Large Touch Targets**: All interactive elements are appropriately sized
- **Swipe Navigation**: Natural touch gestures for navigation
- **Responsive Design**: Adapts to different screen sizes

#### **Mobile Shortcuts:**
- **Run Cell**: Tap the play button (▶️)
- **Edit Cell**: Double-tap to enter edit mode
- **Add Cell**: Use the + button in the toolbar

### 🎯 Customization Options

#### **Personal Preferences:**
- **Font Family**: Change in browser settings
- **Font Size**: Use browser zoom or Colab zoom controls
- **Line Spacing**: Adjust in browser accessibility settings
- **Cursor Size**: Modify in operating system settings

#### **Browser Extensions:**
- **High Contrast**: Browser extensions for enhanced contrast
- **Text-to-Speech**: Screen readers and text-to-speech tools
- **Magnification**: Browser zoom and magnification tools

### 🆘 Getting Help

#### **If You Need Assistance:**
1. **Colab Help**: Click the "?" icon in the top-right corner
2. **Keyboard Shortcuts**: Press `Ctrl + M H` to see all shortcuts
3. **Accessibility Support**: Contact Google Colab support for specific needs
4. **Community Forums**: Google Colab community for peer support

#### **Accessibility Resources:**
- **WCAG Guidelines**: Web Content Accessibility Guidelines
- **Google Accessibility**: Google's accessibility resources
- **Screen Reader Documentation**: Specific guides for your screen reader

---

**💡 Pro Tip**: Bookmark this section for quick reference during your analysis!


# Emergency Room Wait Time Data Analysis

## Overview
This notebook demonstrates a comprehensive analysis of Emergency Room (ER) wait time data. We'll explore patient satisfaction, wait times, and various factors that influence the ER experience.

## Learning Objectives
By the end of this analysis, you will be able to:
1. **Data Acquisition**: Download and load data from external sources
2. **Data Exploration**: Perform initial data inspection and quality assessment
3. **Statistical Analysis**: Apply appropriate statistical tests to answer research questions
4. **Data Visualization**: Create meaningful visualizations to communicate insights
5. **Interpretation**: Draw actionable conclusions from data analysis

## Dataset Description
The ER Wait Time Dataset contains information about:
- Patient demographics and visit characteristics
- Wait times at different stages of the ER process
- Patient satisfaction ratings
- Hospital and regional information
- Temporal factors (time of day, season, day of week)

## Analysis Workflow
1. **Data Loading & Inspection**: Understanding our dataset structure
2. **Exploratory Data Analysis**: Initial insights and patterns
3. **Statistical Testing**: Rigorous analysis of relationships
4. **Visualization**: Communicating findings effectively
5. **Conclusions**: Actionable insights for healthcare improvement



## Step 1: Data Acquisition

### Data Source
We'll download our dataset from the NCSU Libraries AI in Data Science repository. This dataset contains real-world ER wait time data that will allow us to practice data analysis techniques.

**Data Repository**: https://github.com/NCSU-Libraries/ai_in_data_science

### Why This Dataset?
- **Real-world relevance**: Healthcare data is critical for improving patient care
- **Rich features**: Multiple variables allow for comprehensive analysis
- **Appropriate size**: Large enough for meaningful analysis, small enough for learning
- **Clean structure**: Well-organized data suitable for educational purposes

In [1]:
# Clone the GitHub repository


## Step 2: Data Loading and Initial Inspection

### Understanding Our Data Structure
Before diving into analysis, we need to understand what data we're working with. This step involves:
- **Loading the data**: Reading the CSV file into a pandas DataFrame
- **Initial inspection**: Understanding the shape, columns, and data types
- **Data quality assessment**: Checking for missing values and data integrity

### Key Questions to Answer:
- How many records and variables do we have?
- What types of data are we working with?
- Are there any obvious data quality issues?
- What does a typical record look like?

In [2]:
# List the contents of the data folder


### Loading the Dataset
Now we'll load the CSV data into a pandas DataFrame. This is the foundation of our analysis - we need to get our data into a format that's easy to work with.

**Key concepts**:
- **pandas DataFrame**: A 2-dimensional labeled data structure
- **CSV format**: Comma-separated values, a common data exchange format
- **Data loading**: The process of reading external data into our analysis environment

In [3]:
# Load the CSV data into a pandas DataFrame and display the head


## Step 3: Exploratory Data Analysis (EDA)

### What is Exploratory Data Analysis?
EDA is the process of investigating datasets to summarize their main characteristics, often using visual methods. It helps us:
- **Understand patterns** in the data
- **Identify relationships** between variables
- **Detect anomalies** or outliers
- **Form hypotheses** for further testing

### Our Analysis Strategy
We'll start with **patient satisfaction** as our primary outcome variable, then explore how various factors influence it. This approach helps us understand what drives patient experience in the ER.

### Key Analysis Areas:
1. **Patient Satisfaction Distribution**: Understanding overall satisfaction levels
2. **Temporal Patterns**: How time affects patient experience
3. **Wait Time Analysis**: The relationship between wait times and satisfaction
4. **Statistical Testing**: Rigorous analysis of relationships

### 3.1 Patient Satisfaction Analysis

#### Why Patient Satisfaction?
Patient satisfaction is a crucial healthcare metric because it:
- **Reflects quality of care** from the patient's perspective
- **Impacts healthcare outcomes** and patient compliance
- **Influences hospital reputation** and patient retention
- **Provides actionable insights** for healthcare improvement

#### Our Approach:
1. **Distribution Analysis**: Understanding how satisfaction is distributed
2. **Visualization**: Creating clear, informative charts
3. **Statistical Summary**: Key statistics and insights
4. **Pattern Recognition**: Identifying trends and outliers

#### Key Questions:
- What is the overall satisfaction level?
- Are there clear patterns in satisfaction ratings?
- What factors might influence satisfaction?

In [4]:
# Check the value counts for patient satisfaction


In [5]:
# Create a bar plot of patient satisfaction counts

In [6]:
# Create a bar plot of patient satisfaction percentages


In [7]:
# Create a pie chart of patient satisfaction distribution
# Calculate the counts for each satisfaction level


In [8]:
# Display descriptive statistics for patient satisfaction
# what are the descriptive statistics for patient satisfaction

# print mode and median


### 3.2 Temporal Analysis: Time Patterns in ER Visits

#### Why Analyze Time Patterns?
Understanding temporal patterns in ER visits helps us:
- **Identify peak hours** and resource allocation needs
- **Understand patient flow** throughout the day/week
- **Optimize staffing** based on demand patterns
- **Improve patient experience** by managing expectations

#### Key Temporal Factors:
1. **Day of Week**: Weekend vs. weekday patterns
2. **Time of Day**: Morning, afternoon, evening, night patterns
3. **Seasonal Variations**: How seasons affect ER usage
4. **Wait Time Patterns**: How wait times vary by time

#### Analysis Approach:
- **Heatmaps**: Visualizing patterns across time dimensions
- **Statistical Analysis**: Quantifying differences between time periods
- **Correlation Analysis**: Understanding relationships between time and outcomes

In [9]:
# Display the first few rows of the DataFrame as a reminder


In [10]:
# Create a heatmap of the number of visits by Day of Week and Time of Day


In [11]:
# Create a heatmap showing visit count, average wait time, and average specialists by Day of Week and Time of Day


In [12]:
# Create a heatmap colored by average wait time with annotations
# same heatmap but color shows the total wait time


In [13]:
# Create side-by-side heatmaps for visit count and average wait time


### 3.3 Statistical Analysis: Patient Satisfaction vs. Time of Day

#### Research Question
**Is there a statistically significant relationship between patient satisfaction and the time of day when patients visit the ER?**

#### Why This Analysis Matters:
- **Healthcare Planning**: Understanding when patients are most/least satisfied
- **Resource Allocation**: Optimizing staff and resources for better outcomes
- **Patient Experience**: Identifying times when satisfaction might be lower
- **Quality Improvement**: Targeting specific time periods for improvement

#### Statistical Approach:
1. **Descriptive Analysis**: Visualizing satisfaction by time of day
2. **Statistical Testing**: Using appropriate tests to determine significance
3. **Effect Size**: Understanding the practical importance of differences
4. **Interpretation**: Drawing actionable conclusions

#### Key Concepts:
- **Box Plots**: Showing distribution and outliers
- **Statistical Tests**: Mann-Whitney U test for non-parametric data
- **Significance Levels**: Understanding p-values and confidence
- **Effect Size**: Practical vs. statistical significance

In [14]:
# Create a box plot showing patient satisfaction by Time of Day


In [15]:
# Calculate average patient satisfaction by Time of Day


In [16]:
# Perform Mann-Whitney U test for patient satisfaction in Early Morning vs. rest of the day


### 3.4 Wait Time Consistency Analysis

#### Research Question
**Are wait times consistent across different factors, and what drives variability in ER wait times?**

#### Why Analyze Wait Time Consistency?
- **Quality Assurance**: Ensuring consistent patient experience
- **Resource Planning**: Understanding factors that affect wait times
- **Process Improvement**: Identifying bottlenecks and optimization opportunities
- **Patient Expectations**: Managing patient expectations about wait times

#### Key Analysis Areas:
1. **Distribution Analysis**: Understanding wait time patterns
2. **Factor Analysis**: How different variables affect wait times
3. **Correlation Analysis**: Relationships between wait time components
4. **Statistical Testing**: Rigorous analysis of wait time differences

#### Wait Time Components:
- **Time to Registration**: Initial check-in process
- **Time to Triage**: Assessment and prioritization
- **Time to Medical Professional**: Seeing a healthcare provider
- **Total Wait Time**: Overall time from arrival to treatment

#### Statistical Concepts:
- **Histograms**: Understanding data distribution
- **Box Plots**: Comparing distributions across groups
- **Correlation Matrix**: Understanding relationships between variables
- **ANOVA**: Testing for significant differences between groups

In [17]:
# Display histograms of wait time components


In [18]:
# Display stacked histogram of component wait times


In [19]:
# Create a box plot showing time to medical professional by Time of Day and Urgency Level


In [20]:
# Calculate the correlation matrix for numerical columns

# Display the correlation of 'Total Wait Time (min)' with other numerical columns


In [21]:
# Calculate average Total Wait Time by Hospital ID, Region, and Urgency Level
# Calculate average Total Wait Time by Hospital ID


In [22]:
# Calculate average Total Wait Time and count by Patient Outcome


# Calculate the percentage of patients for each outcome


In [23]:
# Display the head of the DataFrame (first 20 rows)
# df.head(20)

In [24]:
# Calculate average Total Wait Time by Season


In [25]:
# Perform one-way ANOVA test
# The formula specifies that 'Total Wait Time (min)' is dependent on 'Season'

## Step 4: Advanced Analysis - Seasonal Patterns

### 4.1 Seasonal Analysis Overview

#### Why Analyze Seasonal Patterns?
Seasonal analysis helps us understand:
- **Resource Planning**: Preparing for seasonal variations in demand
- **Staffing Optimization**: Adjusting resources based on predictable patterns
- **Patient Experience**: Understanding how seasons affect patient satisfaction
- **Healthcare Outcomes**: Identifying seasonal health trends

#### Key Research Questions:
1. **Do wait times vary significantly by season?**
2. **Is there a relationship between seasonal wait times and patient satisfaction?**
3. **What factors drive seasonal variations in ER performance?**

#### Statistical Approach:
- **Descriptive Statistics**: Average wait times and satisfaction by season
- **ANOVA Testing**: Statistical significance of seasonal differences
- **Correlation Analysis**: Relationship between wait times and satisfaction
- **Visualization**: Clear presentation of seasonal patterns

#### Expected Insights:
- **Winter**: Typically higher demand due to flu season and weather-related injuries
- **Summer**: May have different patterns due to vacation schedules and outdoor activities
- **Spring/Fall**: Transition periods with potentially different patient demographics

### 4.2 Calculating Seasonal Averages

#### Methodology
We'll calculate the average total wait time and average patient satisfaction for each season by:
1. **Grouping the data** by 'Season' column
2. **Computing the mean** of 'Total Wait Time (min)' and 'Patient Satisfaction'
3. **Comparing results** across seasons to identify patterns

#### Why This Approach?
- **Simple and Clear**: Easy to understand and interpret
- **Comparative Analysis**: Allows direct comparison between seasons
- **Foundation for Testing**: Provides data for statistical significance testing
- **Actionable Insights**: Clear metrics for healthcare planning

#### Key Metrics:
- **Average Wait Time**: Mean total wait time per season
- **Average Satisfaction**: Mean patient satisfaction rating per season
- **Sample Sizes**: Number of observations per season
- **Standard Deviations**: Variability within each season



In [26]:
# Calculate the average total wait time and average patient satisfaction by Season


### 4.3 Statistical Testing: Seasonal Relationships

#### Research Question
**Is there a statistically significant relationship between average wait times and patient satisfaction across seasons?**

#### Statistical Approach
We'll perform correlation analysis to quantify the relationship between:
- **Average Total Wait Time** by season
- **Average Patient Satisfaction** by season

#### Why Correlation Analysis?
- **Quantifies Relationships**: Provides a numerical measure of association
- **Direction and Strength**: Shows whether the relationship is positive/negative and how strong
- **Statistical Significance**: Determines if the relationship is meaningful
- **Actionable Insights**: Helps understand if reducing wait times improves satisfaction

#### Key Concepts:
- **Correlation Coefficient**: Measures strength and direction of linear relationship
- **Significance Testing**: Determines if the relationship is statistically meaningful
- **Effect Size**: Understanding the practical importance of the relationship
- **Interpretation**: Drawing conclusions about cause and effect

### 4.4 Interpretation and Analysis

#### Data Interpretation Framework
We'll analyze the relationship between average wait times and patient satisfaction by season to understand:
- **Pattern Recognition**: Identifying trends and relationships
- **Statistical Significance**: Determining if differences are meaningful
- **Practical Implications**: Understanding what the data means for healthcare
- **Actionable Insights**: Drawing conclusions for improvement

#### Key Analysis Steps:
1. **Descriptive Analysis**: Examining the raw numbers and patterns
2. **Statistical Testing**: Determining significance of relationships
3. **Effect Size**: Understanding practical importance
4. **Interpretation**: Drawing meaningful conclusions

#### Expected Insights:
- **Inverse Relationship**: Longer wait times typically correlate with lower satisfaction
- **Seasonal Variations**: Different seasons may show different patterns
- **Statistical Significance**: Whether observed differences are meaningful
- **Practical Implications**: What this means for healthcare operations

In [27]:
print("Analysis of the relationship between average wait times and patient satisfaction by Season:")
print("-----------------------------------------------------------------------------------------")

# Display the calculated averages again for easy reference
# display(average_stats_by_season)

# Interpret the relationship
print("\nInterpretation:")
print("- Seasons with lower average total wait times (Fall and Spring) tend to have higher average patient satisfaction.")
print("- Seasons with higher average total wait times (Winter and Summer) tend to have lower average patient satisfaction.")
print("- This suggests an inverse relationship: as average wait times increase, average patient satisfaction tends to decrease.")

Analysis of the relationship between average wait times and patient satisfaction by Season:
-----------------------------------------------------------------------------------------

Interpretation:
- Seasons with lower average total wait times (Fall and Spring) tend to have higher average patient satisfaction.
- Seasons with higher average total wait times (Winter and Summer) tend to have lower average patient satisfaction.
- This suggests an inverse relationship: as average wait times increase, average patient satisfaction tends to decrease.


## Step 5: Conclusions and Next Steps

### 5.1 Key Findings Summary

#### What We've Learned:
1. **Patient Satisfaction Patterns**: Understanding overall satisfaction levels and distributions
2. **Temporal Relationships**: How time of day and season affect patient experience
3. **Wait Time Analysis**: Factors that influence wait times and their consistency
4. **Statistical Relationships**: Significant associations between variables

#### Key Insights:
- **Patient satisfaction** varies by time of day and season
- **Wait times** show patterns that can inform resource allocation
- **Statistical significance** of relationships provides confidence in findings
- **Actionable recommendations** for healthcare improvement

### 5.2 Practical Applications

#### For Healthcare Administrators:
- **Resource Planning**: Optimizing staff and resources based on demand patterns
- **Quality Improvement**: Targeting specific areas for patient experience enhancement
- **Performance Monitoring**: Establishing benchmarks and tracking progress

#### For Healthcare Providers:
- **Patient Communication**: Setting appropriate expectations about wait times
- **Process Optimization**: Identifying bottlenecks and improvement opportunities
- **Quality Assurance**: Monitoring and improving patient satisfaction

### 5.3 Next Steps for Further Analysis

#### Advanced Analytics:
1. **Predictive Modeling**: Using machine learning to predict wait times and satisfaction
2. **Causal Analysis**: Understanding cause-and-effect relationships
3. **Optimization**: Mathematical models for resource allocation
4. **Real-time Monitoring**: Dashboard development for ongoing analysis

#### Data Collection Improvements:
1. **Additional Variables**: Patient demographics, medical conditions, staff levels
2. **Temporal Granularity**: Hourly data for more detailed analysis
3. **Patient Feedback**: Qualitative data to complement quantitative metrics
4. **Operational Data**: Staff schedules, equipment availability, room capacity

### 5.4 Learning Outcomes Achieved

#### Technical Skills:
- **Data Loading**: Importing and managing datasets
- **Exploratory Analysis**: Understanding data structure and patterns
- **Statistical Testing**: Applying appropriate statistical methods
- **Visualization**: Creating meaningful charts and graphs
- **Interpretation**: Drawing actionable conclusions from data

#### Analytical Thinking:
- **Question Formulation**: Developing research questions
- **Methodology Selection**: Choosing appropriate analytical approaches
- **Critical Evaluation**: Assessing results and limitations
- **Communication**: Presenting findings clearly and effectively

### 5.5 Resources for Continued Learning

#### Recommended Next Steps:
1. **Advanced Statistics**: Regression analysis, time series analysis
2. **Machine Learning**: Predictive modeling, clustering, classification
3. **Healthcare Analytics**: Specialized courses in healthcare data analysis
4. **Data Visualization**: Advanced visualization techniques and tools
5. **Programming**: Advanced Python, R, or other analytical languages

#### Additional Datasets:
- **Healthcare.gov**: Public health datasets
- **CDC Data**: Centers for Disease Control datasets
- **Hospital Compare**: Medicare hospital quality data
- **WHO Data**: World Health Organization health statistics

---

## Congratulations! 🎉

You've completed a comprehensive analysis of ER wait time data. You've learned to:
- Load and explore datasets
- Perform statistical analysis
- Create meaningful visualizations
- Draw actionable conclusions
- Apply analytical thinking to real-world problems

**Keep practicing with different datasets and analytical techniques to continue developing your data analysis skills!**
