# Hospital Quality Analysis

This project explores hospital performance across the United States using data from the Centers for Medicare & Medicaid Services (CMS). The goal is to analyze how hospital ownership, type, and location relate to overall ratings and quality of care.


In [92]:
# Import pandas for working with the dataset
import pandas as pd

# Load the hospital dataset
# This file contains U.S. hospital information and performance ratings
# The encoding 'latin1' is used to handle special characters in the dataset
df = pd.read_csv('../Data/Hospital General Information.csv', encoding='latin1')

# Display the first few rows
df.head()

df.info()



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4812 entries, 0 to 4811
Data columns (total 28 columns):
 #   Column                                                         Non-Null Count  Dtype 
---  ------                                                         --------------  ----- 
 0   Provider ID                                                    4812 non-null   int64 
 1   Hospital Name                                                  4812 non-null   object
 2   Address                                                        4812 non-null   object
 3   City                                                           4812 non-null   object
 4   State                                                          4812 non-null   object
 5   ZIP Code                                                       4812 non-null   int64 
 6   County Name                                                    4797 non-null   object
 7   Phone Number                                                   4812 n

### Select revelvant columns 

In [93]:
# Trim the dataset to only relevant columns
df = df[[
    'Hospital Name',
    'State',
    'Hospital Type',
    'Hospital Ownership',
    'Emergency Services',
    'Hospital overall rating',
    'Readmission national comparison',
    'Effectiveness of care national comparison',
    'Timeliness of care national comparison',
    'Patient experience national comparison'
]]


### Remove Incomplete Ratings

Rows with missing overall hospital ratings are removed since they cannot be used for comparison in the analysis.


In [94]:
# Remove rows with missing hospital overall ratings
df = df[df['Hospital overall rating'].notna()]
 

In [95]:
# Convert overall rating to numeric in case it's stored as text
df['Hospital overall rating'] = pd.to_numeric(df['Hospital overall rating'], errors='coerce')


In [96]:
# Save the cleaned dataset
df.to_csv('../Data/hospital_quality_dashboard.csv', index=False)

In [97]:
# View the first few rows of the cleaned dataset
df.head()


Unnamed: 0,Hospital Name,State,Hospital Type,Hospital Ownership,Emergency Services,Hospital overall rating,Readmission national comparison,Effectiveness of care national comparison,Timeliness of care national comparison,Patient experience national comparison
0,SOUTHEAST ALABAMA MEDICAL CENTER,AL,Acute Care Hospitals,Government - Hospital District or Authority,Yes,3.0,Same as the national average,Same as the national average,Same as the national average,Below the national average
1,MARSHALL MEDICAL CENTER SOUTH,AL,Acute Care Hospitals,Government - Hospital District or Authority,Yes,3.0,Above the national average,Same as the national average,Above the national average,Same as the national average
2,ELIZA COFFEE MEMORIAL HOSPITAL,AL,Acute Care Hospitals,Government - Hospital District or Authority,Yes,2.0,Same as the national average,Same as the national average,Above the national average,Below the national average
3,MIZELL MEMORIAL HOSPITAL,AL,Acute Care Hospitals,Voluntary non-profit - Private,Yes,2.0,Below the national average,Below the national average,Above the national average,Same as the national average
4,CRENSHAW COMMUNITY HOSPITAL,AL,Acute Care Hospitals,Proprietary,Yes,3.0,Same as the national average,Same as the national average,Above the national average,Not Available


## Analyze Hospital Ownership

We examine the number of hospitals by ownership type to understand how different ownership models are represented in the dataset. 
This can help reveal whether any ownership types are over- or under-represented in quality measures.

In [98]:
# Count the number of hospitals by ownership type
df['Hospital Ownership'].value_counts()

Hospital Ownership
Voluntary non-profit - Private                 2052
Proprietary                                     800
Government - Hospital District or Authority     561
Voluntary non-profit - Other                    462
Government - Local                              407
Voluntary non-profit - Church                   343
Physician                                        68
Government - State                               65
Government - Federal                             45
Tribal                                            9
Name: count, dtype: int64

In [99]:
#Convert hospital rating to numeric (in case it's stored as text)
df['Hospital overall rating'] = pd.to_numeric(df['Hospital overall rating'], errors='coerce')

df.groupby('Hospital Ownership')['Hospital overall rating'].mean().sort_values(ascending=False)


Hospital Ownership
Physician                                      3.750000
Voluntary non-profit - Church                  3.177994
Voluntary non-profit - Other                   3.150134
Voluntary non-profit - Private                 3.133956
Government - Hospital District or Authority    3.017804
Government - Local                             2.950617
Government - Federal                           2.937500
Proprietary                                    2.831442
Government - State                             2.600000
Tribal                                         2.500000
Name: Hospital overall rating, dtype: float64

In [100]:
# Summary table of hospital counts and average ratings by ownership
df.groupby('Hospital Ownership')['Hospital overall rating'].agg(['count', 'mean']).sort_values(by='mean', ascending=False)


Unnamed: 0_level_0,count,mean
Hospital Ownership,Unnamed: 1_level_1,Unnamed: 2_level_1
Physician,20,3.75
Voluntary non-profit - Church,309,3.177994
Voluntary non-profit - Other,373,3.150134
Voluntary non-profit - Private,1605,3.133956
Government - Hospital District or Authority,337,3.017804
Government - Local,243,2.950617
Government - Federal,16,2.9375
Proprietary,617,2.831442
Government - State,45,2.6
Tribal,2,2.5


In [101]:
# Save the cleaned hospital data to CSV
df.to_csv('../Data/hospital_quality_dashboard.csv', index=False)

## Summary and Next Steps

This notebook prepared and cleaned a dataset of U.S. hospital quality data from CMS.
The processed file was exported to CSV and used to create a Tableau dashboard that visualizes average hospital ratings by ownership, type, and state.

Future expansions could include joining patient satisfaction data or analyzing cost-efficiency across hospital types.
