## Hospital Performance And Efficiancy Analysis
#### This project analyze hospital Performance using multiple healthcare dataset. The goal is to clean, prepare and analyze hospital data to understand the relationship between hospital ratings, medicare spending, and patient outcomes.

In [None]:
# Importing necessary libraries to load, process, analyze and visualize the dataset.
import sqlite3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Loading the Datasets 
#### The three datasets are loaded into Python to begin data cleaning and preparation for analysis.

In [None]:
# Loading hospital general information dataset
hospital_info = pd.read_csv("../data/Hospital_General_Information.csv")
hospital_info.head(5)

In [None]:
# Loading medicare hospital spending per patient dataset
medicare_data = pd.read_csv("../data/Medicare_Hospital_spending_Per_Patient-Hospital.csv")
medicare_data.head(5)

In [None]:
# Loading unplanned hospital visits
unplanned_visit_data= pd.read_csv("../data/Unplanned_Hospital_visits-Hospital.csv")
unplanned_visit_data.head(5)

## Exploring the Hospital Info Dataset
The structure and contents of the dataset are examined to understand data types, check for missing values, and get the overview of the dataset.

In [None]:
# checking datatypes and missing values
hospital_info.info()

In [None]:
# checking number of rows and columns
hospital_info.shape

## Data Cleaning: Hospital Info Data
#### In this section, hospital general information dataset is cleaned to prepare it for analysis. the cleaning process includes reviewing columns names, removing unnecessary columns, renaming columns, checking for missing values. The goal is to create a clean dataset that can be easily used for analysis and combined with other datasets.


### step 1 : Inspecting Columns
### In this step, I reviewed all the columns names in the hospital dataset to understand what information is available. This helps identify which columns are useful for the analysis and which ones may not be needed.

In [None]:
# columns for hospital data
hospital_info.columns

### step 2 : Removing Unnecessary Columns
#### some columns in the hospital dataset are not needed for this project. Those columns are removed to make the dataset cleaner and easier to work during analysis.

In [None]:
# Removing the unnecessary columns from hospital data
hospital_info = hospital_info.drop(columns=['Address', 'City/Town',
       'ZIP Code', 'County/Parish', 'Telephone Number',
       'Meets criteria for birthing friendly designation',
        'Hospital overall rating footnote',
       'MORT Group Measure Count', 'Count of Facility MORT Measures',
       'Count of MORT Measures Better', 'Count of MORT Measures No Different',
       'Count of MORT Measures Worse', 'MORT Group Footnote',
       'Safety Group Measure Count', 'Count of Facility Safety Measures',
       'Count of Safety Measures Better',
       'Count of Safety Measures No Different',
       'Count of Safety Measures Worse', 'Safety Group Footnote',
       'READM Group Measure Count', 'Count of Facility READM Measures',
       'Count of READM Measures Better',
       'Count of READM Measures No Different', 'Count of READM Measures Worse',
       'READM Group Footnote', 'Pt Exp Group Measure Count',
       'Count of Facility Pt Exp Measures', 'Pt Exp Group Footnote',
       'TE Group Measure Count', 'Count of Facility TE Measures',
       'TE Group Footnote'], axis=1)
hospital_info.head(5)

### Data cleaning step 3 : Renaing Columns
#### In this step, renamed columns to improve redability and ensure compatibility with python and SQL quaries.
#### The original dataset contains spaces and special character, which can cause error during analysis. columns are renamed using snake_case formatting.

In [None]:
# Renaming Columns
hospital_info = hospital_info.rename(columns={
    'Facility ID' : 'facility_id',
    'Facility Name' : 'facility_name',
    'State': 'state',
    'Hospital Type' : 'hospital_type',
    'Hospital Ownership' : 'hospital_ownership',
    'Emergency Services' : 'emergency_services',
    'Hospital overall rating' : 'hospital_overall_rating'
})


### Step 4: checking for missing values
#### The dataset is examined for missing(Null) values. Identifying missing data is important because it can affect analysis accuracy.

In [None]:
# checking null values
hospital_info.isnull().sum()

In [None]:
# Information about the dataset
hospital_info.info()

### Step 5: Checking for Duplicates
In this step, dataset is checked for duplicate rows. Duplicate rows can cause error and may lead to double counting hospital data. Duplicate rows will be identified and removed.

In [None]:
#check duplicates based on facility_id
duplicates_id = hospital_info.duplicated(subset=['facility_id']).sum()
duplicates_id 

#### Each hospital is uniquely represented in the dataset.

In [None]:
# columns in medicare dataset
medicare_data.columns

In [None]:
# Removing unnecessery columns from medicare data
medicare_data = medicare_data.drop(columns=['Address', 'City/Town', 
       'ZIP Code', 'Facility Name', 'State','County/Parish','Measure ID', 'Telephone Number', 'Footnote'])
medicare_data.head(5)

In [None]:
#columns in unplanned visit data
unplanned_visit_data.columns

In [None]:
# removing unnecessery columns from unplanned visit data
unplanned_visit_data = unplanned_visit_data.drop(columns=['Address', 'City/Town',
       'ZIP Code', 'County/Parish', 'Telephone Number', 'Measure ID',
         'Denominator', 'Facility Name',	'State',
       'Lower Estimate', 'Higher Estimate', 
        'Footnote'])
unplanned_visit_data.head(5)