#  Hospital Data ETL Demonstration

This notebook demonstrates **Python-based ETL (Extract, Transform, Load)** steps on a hospital dataset.  
It is intended to showcase **data engineering skills** using Python, while the final **visualizations and reporting** will be built in **Power BI**.

---

## 1. Setup & Libraries

In [85]:
import pandas as pd
import os

## 2. Load Hospital Data Tables

In [88]:
# Define folder path where CSV files are located
folder_path = r'C:\Users\nawaf\Downloads\HospetalDB'  # change if needed

# List CSV files in folder
csv_files = [f for f in os.listdir(folder_path) if f.endswith('.csv')]

# Load CSV files into a dictionary
dataframes = {f.split('.csv')[0].strip().lower(): pd.read_csv(os.path.join(folder_path, f))
              for f in csv_files}

# Assign variables
appointments = dataframes.get('appointments')
patients = dataframes.get('patients')
doctors = dataframes.get('doctors')
treatments = dataframes.get('treatments')
peeling = dataframes.get('billing')


## 3. Data Overview

In [101]:

# Display shapes of all tables
print("Patients:", patients.shape)
print("Appointments:", appointments.shape)
print("Doctors:", doctors.shape)
print("Treatments:", treatments.shape)
print("Billing:", billing.shape)

# Quick info and missing values check
patients.info()
patients.isnull().sum()
    

Patients: (50, 11)
Appointments: (200, 7)
Doctors: (10, 8)
Treatments: (200, 6)
Billing: (200, 7)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 11 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   patient_id          50 non-null     object
 1   first_name          50 non-null     object
 2   last_name           50 non-null     object
 3   gender              50 non-null     object
 4   date_of_birth       50 non-null     object
 5   contact_number      50 non-null     int64 
 6   address             50 non-null     object
 7   registration_date   50 non-null     object
 8   insurance_provider  50 non-null     object
 9   insurance_number    50 non-null     object
 10  email               50 non-null     object
dtypes: int64(1), object(10)
memory usage: 4.4+ KB


patient_id            0
first_name            0
last_name             0
gender                0
date_of_birth         0
contact_number        0
address               0
registration_date     0
insurance_provider    0
insurance_number      0
email                 0
dtype: int64

## 4. ETL Transformations

In [103]:

# Example: Convert appointment_date to datetime
appointments['appointment_date'] = pd.to_datetime(appointments['appointment_date'], errors='coerce')

# Example: Rename columns for clarity
patients.rename(columns={"dob": "date_of_birth"}, inplace=True)

# Example: Filter patients older than 60
patients['age'] = (pd.Timestamp("today") - pd.to_datetime(patients['date_of_birth'], errors='coerce')).dt.days // 365
senior_patients = patients[patients['age'] > 60]

# Example: Merge appointments with patients and doctors
appt_details = appointments.merge(patients, on="patient_id", how="left") \
                          .merge(doctors, on="doctor_id", how="left")

appt_details.head()
    

Unnamed: 0,appointment_id,patient_id,doctor_id,appointment_date,appointment_time,reason_for_visit,status,first_name_x,last_name_x,gender,...,insurance_number,email_x,age,first_name_y,last_name_y,specialization,phone_number,years_experience,hospital_branch,email_y
0,A001,P034,D009,2023-08-09,15:15:00,Therapy,Scheduled,Alex,Smith,F,...,INS653880,alex.smith@mail.com,75,Sarah,Smith,Pediatrics,7387087517,26,Central Hospital,dr.sarah.smith@hospital.com
1,A002,P032,D004,2023-06-09,14:30:00,Therapy,No-show,Alex,Moore,M,...,INS335362,alex.moore@mail.com,44,David,Jones,Pediatrics,6594221991,28,Central Hospital,dr.david.jones@hospital.com
2,A003,P048,D004,2023-06-28,8:00:00,Consultation,Cancelled,Emily,Miller,M,...,INS694319,emily.miller@mail.com,42,David,Jones,Pediatrics,6594221991,28,Central Hospital,dr.david.jones@hospital.com
3,A004,P025,D006,2023-09-01,9:15:00,Consultation,Cancelled,Robert,Wilson,M,...,INS833429,robert.wilson@mail.com,59,Alex,Davis,Pediatrics,6570137231,23,Central Hospital,dr.alex.davis@hospital.com
4,A005,P040,D003,2023-07-06,12:45:00,Emergency,No-show,Emily,Williams,M,...,INS320984,emily.williams@mail.com,53,Jane,Smith,Pediatrics,8737740598,19,Eastside Clinic,dr.jane.smith@hospital.com


## 5. Export Cleaned Data for Power BI

In [107]:

# Save cleaned tables for Power BI (CSV format)
export_path = os.path.join(folder_path, "cleaned")

os.makedirs(export_path, exist_ok=True)

patients.to_csv(os.path.join(export_path, "patients_clean.csv"), index=False)
appointments.to_csv(os.path.join(export_path, "appointments_clean.csv"), index=False)
doctors.to_csv(os.path.join(export_path, "doctors_clean.csv"), index=False)
treatments.to_csv(os.path.join(export_path, "treatments_clean.csv"), index=False)
billing.to_csv(os.path.join(export_path, "billing_clean.csv"), index=False)

print("✅ Cleaned data exported successfully for Power BI.")
    

✅ Cleaned data exported successfully for Power BI.



## 6. Closing Note

The dataset is relatively **small and already structured**, which makes **Power BI the ideal tool for interactive dashboards and reporting**.  

This notebook demonstrates how Python can be used for **ETL and data preparation**, complementing Power BI for **business intelligence delivery**.  
Together, they provide an end-to-end workflow: **Python for ETL + Power BI for visualization**.
    