## **Automated Analysis and Visualization of Global Hospital Data 🩺**

**🏥Overview:**

This project analyzes a comprehensive global health dataset to determine disease prevalence, identify epidemiological trends, and present findings through automation and interactive visualization. The dataset spans multiple countries and years, covering various diseases, treatments, and healthcare outcomes.

The analysis supports health policy planning, epidemiological research, and predictive modeling, enabling data-driven decisions that can improve healthcare delivery and resource allocation.

**Libraries to use:**

In [3]:
import pandas as pd
import numpy as np
from datetime import datetime

In [4]:
df = pd.read_csv("global_health_stats.csv")

**Understand the Data Structure:**

In [6]:
df.head()

Unnamed: 0,Country,Year,Disease Name,Disease Category,Prevalence Rate (%),Incidence Rate (%),Mortality Rate (%),Age Group,Gender,Population Affected,...,Hospital Beds per 1000,Treatment Type,Average Treatment Cost (USD),Availability of Vaccines/Treatment,Recovery Rate (%),DALYs,Improvement in 5 Years (%),Per Capita Income (USD),Education Index,Urbanization Rate (%)
0,Italy,2013,Malaria,Respiratory,0.95,1.55,8.42,0-18,Male,471007,...,7.58,Medication,21064,No,91.82,4493,2.16,16886,0.79,86.02
1,France,2002,Ebola,Parasitic,12.46,8.63,8.75,61+,Male,634318,...,5.11,Surgery,47851,Yes,76.65,2366,4.82,80639,0.74,45.52
2,Turkey,2015,COVID-19,Genetic,0.91,2.35,6.22,36-60,Male,154878,...,3.49,Vaccination,27834,Yes,98.55,41,5.81,12245,0.41,40.2
3,Indonesia,2011,Parkinson's Disease,Autoimmune,4.68,6.29,3.99,0-18,Other,446224,...,8.44,Surgery,144,Yes,67.35,3201,2.22,49336,0.49,58.47
4,Italy,2013,Tuberculosis,Genetic,0.83,13.59,7.01,61+,Male,472908,...,5.9,Medication,8908,Yes,50.06,2832,6.93,47701,0.5,48.14


In [7]:
df.describe()

Unnamed: 0,Year,Prevalence Rate (%),Incidence Rate (%),Mortality Rate (%),Population Affected,Healthcare Access (%),Doctors per 1000,Hospital Beds per 1000,Average Treatment Cost (USD),Recovery Rate (%),DALYs,Improvement in 5 Years (%),Per Capita Income (USD),Education Index,Urbanization Rate (%)
count,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0
mean,2011.996999,10.047992,7.555005,5.049919,500735.427363,74.987835,2.747929,5.245931,25010.313665,74.496934,2499.144809,5.002593,50311.099835,0.650069,54.985212
std,7.217287,5.740189,4.298947,2.859427,288660.116648,14.436345,1.299067,2.742865,14402.279227,14.155168,1443.923798,2.888298,28726.959359,0.144472,20.214042
min,2000.0,0.1,0.1,0.1,1000.0,50.0,0.5,0.5,100.0,50.0,1.0,0.0,500.0,0.4,20.0
25%,2006.0,5.09,3.84,2.58,250491.25,62.47,1.62,2.87,12538.0,62.22,1245.0,2.5,25457.0,0.53,37.47
50%,2012.0,10.04,7.55,5.05,501041.0,75.0,2.75,5.24,24980.0,74.47,2499.0,5.0,50372.0,0.65,54.98
75%,2018.0,15.01,11.28,7.53,750782.0,87.49,3.87,7.62,37493.0,86.78,3750.0,7.51,75195.0,0.78,72.51
max,2024.0,20.0,15.0,10.0,1000000.0,100.0,5.0,10.0,50000.0,99.0,5000.0,10.0,100000.0,0.9,90.0


In [8]:
df.shape

(1000000, 22)

In [9]:
df.columns

Index(['Country', 'Year', 'Disease Name', 'Disease Category',
       'Prevalence Rate (%)', 'Incidence Rate (%)', 'Mortality Rate (%)',
       'Age Group', 'Gender', 'Population Affected', 'Healthcare Access (%)',
       'Doctors per 1000', 'Hospital Beds per 1000', 'Treatment Type',
       'Average Treatment Cost (USD)', 'Availability of Vaccines/Treatment',
       'Recovery Rate (%)', 'DALYs', 'Improvement in 5 Years (%)',
       'Per Capita Income (USD)', 'Education Index', 'Urbanization Rate (%)'],
      dtype='object')

In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 22 columns):
 #   Column                              Non-Null Count    Dtype  
---  ------                              --------------    -----  
 0   Country                             1000000 non-null  object 
 1   Year                                1000000 non-null  int64  
 2   Disease Name                        1000000 non-null  object 
 3   Disease Category                    1000000 non-null  object 
 4   Prevalence Rate (%)                 1000000 non-null  float64
 5   Incidence Rate (%)                  1000000 non-null  float64
 6   Mortality Rate (%)                  1000000 non-null  float64
 7   Age Group                           1000000 non-null  object 
 8   Gender                              1000000 non-null  object 
 9   Population Affected                 1000000 non-null  int64  
 10  Healthcare Access (%)               1000000 non-null  float64
 11  Doctors per 

**Handle Missing Data:**

In [12]:
df.isna().sum()

Country                               0
Year                                  0
Disease Name                          0
Disease Category                      0
Prevalence Rate (%)                   0
Incidence Rate (%)                    0
Mortality Rate (%)                    0
Age Group                             0
Gender                                0
Population Affected                   0
Healthcare Access (%)                 0
Doctors per 1000                      0
Hospital Beds per 1000                0
Treatment Type                        0
Average Treatment Cost (USD)          0
Availability of Vaccines/Treatment    0
Recovery Rate (%)                     0
DALYs                                 0
Improvement in 5 Years (%)            0
Per Capita Income (USD)               0
Education Index                       0
Urbanization Rate (%)                 0
dtype: int64

In [13]:
df.duplicated().sum()

0

**Standardize Data Formats:**

In [15]:
df['Year'] = pd.to_datetime(df['Year'], format="%Y")

In [16]:
df["Age Group"] = df["Age Group"].astype("category")

In [17]:
df[['Disease Name','Disease Category','Treatment Type']] = (
    df[['Disease Name','Disease Category','Treatment Type']].apply(lambda col:col.str.title().str.strip()))