# ANA430 Final Project: Urban Air Quality & Healthcare Dashboard
**Data Scientist:** Climineda

## Problem Statement
Air pollution, population growth, and strained healthcare systems are critical global concerns. This project aims to visualize relationships between PM2.5 air pollution, population size, and healthcare infrastructure (hospital beds per 1,000 people) across selected countries using live data. Intended users include public health officials and environmental policy makers.

**Data Source:** World Bank API

## Hypothesis Formulation
**Null Hypothesis (H₀):** There is no significant relationship between population size and PM2.5 levels or hospital bed availability.

**Alternative Hypothesis (H₁):** Countries with larger populations tend to have higher PM2.5 levels and lower hospital bed availability.

## Data Acquisition
Using the `wbdata` library to retrieve data from the World Bank API.

In [None]:
import wbdata
import pandas as pd
import datetime

# Define time frame and countries
data_date = (datetime.datetime(2010, 1, 1), datetime.datetime(2021, 12, 31))
countries = ["USA", "CHN", "IND", "BRA", "ZAF"]

# Define indicators
indicators = {
    "EN.ATM.PM25.MC.M3": "PM2.5",
    "SP.POP.TOTL": "Population",
    "SH.MED.BEDS.ZS": "Hospital Beds per 1,000"
}

# Pull data
df = wbdata.get_dataframe(indicators, country=countries, data_date=data_date, convert_date=True)
df = df.reset_index()
df.head()

## Data Preparation
Cleaning missing values and creating a new metric.

In [None]:
# Drop missing values
df.dropna(inplace=True)

# Create derived metric
df['PM2.5 per 1000 people'] = df['PM2.5'] / (df['Population'] / 1000)

df.info()

## Data Analysis
Correlation matrix and time-series plot using Plotly.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Correlation matrix
corr = df[['PM2.5', 'Population', 'Hospital Beds per 1,000']].corr()
sns.heatmap(corr, annot=True)
plt.title("Correlation Matrix")
plt.show()

In [None]:
import plotly.express as px

# Line plot of PM2.5 by country
fig = px.line(df, x='date', y='PM2.5', color='country', title='PM2.5 Levels Over Time')
fig.show()

## Summary & Actionable Insights
This analysis supports decision-makers in understanding the environmental and health impacts of population growth. Real-time tracking of PM2.5 and healthcare capacity allows policy makers to identify high-risk areas and allocate resources efficiently.