# Global Air Quality Analysis Dashboard (Enhanced)

### ðŸ“¦ How to upload the whole project to Colab in one go:
1. **On your computer**: Zip `src`, `data`, `results` and this `.ipynb` into `project.zip`.
2. **On Colab**: Upload `project.zip` to the folders tab.
3. **Run Cell 1 & 2**: Unzip and install dependencies.

In [None]:
# Cell 1: Unzip
!unzip project.zip

In [None]:
# Cell 2: Dependencies
!pip install pandas numpy matplotlib seaborn scikit-learn joblib plotly statsmodels nbformat

In [None]:
import os
import sys
import pandas as pd
import numpy as np
import plotly.express as px
from sklearn.model_selection import train_test_split

sys.path.append(os.getcwd())
from src.infrastructure.data_loader import load_air_quality_data
from src.use_cases.data_cleaning import preprocess_pipeline
from src.use_cases.feature_engineering import calculate_aqi_index, encode_target
from src.use_cases.temporal_analysis import extract_temporal_features
from src.shared.config import POLLUTANTS, METEOROLOGICAL
from src.infrastructure.model_factory import run_all_models
from src.presentation.visualizer import plot_temporal_trends, plot_country_comparison

## 1. Data Preparation

In [None]:
df = load_air_quality_data()
df = calculate_aqi_index(df, POLLUTANTS)
df = preprocess_pipeline(df, POLLUTANTS)
df = extract_temporal_features(df)
df, _ = encode_target(df)
df.head()

## 2. Comparative Analysis (By Country)
Which countries have the highest average AQI?

In [None]:
country_avg = df.groupby('Country')['AQI'].mean().sort_values(ascending=False).head(15).reset_index()
fig = px.bar(country_avg, x='AQI', y='Country', orientation='h', color='AQI', title="Top 15 Most Polluted Countries (Comparative Analysis)")
fig.show()

## 3. Cycle Identification (Temporal Analysis)
Identifying monthly and seasonal cycles in air quality.

In [None]:
monthly_avg = df.groupby(['Month', 'Season'])['AQI'].mean().reset_index()
fig = px.line(monthly_avg, x='Month', y='AQI', color='Season', markers=True, title="Monthly AQI Cycles")
fig.show()

## 4. Machine Learning Comparison

In [None]:
features = POLLUTANTS + METEOROLOGICAL
X = df[features]
y = df['AQI']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

results = run_all_models(X_train, X_test, y_train, y_test)
perf = pd.DataFrame([{"Model": n, "R2": d['metrics']['R2']} for n, d in results.items()]).sort_values("R2", ascending=False)
px.bar(perf, x="Model", y="R2", color="R2", title="Model RÂ² Comparison").show()