# Global Air Quality Analysis Dashboard (Colab Friendly)

### ðŸ“¦ How to upload the whole project to Colab in one go:
1. **On your map/computer**: Right-click your project folder and select "Compress" or "Send to ZIP". Name it `project.zip`.
2. **On Colab**: Click the folder icon on the left, then the 'Upload' button, and select your `project.zip`.
3. **Run Cell 1**: Run the "Unzip Helper" below.
4. **Run Cell 2**: Install dependencies.
5. **Run the rest**: See the interactive dashboard!

## Cell 1: Unzip Project (Run this after uploading `project.zip`)

In [None]:
!unzip project.zip

## Cell 2: Install Dependencies

In [None]:
!pip install pandas numpy matplotlib seaborn scikit-learn joblib plotly statsmodels nbformat

## 1. Setup & Imports

In [None]:
import os
import sys
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from sklearn.model_selection import train_test_split
import joblib

# Ensure the current directory is in the path so we can import 'src'
sys.path.append(os.getcwd())

# Import our professional Clean Architecture modules
from src.infrastructure.data_loader import load_air_quality_data
from src.use_cases.data_cleaning import preprocess_pipeline
from src.use_cases.feature_engineering import calculate_aqi_index, encode_target
from src.shared.config import POLLUTANTS, METEOROLOGICAL
from src.infrastructure.model_factory import run_all_models

## 2. Load & Preprocess Data

In [None]:
df = load_air_quality_data()
if df is not None:
    df = calculate_aqi_index(df, POLLUTANTS)
    df = preprocess_pipeline(df, POLLUTANTS)
    df, _ = encode_target(df)
    print(f"Data loaded successfully! Shape: {df.shape}")
    display(df.head())
else:
    print("ERROR: Data file not found. Please ensure 'project.zip' was unzipped correctly.")

## 3. Interactive Data Exploration

In [None]:
fig_aqi = px.histogram(df, x="AQI", color="AQI_Category", 
                       title="AQI Distribution Across Records", 
                       template="plotly_dark")
fig_aqi.show()

fig_temp = px.scatter(df, x="Temperature", y="AQI", color="AQI_Category", 
                     trendline="ols", hover_data=['City', 'Country'],
                     title="Relationship Between Temperature and AQI",
                     template="plotly_white")
fig_temp.show()

## 4. Model Training & Comparison

In [None]:
features = POLLUTANTS + METEOROLOGICAL
X = df[features]
y = df['AQI']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("Training models (this may take a few moments)...")
results = run_all_models(X_train, X_test, y_train, y_test)

# Display Model Performance Table
metrics = []
for name, data in results.items():
    metrics.append({"Model": name, "MSE": data['metrics']['MSE'], "R2 Score": data['metrics']['R2']})

df_results = pd.DataFrame(metrics).sort_values(by="R2 Score", ascending=False)
display(df_results)

## 5. Visualizing Model Accuracy

In [None]:
fig_models = px.bar(df_results, x="Model", y="R2 Score", color="R2 Score",
                   title="Machine Learning Model Comparison", 
                   labels={'R2 Score': 'RÂ² Accuracy'})
fig_models.show()