# Air Quality Health Risk Prediction Model

This notebook trains and evaluates models for predicting health risks from air quality data.

## Objectives:
1. Load air quality data
2. Train multiple model types
3. Evaluate and compare models
4. Save best model


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import sys
import warnings

sys.path.append(str(Path().absolute().parent / "src"))
from models.air_quality_model import AirQualityHealthRiskModel

warnings.filterwarnings('ignore')
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)
%matplotlib inline

print("Libraries imported successfully!")


Libraries imported successfully!


In [2]:
# Load data
data_dir = Path().absolute().parent / "data" / "raw"
air_quality_file = data_dir / "air_quality_data.csv"

df = pd.read_csv(air_quality_file, parse_dates=['timestamp'])
print(f"✓ Loaded {len(df)} records")
print(f"Dataset shape: {df.shape}")
print(f"\nHealth risk levels: {df['health_risk_level'].value_counts().to_dict()}")


✓ Loaded 150 records
Dataset shape: (150, 11)

Health risk levels: {'moderate': 46, 'good': 44, 'very_unhealthy': 31, 'unhealthy': 27, 'unhealthy_sensitive': 2}


In [3]:
# Train multiple model types
model_types = ['random_forest', 'gradient_boosting', 'logistic_regression']
results = {}

for model_type in model_types:
    print(f"\n{'='*60}")
    print(f"Training {model_type.upper()}")
    print(f"{'='*60}")
    
    model = AirQualityHealthRiskModel(model_type=model_type)
    metrics = model.train(df, target_col='health_risk_level', test_size=0.2)
    
    results[model_type] = {
        'model': model,
        'metrics': metrics
    }



Training RANDOM_FOREST
Training random_forest model...

Model Performance:
Accuracy: 1.0000
Precision: 1.0000
Recall: 1.0000
F1-Score: 1.0000

Classification Report:
                precision    recall  f1-score   support

          good       1.00      1.00      1.00         9
      moderate       1.00      1.00      1.00         9
     unhealthy       1.00      1.00      1.00         6
very_unhealthy       1.00      1.00      1.00         6

      accuracy                           1.00        30
     macro avg       1.00      1.00      1.00        30
  weighted avg       1.00      1.00      1.00        30


Training GRADIENT_BOOSTING
Training gradient_boosting model...

Model Performance:
Accuracy: 1.0000
Precision: 1.0000
Recall: 1.0000
F1-Score: 1.0000

Classification Report:
                precision    recall  f1-score   support

          good       1.00      1.00      1.00         9
      moderate       1.00      1.00      1.00         9
     unhealthy       1.00      1.00   

In [4]:
# Compare models and save best
comparison_df = pd.DataFrame({
    model_type: {
        'Accuracy': results[model_type]['metrics']['accuracy'],
        'Precision': results[model_type]['metrics']['precision'],
        'Recall': results[model_type]['metrics']['recall'],
        'F1-Score': results[model_type]['metrics']['f1_score'],
        'ROC-AUC': results[model_type]['metrics'].get('roc_auc', None)
    }
    for model_type in model_types
})

print("Model Comparison:")
print(comparison_df.round(4))

# Select best model
best_model_type = max(model_types, key=lambda x: results[x]['metrics']['f1_score'])
best_model = results[best_model_type]['model']

models_dir = Path().absolute().parent / "models"
models_dir.mkdir(exist_ok=True)
model_path = models_dir / f"air_quality_model_{best_model_type}.pkl"
best_model.save(str(model_path))
print(f"\n✓ Best model ({best_model_type}) saved to: {model_path}")


Model Comparison:
           random_forest  gradient_boosting  logistic_regression
Accuracy             1.0                1.0               0.7667
Precision            1.0                1.0               0.8514
Recall               1.0                1.0               0.7667
F1-Score             1.0                1.0               0.7414
ROC-AUC              NaN                NaN                  NaN
Model saved to /Users/faiqahmed/Desktop/Semesters/Semester7/MLOPS/PROJECT/models/air_quality_model_random_forest.pkl

✓ Best model (random_forest) saved to: /Users/faiqahmed/Desktop/Semesters/Semester7/MLOPS/PROJECT/models/air_quality_model_random_forest.pkl
