# 05 â€“ Image-based Weather Severity Classifier

## Objectives

- Build a **lightweight image-based weather severity classifier** to support the
  winter tour demand & cancellation project.
- Use **hand-crafted features** extracted from mountain weather images (brightness,
  colourfulness, proportion of bright pixels, etc.) rather than heavy CNN frameworks.
- Train a small scikit-learn model to classify images into **mild / moderate / severe**
  weather categories.

## Inputs

- Labelled images stored under:
  - `data/raw/weather_images/mild/`
  - `data/raw/weather_images/moderate/`
  - `data/raw/weather_images/severe/`
- Feature extraction logic from `src/image_features.py`.

## Outputs

- A trained scikit-learn classifier predicting `weather_severity` from image features.
- Saved model file: `models/weather_severity_model.pkl`.
- Basic evaluation metrics (accuracy, confusion matrix) to validate that the model
  is good enough as a **prototype** to support weather-based decision-making in the app.


In [None]:
import os
from pathlib import Path
from typing import List, Dict

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.ensemble import RandomForestClassifier
import joblib

from src.image_features import extract_weather_features_from_bytes


In [None]:
# Base project directory (adjust if needed)
BASE_DIR = Path(__file__).resolve().parents[1]

DATA_RAW = BASE_DIR / "data" / "raw" / "weather_images"
MODEL_PATH = BASE_DIR / "models" / "weather_severity_model.pkl"

# Simple label mapping: folder name -> class label
LABELS = ["mild", "moderate", "severe"]

DATA_RAW, MODEL_PATH, LABELS


In [None]:
def load_features_for_label(label: str) -> pd.DataFrame:
    """
    Load all images for a given label (e.g. 'mild') and extract weather features.

    Returns a DataFrame where each row corresponds to one image and includes:
    - the numeric features from `extract_weather_features_from_bytes`
    - a 'weather_label' column with the class name
    """
    label_dir = DATA_RAW / label
    rows: List[Dict] = []

    if not label_dir.exists():
        print(f"[WARN] Directory not found for label '{label}': {label_dir}")
        return pd.DataFrame()

    for fname in os.listdir(label_dir):
        if not fname.lower().endswith((".jpg", ".jpeg", ".png", ".webp")):
            continue

        path = label_dir / fname
        with open(path, "rb") as f:
            image_bytes = f.read()

        feats = extract_weather_features_from_bytes(image_bytes)
        feats["weather_label"] = label
        feats["filename"] = fname
        rows.append(feats)

    df = pd.DataFrame(rows)
    print(f"Loaded {len(df)} images for label '{label}'")
    return df


In [None]:
dfs = []
for label in LABELS:
    df_label = load_features_for_label(label)
    dfs.append(df_label)

df = pd.concat(dfs, ignore_index=True)
print(f"Total images: {len(df)}")
df.head()


In [None]:
X = df.drop(columns=["weather_label", "filename"])
y = df["weather_label"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

X_train.shape, X_test.shape


In [None]:
rf = RandomForestClassifier(
    n_estimators=200,
    max_depth=None,
    random_state=42,
    n_jobs=-1,
)

rf.fit(X_train, y_train)


In [None]:
y_pred = rf.predict(X_test)

print("Classification report:\n")
print(classification_report(y_test, y_pred))

print("\nConfusion matrix:")
print(confusion_matrix(y_test, y_pred))


In [None]:
MODEL_PATH.parent.mkdir(parents=True, exist_ok=True)
joblib.dump(rf, MODEL_PATH)

MODEL_PATH


### Does this image-based model meet the project requirement?

This RandomForest classifier achieves reasonable performance on the small,
hand-labelled image dataset (see the classification report above). The goal of
this model is **not** to be a production-grade weather recognition system, but
to:

- demonstrate how **image data can be turned into tabular features** and modelled,
- provide a **supportive severity label** (mild / moderate / severe) that aligns
  with the weather severity bins used elsewhere in the project,
- and show how an uploaded image in the Streamlit app can be converted into a
  simple, interpretable ML prediction.

For the purposes of this portfolio project, the model **successfully answers its
predictive task** as a prototype classifier for mountain weather severity.
