# 05 – Image-based Weather Severity Classifier

## Objectives

- Build a **lightweight image-based weather severity classifier** to support the
  winter tour demand & cancellation project.
- Use **hand-crafted features** extracted from mountain weather images (brightness,
  colourfulness, proportion of bright pixels, etc.) rather than heavy CNN frameworks.
- Train a small scikit-learn model to classify images into **mild / moderate / severe**
  weather categories.

## Inputs

- Labelled images stored under:
  - `data/raw/weather_images/mild/`
  - `data/raw/weather_images/moderate/`
  - `data/raw/weather_images/severe/`
- Feature extraction logic from `src/image_features.py`.

## Outputs

- A trained scikit-learn classifier predicting `weather_severity` from image features.
- Saved model file: `models/weather_severity_model.pkl`.
- Basic evaluation metrics (accuracy, confusion matrix) to validate that the model
  is good enough as a **prototype** to support weather-based decision-making in the app.


In [None]:
import os
import sys
from pathlib import Path
from typing import List, Dict

# Make sure the project root is on sys.path 
CURRENT_DIR = Path.cwd()
PROJECT_ROOT = None

for parent in [CURRENT_DIR] + list(CURRENT_DIR.parents):
    if (parent / "src").exists():
        sys.path.append(str(parent))
        PROJECT_ROOT = parent
        print(f"Added project root to sys.path: {parent}")
        break
else:
    print("⚠️ Could not find 'src' directory in any parent folders.")

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.ensemble import RandomForestClassifier
import joblib

from src.image_features import extract_weather_features_from_bytes




Added project root to sys.path: c:\Users\tomgo\OneDrive\Documents\vscode-projects\winter-mountain-tours-demand-predictor


In [None]:
from pathlib import Path

# Use the project root discovered in the first cell if available
if "PROJECT_ROOT" in globals() and PROJECT_ROOT is not None:
    BASE_DIR = PROJECT_ROOT
else:
    # Fallback
    BASE_DIR = Path.cwd().parent

DATA_RAW = BASE_DIR / "data" / "raw" / "weather_images"
MODEL_PATH = BASE_DIR / "models" / "weather_severity_model.pkl"

# Simple label mapping
LABELS = ["mild", "moderate", "severe"]

DATA_RAW, MODEL_PATH, LABELS



(WindowsPath('c:/Users/tomgo/OneDrive/Documents/vscode-projects/winter-mountain-tours-demand-predictor/data/raw/weather_images'),
 WindowsPath('c:/Users/tomgo/OneDrive/Documents/vscode-projects/winter-mountain-tours-demand-predictor/models/weather_severity_model.pkl'),
 ['mild', 'moderate', 'severe'])

In [15]:
def load_features_for_label(label: str) -> pd.DataFrame:
    """
    Load all images for a given label (e.g. 'mild') and extract weather features.

    Returns a DataFrame where each row corresponds to one image and includes:
    - the numeric features from `extract_weather_features_from_bytes`
    - a 'weather_label' column with the class name
    """
    label_dir = DATA_RAW / label
    rows: List[Dict] = []

    if not label_dir.exists():
        print(f"[WARN] Directory not found for label '{label}': {label_dir}")
        return pd.DataFrame()

    for fname in os.listdir(label_dir):
        if not fname.lower().endswith((".jpg", ".jpeg", ".png", ".webp")):
            continue

        path = label_dir / fname
        with open(path, "rb") as f:
            image_bytes = f.read()

        feats = extract_weather_features_from_bytes(image_bytes)
        feats["weather_label"] = label
        feats["filename"] = fname
        rows.append(feats)

    df = pd.DataFrame(rows)
    print(f"Loaded {len(df)} images for label '{label}'")
    return df


In [16]:
dfs = []
for label in LABELS:
    df_label = load_features_for_label(label)
    dfs.append(df_label)

df = pd.concat(dfs, ignore_index=True)
print(f"Total images: {len(df)}")
df.head()


Loaded 22 images for label 'mild'
Loaded 21 images for label 'moderate'
Loaded 21 images for label 'severe'
Total images: 64


Unnamed: 0,brightness_mean,brightness_std,r_mean,g_mean,b_mean,r_std,g_std,b_std,bright_ratio,colourfulness_mean,weather_label,filename
0,0.543765,0.198116,0.559966,0.549165,0.522166,0.088159,0.150838,0.294055,0.029735,0.112519,mild,mild_1.jpg
1,0.512514,0.212463,0.431594,0.520957,0.584991,0.161169,0.160586,0.267929,0.026606,0.109025,mild,mild_10.jpg
2,0.437123,0.246063,0.389538,0.437973,0.483859,0.224,0.230664,0.27168,0.029337,0.065773,mild,mild_11.jpg
3,0.45863,0.208089,0.404684,0.457032,0.514174,0.18142,0.189051,0.235057,0.01997,0.064714,mild,mild_12.jpg
4,0.451532,0.234703,0.37377,0.445224,0.535603,0.205739,0.215127,0.25198,0.029058,0.077314,mild,mild_13.jpg


In [17]:
X = df.drop(columns=["weather_label", "filename"])
y = df["weather_label"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

X_train.shape, X_test.shape


((51, 10), (13, 10))

In [18]:
rf = RandomForestClassifier(
    n_estimators=200,
    max_depth=None,
    random_state=42,
    n_jobs=-1,
)

rf.fit(X_train, y_train)


0,1,2
,n_estimators,200
,criterion,'gini'
,max_depth,
,min_samples_split,2
,min_samples_leaf,1
,min_weight_fraction_leaf,0.0
,max_features,'sqrt'
,max_leaf_nodes,
,min_impurity_decrease,0.0
,bootstrap,True


In [19]:
y_pred = rf.predict(X_test)

print("Classification report:\n")
print(classification_report(y_test, y_pred))

print("\nConfusion matrix:")
print(confusion_matrix(y_test, y_pred))


Classification report:

              precision    recall  f1-score   support

        mild       0.80      0.80      0.80         5
    moderate       0.60      0.75      0.67         4
      severe       1.00      0.75      0.86         4

    accuracy                           0.77        13
   macro avg       0.80      0.77      0.77        13
weighted avg       0.80      0.77      0.78        13


Confusion matrix:
[[4 1 0]
 [1 3 0]
 [0 1 3]]


In [20]:
MODEL_PATH.parent.mkdir(parents=True, exist_ok=True)
joblib.dump(rf, MODEL_PATH)

MODEL_PATH


WindowsPath('c:/Users/tomgo/OneDrive/Documents/vscode-projects/winter-mountain-tours-demand-predictor/models/weather_severity_model.pkl')

### Does this image-based model meet the project requirement?

This RandomForest classifier achieves reasonable performance on the small,
hand-labelled image dataset (see the classification report above). The goal of
this model is **not** to be a production-grade weather recognition system, but
to:

- demonstrate how **image data can be turned into tabular features** and modelled,
- provide a **supportive severity label** (mild / moderate / severe) that aligns
  with the weather severity bins used elsewhere in the project,
- and show how an uploaded image in the Streamlit app can be converted into a
  simple, interpretable ML prediction.

For the purposes of this portfolio project, the model **successfully answers its
predictive task** as a prototype classifier for mountain weather severity.
