# Campus Waste Intelligence – Model Testing

This notebook loads pre‑trained forecasting models for different canteen sections and validates their predictions on recent data. The models (XGBoost, Random Forest, SVM, Prophet, SARIMA, LSTM, and simple baselines) were saved during the training phase and are now tested to ensure they load correctly and produce plausible outputs.

In [2]:
!pip install pmdarima

Collecting pmdarima
  Downloading pmdarima-2.1.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (8.5 kB)
Downloading pmdarima-2.1.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (689 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m689.1/689.1 kB[0m [31m29.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pmdarima
Successfully installed pmdarima-2.1.1


In [3]:
import os
import joblib
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
from prophet import Prophet
import pmdarima as pm

# Configuration
LOOKBACK = 7
LSTM_HIDDEN = 50
LSTM_LAYERS = 2

In [4]:
class LSTMModel(nn.Module):
    """Simple LSTM for univariate time series forecasting."""
    def __init__(self, input_size=1, hidden_size=LSTM_HIDDEN,
                 num_layers=LSTM_LAYERS, output_size=1):
        super().__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        # x shape: (batch, seq_len, input_size)
        out, _ = self.lstm(x)
        # Use output of last time step
        out = self.fc(out[:, -1, :])
        return out.squeeze()

## 1. Mount Google Drive and Set Working Directory

All data and model files are stored on Google Drive. We mount the drive and navigate to the project folder.

In [5]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)
os.chdir("/content/drive/MyDrive/UAB/FDS/campus-waste-intelligence")
print("Current working directory:", os.getcwd())

Mounted at /content/drive
Current working directory: /content/drive/MyDrive/UAB/FDS/campus-waste-intelligence


## 2. Load and Prepare Data

We read the cleaned food waste records and aggregate them by day and canteen section. The result is a wide time series where each column represents a section.

In [6]:
DATA_PATH = 'data/food_waste_cleaned.csv'
df = pd.read_csv(DATA_PATH, parse_dates=['Date'])

# Aggregate daily waste per section
daily_section = (
    df.groupby(['Date', 'Canteen_Section'])['Waste_Weight_kg']
      .sum()
      .reset_index()
      .rename(columns={'Waste_Weight_kg': 'Total_Waste_kg'})
)

# Pivot to wide format (one column per section)
daily_wide = (
    daily_section
    .pivot(index='Date', columns='Canteen_Section', values='Total_Waste_kg')
    .fillna(0)
    .sort_index()
    .asfreq('D')
    .fillna(0)
)

print("Daily data shape:", daily_wide.shape)
daily_wide.head()

Daily data shape: (61, 4)


Canteen_Section,A,B,C,D
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2025-06-11,29.9,26.85,27.72,37.59
2025-06-12,32.99,34.36,33.39,31.45
2025-06-13,32.99,28.56,30.4,31.51
2025-06-14,21.74,26.84,35.79,32.74
2025-06-15,30.39,32.22,29.09,32.4


## 3. Feature Engineering Function

The models were trained on a set of features including calendar variables, lagged values, and rolling statistics. We recreate exactly the same features for each section so that the test input matches the training format.

In [7]:
def create_features_for_series(series):
    """
    Build a DataFrame with features used during training.
    The input series is a daily time series (pandas Series with DatetimeIndex).
    """
    df_ml = pd.DataFrame(index=series.index)
    df_ml['y'] = series.values
    df_ml['dayofweek'] = df_ml.index.dayofweek
    df_ml['day']       = df_ml.index.day
    df_ml['month']     = df_ml.index.month
    df_ml['quarter']   = df_ml.index.quarter
    df_ml['weekend']   = (df_ml.index.dayofweek >= 5).astype(int)

    # Lags
    for lag in [1, 2, 3, 7, 14]:
        df_ml[f'lag_{lag}'] = df_ml['y'].shift(lag)

    # Rolling statistics (using shifted values to avoid leakage)
    shifted = df_ml['y'].shift(1)
    df_ml['rolling_mean_7'] = shifted.rolling(7).mean()
    df_ml['rolling_std_7']  = shifted.rolling(7).std()
    df_ml['rolling_min_7']  = shifted.rolling(7).min()
    df_ml['rolling_max_7']  = shifted.rolling(7).max()
    df_ml['ewm_mean_7']     = shifted.ewm(span=7).mean()

    # Drop rows with NaN (created by shifts and rolling windows)
    df_ml.dropna(inplace=True)
    return df_ml

## 4. Generate Feature DataFrames for All Sections

We apply the feature function to each section’s time series and store the results in a dictionary.

In [8]:
feature_dfs = {}
for sec in daily_wide.columns:
    feature_dfs[sec] = create_features_for_series(daily_wide[sec])

## 5. Align All Sections to a Common Date Range

Different sections may have slightly different start/end dates due to missing data. We clip all DataFrames to the longest common period so that we can compare fairly.

In [9]:
common_start = max(df_sec.index.min() for df_sec in feature_dfs.values())
common_end   = min(df_sec.index.max() for df_sec in feature_dfs.values())
print(f"Common date range: {common_start.date()} to {common_end.date()}")

for sec in feature_dfs:
    feature_dfs[sec] = feature_dfs[sec].loc[common_start:common_end]

Common date range: 2025-06-25 to 2025-08-10


## 6. Load and Test Deployed Models

The saved models are stored in the `deployment_models/` folder. We loop over every `.joblib` file (excluding baseline models, which are handled separately), load the corresponding artifacts, and run a quick prediction on the last five available data points. This verifies that the model loads correctly and that the prediction pipeline works.

For each model type we use the appropriate method:
- **XGBoost / Random Forest**: direct `.predict()` on the feature matrix.
- **SVM**: features are scaled with the saved scaler before prediction.
- **Prophet**: requires a DataFrame with a `ds` column containing the future dates.
- **SARIMA**: uses `.predict(n_periods=...)`.
- **LSTM**: rebuilds the network from the saved state dict and forecasts one step ahead using the last `lookback` days.
- **Baselines** (Naive, Seasonal Naive, MA(7)): we print the last 14 stored values (the baseline logic can be extended if needed).

In [11]:
MODEL_DIR = 'deployment_models'

# Get all section model files (excluding baseline models, which have 'baseline' in their filename)
model_files = [f for f in os.listdir(MODEL_DIR) if f.endswith('.joblib') and 'baseline' not in f]
print(f"Found {len(model_files)} model files.\n")

for mf in model_files:
    path = os.path.join(MODEL_DIR, mf)
    artifacts = joblib.load(path)

    sec = artifacts['section']
    model_name = artifacts['model_name']
    print(f"\n--- Testing {model_name} for Locaton {sec} ---")

    # Get the corresponding feature DataFrame for this section
    df_sec = feature_dfs[sec]
    feature_cols = artifacts['feature_columns']
    # Take the last 5 rows as a sample test set
    X_sample = df_sec[feature_cols].iloc[-5:]

    # Prediction depends on model type
    if model_name in ('XGBoost', 'Random Forest'):
        model = artifacts['model']
        preds = model.predict(X_sample)
        print("Predictions:", preds)

    elif model_name == 'SVM':
        model = artifacts['model']
        scaler = artifacts['scaler']
        X_scaled = scaler.transform(X_sample)
        preds = model.predict(X_scaled)
        print("Predictions:", preds)

    elif model_name == 'Prophet':
        model = artifacts['model']
        # Prophet expects a DataFrame with column 'ds' (dates)
        future = pd.DataFrame({'ds': X_sample.index})
        forecast = model.predict(future)
        preds = forecast['yhat'].values
        print("Predictions:", preds)

    elif model_name == 'SARIMA':
        model = artifacts['model']
        # SARIMA's predict method needs the number of steps ahead
        preds = model.predict(n_periods=len(X_sample))
        print("Predictions:", preds)

    elif model_name == 'LSTM':
        # LSTM requires special handling: load state dict and rebuild model
        state_dict_path = os.path.join(MODEL_DIR, artifacts['state_dict_path'])
        lstm_model = LSTMModel()
        lstm_model.load_state_dict(torch.load(state_dict_path, map_location='cpu'))
        lstm_model.eval()
        lookback = artifacts.get('lookback', 7)

        # Use the original target series to get the last 'lookback' values
        y_series = df_sec['y']
        if len(y_series) >= lookback:
            last_values = y_series.iloc[-lookback:].values.reshape(1, lookback, 1)
            inp = torch.tensor(last_values, dtype=torch.float32)
            with torch.no_grad():
                pred = lstm_model(inp).item()
            print("Prediction for next day:", pred)
        else:
            print("Not enough history for LSTM test.")

    elif model_name in ('Naive', 'Seasonal Naive', 'MA(7)'):
        # Baseline models: we stored the last 14 actual values
        last_vals = artifacts['last_values']
        print("Baseline model - last 14 values:", last_vals[-5:])  # show last 5 as a sample
        # (You could implement the actual baseline forecasting logic here)

    else:
        print(f"Unknown model type: {model_name}")


Found 4 model files.


--- Testing XGBoost for Locaton D ---
Predictions: [30.095644 28.456524 29.044254 29.65153  28.70296 ]

--- Testing XGBoost for Locaton B ---
Predictions: [29.5882   30.7317   31.589533 32.723454 35.911526]

--- Testing Random Forest for Locaton C ---
Predictions: [25.988775 26.48315  27.21354  25.9371   29.5486  ]

--- Testing XGBoost for Locaton A ---
Predictions: [32.837654 32.955177 32.745777 32.64806  25.466908]
