# Modern Traffic Volume Forecasting (Metro Interstate)

This notebook builds a **modern traffic forecasting project** on top of the
widely used **Metro Interstate Traffic Volume** dataset (Kaggle).

We focus on short-term traffic volume prediction with:

- Solid time-series EDA.
- Feature engineering with calendar, weather and lags.
- Baseline and tree-based models (sklearn).
- A **sequence deep learning model (LSTM)** for traffic volume.
- An optional section sketching a **Temporal Fusion Transformer** setup.


## 0. How to run this notebook

1. Download the **Metro Interstate Traffic Volume** dataset from Kaggle.
2. Save it as:

   ```text
   data/traffic_volume.csv
   ```

3. Install required packages:

   ```bash
   pip install numpy pandas matplotlib scikit-learn torch
   ```

   For the optional Temporal Fusion Transformer section:

   ```bash
   pip install pytorch-lightning pytorch-forecasting
   ```

4. Run this notebook top-to-bottom in Jupyter / VS Code.


## 1. Imports and basic config


In [ ]:
from __future__ import annotations

from pathlib import Path
from typing import Dict, List, Tuple

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import HistGradientBoostingRegressor

import torch
from torch import nn
from torch.utils.data import Dataset, DataLoader

plt.rcParams["figure.figsize"] = (11, 5)
RANDOM_STATE: int = 42
np.random.seed(RANDOM_STATE)
torch.manual_seed(RANDOM_STATE)

DATA_PATH: Path = Path("data") / "traffic_volume.csv"
if not DATA_PATH.exists():
    raise FileNotFoundError(
        f"Expected dataset at {DATA_PATH.resolve()}\n"
        "Download 'Metro Interstate Traffic Volume' and save as 'data/traffic_volume.csv'."
    )

raw_df: pd.DataFrame = pd.read_csv(DATA_PATH)
raw_df.head()

## 2. Basic cleaning and timestamp handling

We parse the `date_time` column, sort by time and keep the main fields we
need for forecasting traffic volume.


In [ ]:
def clean_traffic_df(df: pd.DataFrame) -> pd.DataFrame:
    """Clean the raw Metro traffic dataframe.

    Parameters
    ----------
    df : pd.DataFrame
        Raw dataframe as loaded from the Kaggle CSV.

    Returns
    -------
    pd.DataFrame
        Cleaned dataframe indexed by timestamp, with a `traffic_volume` column
        and helper columns for weather and calendar.
    """
    df = df.copy()
    if "date_time" not in df.columns:
        raise ValueError("Expected a 'date_time' column in traffic dataset.")

    df["timestamp"] = pd.to_datetime(df["date_time"], errors="coerce")
    df = df.dropna(subset=["timestamp"]).sort_values("timestamp")

    # Set index and ensure regular hourly frequency (dataset is hourly)
    df = df.set_index("timestamp")
    df = df.asfreq("H")

    # Basic numeric conversions
    num_cols: List[str] = [
        "traffic_volume",
        "temp",
        "rain_1h",
        "snow_1h",
        "clouds_all",
    ]
    for col in num_cols:
        if col in df.columns:
            df[col] = pd.to_numeric(df[col], errors="coerce")

    df = df.dropna(subset=["traffic_volume"])
    return df


df: pd.DataFrame = clean_traffic_df(raw_df)
df.head()

### 2.1 Quick EDA: traffic volume over time


In [ ]:
df["traffic_volume"].plot(alpha=0.7)
plt.title("Traffic volume over time")
plt.ylabel("Vehicles / hour")
plt.show()

# Daily pattern example
sample_start: pd.Timestamp = df.index.min() + pd.Timedelta(days=7)
sample_end: pd.Timestamp = sample_start + pd.Timedelta(days=7)
sample = df.loc[sample_start:sample_end]
sample["traffic_volume"].plot()
plt.title("Example week – hourly traffic volume")
plt.ylabel("Vehicles / hour")
plt.show()

## 3. Feature engineering

We build a supervised learning table with:

- Calendar features: hour of day, day of week, weekend/holiday flags.
- Weather features: temperature, rain, snow, clouds.
- Lag features: previous 1, 2, 3, 24, 24*7 hours of traffic volume.
- Target: next hour's traffic volume.


In [ ]:
def add_calendar_features(df: pd.DataFrame) -> pd.DataFrame:
    """Add basic calendar features to the traffic dataframe.

    Parameters
    ----------
    df : pd.DataFrame
        Dataframe indexed by datetime.

    Returns
    -------
    pd.DataFrame
        Same dataframe with added calendar columns.
    """
    df = df.copy()
    df["hour"] = df.index.hour
    df["dayofweek"] = df.index.dayofweek
    df["is_weekend"] = df["dayofweek"].isin([5, 6]).astype(int)
    df["month"] = df.index.month
    return df


def add_lagged_target(df: pd.DataFrame, target_col: str, lags: List[int]) -> pd.DataFrame:
    """Add lagged versions of a target column.

    Each lag `k` creates a column `f"{target_col}_lag_{k}"`.
    """
    df = df.copy()
    for lag in lags:
        df[f"{target_col}_lag_{lag}"] = df[target_col].shift(lag)
    return df


def build_supervised_frame(df: pd.DataFrame) -> pd.DataFrame:
    """Build a supervised learning table for next-hour traffic forecasting.

    The target is the traffic volume one step ahead.
    """
    df_feat = add_calendar_features(df)
    df_feat = add_lagged_target(df_feat, "traffic_volume", [1, 2, 3, 24, 24 * 7])

    # Target: next hour's traffic volume
    df_feat["target"] = df_feat["traffic_volume"].shift(-1)

    df_feat = df_feat.dropna()
    return df_feat


sup_df: pd.DataFrame = build_supervised_frame(df)
sup_df.head()

### 3.1 Train / validation / test split

We keep the last 20% of observations as test, the previous 20% as validation,
and the rest as training data (time-ordered).


In [ ]:
def time_series_train_val_test_split(df: pd.DataFrame, val_frac: float = 0.2, test_frac: float = 0.2) -> Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]:
    """Time-ordered train/val/test split.

    Parameters
    ----------
    df : pd.DataFrame
        Supervised dataframe sorted by time.
    val_frac : float
        Fraction of data to allocate to validation.
    test_frac : float
        Fraction of data to allocate to test.

    Returns
    -------
    (train_df, val_df, test_df)
    """
    n: int = len(df)
    n_test: int = int(n * test_frac)
    n_val: int = int(n * val_frac)

    test_df = df.iloc[-n_test:]
    val_df = df.iloc[-(n_test + n_val) : -n_test]
    train_df = df.iloc[: -(n_test + n_val)]
    return train_df, val_df, test_df


train_df, val_df, test_df = time_series_train_val_test_split(sup_df)
len(train_df), len(val_df), len(test_df)

### 3.2 Feature/target matrices and scaling


In [ ]:
feature_cols: List[str] = [
    c
    for c in sup_df.columns
    if c
    not in [
        "target",
        "traffic_volume",
        "date_time",
    ]
]

X_train = train_df[feature_cols].to_numpy()
y_train = train_df["target"].to_numpy()
X_val = val_df[feature_cols].to_numpy()
y_val = val_df["target"].to_numpy()
X_test = test_df[feature_cols].to_numpy()
y_test = test_df["target"].to_numpy()

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)
X_test_scaled = scaler.transform(X_test)

X_train_scaled.shape, X_val_scaled.shape, X_test_scaled.shape

## 4. Metrics helper


In [ ]:
def rmse(y_true: np.ndarray, y_pred: np.ndarray) -> float:
    """Root mean squared error."""
    return float(np.sqrt(mean_squared_error(y_true, y_pred)))


def evaluate_regression(y_true: np.ndarray, y_pred: np.ndarray) -> Dict[str, float]:
    """Compute common regression metrics."""
    return {
        "mae": float(mean_absolute_error(y_true, y_pred)),
        "rmse": rmse(y_true, y_pred),
    }


def print_metrics(name: str, y_true: np.ndarray, y_pred: np.ndarray) -> None:
    """Print metrics for a model with a short label."""
    metrics = evaluate_regression(y_true, y_pred)
    print(f"{name}: MAE={metrics['mae']:.2f}, RMSE={metrics['rmse']:.2f}")

## 5. Baseline models

We use simple baselines based on the target itself:

- **Naïve**: predict the last observed traffic volume.
- **Daily seasonal naïve**: predict the value from the same hour of the
  previous day (lag 24).


In [ ]:
# Align baselines with the supervised table
naive_pred = test_df["traffic_volume"].to_numpy()  # predict current hour as next hour
daily_naive_pred = test_df["traffic_volume_lag_24"].to_numpy()

print_metrics("Naive", y_test, naive_pred)
print_metrics("DailyNaive", y_test, daily_naive_pred)

## 6. HistGradientBoostingRegressor (tree-based, modern sklearn)

We now train a **HistGradientBoostingRegressor**, which is a strong baseline
for tabular time-series forecasting.


In [ ]:
hgb = HistGradientBoostingRegressor(
    max_depth=8,
    learning_rate=0.05,
    max_iter=500,
    random_state=RANDOM_STATE,
)
hgb.fit(X_train_scaled, y_train)

y_val_pred_hgb = hgb.predict(X_val_scaled)
y_test_pred_hgb = hgb.predict(X_test_scaled)

print_metrics("HGB (val)", y_val, y_val_pred_hgb)
print_metrics("HGB (test)", y_test, y_test_pred_hgb)

### 6.1 Visual check – predictions vs actuals (test set)


In [ ]:
n_plot: int = 7 * 24  # one week
plt.plot(test_df.index[:n_plot], y_test[:n_plot], label="actual")
plt.plot(test_df.index[:n_plot], y_test_pred_hgb[:n_plot], label="HGB", linestyle="--")
plt.title("Test set – actual vs HGB predictions (first week)")
plt.ylabel("Traffic volume")
plt.legend()
plt.show()

## 7. Sequence deep learning model (LSTM)

Next we build a simple **LSTM-based sequence model**:

- Input: sliding windows of past features of length `seq_len`.
- Output: next-hour traffic volume.
- Trained with MSE loss and Adam.


In [ ]:
class TrafficSequenceDataset(Dataset):
    """PyTorch dataset for sliding window sequences.

    Each item is a pair (X_seq, y), where X_seq has shape
    (seq_len, n_features) and y is a scalar target.
    """

    def __init__(
        self,
        features: np.ndarray,
        targets: np.ndarray,
        seq_len: int,
    ) -> None:
        assert len(features) == len(targets), "Features and targets must align."
        self.features = features.astype(np.float32)
        self.targets = targets.astype(np.float32)
        self.seq_len = seq_len

    def __len__(self) -> int:
        return len(self.features) - self.seq_len

    def __getitem__(self, idx: int) -> Tuple[torch.Tensor, torch.Tensor]:
        x_seq = self.features[idx : idx + self.seq_len]
        y = self.targets[idx + self.seq_len]
        return torch.from_numpy(x_seq), torch.tensor(y)


class LSTMRegressor(nn.Module):
    """Simple many-to-one LSTM regressor for time series."""

    def __init__(
        self,
        n_features: int,
        hidden_size: int = 64,
        num_layers: int = 2,
        dropout: float = 0.1,
    ) -> None:
        super().__init__()
        self.lstm = nn.LSTM(
            input_size=n_features,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True,
            dropout=dropout,
        )
        self.fc = nn.Linear(hidden_size, 1)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        output, _ = self.lstm(x)
        last_hidden = output[:, -1, :]
        return self.fc(last_hidden).squeeze(-1)


In [ ]:
seq_len: int = 24  # use past 24 hours
batch_size: int = 64
n_features: int = X_train_scaled.shape[1]

train_dataset = TrafficSequenceDataset(X_train_scaled, y_train, seq_len)
val_dataset = TrafficSequenceDataset(X_val_scaled, y_val, seq_len)

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = LSTMRegressor(n_features=n_features, hidden_size=64, num_layers=2, dropout=0.2).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.MSELoss()

def train_lstm(
    model: nn.Module,
    train_loader: DataLoader,
    val_loader: DataLoader,
    n_epochs: int = 10,
) -> None:
    """Train the LSTM model with basic logging."""
    for epoch in range(1, n_epochs + 1):
        model.train()
        train_losses: List[float] = []
        for X_batch, y_batch in train_loader:
            X_batch = X_batch.to(device)
            y_batch = y_batch.to(device)
            optimizer.zero_grad()
            preds = model(X_batch)
            loss = criterion(preds, y_batch)
            loss.backward()
            optimizer.step()
            train_losses.append(float(loss.item()))

        model.eval()
        val_losses: List[float] = []
        with torch.no_grad():
            for X_batch, y_batch in val_loader:
                X_batch = X_batch.to(device)
                y_batch = y_batch.to(device)
                preds = model(X_batch)
                loss = criterion(preds, y_batch)
                val_losses.append(float(loss.item()))

        print(
            f"Epoch {epoch:02d} | "
            f"train_loss={np.mean(train_losses):.4f} | val_loss={np.mean(val_losses):.4f}"
        )


# Training the LSTM may take a few minutes depending on hardware.
train_lstm(model, train_loader, val_loader, n_epochs=10)

### 7.1 Evaluate the LSTM on the test set

We construct test sequences and compare against the HGB baseline.


In [ ]:
test_dataset = TrafficSequenceDataset(X_test_scaled, y_test, seq_len)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

model.eval()
y_test_lstm: List[float] = []
y_true_lstm: List[float] = []

with torch.no_grad():
    for X_batch, y_batch in test_loader:
        X_batch = X_batch.to(device)
        preds = model(X_batch).cpu().numpy()
        y_test_lstm.append(preds)
        y_true_lstm.append(y_batch.numpy())

y_test_lstm_arr = np.concatenate(y_test_lstm)
y_true_lstm_arr = np.concatenate(y_true_lstm)

print_metrics("LSTM (test)", y_true_lstm_arr, y_test_lstm_arr)

### 7.2 Visual comparison: HGB vs LSTM


In [ ]:
n_plot_seq: int = 7 * 24
plt.plot(test_df.index[seq_len : seq_len + n_plot_seq], y_true_lstm_arr[:n_plot_seq], label="actual")
plt.plot(test_df.index[seq_len : seq_len + n_plot_seq], y_test_pred_hgb[seq_len : seq_len + n_plot_seq], label="HGB")
plt.plot(test_df.index[seq_len : seq_len + n_plot_seq], y_test_lstm_arr[:n_plot_seq], label="LSTM", linestyle="--")
plt.title("Test set – actual vs HGB vs LSTM (first week with sequences)")
plt.ylabel("Traffic volume")
plt.legend()
plt.show()

## 8. Optional: Temporal Fusion Transformer (TFT) sketch

Below is a **sketch** of how you could set up a Temporal Fusion Transformer
using `pytorch-forecasting`. It is not runnable without installing
`pytorch-forecasting` and adapting the data pipeline, but it shows the core
ideas and configuration.


In [ ]:
# This cell is illustrative and may require adaptation to run.
try:
    import pytorch_lightning as pl
    from pytorch_forecasting import TimeSeriesDataSet, TemporalFusionTransformer
    from pytorch_forecasting.metrics import QuantileLoss

    has_tft = True
except ImportError:
    has_tft = False
    print("pytorch-forecasting not installed; skip TFT example or install it first.")

if has_tft:
    # Example max encoder/decoder lengths
    max_encoder_length = 24 * 7
    max_prediction_length = 24

    df_tft = sup_df.copy()
    df_tft["time_idx"] = np.arange(len(df_tft))
    df_tft["series"] = 0  # single series id

    training_cutoff = df_tft["time_idx"].max() - max_prediction_length * 2

    tft_dataset = TimeSeriesDataSet(
        df_tft,
        time_idx="time_idx",
        target="target",
        group_ids=["series"],
        max_encoder_length=max_encoder_length,
        max_prediction_length=max_prediction_length,
        time_varying_unknown_reals=["target"],
        time_varying_known_reals=["hour", "dayofweek", "is_weekend", "month"],
        static_categoricals=None,
        target_normalizer=None,
        allow_missing_timesteps=True,
    )

    train_tft, val_tft = tft_dataset.split_before(training_cutoff)
    train_loader_tft = train_tft.to_dataloader(train=True, batch_size=64)
    val_loader_tft = val_tft.to_dataloader(train=False, batch_size=64)

    tft = TemporalFusionTransformer.from_dataset(
        train_tft,
        learning_rate=1e-3,
        hidden_size=32,
        attention_head_size=4,
        dropout=0.1,
        loss=QuantileLoss(),
    )

    trainer = pl.Trainer(max_epochs=10, accelerator="auto")
    trainer.fit(tft, train_dataloaders=train_loader_tft, val_dataloaders=val_loader_tft)


## 9. Summary

In this notebook we:

- Built a **modern traffic forecasting** pipeline on the Metro dataset.
- Engineered calendar and lag features and used a **time-aware split**.
- Compared simple baselines with a strong tree-based model
  (**HistGradientBoostingRegressor**).
- Implemented a **sequence LSTM model** and compared it against HGB.
- Sketched how to plug in a **Temporal Fusion Transformer** using
  `pytorch-forecasting` for probabilistic, attention-based forecasting.

From here you can:

- Extend to multi-step (24h) forecasting.
- Add richer weather sources or incident flags.
- Move the LSTM / TFT into a proper training loop with GPU and logging.
