<a href="https://colab.research.google.com/github/basugautam/Reproducibility-Challenge-Project/blob/Architecture-Files/1implenetation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
### 📦 Step 1: Installing Required Libraries

#In this step, we install all necessary Python packages to perform deep learning, data handling, and visualization. These include:
#- `torch`, `torchvision`: for building and training neural networks.
#- `pandas`, `numpy`: for handling and manipulating time series data.
#- `scikit-learn`: for data splitting and preprocessing utilities.
#- `matplotlib`: for plotting graphs to visualize error and loss.
#- `einops`: for tensor manipulation, commonly used in transformer models.

#These packages form the basic toolkit for our forecasting project using loss shaping constraints.


In [2]:
# STEP 1: Install Required Libraries
!pip install einops
!pip install torch torchvision
!pip install pandas numpy matplotlib scikit-learn

# STEP 2: Download the Dataset
!wget https://raw.githubusercontent.com/zhouhaoyi/ETDataset/main/ETT-small/ETTm1.csv

import pandas as pd
import numpy as np
import torch
from torch import nn, optim
from torch.utils.data import DataLoader, TensorDataset
import matplotlib.pyplot as plt


--2025-04-10 00:05:08--  https://raw.githubusercontent.com/zhouhaoyi/ETDataset/main/ETT-small/ETTm1.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10360719 (9.9M) [text/plain]
Saving to: ‘ETTm1.csv.3’


2025-04-10 00:05:08 (95.8 MB/s) - ‘ETTm1.csv.3’ saved [10360719/10360719]



In [3]:
###  Step 2: Downloading the Time Series Dataset

#We use the **ETTm1** dataset (15-minute resolution Electricity Transformer Temperature) from the ETT benchmark, which is widely used for long-term time series forecasting.

#This dataset is downloaded directly from GitHub, and contains hourly sensor readings for electric transformers. We'll preprocess it in the next step to prepare inputs for our model.


In [4]:
def load_and_preprocess(path, input_len=96, pred_len=96):
    df = pd.read_csv(path)
    df = df.drop(columns=['date'])
    data = df.values.astype(np.float32)
    mean, std = data.mean(), data.std()
    data = (data - mean) / std

    X, Y = [], []
    for i in range(len(data) - input_len - pred_len):
        X.append(data[i:i+input_len])
        Y.append(data[i+input_len:i+input_len+pred_len])
    return torch.tensor(X), torch.tensor(Y)

X, Y = load_and_preprocess("ETTm1.csv")
dataset = TensorDataset(X, Y)
loader = DataLoader(dataset, batch_size=32, shuffle=True)


  return torch.tensor(X), torch.tensor(Y)


In [5]:
###  Step 3: Preprocessing and Windowing the Time Series Data

#Here, we:
#- Remove the date column and normalize the feature values using z-score normalization.
#- Create rolling windows to prepare the data for supervised learning:
 # - `input_len`: Number of historical steps used for prediction.
 # - `pred_len`: Number of future steps the model should forecast.

#Each sample becomes a pair `(input_window, prediction_window)`. These are then converted into PyTorch tensors for use in training and evaluation.


In [6]:
class SimpleTransformer(nn.Module):
    def __init__(self, input_dim, hidden_dim=64, pred_len=96):
        super().__init__()
        self.encoder = nn.Linear(input_dim, hidden_dim)
        self.transformer = nn.TransformerEncoder(
            nn.TransformerEncoderLayer(d_model=hidden_dim, nhead=4),
            num_layers=2
        )
        self.decoder = nn.Linear(hidden_dim, pred_len)

    def forward(self, x):
        x = self.encoder(x)
        x = self.transformer(x)
        x = x.mean(dim=1)
        return self.decoder(x)


In [7]:
### 🔧 Step 4: Building a Simple Transformer Model

#We define a minimal transformer encoder-based model:
#- First, we embed input features into a higher-dimensional space.
#- Then, we pass the input through a transformer encoder (multi-head self-attention).
#- Finally, the output is averaged across time steps and passed through a linear decoder to predict future values.

#This simple model captures sequential dependencies and long-term patterns in time series.


In [8]:
model = SimpleTransformer(input_dim=X.shape[2])
optimizer = optim.Adam(model.parameters(), lr=1e-3)

Tp = Y.shape[1]
lambda_ = torch.zeros(Tp, requires_grad=False)
zeta = torch.zeros(Tp, requires_grad=True)
eps = torch.ones(Tp) * 0.5  # baseline constraint
h = lambda z: torch.norm(z, p=2)**2

for epoch in range(5):
    for xb, yb in loader:
        preds = model(xb)
        per_step_loss = (preds - yb).pow(2).mean(dim=0)

        slack = per_step_loss - (eps + zeta)
        loss_main = per_step_loss.mean() + h(zeta)

        optimizer.zero_grad()
        loss_main.backward()
        optimizer.step()

        with torch.no_grad():
            zeta -= 0.01 * (2 * zeta - lambda_)
            zeta = zeta.clamp(min=0)
            lambda_ += 0.01 * slack
            lambda_ = lambda_.clamp(min=0)

    print(f"Epoch {epoch+1}, Loss: {loss_main.item():.4f}")




RuntimeError: The size of tensor a (96) must match the size of tensor b (7) at non-singleton dimension 2

In [None]:
###  Step 5: Training with Loss Shaping Constraints (Primal-Dual)

# We now implement the **core idea** of the paper:
#- Instead of minimizing average loss (ERM), we constrain the per-step losses to stay under `ε + ζ`.
#- We alternate updates for:
#  - `θ`: model parameters using gradient descent.
 # - `ζ`: slack variables (relaxation) via gradient descent.
 # - `λ`: dual variables (multipliers) via gradient ascent.

#This technique reshapes the loss distribution across future time steps, improving consistency in forecasts.


In [None]:
model.eval()
with torch.no_grad():
    preds = model(X[:32])
    mse = ((preds - Y[:32])**2).mean(dim=0).cpu().numpy()

plt.plot(mse)
plt.title("Per-Step MSE Across Prediction Window")
plt.xlabel("Step")
plt.ylabel("MSE")
plt.grid()
plt.show()


In [None]:
###  Step 6: Visualizing the Per-Step Forecasting Error

#We plot the Mean Squared Error (MSE) for each forecast step (1 to `Tp`), showing how the error varies across the prediction horizon.

#This helps us observe:
- Whether the loss is uniformly distributed (desired).
- If there are spikes at certain future steps.
#This is critical for evaluating long-term consistency in time series forecasting.


In [None]:
# Exponential increase: eps_t = base + growth_rate * exp(step / Tp)
import math

Tp = Y.shape[1]
step_range = torch.arange(Tp)
eps = 0.2 + 0.3 * torch.exp(0.01 * step_range)  # exponentially increasing upper bound


In [None]:
### 📈 Step 7: Using Exponential Constraints for Forecast Steps

#Instead of using a constant error limit across all steps, we apply an **exponentially increasing constraint**:
#- Errors closer to the current time are penalized more.
#- Farther errors are allowed more flexibility.

#This mimics realistic scenarios where distant predictions are inherently more uncertain.


In [None]:
from sklearn.model_selection import train_test_split

X_train, X_val, Y_train, Y_val = train_test_split(X, Y, test_size=0.2, shuffle=False)

train_loader = DataLoader(TensorDataset(X_train, Y_train), batch_size=32, shuffle=True)
val_loader = DataLoader(TensorDataset(X_val, Y_val), batch_size=32, shuffle=False)

train_logs, val_logs = [], []


In [None]:
### 🧪 Step 8: Splitting Data into Train and Validation Sets

#We split the dataset chronologically into:
#- Training data: for learning model weights.
#- Validation data: for evaluating generalization.

#This split is crucial to prevent data leakage in time series tasks and to monitor overfitting.


In [None]:
for epoch in range(10):
    model.train()
    epoch_loss = 0
    for xb, yb in train_loader:
        preds = model(xb)
        per_step_loss = (preds - yb).pow(2).mean(dim=0)

        slack = per_step_loss - (eps + zeta)
        loss_main = per_step_loss.mean() + h(zeta)

        optimizer.zero_grad()
        loss_main.backward()
        optimizer.step()

        with torch.no_grad():
            zeta -= 0.01 * (2 * zeta - lambda_)
            zeta = zeta.clamp(min=0)
            lambda_ += 0.01 * slack
            lambda_ = lambda_.clamp(min=0)

        epoch_loss += loss_main.item()

    # Validation
    model.eval()
    with torch.no_grad():
        val_loss = 0
        for xb, yb in val_loader:
            preds = model(xb)
            val_loss += ((preds - yb) ** 2).mean().item()
        val_loss /= len(val_loader)

    train_logs.append(epoch_loss / len(train_loader))
    val_logs.append(val_loss)

    print(f"Epoch {epoch+1}, Train: {train_logs[-1]:.4f}, Val: {val_logs[-1]:.4f}")


In [None]:
### 🔁 Step 9: Enhanced Training Loop with Train/Validation Logging

#We improve the training loop by:
#- Logging both training and validation loss after every epoch.
#- Ensuring evaluation is done in `eval` mode.
#- Storing losses for plotting and TensorBoard logging.

#This allows us to monitor learning progression and compare performances across epochs.


In [None]:
plt.plot(train_logs, label="Train Loss")
plt.plot(val_logs, label="Validation Loss")
plt.xlabel("Epoch")
plt.ylabel("MSE")
plt.title("Training and Validation Loss")
plt.legend()
plt.grid()
plt.show()


In [None]:
### 📉 Step 10: Plotting Train vs Validation Loss

#We visualize:
#- How the model improves (or deteriorates) over time.
#- When the training starts overfitting (if val loss diverges).

##This diagnostic plot is essential for early stopping and hyperparameter tuning.


In [None]:
model.eval()
with torch.no_grad():
    preds = model(X_val[:32])
    mse = ((preds - Y_val[:32])**2).mean(dim=0).cpu().numpy()

plt.plot(eps, label="Constraint (ϵ)")
plt.plot(mse, label="Validation MSE per step")
plt.xlabel("Forecast Step")
plt.ylabel("MSE")
plt.title("Loss Shaping Effect (with Exponential Constraint)")
plt.legend()
plt.grid()
plt.show()


In [None]:
### 🧾 Step 11: Visualizing Constraints vs Actual MSE

#Here, we compare:
#- The actual forecast errors at each step (validation MSE).
#- The constraint `ε` applied at each step.

#This lets us see how well the model adhered to the desired constraint structure across the prediction horizon.


In [None]:
# Install TensorBoard and set up logging directory
%load_ext tensorboard
import os
from torch.utils.tensorboard import SummaryWriter

log_dir = "./runs/loss_shaping"
os.makedirs(log_dir, exist_ok=True)
writer = SummaryWriter(log_dir)


In [None]:
### 📊 Step 12: TensorBoard Logging Setup

#We integrate **TensorBoard**, a powerful tool to:
- Track training and validation loss.
- Monitor how MSE evolves per forecast step.
- Visualize dual variable effects and constraint adherence.

#TensorBoard enables detailed insights into model behavior.


In [None]:
    writer.add_scalar("Loss/Train", train_logs[-1], epoch)
    writer.add_scalar("Loss/Validation", val_logs[-1], epoch)

    # Log per-step constraint vs val MSE at specific epochs
    if epoch % 2 == 0:
        with torch.no_grad():
            preds = model(X_val[:32])
            step_mse = ((preds - Y_val[:32])**2).mean(dim=0)
            for i in range(Tp):
                writer.add_scalar(f"MSE_Step/Step_{i}", step_mse[i], epoch)
                writer.add_scalar(f"Constraint/Step_{i}", eps[i], epoch)


In [None]:
### 🧠 Step 13: Logging Scalars and Per-Step Errors to TensorBoard

#We log:
- Total training and validation loss each epoch.
- Per-step MSE and constraint values at selected intervals.

#This provides deep visibility into how each time-step behaves across epochs and helps analyze constraint violations.


In [None]:
# Launch TensorBoard
%tensorboard --logdir runs/loss_shaping


In [None]:
### 🚀 Step 14: Launching TensorBoard for Visualization

#This launches the live TensorBoard interface in Colab where you can:
- Explore time series plots of loss, constraints, and MSE.
- Compare training behavior across configurations.


In [None]:
# Save after training
torch.save(model.state_dict(), "loss_shaping_model.pt")


In [None]:
### 💾 Step 15: Saving and Reloading the Model

#After training:
- Save model weights to a `.pt` file.
- Reload later for evaluation, re-training, or deployment.

#This ensures reproducibility and prevents losing progress between sessions.


In [None]:
# Load before evaluation or continuing training
model.load_state_dict(torch.load("loss_shaping_model.pt"))
model.eval()
