<a href="https://colab.research.google.com/github/basugautam/Reproducibility-Challenge-Project/blob/Architecture-Files/2_AUTOFORMER_revised_ERM_Exponential_Constant_losses.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
#  (a#) Why we are doing this:
# , we aim to evaluate the effectiveness of different loss shaping strategies — namely **ERM (Empirical Risk Minimization)**, **Exponential**, and **Constant** — in training the Autoformer model.
# We visualize the learning progress of the model using different types of penalties on the loss function. Just like comparing the weather conditions in three cities (Toronto, Vancouver, Calgary), we observe how each penalty affects the learning trajectory.
# By comparing how each loss function strategy (ERM, Exponential, and Constant) affects training, we gain insight into which one results in the best learning for our time series forecasting problem.

#  (b#) How this works:
# - The training loop for **Autoformer** is established, where we will compute and apply the loss at each epoch.
# - Different loss shaping strategies (ERM, Exponential, Constant) are applied to regularize the training process in different ways.
# - We will visualize the loss for each strategy over the epochs to track how the model’s performance changes, like taking photos of the changing weather in each city at different times.
# By using **color-coded graphs**, we can easily compare the performance of each strategy.
# We will apply the **ERM (Sky Blue)** strategy as a baseline, **Exponential (Orange)** to highlight rapid changes, and **Constant (Green)** for stable performance across epochs.

#  (c#) Explanation of terms:
# - **ERM**: This is the basic, vanilla loss function (Mean Squared Error), which just compares the predictions to the ground truth, without any penalties. Think of this as an unseasoned dish — simple but effective.
# - **Exponential**: This strategy introduces an exponentially growing penalty on the loss if the model deviates, like the consequences of breaking school rules that increase rapidly.
# - **Constant**: This strategy applies a fixed penalty on the loss, irrespective of the model’s performance. It's like giving the same gentle nudge regardless of the situation.
# - **Epoch**: An epoch is one complete cycle through the training data. Like attending a full day of classes in a city, we go through all the data once.
# - **Loss**: Loss is the difference between the model’s prediction and the true value. It measures how far off we are, just like checking how far we’ve strayed from the expected route.
# - **Color coding**: We use distinct colors (Sky Blue, Orange, Green) to distinguish between each strategy and make the plot visually clear.
# - **Line styles**: Solid, dashed, and dash-dot lines are used to differentiate between the strategies on the plot.

#  (d#) What we will achieve:
# The end result of these operations will give us three graphs showing how each strategy affects training. By doing so, we’ll have clear insight into which shaping constraint (weather) gives the best results for our Autoformer (model).
# This helps us choose the optimal regularization strategy for time-series forecasting, ensuring that our model can learn from the data efficiently.
# By the end of the process, we will be able to select the best strategy based on its performance (loss behavior) over epochs.

# (e#) Code for Visualization:
# In this section, we will plot the loss trends for all three strategies — ERM, Exponential, and Constant.
# This will help us visually compare their performances and see which one converges faster, or perhaps gives us more stable results.

#  (f#) Loss Tracking for ERM, Exponential, and Constant:
# The training loss for each strategy is tracked during the epochs. We will store these losses and plot them using **color-coded graphs** for better comparison.

#  (g#) Visualizing the Results:
# Using the matplotlib library, we will generate the loss curves for all three strategies. We’ll use distinct colors (Sky Blue, Orange, and Green) to differentiate them on the graph.
# This visualization will show us how each loss function strategy behaves throughout the training process.

#  (h#) Expected Outcome:
# After running the training loop for each strategy, we will visualize the training loss trends. These graphs will allow us to clearly compare the performance of each strategy and make an informed decision about which regularization strategy yields the best model performance.


In [2]:
import pandas as pd

# Load ECL dataset from the UCI Machine Learning Repository
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00374/energydata_complete.csv'
ecl_data = pd.read_csv(url)

# Display the first few rows of the dataset
ecl_data.head()


HTTPError: HTTP Error 502: Bad Gateway

In [None]:
# Check for missing values in the ECL dataset
ecl_missing = ecl_data.isnull().sum()

# Fill missing values using forward fill method
ecl_data_filled = ecl_data.fillna(method='ffill')

# Display the first few rows after filling missing values
ecl_data_filled.head()


In [None]:
from sklearn.preprocessing import MinMaxScaler

# Initialize the MinMaxScaler to normalize data
scaler = MinMaxScaler()

# Normalize the 'T1' feature (temperature)
ecl_data_filled['T1_normalized'] = scaler.fit_transform(ecl_data_filled[['T1']])

# Display the first few rows of the normalized data
ecl_data_filled[['T1', 'T1_normalized']].head()


In [None]:
import matplotlib.pyplot as plt

# Plot the temperature data ('T1') to inspect its trend over time
plt.figure(figsize=(12, 6))
plt.plot(ecl_data_filled['T1'], label='Temperature (T1)')
plt.title('Electricity Consumption Load - Temperature (T1)')
plt.xlabel('Timestamp')
plt.ylabel('Temperature')
plt.legend()
plt.show()

# Plot the normalized temperature data
plt.figure(figsize=(12, 6))
plt.plot(ecl_data_filled['T1_normalized'], label='Normalized Temperature (T1)')
plt.title('Normalized Temperature in ECL Dataset')
plt.xlabel('Timestamp')
plt.ylabel('Normalized Temperature')
plt.legend()
plt.show()


In [None]:
# Display summary statistics of the preprocessed ECL data
ecl_data_filled.describe()


In [None]:
import matplotlib.pyplot as plt

# Number of features excluding the 'date' column
num_features = ecl_data_filled.shape[1] - 1  # Subtracting the 'date' column

# Calculate the number of rows and columns for the subplot layout dynamically
num_columns = 3  # Number of columns (we can adjust this as needed)
num_rows = (num_features // num_columns) + (num_features % num_columns > 0)  # Calculate number of rows

# Plot all features to inspect their behavior over time
plt.figure(figsize=(14, num_rows * 3))  # Adjusting figure size based on the number of rows
ecl_data_filled.drop(['date'], axis=1).plot(subplots=True, layout=(num_rows, num_columns), figsize=(14, num_rows * 3))
plt.suptitle('All Features from ECL Dataset', fontsize=16)
plt.tight_layout()
plt.show()


In [None]:
# Cell continuation from previous cell for inspecting data

import matplotlib.pyplot as plt

# Number of features excluding the 'date' column
num_features = ecl_data_filled.shape[1] - 1  # Subtracting the 'date' column

# Calculate the number of rows and columns for the subplot layout dynamically
num_columns = 3  # Number of columns (you can adjust this as needed)
num_rows = (num_features // num_columns) + (num_features % num_columns > 0)  # Calculate number of rows

# Plot all features to inspect their behavior over time
plt.figure(figsize=(14, num_rows * 3))  # Adjusting figure size based on the number of rows
ecl_data_filled.drop(['date'], axis=1).plot(subplots=True, layout=(num_rows, num_columns), figsize=(14, num_rows * 3))
plt.suptitle('All Features from ECL Dataset', fontsize=16)
plt.tight_layout()
plt.show()


In [None]:
# a) Why we are using this strategy:
#   Visualizing all the features allows us to understand their individual behaviors and how they vary over time.
#   This is critical for understanding potential trends, seasonality, and anomalies in the dataset before applying models.
# b) How these codes will solve our purpose:
#   The code drops the 'date' column and plots each feature in a separate subplot to show how each one behaves over time.
#   It uses dynamic calculations for the number of rows and columns to efficiently manage space and visual clarity.
# c) Explanation of terms used:
#   - **`ecl_data_filled`**: The dataset after filling missing values (through forward fill).
#   - **`plot(subplots=True)`**: This method allows us to plot each column in the DataFrame as a subplot, making it easier to analyze multiple features.
#   - **`layout=(num_rows, num_columns)`**: Specifies the layout of subplots to display them neatly in a grid format.
#   - **`tight_layout()`**: Adjusts the spacing between subplots to prevent overlap.
# d) What we will achieve from this operation:
#   We will visualize each feature's behavior over time in separate plots, which will help identify patterns and outliers.


In [None]:
# Import necessary libraries for defining the model
import torch
import torch.nn as nn
import torch.optim as optim

# Define the Autoformer Model
class Autoformer(nn.Module):
    def __init__(self, input_size, output_size, hidden_size, num_layers, dropout=0.1):
        super(Autoformer, self).__init__()

        # LSTM Layer: Useful for capturing temporal dependencies
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, dropout=dropout)

        # Fully connected layer: To map LSTM output to final prediction
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        # Passing input through LSTM layer
        out, _ = self.lstm(x)

        # Output from the last time step
        out = self.fc(out[:, -1, :])
        return out

# Example parameters (can be tuned later)
input_size = num_features  # Number of features (excluding 'date')
output_size = 1  # Predicting one value per time step (e.g., next value)
hidden_size = 64  # Number of features in the hidden state
num_layers = 2  # Number of LSTM layers
dropout = 0.1  # Dropout to prevent overfitting

# Instantiate the Autoformer model
autoformer_model = Autoformer(input_size, output_size, hidden_size, num_layers, dropout)

# Print model architecture
print(autoformer_model)


In [None]:
# a) Why we are using this strategy:
#   The Autoformer model is designed to capture time-series data effectively by using LSTM layers.
#   It is chosen due to its ability to model temporal patterns and trends in data over time, which is crucial for forecasting tasks.
# b) How these codes will solve our purpose:
#   The model is built using LSTM layers that capture sequential dependencies and a final fully connected layer to produce predictions.
#   We can then use this model to forecast the future values of the time series based on historical data.
# c) Explanation of terms used:
#   - **LSTM (Long Short-Term Memory)**: A type of recurrent neural network (RNN) designed to handle long-range dependencies in sequential data.
#   - **Fully Connected Layer**: A neural network layer where each neuron is connected to every neuron in the next layer.
#   - **Dropout**: A regularization technique used to prevent overfitting by randomly "dropping" units during training.
# d) What we will achieve from this operation:
#   This will define a model that can learn temporal dependencies and make predictions for future time steps.


In [None]:
from sklearn.preprocessing import MinMaxScaler

# Assuming 'ecl_data_filled' is the dataset loaded in previous steps
# We will use MinMaxScaler to scale the data to the range [0, 1]
scaler = MinMaxScaler()

# Scale the data excluding the 'date' column
data_scaled = pd.DataFrame(scaler.fit_transform(ecl_data_filled.drop(['date'], axis=1)))

# Check the first few rows of the scaled data
data_scaled.head()


In [None]:
# a) Why we are using this strategy:
#   Scaling ensures that all features have a similar range, which helps improve the training performance of the model.
#   It also prevents certain features from dominating the learning process due to differing magnitudes.
# b) How these codes will solve our purpose:
#   The `MinMaxScaler` scales the values of all features between 0 and 1, ensuring consistency in the range of input data.
# c) Explanation of terms used:
#   - **MinMaxScaler**: A feature scaling method that rescales the data so that all feature values lie between 0 and 1.
#   - **fit_transform()**: A method used to first compute the scaling parameters and then apply the transformation to the data.
# d) What we will achieve from this operation:
#   The dataset will be scaled, and all features will lie within the same range, making the model more stable and efficient during training.


In [None]:
# Split the scaled data into features (X) and target (y)
X_scaled = data_scaled.iloc[:, :-1]  # All columns except the last one (features)
y_scaled = data_scaled.iloc[:, -1]  # The last column (target)

# Convert to PyTorch tensors
X_scaled = torch.tensor(X_scaled.values, dtype=torch.float32)
y_scaled = torch.tensor(y_scaled.values, dtype=torch.float32)


In [None]:
# a) Why we are using this strategy:
#   The features (X) are all columns except the target column, and the target (y) is the last column of the dataset.
#   Converting the data into PyTorch tensors allows the model to handle the data effectively during training.
# b) How these codes will solve our purpose:
#   By separating the features and target and converting them to tensors, we can efficiently pass them into the model for training.
# c) Explanation of terms used:
#   - **iloc**: An indexing method in pandas to select rows and columns based on integer position.
#   - **torch.tensor()**: Converts the data into a PyTorch tensor, which is a data structure that PyTorch uses for efficient computation.
# d) What we will achieve from this operation:
#   We will have the features and target separated and ready to be fed into the model for training.


In [None]:
import torch
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
import torch.optim as optim
import torch.nn as nn

# Assuming 'ecl_data_filled' is the data we have already prepared

# Scaling the features (excluding the 'date' column)
scaler = MinMaxScaler()
data_scaled = pd.DataFrame(scaler.fit_transform(ecl_data_filled.drop(['date'], axis=1)))

# Check the shape of the scaled data
print(data_scaled.shape)

# Now we create sequences from the scaled data (X_scaled)
sequence_length = 10  # You can adjust this value as needed

# Function to create sequences
def create_sequences(data, seq_length):
    sequences = []
    for i in range(len(data) - seq_length):
        sequences.append(data.iloc[i:i+seq_length].values)  # Convert to numpy array
    return torch.tensor(sequences, dtype=torch.float32)

# Create sequences for the input data
X_scaled_sequences = create_sequences(data_scaled, sequence_length)

# Check the shape of the reshaped data
print(X_scaled_sequences.shape)  # Should be (num_samples - sequence_length, sequence_length, num_features)

# Define the Autoformer model
class Autoformer(nn.Module):
    def __init__(self, input_size, output_size, hidden_size, num_layers, dropout):
        super(Autoformer, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers

        # Define layers
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, dropout=dropout)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        # x should be of shape (batch_size, sequence_length, input_size)
        out, _ = self.lstm(x)
        out = self.fc(out[:, -1, :])  # Take the output from the last time step
        return out

# Initialize the Autoformer model
autoformer_model = Autoformer(input_size=X_scaled_sequences.shape[2],
                              output_size=1,
                              hidden_size=64,
                              num_layers=2,
                              dropout=0.1)

# Define the optimizer and loss function
optimizer = optim.Adam(autoformer_model.parameters(), lr=0.001)
criterion = nn.MSELoss()

# Training loop
num_epochs = 50
for epoch in range(num_epochs):
    # Forward pass
    predictions = autoformer_model(X_scaled_sequences)

    # Compute the loss
    loss = criterion(predictions, y_scaled[sequence_length:])

    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # Print the loss every 10 epochs
    if (epoch + 1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

# a) Why we are using this strategy:
#   The training loop is updated to use the reshaped data (3D tensor), and the loss is calculated based on the output of the model.
# b) How these codes will solve our purpose:
#   The model will process the sequential data, and we will train it to minimize the MSE loss.
# c) Explanation of terms used:
#   - **y_scaled[sequence_length:]**: Since the data is reshaped into sequences, we start the target variable from the `sequence_length` index to match the corresponding sequence inputs.
# d) What we will achieve from this operation:
#   The model will be trained on the sequential data and learn the temporal patterns in the data.


In [None]:
import torch
import torch.optim as optim
import torch.nn as nn

# Assuming X_scaled_sequences and y_scaled are already defined

# Initialize the Autoformer model
autoformer_model = Autoformer(input_size=X_scaled_sequences.shape[2],
                              output_size=1,
                              hidden_size=64,
                              num_layers=2,
                              dropout=0.1)
import torch
import torch.nn as nn
import torch.optim as optim

# Define the loss shaping functions (for ERM, Exponential, Constant)
def exponential_loss(prediction, target, alpha=1.0):
    # Exponential loss function
    return torch.mean(torch.exp(alpha * (prediction - target)) - 1)

def constant_loss(prediction, target, penalty=0.1):
    # Constant loss function (fixed penalty)
    return penalty * torch.mean(torch.abs(prediction - target))

# Define the standard loss function (ERM - Empirical Risk Minimization)
def standard_loss(prediction, target):
    return torch.mean((prediction - target) ** 2)  # Example: MSE loss

# Autoformer Model (same as before)
autoformer_model = Autoformer(input_size=X_scaled_sequences.shape[2],
                              output_size=1,
                              hidden_size=64,
                              num_layers=2,
                              dropout=0.1)

optimizer = optim.Adam(autoformer_model.parameters(), lr=0.001)

# Training Loop with loss shaping constraints (ERM, Exponential, Constant)
for epoch in range(num_epochs):
    autoformer_model.train()
    optimizer.zero_grad()

    # Forward pass
    predictions = autoformer_model(X_scaled_sequences)

    # Compute the standard loss (ERM)
    loss = standard_loss(predictions, y_scaled[sequence_length:])

    # Apply Exponential loss shaping (optional)
    exp_loss = exponential_loss(predictions, y_scaled[sequence_length:], alpha=1.0)

    # Apply Constant loss shaping (optional)
    const_loss = constant_loss(predictions, y_scaled[sequence_length:], penalty=0.1)

    # Total loss = Standard Loss + Loss Shaping Constraints
    total_loss = loss + exp_loss + const_loss

    # Backward pass
    total_loss.backward()
    optimizer.step()

    if (epoch + 1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {total_loss.item():.4f}')

# Define the optimizer and loss function
optimizer = optim.Adam(autoformer_model.parameters(), lr=0.001)
criterion = nn.MSELoss()

# Training loop
num_epochs = 50
for epoch in range(num_epochs):
    autoformer_model.train()  # Set the model to training mode
    # Forward pass
    predictions = autoformer_model(X_scaled_sequences)

    # Compute the loss
    loss = criterion(predictions, y_scaled[sequence_length:])

    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # Print the loss every 10 epochs
    if (epoch + 1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

# Evaluate the model
autoformer_model.eval()  # Set the model to evaluation mode
with torch.no_grad():  # No gradient calculation needed during evaluation
    predictions = autoformer_model(X_scaled_sequences)  # Get predictions from the model

# Evaluate performance using MSE and R-squared
# Assuming y_scaled is the true target values corresponding to X_scaled_sequences
mse = criterion(predictions, y_scaled[sequence_length:])
r_squared = 1 - torch.sum((predictions - y_scaled[sequence_length:]) ** 2) / torch.sum((y_scaled[sequence_length:] - torch.mean(y_scaled[sequence_length:])) ** 2)

# Print the evaluation metrics
print(f'MSE: {mse.item():.4f}')
print(f'R-squared: {r_squared.item():.4f}')


In [None]:
# #a) Why we are using this strategy:
# In the machine learning domain, the process of training models involves minimizing a loss function to improve prediction accuracy.
# To ensure that our model not only fits the data well but also generalizes effectively, we add extra components like regularization and loss shaping constraints.
# This is particularly important when the model starts to overfit or underfit the training data. By adding additional loss terms like the Exponential and Constant loss functions,
# we are imposing constraints that guide the model to learn in a more structured and robust manner, improving its ability to predict unseen data accurately.
# Regularization terms help in preventing the model from becoming too complex or overfitting to the noise in the training set.

# #b) How these codes, functions, and operations will solve our purpose:
# The training loop has been designed to optimize the model by minimizing a combined loss function that includes:
#   1. **Standard Loss (ERM)**: This is the core loss function, typically Mean Squared Error (MSE), that measures how well the model's predictions match the true target values.
#   2. **Exponential Loss**: This additional loss component helps impose a smoother learning curve by penalizing large deviations from the target values exponentially, preventing erratic behavior in predictions.
#   3. **Constant Loss**: The constant loss introduces a fixed penalty based on the absolute difference between predictions and targets. It forces the model to balance between fitting the data and staying within a reasonable range of values.
# By incorporating these additional losses, the model will be encouraged to learn in a more structured manner, avoiding overfitting and ensuring better generalization.

# #c) Explanation of the terms used:
#   1. **Standard Loss (ERM)**: Empirical Risk Minimization (ERM) is a strategy where the model aims to minimize the discrepancy (error) between predicted values and actual target values on the training data. MSE (Mean Squared Error) is a common example of this loss function.
#   2. **Exponential Loss**: This is a type of loss where the penalty increases exponentially with the magnitude of the error. It’s useful in scenarios where larger errors should be penalized more heavily.
#   3. **Constant Loss**: The constant loss applies a fixed penalty (a constant) to the error, irrespective of its size. This is a way of constraining the model’s prediction behavior.
#   4. **Backpropagation**: This is the process of computing gradients of the loss function with respect to the model's parameters and updating the parameters accordingly using an optimization algorithm like Adam.

# #d) What will we achieve from this operation:
# By combining these losses, we are achieving several things:
#   - **Better Generalization**: The additional loss terms help the model generalize better by adding structure to the learning process.
#   - **Avoiding Overfitting**: The regularization effects of the exponential and constant losses prevent the model from memorizing the training data and instead encourage it to find patterns that are more broadly applicable.
#   - **Improved Robustness**: The combined loss function ensures the model learns not only from the data but also from additional constraints, making it more stable and robust to unseen data.


In [None]:
# a) Why we are using this strategy:
#   The training loop is designed to optimize the model using the Adam optimizer and minimize the error using the MSE loss function.
# b) How these codes will solve our purpose:
#   The model will learn from the input data and update its weights during each epoch to minimize the error and improve predictions.
# c) Explanation of terms used:
#   - **optimizer.zero_grad()**: Clears the gradients of all optimized tensors.
#   - **loss.backward()**: Computes the gradient of the loss with respect to the parameters.
#   - **optimizer.step()**: Updates the model parameters based on the gradients.
# d) What we will achieve from this operation:
#   The model will be trained over 50 epochs, and we will observe the loss decrease, indicating that the model is learning.


In [None]:
# a) Why we are using this strategy:
#   Evaluating the model on the test data allows us to assess its generalization performance and accuracy.
# b) How these codes will solve our purpose:
#   We calculate the MSE and R-squared scores, which are useful metrics for regression tasks, to quantify the model's performance.
# c) Explanation of terms used:
#   - **eval()**: Puts the model in evaluation mode, disabling certain features like dropout.
#   - **mean_squared_error()**: A function from sklearn that computes the mean squared error between predicted and actual values.
#   - **r2_score()**: A function from sklearn that calculates R-squared, indicating how well the model explains the variance in the data.
# d) What we will achieve from this operation:
#   We will evaluate the model's performance and get a sense of how well it predicts the target variable.


In [None]:
# Define the loss function and optimizer
criterion = nn.MSELoss()  # Mean Squared Error for regression tasks
optimizer = optim.Adam(autoformer_model.parameters(), lr=0.001)  # Adam optimizer for training

# Prepare the data for training (Example: You would typically split the data into training and validation sets)
X_train, y_train = data_scaled.iloc[:, :-1], data_scaled.iloc[:, -1]  # Features and target
X_train = torch.tensor(X_train.values, dtype=torch.float32)  # Convert to torch tensor
y_train = torch.tensor(y_train.values, dtype=torch.float32)

# Training loop (example)
num_epochs = 50
for epoch in range(num_epochs):
    # Forward pass
    predictions = autoformer_model(X_train)
    loss = criterion(predictions, y_train)

    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # Print training loss every 10 epochs
    if (epoch+1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')


In [None]:
# a) Why we are using this strategy:
#   The Adam optimizer is used for efficient training, while the Mean Squared Error (MSE) loss function is suitable for regression tasks.
#   We train the model over multiple epochs to optimize the weights of the network based on the training data.
# b) How these codes will solve our purpose:
#   By training the model over several epochs, it will learn the patterns and dependencies in the time series data, enabling accurate forecasts.
#   The optimizer updates the model weights to minimize the error, and the loss function quantifies the prediction error.
# c) Explanation of terms used:
#   - **Adam optimizer**: A popular optimization algorithm used for training neural networks that adapts the learning rate during training.
#   - **MSELoss**: A loss function that calculates the average squared differences between predicted and actual values, used for regression tasks.
#   - **Epochs**: A full pass over the entire dataset during training.
# d) What we will achieve from this operation:
#   The model will be trained and optimized to learn the patterns from the time series data.


In [None]:
# Evaluate the model (Example: on validation set or test set)
# Here, we would split the data into training and validation/test sets
X_test, y_test = data_scaled.iloc[:, :-1], data_scaled.iloc[:, -1]
X_test = torch.tensor(X_test.values, dtype=torch.float32)
y_test = torch.tensor(y_test.values, dtype=torch.float32)

# Test the model
model.eval()  # Set the model to evaluation mode
with torch.no_grad():  # No need to calculate gradients during testing
    predictions = model(X_test)

# Calculate performance metrics
from sklearn.metrics import mean_squared_error, r2_score
mse = mean_squared_error(y_test.numpy(), predictions.numpy())
r2 = r2_score(y_test.numpy(), predictions.numpy())

print(f'Mean Squared Error: {mse:.4f}')
print(f'R-squared: {r2:.4f}')


In [None]:
# a) Why we are using this strategy:
#   After training, we need to evaluate the model to assess how well it performs on unseen data (test set).
#   Metrics like Mean Squared Error (MSE) and R-squared help quantify model performance.
# b) How these codes will solve our purpose:
#   The code calculates the MSE and R-squared scores to evaluate the predictive accuracy of the trained model.
# c) Explanation of terms used:
#   - **Mean Squared Error (MSE)**: A common evaluation metric for regression tasks, representing the average squared difference between predicted and true values.
#   - **R-squared**: A measure of how well the model explains the variance in the target variable, with 1 indicating perfect prediction.
# d) What we will achieve from this operation:
#   We will assess how well the model is performing on the test set and understand its predictive capabilities.


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim

# Custom Loss Function with Shaping Constraints (e.g., L2 regularization or custom constraints)
class LossShaping(nn.Module):
    def __init__(self, alpha=0.01):
        super(LossShaping, self).__init__()
        self.alpha = alpha  # Regularization strength

    def forward(self, predictions, targets):
        mse_loss = nn.MSELoss()(predictions, targets)
        # Example constraint: L2 regularization on the weights
        l2_reg = torch.sum(torch.square(predictions))
        total_loss = mse_loss + self.alpha * l2_reg
        return total_loss

# Loss shaping implementation with the Autoformer model
autoformer_model = Autoformer(input_size=X_scaled_sequences.shape[2],
                              output_size=1,
                              hidden_size=64,
                              num_layers=2,
                              dropout=0.1)

optimizer = optim.Adam(autoformer_model.parameters(), lr=0.001)
loss_shaping = LossShaping(alpha=0.01)

# Training loop with loss shaping
for epoch in range(num_epochs):
    autoformer_model.train()
    predictions = autoformer_model(X_scaled_sequences)
    loss = loss_shaping(predictions, y_scaled[sequence_length:])

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if (epoch + 1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')
        # Initialize separate lists to store losses for each strategy
erm_losses = []
exponential_losses = []
constant_losses = []

# Example: Training with ERM
for epoch in range(num_epochs):
    autoformer_model.train()
    predictions = autoformer_model(X_scaled_sequences)
    loss = erm_loss_fn(predictions, y_scaled[sequence_length:])  # ERM loss function
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    loss_value = loss.item()
    erm_losses.append(loss_value)

# Example: Training with Exponential Loss Shaping
for epoch in range(num_epochs):
    autoformer_model.train()
    predictions = autoformer_model(X_scaled_sequences)
    loss = exponential_loss_fn(predictions, y_scaled[sequence_length:])  # Exponential loss
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    loss_value = loss.item()
    exponential_losses.append(loss_value)

# Example: Training with Constant Loss Shaping
for epoch in range(num_epochs):
    autoformer_model.train()
    predictions = autoformer_model(X_scaled_sequences)
    loss = constant_loss_fn(predictions, y_scaled[sequence_length:])  # Constant loss
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    loss_value = loss.item()
    constant_losses.append(loss_value)



In [None]:
# a) Loss shaping constraints are used to guide the model to avoid overfitting or undesirable
# behavior. The custom loss function combines MSE with an L2 regularization term (which penalizes
# large weights), thus controlling model complexity.

# b) By combining MSE with an L2 regularization term, we ensure the model doesn't overfit the data.
# The loss shaping function adjusts the model’s loss to consider both prediction accuracy (MSE)
# and weight regularization (L2 term). This helps in obtaining better generalization.

# c) `LossShaping` is a custom loss function class that takes predictions and targets as inputs.
# It computes MSE and adds an L2 penalty on the predictions. The regularization strength is controlled
# by the `alpha` parameter.

# d) By implementing loss shaping, we encourage the model to generalize better by preventing
# it from overfitting to the training data. The L2 regularization term penalizes large weights
# and helps improve the model's performance on unseen data.


In [None]:
import matplotlib.pyplot as plt

# Assuming you have predictions and actual values
predictions = autoformer_model(X_scaled_sequences).detach().numpy()
actual_values = y_scaled[sequence_length:].numpy()

# Plot predictions vs actual values
plt.figure(figsize=(10, 6))
plt.plot(actual_values, label='Actual')
plt.plot(predictions, label='Predicted', linestyle='--')
plt.xlabel('Time')
plt.ylabel('Value')
plt.legend()
plt.title('Predicted vs Actual Values')
plt.show()


In [None]:
# a) Visualizing predictions versus actual values allows us to assess the model’s performance.
# The graph helps us understand how well the model is generalizing and whether it’s overfitting
# or underfitting.

# b) By plotting the actual and predicted values, we can visually inspect how close the model’s
# predictions are to the true values. The dashed line represents predictions, while the solid line
# represents the true values, allowing easy comparison.

# c) `plt.plot()` is used to plot the data points, and `plt.legend()` adds labels to the plot for
# easy interpretation. `plt.show()` displays the plot.

# d) The result is a graphical comparison that shows how well the model performs in forecasting
# the time series. A close alignment of predicted and actual values indicates a good model performance.


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.preprocessing import MinMaxScaler

# Define the Transformer model class
class TransformerModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, n_heads, n_layers, output_dim):
        super(TransformerModel, self).__init__()
        self.encoder = nn.TransformerEncoder(
            nn.TransformerEncoderLayer(d_model=hidden_dim, nhead=n_heads),
            num_layers=n_layers
        )put_dim)

        self.fc = nn.Linear(hidden_dim, out
    def forward(self, x):
        x = self.encoder(x)
        x = x.mean(dim=1)  # Global average pooling
        x = self.fc(x)
        return x


In [None]:
# TransformerModel Class
# a) This class defines the Transformer architecture, which consists of an encoder layer followed by a fully connected (fc) layer.
# b) The encoder processes the input data, extracting relevant features, while the fully connected layer makes predictions from these features.
# c) nn.TransformerEncoderLayer: This is the building block of the transformer encoder. It takes in input and outputs transformed feature vectors.
# d) The model ends with a Linear layer, which maps the encoded features to the output dimension (predictions).


In [None]:
# Step 1: Set up the training loop
def train_transformer(model, train_loader, criterion, optimizer, epochs=10):
    model.train()  # Set the model to training mode
    for epoch in range(epochs):
        epoch_loss = 0
        for X_batch, y_batch in train_loader:
            optimizer.zero_grad()  # Clear previous gradients
            output = model(X_batch)  # Forward pass
            loss = criterion(output, y_batch)  # Calculate loss
            loss.backward()  # Backpropagation
            optimizer.step()  # Update parameters
            epoch_loss += loss.item()  # Accumulate loss
        print(f"Epoch [{epoch+1}/{epochs}], Loss: {epoch_loss/len(train_loader)}")


In [None]:
# Training Loop for Transformer
# a) We define a function that will train the transformer model for a specified number of epochs, using mini-batches of data from the train_loader.
# b) The function processes the data in batches, computes the forward pass, calculates the loss, performs backpropagation, and updates the model parameters.
# c) We use the criterion (loss function) to measure how well the model's predictions match the actual values. The optimizer is responsible for updating the model's parameters based on the loss.
# d) By running this, we train the transformer model, iteratively improving its performance.


In [None]:
from sklearn.metrics import mean_squared_error
import math

# Evaluation function for testing the model
def evaluate_model(model, test_loader):
    model.eval()  # Set the model to evaluation mode
    all_predictions = []
    all_labels = []
    with torch.no_grad():  # No gradient calculation needed during evaluation
        for X_batch, y_batch in test_loader:
            predictions = model(X_batch)
            all_predictions.extend(predictions.numpy())  # Collect predictions
            all_labels.extend(y_batch.numpy())  # Collect actual values
    mse = mean_squared_error(all_labels, all_predictions)  # Calculate MSE
    rmse = math.sqrt(mse)  # Calculate RMSE
    return mse, rmse


In [None]:
# Evaluation Function
# a) This function evaluates the performance of the model using test data, by calculating MSE and RMSE.
# b) We compute predictions on the test data and compare them to the actual values using mean_squared_error.
# c) The MSE measures the average squared difference between the predicted and actual values. RMSE is just the square root of MSE.
# d) By running this, we get a numerical evaluation of the model's performance on unseen data.


In [None]:
import matplotlib.pyplot as plt

# Visualize predictions vs actual values
def plot_predictions(actual, predicted):
    plt.figure(figsize=(10, 6))
    plt.plot(actual, label='Actual', color='blue')
    plt.plot(predicted, label='Predicted', color='red')
    plt.xlabel('Time')
    plt.ylabel('Values')
    plt.title('Predicted vs Actual')
    plt.legend()
    plt.show()


In [None]:
# Set up the data loader for training and testing (make sure your data is in a proper format)
train_loader = torch.utils.data.DataLoader(data_filled, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(data_filled, batch_size=64, shuffle=False)

# Define the model
model = TransformerModel(input_dim=29, hidden_dim=64, n_heads=4, n_layers=2, output_dim=1)

# Define the loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train the model
train_transformer(model, train_loader, criterion, optimizer, epochs=10)

# Evaluate the model
mse, rmse = evaluate_model(model, test_loader)

# Visualize the results
plot_predictions(all_labels, all_predictions)

# Print evaluation results
print(f'Mean Squared Error (MSE): {mse}')
print(f'Root Mean Squared Error (RMSE): {rmse}')


In [None]:
# Model Evaluation Results
# a) By training the model and evaluating it on the test set, we compute MSE and RMSE, which provide insight into the model's performance.
# b) These metrics help us quantify how well our transformer model is predicting the target values.
# c) MSE penalizes large errors more heavily, while RMSE gives a more interpretable value in the same units as the original data.
# d) The evaluation results guide us in understanding whether the model is suitable for our forecasting task.


In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Initialize lists to store loss values for ERM, Exponential, and Constant
erm_losses = []
exponential_losses = []
constant_losses = []

# Simulate loss values over epochs (replace with actual loss values from your training loop)
# For illustration purposes, we'll generate some example loss values that might represent each type of loss function.
# These should be replaced with actual loss calculations from your training loop.

num_epochs = 100
for epoch in range(num_epochs):
    # Example values for ERM, Exponential, and Constant losses
    erm_loss = np.random.uniform(0.1, 0.3) + 0.05 * np.sin(epoch / 10)  # Simulated ERM loss
    exponential_loss = np.random.uniform(0.1, 0.3) + 0.07 * np.exp(-epoch / 15)  # Simulated Exponential loss
    constant_loss = 0.2 + 0.03 * epoch  # Simulated Constant loss

    # Append to lists
    erm_losses.append(erm_loss)
    exponential_losses.append(exponential_loss)
    constant_losses.append(constant_loss)

# Plotting the losses
plt.figure(figsize=(12, 6))

# ERM Loss
plt.subplot(1, 3, 1)
plt.plot(range(num_epochs), erm_losses, color='blue', label='ERM Loss')
plt.title('ERM Loss Over Epochs')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

# Exponential Loss
plt.subplot(1, 3, 2)
plt.plot(range(num_epochs), exponential_losses, color='red', label='Exponential Loss')
plt.title('Exponential Loss Over Epochs')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

# Constant Loss
plt.subplot(1, 3, 3)
plt.plot(range(num_epochs), constant_losses, color='green', label='Constant Loss')
plt.title('Constant Loss Over Epochs')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

# Show the plots
plt.tight_layout()
plt.show()


In [None]:
# (a) We are importing the libraries "matplotlib.pyplot" and "numpy" for visualizations and numerical operations.
# (b) These libraries allow us to create plots and handle the numerical computations necessary to simulate or analyze the loss functions over epochs.
# (c) "matplotlib.pyplot" is used for creating visual plots, while "numpy" is used to generate arrays and perform mathematical operations.
# (d) With these libraries, we can plot the behavior of loss functions (ERM, Exponential, and Constant) over time and handle the data calculations required for the loss curves.
# (a) We are simulating the loss values for the ERM, Exponential, and Constant loss functions over 100 epochs.
# (b) These simulated values are placeholders, and during actual training, these loss values would come from the loss function applied to model predictions and targets.
# (c) The "erm_losses", "exponential_losses", and "constant_losses" lists will store the loss values for each function over time.
# (d) By simulating these loss values, we can visualize the performance of different loss functions (ERM, Exponential, Constant) over epochs and assess their trends.
# (a) We are plotting the loss values for ERM, Exponential, and Constant loss functions in separate subplots for easy comparison.
# (b) By using subplots, we allow a side-by-side view of each loss function's evolution over time, enabling a more intuitive comparison.
# (c) "plt.subplot()" creates a subplot layout, while "plt.plot()" plots the loss values for each function. "plt.title()" adds titles, "plt.xlabel()" and "plt.ylabel()" label the axes, and "plt.legend()" adds a legend to identify each loss curve.
# (d) This step will generate a visualization where you can observe how each loss function behaves over time (in terms of convergence or fluctuation).


In [None]:
import matplotlib.pyplot as plt

#  Use distinct colors and labels for clarity
colors = {
    "ERM": "deepskyblue",        # Cool blue
    "Exponential": "darkorange", # Vibrant orange
    "Constant": "forestgreen"    # Earthy green
}

#  Create the plot
plt.figure(figsize=(12, 6))

#  Plot each loss trend
plt.plot(erm_losses, label='ERM Loss', color=colors["ERM"], linewidth=2)
plt.plot(exp_losses, label='Exponential Loss', color=colors["Exponential"], linestyle='--', linewidth=2)
plt.plot(const_losses, label='Constant Loss', color=colors["Constant"], linestyle='-.', linewidth=2)

#  Beautify the plot
plt.title('Autoformer: Training Loss Comparison Across Loss Shaping Strategies', fontsize=16)
plt.xlabel('Epochs', fontsize=12)
plt.ylabel('Loss Value (MSE + Regularization)', fontsize=12)
plt.legend(fontsize=12)
plt.grid(True, linestyle=':', alpha=0.7)

#  Show the plot
plt.show()


In [None]:
# Setting random seeds for reproducibility

import random
import numpy as np
import torch

random.seed(42)
np.random.seed(42)
torch.manual_seed(42)
torch.cuda.manual_seed_all(42)  # If using GPU
