**Question:**

Using Torch and PyTorch, prepare a single Jupyter Notebook which shows the performance of different optimizers on different neural networks for regression and classification. In your Jupyter Notebook, you may use different number of neural networks with different topology (different number of layers and neurons , different activation functions, different loss functions, different optimizers, different training modes (batch, mini-batch, stochastic).

**In this notebook, I will be using Torch and Pytorch to show the performance of different optimizers on different neural networks for regression and classification.**

Before going directly into the code, lets understand what are the terms optimizer, regression and classification.

**Optimizer:** In machine learning, optimization algorithms or optimizers are used to find the best values of the parameters of a model in order to minimize the loss function. Optimizers help in adjusting the weights and biases of a neural network during the training process, such that the network can learn to predict the target variable accurately.

**Classification:** Classification is a type of machine learning task where the goal is to predict which category or class a given input belongs to. In classification tasks, the input is usually a set of features and the output is a discrete label. For example, a classification algorithm may be used to classify emails as spam or not spam, or to classify images of animals into different species.

**Regression:** Regression is a type of machine learning task where the goal is to predict a continuous numerical value based on one or more input features. Regression algorithms are used for tasks such as predicting the price of a house based on its features, or predicting the age of a person based on their demographic information.


**I will use MNIST dataset for classification and California Housing dataset for regression.**


# Classification

First of all, we need to import the necessary libraries.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, TensorDataset, random_split
from sklearn.datasets import make_classification, make_regression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

In [None]:
# Seting device to GPU(CUDA) if available, otherwise use CPU.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [None]:
print(device.type)

cpu


In [None]:
# Creating the transform attribute.
T = transforms.Compose([ # Combining multiple image transformations
    transforms.ToTensor(), # Converting images to tensors.
    transforms.Normalize((0.5,), (0.5,)) # Normalize the tensors.
])

In [None]:
#Uploading the MNIST dataset and splitting the dataset into train, test and validation sets.
train_set = torchvision.datasets.MNIST('./data', train=True, transform=T, download=True)
validation_set = torchvision.datasets.MNIST('./data', train=False, transform=T, download=True)
test_set = torchvision.datasets.MNIST('./data', train=False, transform=T, download=True)

In [None]:
# Creating data loaders.
train_loader = torch.utils.data.DataLoader(train_set, batch_size=64, shuffle=True)
validation_loader = torch.utils.data.DataLoader(validation_set, batch_size=64, shuffle=False)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=64, shuffle=False)

In [None]:
# Creating different models to measure the performance of the optimizers.

# This model is a very simple model that only has one linear layer and a softmax activation function.
class MNISTModel1(nn.Module):
    def __init__(self):
        super(MNISTModel1, self).__init__()
        self.model = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28*28, 10),
            nn.Softmax(dim=1))

    def forward(self, self_updated):
        return self.model(self_updated)

# This model is a slightly more complex, it has two linear layers with a sigmoid activation function between the layers.
# The output is passed through softmax activation function.
class MNISTModel2(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc2 = nn.Linear(16, 10)
        self.fc1 = nn.Linear(28*28, 16)

    def forward(self, x):
        x = x.reshape(-1, 28*28)
        x = self.fc1(x)
        x = torch.sigmoid(x)
        out = self.fc2(x)
        return torch.softmax(out, dim=1)

# This model is similar to previous model. It has additional two linear layers with ReLU activation function between the layers.
# # The output is passed through softmax activation function.
class MNISTModel3(nn.Module):
    def __init__(self):
        super(MNISTModel3, self).__init__()
        self.fc1 = nn.Linear(28*28, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)
        self.relu = nn.ReLU()
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):
        x = x.view(-1, 28*28)
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.softmax(self.fc3(x))
        return x

# Creating a list of models.
models = [MNISTModel1(),
    MNISTModel2(),
    MNISTModel3()]
# My goal was to use the models like this, but like I explained further in the notebook, I noticed it did not work correctly.
# So I ended up running them seperately.

In [None]:
# Creating a list of optimizers.
optimizers = [[torch.optim.SGD(models[i].parameters(), lr=0.1, momentum=0.9),
        torch.optim.Adam(models[i].parameters(), lr=0.01),
        torch.optim.RMSprop(models[i].parameters(), lr=0.01, alpha=0.9)
    ]
    for i in range(len(models))] #This for loop iterates over a range of values from 0 to the length of the models list minus 1, assigning each value to the variable i.

In [None]:
# Creating a list of loss functions to use during training.
loss_functions = [nn.CrossEntropyLoss(), nn.MultiMarginLoss()]

In [None]:
# Creating a list with different batch sizes. 1, 64 and len(X_train) are for stochastic, mini-batch and batch modes, respectively.
batch_sizes = [1, 64, len(train_loader)]

In [None]:
# Training the model

# To train our model, we are going to create a nested loop.
# We will try different combinations of optimizers, loss functions and batch modes on each different model we have created.

# We will use this part to find the best combination at the end of our training.
best_accuracy = 0.0
best_model = None
best_optimizer = None
best_loss_fn = None
best_batch_size = None

# Setting the number of epochs.
epochs_number = 2      # I used 2 because it took so much time to run and the system stopped :((

# Creating a nested loop to try different combinations.
for model in models:
  model_name = model.__class__.__name__
  epoch_loss = 0
  epoch_acc = 0

  # Creating empty lists.
  train_losses, train_accuracies, val_losses, val_accuracies = [], [], [], []

  for optimizer in optimizers[0]: # I used optimizers[0] because we have two elements in optimizers list. We will use the first one to get the optimizers.
      optimizer_name = optimizer.__class__.__name__

      for loss_fn in loss_functions:
        lossfn_name = loss_fn.__class__.__name__

        for batch_size in batch_sizes:

          # Printing the combination before every combination.
          print("Training on", model_name, "using", optimizer_name, "as the optimizer and", lossfn_name, "as the loss function.", "Batch mode:", batch_size)

          # Creating a loop to go over the training "epochs" times we set.
          for e in range(epochs_number):
              # Initializing the running loss to zero for the current epoch.
              running_loss = 0.0
              # Initializing the running number of correct predictions to zero for the current epoch.
              running_corrects = 0
              # Setting the model to train mode which turns on the gradient computation and updates the model's parameters during backpropagation.
              model.train()

              # Looping over the training set and loading a batch of input images and their corresponding target labels.
              for inputs, targets in train_loader:
                  # This clears the gradients of all optimized variables to zero before the forward pass.
                  optimizer.zero_grad()
                  # This computes the forward pass of the model on the input batch.
                  outputs = model(inputs)
                  #  This computes the loss between the model's predictions and the target labels.
                  loss = loss_fn(outputs, targets)
                  # This computes the gradients of the loss with respect to the model parameters.
                  loss.backward()
                  # This updates the model parameters by taking a step in the direction of the negative gradient using the optimizer.
                  optimizer.step()

                  # This gets the index of the maximum log-probability predicted by the model for each input image in the batch.
                  _, preds = torch.max(outputs, 1)
                  # This adds the product of the loss and the batch size to the running loss for the current epoch.
                  running_loss += loss.item() * inputs.size(0)
                  #  This adds the number of correct predictions to the running number of correct predictions for the current epoch.
                  running_corrects += torch.sum(preds == targets.data)

              # This calculates the average loss for the current epoch.
              epoch_loss = running_loss / len(train_loader.dataset)
              # This calculates the accuracy for the current epoch.
              epoch_acc = running_corrects.double() / len(train_loader.dataset)
              # This appends the average loss for the current epoch to a list of training losses for all epochs.
              train_losses.append(epoch_loss)
              # This appends the accuracy for the current epoch to a list of training accuracies for all epochs.
              train_accuracies.append(epoch_acc)

              # This initializes the running loss to zero for the current epoch.
              running_loss = 0.0
              # This initializes the running number of correct predictions to zero for the current epoch.
              running_corrects = 0
              # This sets the model to evaluation mode which turns off the gradient computation and freezes the model's parameters.
              model.eval()
              # This temporarily disables gradient computation to speed up model inference and conserve memory.
              with torch.no_grad():
                  # This loops over the validation set and loads a batch of input images and their corresponding target labels.
                  for inputs, targets in validation_loader:
                      # This computes the forward pass of the model on the input batch.
                      outputs = model(inputs)
                      # This computes the loss between the model's predictions and the target labels.
                      loss = loss_fn(outputs, targets)

                      # This gets the index of the maximum log-probability predicted by the model for each input image in the batch.
                      _, preds = torch.max(outputs, 1)
                      # This adds the product of the loss and the batch size to the running loss for the current epoch.
                      running_loss += loss.item() * inputs.size(0)
                      # This adds the number of correct predictions to the running number of correct predictions.
                      running_corrects += torch.sum(preds == targets.data)

                  # Compute the average loss per image in the validation set for the current epoch.
                  epoch_loss = running_loss / len(validation_loader.dataset)
                  # Compute the accuracy for the current epoch by dividing the number of correctly classified images by the total number of images in the validation set.
                  epoch_acc = running_corrects.double() / len(validation_loader.dataset)
                  # Append the current epoch's loss to the list of losses for all epochs.
                  val_losses.append(epoch_loss)
                  # Append the current epoch's accuracy to the list of accuracies for all epochs.
                  val_accuracies.append(epoch_acc)
              # Print the current epoch number, training loss, training accuracy, validation loss, and validation accuracy in a formatted string.
              print(f"Epoch {e+1}/{epochs_number}, Validation Loss: {epoch_loss:.4f}, Validation Accuracy: {epoch_acc:.4f}")

          # Checking if the current model's accuracy(last line of val_accuricies list) is better than the previous best accuracy
          if val_accuracies[-1] > best_accuracy:
              best_accuracy = val_accuracies[-1]
              best_model = model_name
              best_optimizer = optimizer_name
              best_loss_fn = lossfn_name
              best_batch_size = batch_size

Training on MNISTModel1 using SGD as the optimizer and CrossEntropyLoss as the loss function. Batch mode: 1
Epoch 1/2, Validation Loss: 1.8055, Validation Accuracy: 0.6547
Epoch 2/2, Validation Loss: 1.8054, Validation Accuracy: 0.6538
Training on MNISTModel1 using SGD as the optimizer and CrossEntropyLoss as the loss function. Batch mode: 64
Epoch 1/2, Validation Loss: 1.7898, Validation Accuracy: 0.6695
Epoch 2/2, Validation Loss: 1.7883, Validation Accuracy: 0.6714
Training on MNISTModel1 using SGD as the optimizer and CrossEntropyLoss as the loss function. Batch mode: 938
Epoch 1/2, Validation Loss: 1.7883, Validation Accuracy: 0.6715
Epoch 2/2, Validation Loss: 1.7900, Validation Accuracy: 0.6701
Training on MNISTModel1 using SGD as the optimizer and MultiMarginLoss as the loss function. Batch mode: 1
Epoch 1/2, Validation Loss: 0.3306, Validation Accuracy: 0.6715
Epoch 2/2, Validation Loss: 0.3307, Validation Accuracy: 0.6714
Training on MNISTModel1 using SGD as the optimizer and

In [None]:
print(f"The best combination is: Model: {best_model}, Optimizer: {best_optimizer}, Loss Function: {best_loss_fn}, Batch Size: {best_batch_size}, with Validation Accuracy: {best_accuracy:.4f}")

The best combination is: Model: MNISTModel1, Optimizer: SGD, Loss Function: MultiMarginLoss, Batch Size: 938, with Validation Accuracy: 0.6755


When I run the training model for all three models together, I noticed a bug in my code. The code worked without a problem for the first model, but when it moved to other model, the accuracy dropped around to 0.05-0.15.

That is why I run the code for each model seperately and added the results belove.

**Output for second model:**

Training on MNISTModel2 using SGD as the optimizer and CrossEntropyLoss as the loss function. Batch mode: 1

Epoch 1/2, Validation Loss: 1.6620, Validation Accuracy: 0.8356

Epoch 2/2, Validation Loss: 1.6238, Validation Accuracy: 0.8482

Training on MNISTModel2 using SGD as the optimizer and CrossEntropyLoss as the loss function. Batch mode: 64

Epoch 1/2, Validation Loss: 1.5540, Validation Accuracy: 0.9218

Epoch 2/2, Validation Loss: 1.5529, Validation Accuracy: 0.9198

Training on MNISTModel2 using SGD as the optimizer and CrossEntropyLoss as the loss function. Batch mode: 938

Epoch 1/2, Validation Loss: 1.5469, Validation Accuracy: 0.9243

Epoch 2/2, Validation Loss: 1.5443, Validation Accuracy: 0.9259

Training on MNISTModel2 using SGD as the optimizer and MultiMarginLoss as the loss function. Batch mode: 1

Epoch 1/2, Validation Loss: 0.0826, Validation Accuracy: 0.9263

Epoch 2/2, Validation Loss: 0.0800, Validation Accuracy: 0.9282

Training on MNISTModel2 using SGD as the optimizer and MultiMarginLoss as the loss function. Batch mode: 64

Epoch 1/2, Validation Loss: 0.0857, Validation Accuracy: 0.9235

Epoch 2/2, Validation Loss: 0.0759, Validation Accuracy: 0.9343

Training on MNISTModel2 using SGD as the optimizer and MultiMarginLoss as the loss function. Batch mode: 938

Epoch 1/2, Validation Loss: 0.0829, Validation Accuracy: 0.9253

Epoch 2/2, Validation Loss: 0.0735, Validation Accuracy: 0.9329

Training on MNISTModel2 using Adam as the optimizer and CrossEntropyLoss as the loss function. Batch mode: 1

Epoch 1/2, Validation Loss: 1.5732, Validation Accuracy: 0.8929

Epoch 2/2, Validation Loss: 1.5516, Validation Accuracy: 0.9111

Training on MNISTModel2 using Adam as the optimizer and CrossEntropyLoss as the loss function. Batch mode: 64

Epoch 1/2, Validation Loss: 1.5633, Validation Accuracy: 0.9004

Epoch 2/2, Validation Loss: 1.5625, Validation Accuracy: 0.9045

Training on MNISTModel2 using Adam as the optimizer and CrossEntropyLoss as the loss function. Batch mode: 938

Epoch 1/2, Validation Loss: 1.5594, Validation Accuracy: 0.9043

Epoch 2/2, Validation Loss: 1.5526, Validation Accuracy: 0.9102

Training on MNISTModel2 using Adam as the optimizer and MultiMarginLoss as the loss function. Batch mode: 1

Epoch 1/2, Validation Loss: 0.1070, Validation Accuracy: 0.8979

Epoch 2/2, Validation Loss: 0.0891, Validation Accuracy: 0.9144

Training on MNISTModel2 using Adam as the optimizer and MultiMarginLoss as the loss function. Batch mode: 64

Epoch 1/2, Validation Loss: 0.1078, Validation Accuracy: 0.8988

Epoch 2/2, Validation Loss: 0.1089, Validation Accuracy: 0.8956

Training on MNISTModel2 using Adam as the optimizer and MultiMarginLoss as the loss function. Batch mode: 938

Epoch 1/2, Validation Loss: 0.1143, Validation Accuracy: 0.8894

Epoch 2/2, Validation Loss: 0.1053, Validation Accuracy: 0.8975

Training on MNISTModel2 using RMSprop as the optimizer and CrossEntropyLoss as the loss function. Batch mode: 1

Epoch 1/2, Validation Loss: 1.5494, Validation Accuracy: 0.9136

Epoch 2/2, Validation Loss: 1.5513, Validation Accuracy: 0.9113

Training on MNISTModel2 using RMSprop as the optimizer and CrossEntropyLoss as the loss function. Batch mode: 64

Epoch 1/2, Validation Loss: 1.5461, Validation Accuracy: 0.9151

Epoch 2/2, Validation Loss: 1.5399, Validation Accuracy: 0.9221

Training on MNISTModel2 using RMSprop as the optimizer and CrossEntropyLoss as the loss function. Batch mode: 938

Epoch 1/2, Validation Loss: 1.5407, Validation Accuracy: 0.9220

Epoch 2/2, Validation Loss: 1.5371, Validation Accuracy: 0.9247

Training on MNISTModel2 using RMSprop as the optimizer and MultiMarginLoss as the loss function. Batch mode: 1

Epoch 1/2, Validation Loss: 0.0830, Validation Accuracy: 0.9185

Epoch 2/2, Validation Loss: 0.0745, Validation Accuracy: 0.9265

Training on MNISTModel2 using RMSprop as the optimizer and MultiMarginLoss as the loss function. Batch mode: 64

Epoch 1/2, Validation Loss: 0.0784, Validation Accuracy: 0.9226

Epoch 2/2, Validation Loss: 0.0776, Validation Accuracy: 0.9236

Training on MNISTModel2 using RMSprop as the optimizer and MultiMarginLoss as the loss function. Batch mode: 938

Epoch 1/2, Validation Loss: 0.0781, Validation Accuracy: 0.9229

Epoch 2/2, Validation Loss: 0.0742, Validation Accuracy: 0.9271


**The best combination is: Model: MNISTModel2, Optimizer: SGD, Loss Function: MultiMarginLoss, Batch Size: 64, with Validation Accuracy: 0.9343**


**Output for third model:**

Training on MNISTModel3 using SGD as the optimizer and CrossEntropyLoss as the loss function. Batch mode: 1

Epoch 1/2, Validation Loss: 1.7895, Validation Accuracy: 0.6700

Epoch 2/2, Validation Loss: 1.7852, Validation Accuracy: 0.6748

Training on MNISTModel3 using SGD as the optimizer and CrossEntropyLoss as the loss function. Batch mode: 64

Epoch 1/2, Validation Loss: 1.7869, Validation Accuracy: 0.6737

Epoch 2/2, Validation Loss: 1.7812, Validation Accuracy: 0.6792

Training on MNISTModel3 using SGD as the optimizer and CrossEntropyLoss as the loss function. Batch mode: 938

Epoch 1/2, Validation Loss: 1.8000, Validation Accuracy: 0.6606

Epoch 2/2, Validation Loss: 1.8317, Validation Accuracy: 0.6291

Training on MNISTModel3 using SGD as the optimizer and MultiMarginLoss as the loss function. Batch mode: 1

Epoch 1/2, Validation Loss: 0.3725, Validation Accuracy: 0.6274

Epoch 2/2, Validation Loss: 0.3226, Validation Accuracy: 0.6773

Training on MNISTModel3 using SGD as the optimizer and MultiMarginLoss as the loss function. Batch mode: 64

Epoch 1/2, Validation Loss: 0.3470, Validation Accuracy: 0.6530

Epoch 2/2, Validation Loss: 0.3748, Validation Accuracy: 0.6251

Training on MNISTModel3 using SGD as the optimizer and MultiMarginLoss as the loss function. Batch mode: 938

Epoch 1/2, Validation Loss: 0.3412, Validation Accuracy: 0.6589

Epoch 2/2, Validation Loss: 0.3861, Validation Accuracy: 0.6139

Training on MNISTModel3 using Adam as the optimizer and CrossEntropyLoss as the loss function. Batch mode: 1

Epoch 1/2, Validation Loss: 1.6982, Validation Accuracy: 0.7622

Epoch 2/2, Validation Loss: 1.6962, Validation Accuracy: 0.7647

Training on MNISTModel3 using Adam as the optimizer and CrossEntropyLoss as the loss function. Batch mode: 64

Epoch 1/2, Validation Loss: 1.7651, Validation Accuracy: 0.6961

Epoch 2/2, Validation Loss: 1.6898, Validation Accuracy: 0.7713

Training on MNISTModel3 using Adam as the optimizer and CrossEntropyLoss as the loss function. Batch mode: 938

Epoch 1/2, Validation Loss: 1.7111, Validation Accuracy: 0.7497

Epoch 2/2, Validation Loss: 1.6244, Validation Accuracy: 0.8366

Training on MNISTModel3 using Adam as the optimizer and MultiMarginLoss as the loss function. Batch mode: 1

Epoch 1/2, Validation Loss: 0.1906, Validation Accuracy: 0.8097

Epoch 2/2, Validation Loss: 0.1456, Validation Accuracy: 0.8543

Training on MNISTModel3 using Adam as the optimizer and MultiMarginLoss as the loss function. Batch mode: 64

Epoch 1/2, Validation Loss: 0.1632, Validation Accuracy: 0.8369

Epoch 2/2, Validation Loss: 0.1610, Validation Accuracy: 0.8391

Training on MNISTModel3 using Adam as the optimizer and MultiMarginLoss as the loss function. Batch mode: 938

Epoch 1/2, Validation Loss: 0.2258, Validation Accuracy: 0.7743

Epoch 2/2, Validation Loss: 0.1929, Validation Accuracy: 0.8070

Training on MNISTModel3 using RMSprop as the optimizer and CrossEntropyLoss as the loss function. Batch mode: 1

Epoch 1/2, Validation Loss: 2.3632, Validation Accuracy: 0.0980

Epoch 2/2, Validation Loss: 2.3632, Validation Accuracy: 0.0980

Training on MNISTModel3 using RMSprop as the optimizer and CrossEntropyLoss as the loss function. Batch mode: 64

Epoch 1/2, Validation Loss: 2.3632, Validation Accuracy: 0.0980

Epoch 2/2, Validation Loss: 2.3632, Validation Accuracy: 0.0980

Training on MNISTModel3 using RMSprop as the optimizer and CrossEntropyLoss as the loss function. Batch mode: 938

Epoch 1/2, Validation Loss: 2.3632, Validation Accuracy: 0.0980

Epoch 2/2, Validation Loss: 2.3632, Validation Accuracy: 0.0980

Training on MNISTModel3 using RMSprop as the optimizer and MultiMarginLoss as the loss function. Batch mode: 1

Epoch 1/2, Validation Loss: 0.9020, Validation Accuracy: 0.0980

Epoch 2/2, Validation Loss: 0.9020, Validation Accuracy: 0.0980

Training on MNISTModel3 using RMSprop as the optimizer and MultiMarginLoss as the loss function. Batch mode: 64

Epoch 1/2, Validation Loss: 0.9020, Validation Accuracy: 0.0980

Epoch 2/2, Validation Loss: 0.9020, Validation Accuracy: 0.0980

Training on MNISTModel3 using RMSprop as the optimizer and MultiMarginLoss as the loss function. Batch mode: 938

Epoch 1/2, Validation Loss: 0.9020, Validation Accuracy: 0.0980

Epoch 2/2, Validation Loss: 0.9020, Validation Accuracy: 0.0980

**The best combination is: Model: MNISTModel3, Optimizer: Adam, Loss Function: MultiMarginLoss, Batch Size: 1, with Validation Accuracy: 0.8543**

Overall, we can say that the best combination was:

**Model: MNISTModel2, Optimizer: SGD, Loss Function: MultiMarginLoss, Batch Size: 64, with Validation Accuracy: 0.9343**

# Regression

For regression I will use "California Housing Dataset".
This dataset contains information about the median house values for various districts in California.

The goal of this dataset is to use the available features to predict the median house value in the corresponding district.

In [None]:
# Loading the California Housing dataset using sklearn
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

In [None]:
# Fetching the California Housing dataset
dataset = fetch_california_housing()

# Splitting the dataset into training and testing sets using train_test_split
X_train, X_test, y_train, y_test = train_test_split(dataset.data, dataset.target, test_size=0.2, random_state=42)

# Standardizing the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Convert the data to PyTorch tensors
X_train = torch.from_numpy(X_train).float()
y_train = torch.from_numpy(y_train).float().view(-1, 1)
X_test = torch.from_numpy(X_test).float()
y_test = torch.from_numpy(y_test).float().view(-1, 1)

In [None]:
# Importing the torch.nn.functional that we are going to use while creating our models.
import torch.nn.functional as F

In [None]:
# Defining the different models

#First model has three layers with 16, 8, and output_dim number of neurons in each layer, respectively.
#It uses ReLU activation function for the first two layers and sigmoid activation function for the last layer.
class RegressionModel1(nn.Module):
    def __init__(self, input_dim, output_dim):
        super().__init__()
        self.layer1 = nn.Linear(input_dim, 16)
        self.layer2 = nn.Linear(16, 8)
        self.layer3 = nn.Linear(8, output_dim)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = F.relu(self.layer1(x))
        x = F.relu(self.layer2(x))
        x = self.sigmoid(self.layer3(x))
        return x

#The second model is similar to the first model, but with more neurons in each layer.
#It has three layers with 32, 16, and output_dim number of neurons in each layer, respectively.
#It also uses ReLU activation function for the first two layers and sigmoid activation function for the last layer.
class RegressionModel2(nn.Module):
    def __init__(self, input_dim, output_dim):
        super().__init__()
        self.layer1 = nn.Linear(input_dim, 32)
        self.layer2 = nn.Linear(32, 16)
        self.layer3 = nn.Linear(16, output_dim)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = F.relu(self.layer1(x))
        x = F.relu(self.layer2(x))
        x = self.sigmoid(self.layer3(x))
        return x
#The third model is the most complex of the three. This model has four layers.
#It has 64, 32, 16, and output_dim number of neurons in each layer, respectively.
#It also uses ReLU activation function for the first three layers and sigmoid activation function for the last layer.
class RegressionModel3(nn.Module):
    def __init__(self, input_dim, output_dim):
        super().__init__()
        self.layer1 = nn.Linear(input_dim, 64)
        self.layer2 = nn.Linear(64, 32)
        self.layer3 = nn.Linear(32, 16)
        self.layer4 = nn.Linear(16, output_dim)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = F.relu(self.layer1(x))
        x = F.relu(self.layer2(x))
        x = F.relu(self.layer3(x))
        x = self.sigmoid(self.layer4(x))
        return x


In [None]:
# Defining the dimensions of the input and output data.
input_dim = X_train.shape[1]
output_dim = y_train.shape[1]

#I created a list named "models" to use three models with different optimizers and loss functions during model training.
models = [
    RegressionModel1(input_dim, output_dim),
    RegressionModel2(input_dim, output_dim),
    RegressionModel3(input_dim, output_dim)]

In [None]:
# Creating a list of suitable loss functions to use during training.
loss_functions = [nn.MSELoss(), nn.L1Loss(), nn.SmoothL1Loss()]

In [None]:
# Creating a list of optimizers.
optimizers = [[torch.optim.SGD(models[i].parameters(), lr=0.1, momentum=0.9),
        torch.optim.Adam(models[i].parameters(), lr=0.01),
        torch.optim.RMSprop(models[i].parameters(), lr=0.01, alpha=0.9),
        torch.optim.Adagrad(models[i].parameters(), lr=0.1),
        torch.optim.Adadelta(models[i].parameters(), lr=0.1)
    ]
    for i in range(len(models))] #This for loop iterates over a range of values from 0 to the length of the models list minus 1, assigning each value to the variable i.

In [None]:
# Creating a list with different batch sizes. 1, 64 and len(X_train) are for stochastic, mini-batch and batch modes, respectively.
batch_sizes = [1, 64, len(X_train)]

In [None]:
# Defining the data loaders
train_dataset = TensorDataset(X_train, y_train)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

# Training the model

# To train our model, we are going to create a nested loop.
# We will try different combinations of optimizers, loss functions and batch modes on each different model we have created.

# We will use this part to find the best combination at the end of our training.
best_loss = float('inf')
best_model = None
best_optimizer = None
best_loss_fn = None
best_batch_size = None

# Setting the number of epochs.
epochs = 10

# Creating a nested loop to try different combinations.
for model in models:
    model_name = model.__class__.__name__
    for optimizer in optimizers[0]:
        optimizer_name = optimizer.__class__.__name__
        for loss_fn in loss_functions:
            lossfn_name = loss_fn.__class__.__name__
            for batch_size in batch_sizes:
                #batchsize_name = batch_size.__class__.__name__
                # Printing the combination before every combination.
                print("Training on", model_name, "using", optimizer_name, "as the optimizer and", lossfn_name, "as the loss function.", "Batch mode:", batch_size)

                # Creating a loop to go over the training "epochs" times we set.
                for epoch in range(epochs):
                    # Training our model
                    for i, (inputs, labels) in enumerate(train_loader):
                        # Creating the gradients of all optimized tensors before computing the gradients for the current batch.
                        optimizer.zero_grad()
                        # Passing the input data through the model and generating predictions for the current batch.
                        outputs = model(inputs)
                        # Computing the loss between the predicted outputs and the actual output labels for the current batch, using the specified loss function.
                        loss = loss_fn(outputs, labels)
                        # Computing the gradients of the loss with respect to the parameters of the model.
                        loss.backward()
                        # Updating the parameters of the model using the computed gradients, based on the optimizer's update rule.
                        optimizer.step()

                    # Printing the training loss for every 10 epochs.
                    if epoch % 10 == 9:
                        print(f"Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}")

                # Evaluating the model on the test set.
                # The torch.no_grad() block is used to prevent the computation of gradients during the evaluation, as they are not needed and would slow down the computation.
                with torch.no_grad():
                    outputs = model(X_test)
                    test_loss = loss_fn(outputs, y_test)
                    print(f"Test Loss: {test_loss.item():.4f}")

                # Checking if this is the best model so far
                if test_loss.item() < best_loss:
                    best_loss = test_loss.item()
                    best_model = model_name
                    best_optimizer = optimizer_name
                    best_loss_fn = lossfn_name
                    best_batch_size = batch_size

Training on RegressionModel1 using SGD as the optimizer and MSELoss as the loss function. Batch mode: 1
Epoch [10/10], Loss: 1.8501
Test Loss: 2.4234
Training on RegressionModel1 using SGD as the optimizer and MSELoss as the loss function. Batch mode: 64
Epoch [10/10], Loss: 2.5862
Test Loss: 2.4234
Training on RegressionModel1 using SGD as the optimizer and MSELoss as the loss function. Batch mode: 16512
Epoch [10/10], Loss: 2.2678
Test Loss: 2.4234
Training on RegressionModel1 using SGD as the optimizer and L1Loss as the loss function. Batch mode: 1
Epoch [10/10], Loss: 1.0683
Test Loss: 1.1408
Training on RegressionModel1 using SGD as the optimizer and L1Loss as the loss function. Batch mode: 64
Epoch [10/10], Loss: 1.4226
Test Loss: 1.1408
Training on RegressionModel1 using SGD as the optimizer and L1Loss as the loss function. Batch mode: 16512
Epoch [10/10], Loss: 1.2944
Test Loss: 1.1408
Training on RegressionModel1 using SGD as the optimizer and SmoothL1Loss as the loss function

In [None]:
# Printing the best combination
print("Best combination:")
print(f"Model: {best_model}")
print(f"Optimizer: {best_optimizer}")
print(f"Loss function: {best_loss_fn}")
print(f"Batch size: {best_batch_size}")
print(f"Test loss: {best_loss:.4f}")

Best combination:
Model: RegressionModel1
Optimizer: Adam
Loss function: SmoothL1Loss
Batch size: 1
Test loss: 0.7584
