<a href="https://colab.research.google.com/github/Priyanka-Police-Reddy-Gari/TensorsAutoDiffRegression/blob/main/Auto_Diff_regression_priyanka.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <font color = 'indianred'>**Task 1 - Autodiff**

In [None]:
import torch
import torch.nn as nn

##  <font color = 'indianred'>**Normalize Function**<font>

Wrote the function that normalizes the columns of a matrix. To compute the mean and standard deviation of each column. Then for each element of the column,  subtract the mean and divide by the standard deviation.

In [None]:
# Given Data
x = [[ 3,  60,  100, -100],
     [ 2,  20,  600, -600],
     [-5,  50,  900, -900]]

In [None]:
# Convert to PyTorch Tensor and set to float
X = torch.tensor(x)
X= X.float()

In [None]:
# Print shape and data type for verification
print(X.shape)
print(X.dtype)

torch.Size([3, 4])
torch.float32


In [None]:
# Compute and display the mean and standard deviation of each column for reference
X.mean(axis = 0)
X.std(axis = 0)

tensor([  4.3589,  20.8167, 404.1452, 404.1452])

In [None]:
X.std(axis = 0)

tensor([  4.3589,  20.8167, 404.1452, 404.1452])

- task starts here
- normalize_matrix function takes a PyTorch tensor x as input.
- It returns a tensor where the columns are normalized.
- After implementing function, used the code provided to verify if the mean for each column in Z is close to zero and the standard deviation is 1.

In [None]:
def normalize_matrix(x):
  # Calculated the mean along each column (think carefully , you will take mean along axis = 0 or 1)
  mean = x.mean(axis=0)
  # Calculated the standard deviation along each column
  std = x.std(axis=0)

  # Normalized each element in the columns by subtracting the mean and dividing by the standard deviation
  y = (x - mean) / std;

  return y  # Returned the normalized matrix



In [None]:
Z = normalize_matrix(X)
Z

tensor([[ 0.6882,  0.8006, -1.0722,  1.0722],
        [ 0.4588, -1.1209,  0.1650, -0.1650],
        [-1.1471,  0.3203,  0.9073, -0.9073]])

In [None]:
Z.mean(axis = 0)

tensor([ 0.0000e+00,  4.9671e-08,  3.9736e-08, -3.9736e-08])

In [None]:
Z.std(axis = 0)

tensor([1., 1., 1., 1.])

##  <font color = 'indianred'>**Calculate Gradients**

Computed Gradient using  PyTorch Autograd
## $f(x,y) = \frac{x + \exp(y)}{\log(x) + (x-y)^3}$
Computed dx and dy at x=3 and y=4

In [None]:
def fxy(x, y):
  # Calculated the numerator: Add x to the exponential of y
  num = x + torch.exp(y)

  # Calculated the denominator: Sum of the logarithm of x and cube of the difference between x and y
  den = torch.log(x) + (x - y)**3

  # Performed element-wise division of the numerator by the denominator
  return (num/den)


In [None]:
# Created a single-element tensor 'x' containing the value 3.0
# made sure to set 'requires_grad=True' as you want to compute gradients with respect to this tensor during backpropagation
x = torch.tensor(3.0, requires_grad=True)

# Created a single-element tensor 'y' containing the value 4.0
# Similar to 'x', we wanted to compute gradients for 'y' during backpropagation, hence make sure to set 'requires_grad=True'
y = torch.tensor(4.0,requires_grad=True)


In [None]:
# Called the function 'fxy' with the tensors 'x' and 'y' as arguments
# The result 'f' will also be a tensor and contained derivative information because 'x' and 'y' have 'requires_grad=True'
f = fxy(x, y)
f

tensor(584.0868, grad_fn=<DivBackward0>)

In [None]:
# Performed backpropagation to compute the gradients of 'f' with respect to 'x' and 'y'
# used backward() function on f

f.backward()


In [None]:
# Displayed the computed gradients of 'f' with respect to 'x' and 'y'
# These gradients are stored as attributes of x and y after the backward operation
# Printed the gradients for x and y
print('x.grad =', x.grad)
print('y.grad =', y.grad)



x.grad = tensor(-19733.3965)
y.grad = tensor(18322.8477)


## <font color = 'indianred'>**Numerical Precision**

Given scalars `x` and `y`, implemented the following `log_exp` function such that it returns
$$-\log\left(\frac{e^x}{e^x+e^y}\right)$$.

In [None]:

def log_exp(x, y):
    ## add your solution here and remove pass
    return -torch.log(torch.exp(x)/(torch.exp(x)+torch.exp(y)))




Tested codes with normal inputs:

In [None]:
# Created tensors x and y with initial values 2.0 and 3.0, respectively
x, y = torch.tensor([2.0]), torch.tensor([3.0])

# Evaluated the function log_exp() for the given x and y, and store the output in z
z = log_exp(x, y)

# Displayed the computed value of z
z


tensor([1.3133])

Now implemented a function to compute $\partial z/\partial x$ and $\partial z/\partial y$ with `autograd`

In [None]:
def grad(forward_func, x, y):
  # Enable gradient tracking for x and y, set reauires_grad appropraitely
  # x, y = x.requires_grad_(True), y.requires_grad_(True)
  # CODE HERE
  x.requires_grad_(True), y.requires_grad_(True)

  # Evaluate the forward function to get the output 'z'
  z = forward_func(x, y)

  # Perform the backward pass to compute gradients
  # Hint use backward() function on z
  z.backward()

  # Print the gradients for x and y
  print('x.grad =', x.grad)
  print('y.grad =', y.grad)

  # Reset the gradients for x and y to zero for the next iteration
  x.grad.zero_()
  y.grad.zero_()




Tested codes, it printed the results nicely.

In [None]:
grad(log_exp, x, y)

x.grad = tensor([-0.7311])
y.grad = tensor([0.7311])


But now let's try some "hard" inputs

In [None]:
x, y = torch.tensor([50.0]), torch.tensor([100.0])

In [None]:
# you may see nan/inf values as output, this is not an error
grad(log_exp, x, y)

x.grad = tensor([nan])
y.grad = tensor([nan])


In [None]:
# you may see nan/inf values as output, this is not an error
torch.exp(torch.tensor([100.0]))

tensor([inf])

Does the code return correct results? If not, we try to understand the reason. (Hint, evaluate `exp(100)`). Now developed a new function `stable_log_exp` that is identical to `log_exp` in math, but returns a more numerical stable result.
<br> Hint: (1) $\log\left(\frac{x}{y}\right) = log ({x}) -log({y})$
<br> Hint: (2) See logsum Trick - https://www.xarg.org/2016/06/the-log-sum-exp-trick-in-machine-learning/

In [None]:
def stable_log_exp(x, y):
    max = torch.max(x,y)
    return(-(x - (max + torch.log(torch.exp(x-max)+ torch.exp(y-max)))))



In [None]:
log_exp(x, y)

tensor([inf], grad_fn=<NegBackward0>)

In [None]:
stable_log_exp(x, y)

tensor([50.], grad_fn=<NegBackward0>)

In [None]:
grad(stable_log_exp, x, y)

x.grad = tensor([-1.])
y.grad = tensor([1.])


# <font color = 'indianred'>**Task 2 - Linear Regression using Batch Gradient Descent with PyTorch**

# <font color = 'indianred'>**Regression using Pytroch**</font>

Imagine that you're trying to figure out relationship between two variables x and y . You have some idea but you aren't quite sure yet whether the dependence is linear or quadratic.

Your goal is to use least mean squares regression to identify the coefficients for the following three models. The three models are:

1. Quadratic model where $\mathrm{y} = b + w_1 \cdot \mathrm{x} + w_2 \cdot \mathrm{x}^2$.
1. Linear model where $\mathrm{y} = b + w_1 \cdot \mathrm{x}$.
1. Linear model with no bias  where $\mathrm{y} = w_1 \cdot \mathrm{x}$.

- You will use <font color = 'indianred'>**Batch gradient descent to estimate the model co-efficients.Batch gradient descent uses complete training data at each iteration.**</font>
- You will implement only training loop (no splitting of data in to training/validation).
- The training loop will have only one ```for loop```. We need to iterate over whole data in each epoch. We do not need to create batches.
- You may have to try different values of number of epochs/ learning rate to get good results.
- You should use  Pytorch's nn.module and functions.

## <font color = 'indianred'> **Data**

In [None]:
x = torch.tensor([1.5420291, 1.8935232, 2.1603365, 2.5381863, 2.893443, \
                    3.838855, 3.925425, 4.2233696, 4.235571, 4.273397, \
                    4.9332876, 6.4704757, 6.517571, 6.87826, 7.0009003, \
                    7.035741, 7.278681, 7.7561755, 9.121138, 9.728281])
y = torch.tensor([63.802246, 80.036026, 91.4903, 108.28776, 122.781975, \
                    161.36314, 166.50816, 176.16772, 180.29395, 179.09758, \
                    206.21027, 272.71857, 272.24033, 289.54745, 293.8488, \
                    295.2281, 306.62274, 327.93243, 383.16296, 408.65967])

In [None]:
# Reshapeed the y tensor to have shape (n, 1), where n is the number of samples.
# This is done to match the expected input shape for PyTorch's loss functions.
y = y.view(-1,1)

# Reshapeed the x tensor to have shape (n, 1), similar to y, for consistency and to work with matrix operations.
x = x.view(-1,1)

# Computed the square of each element in x.
# This may be used for polynomial features in regression models.
x2 = x * x


In [None]:
# Concatenated the original x tensor and its squared values (x2) along dimension 1 (columns).
# This creates a new tensor with two features: the original x and x2 (its square) . This can be useful for polynomial regression.
x_combined = torch.cat((x, x2), dim= 1)


In [None]:
print(x_combined.shape, x.shape)

torch.Size([20, 2]) torch.Size([20, 1])


##<font color = 'indianred'>**Loss Function**

In [None]:
# Initialized Mean Squared Error (MSE) loss function with mean reduction
# 'reduction="mean"' averages the squared differences between predicted and target values
loss_function = nn.MSELoss(reduction='mean')


## <font color = 'indianred'> **Train Function**

In [None]:
def train(epochs, x, y, loss_function, log_interval, model, optimizer):
    """
    Train a PyTorch model using gradient descent.

    Parameters:
    epochs (int): The number of training epochs.
    x (torch.Tensor): The input features.
    y (torch.Tensor): The ground truth labels.
    loss_function (torch.nn.Module): The loss function to be minimized.
    log_interval (int): The interval at which training information is logged.
    model (torch.nn.Module): The PyTorch model to be trained.
    optimizer (torch.optim.Optimizer): The optimizer for updating model parameters.

    Side Effects:
    - Modifies the input model's internal parameters during training.
    - Outputs training log information at specified intervals.
    """


    for epoch in range(epochs):

        optim = torch.optim.SGD(model.parameters(), lr=1e-2)

        # Step 1: Forward pass - Compute predictions based on the input features
        y_hat = model(x)

        # Step 2: Compute Loss
        loss = loss_function(y_hat,y)

        # Step 3: Zero Gradients - Clear previous gradient information to prevent accumulation
        optim.zero_grad()

        # Step 4: Calculate Gradients - Backpropagate the error to compute gradients for each parameter
        loss.backward()

        # Step 5: Update Model Parameters - Adjust weights based on computed gradients
        optimizer.step()

        # Log training information at specified intervals
        if epoch % log_interval == 0:
            print(f'epoch: {epoch + 1} --> loss {loss.item()}')



## <font color = 'indianred'> **Part 1**

-  <font color = 'indianred'>**For Part 1, use x_combined (we need to use both $x$ and $x^2 $) as input to the model, this means that you have two inputs.**</font>
- Use `nn.Linear` function to specify the model, <font color = 'indianred'>**think carefully what values the three arguments ```(n_ins, n_outs, bias)``` will take**.</font>.
- In PyTorch, the `nn.Linear` layer initializes its weights using Kaiming (He) initialization by default, which is well-suited for ReLU activation functions. The bias terms are initialized to zero.
-  In this task we will  use `nn.init` functions like `nn.init.normal_` and `nn.init.zeros_`, to explicitly override these default initializations to use your specified methods.


**Run the cell below twice**

**In the first attempt**
- Use LEARNING_RATE = 0.05
What do we observe?

Write your observations HERE:

**In the second attempt**
- Now use a LEARNING_RATE  = 0.0005,
What do we observe?

Write your observations HERE:


In [None]:
# model 1
LEARNING_RATE = 0.0005
EPOCHS = 100000
LOG_INTERVAL= 10000

# Use PyTorch's nn.Linear to create the model for your task.
# Based on your understanding of the problem at hand, decide how you will initialize the nn.Linear layer.
# Take into consideration the number of input features, the number of output features, and whether or not to include a bias term.
model = nn.Linear(2,1)

# Initialize the weights of the model using a normal distribution with mean = 0 and std = 0.01
# Hint: To initialize the model's weights, you can use the nn.init.normal_() function.
# You will need to provide the 'model.weight' tensor and specify values for the 'mean' and 'std' arguments.
nn.init.normal_(model.weight,mean=0,std=0.001)


# Initialize the model's bias terms to zero
# Hint: To set the model's bias terms to zero, consider using the nn.init.zeros_() function.
# You'll need to supply 'model.bias' as an argument.
nn.init.zeros_(model.bias)

# Create an SGD (Stochastic Gradient Descent) optimizer using the model's parameters and a predefined learning rate
optimizer = torch.optim.SGD(model.parameters(),lr=LEARNING_RATE)


# Start the training process for the model with specified parameters and settings
train(EPOCHS, x_combined, y, loss_function, LOG_INTERVAL, model, optimizer)


epoch: 1 --> loss 57958.375
epoch: 10001 --> loss 5.003321647644043
epoch: 20001 --> loss 3.095216989517212
epoch: 30001 --> loss 2.1377272605895996
epoch: 40001 --> loss 1.657240867614746
epoch: 50001 --> loss 1.4161328077316284
epoch: 60001 --> loss 1.2949767112731934
epoch: 70001 --> loss 1.2341285943984985
epoch: 80001 --> loss 1.2036001682281494
epoch: 90001 --> loss 1.1881986856460571


In [None]:
print(f' Weights {model.weight.data}, \nBias: {model.bias.data}')

 Weights tensor([[4.1796e+01, 1.4826e-02]]), 
Bias: tensor([0.9774])


## <font color = 'indianred'> **Part 2**

-  <font color = 'indianred'>**For Part 2, used $x$ as input to the model, this means that you have only one input.**</font>
- Use `nn.Linear` to specify the model, <font color = 'indianred'>**think carefully what values the three arguments ```(n_ins, n_outs, bias)``` will take**.</font>.


In [None]:
# model 2
LEARNING_RATE = 0.01
EPOCHS = 1000
LOG_INTERVAL= 10

# Used PyTorch's nn.Linear to create the model for your task.
# Based on your understanding of the problem at hand, decide how you will initialize the nn.Linear layer.
# Taken into consideration the number of input features, the number of output features, and whether or not to include a bias term.
model = nn.Linear(1,1,bias=True)

# Initializde the weights of the model using a normal distribution with mean = 0 and std = 0.01
# Hint: To initialize the model's weights, you can use the torch.nn.init.normal_() function.
# We will need to provide the 'model.weight' tensor and specify values for the 'mean' and 'std' arguments.
nn.init.normal_(model.weight,mean=0,std=0.01)



# Initialized the model's bias terms to zero
# Hint: To set the model's bias terms to zero, consider using the nn.init.zeros_() function.
# We'll need to supply 'model.bias' as an argument.
nn.init.zeros_(model.bias)


# Created an SGD (Stochastic Gradient Descent) optimizer using the model's parameters and a predefined learning rate
optimizer =  torch.optim.SGD(model.parameters(),lr=LEARNING_RATE)


# Started the training process for the model with specified parameters and settings
# Noted that we are passing x as an input for this part
train(EPOCHS, x, y, loss_function, LOG_INTERVAL, model, optimizer)

epoch: 1 --> loss 57976.9453125
epoch: 11 --> loss 6.975368499755859
epoch: 21 --> loss 6.601193428039551
epoch: 31 --> loss 6.251128196716309
epoch: 41 --> loss 5.923625469207764
epoch: 51 --> loss 5.617262840270996
epoch: 61 --> loss 5.330683708190918
epoch: 71 --> loss 5.062532901763916
epoch: 81 --> loss 4.811709403991699
epoch: 91 --> loss 4.577028751373291
epoch: 101 --> loss 4.35750675201416
epoch: 111 --> loss 4.152127265930176
epoch: 121 --> loss 3.9599738121032715
epoch: 131 --> loss 3.780254364013672
epoch: 141 --> loss 3.6120963096618652
epoch: 151 --> loss 3.4547812938690186
epoch: 161 --> loss 3.3076419830322266
epoch: 171 --> loss 3.1699490547180176
epoch: 181 --> loss 3.041151523590088
epoch: 191 --> loss 2.9206676483154297
epoch: 201 --> loss 2.807941436767578
epoch: 211 --> loss 2.7024998664855957
epoch: 221 --> loss 2.603853940963745
epoch: 231 --> loss 2.511547327041626
epoch: 241 --> loss 2.4252114295959473
epoch: 251 --> loss 2.3444480895996094
epoch: 261 --> loss

In [None]:
print(f' Weights {model.weight.data}, \nBias: {model.bias.data}')

 Weights tensor([[41.9377]]), 
Bias: tensor([0.7468])


## <font color = 'indianred'> **Part 3**
-  <font color = 'indianred'>**Part 3 is similar to part 2, the only difference is that model has no bias term now.**</font>
- **we will see that we are now running the model for only ten epochs and will get similar results**

In [None]:
# model 3
LEARNING_RATE = 0.01
EPOCHS = 10
LOG_INTERVAL= 1

# Used PyTorch's nn.Linear to create the model for your task.
# Based on your understanding of the problem at hand, decide how you will initialize the nn.Linear layer.
# Taken into consideration the number of input features, the number of output features, and whether or not to include a bias term.
model = nn.Linear(1,1,bias=False)


# Initialized the weights of the model using a normal distribution with mean = 0 and std = 0.01
# Hint used: To initialize the model's weights, you can use the nn.init.normal_() function.
# we will need to provide the 'model.weight' tensor and specify values for the 'mean' and 'std' arguments.
nn.init.normal_(model.weight,mean=0,std=0.01)



# We do not need to initilaize the bias term as there is no bias term in this model

# Create an SGD (Stochastic Gradient Descent) optimizer using the model's parameters and a predefined learning rate
optimizer = torch.optim.SGD(model.parameters(),lr=LEARNING_RATE)


# Start the training process for the model with specified parameters and settings
# Note that we are passing x as an input for this part
train(EPOCHS, x, y, loss_function, LOG_INTERVAL, model, optimizer)


epoch: 1 --> loss 57964.03125
epoch: 2 --> loss 6895.87646484375
epoch: 3 --> loss 821.3375244140625
epoch: 4 --> loss 98.77323150634766
epoch: 5 --> loss 12.824605941772461
epoch: 6 --> loss 2.6011123657226562
epoch: 7 --> loss 1.3850191831588745
epoch: 8 --> loss 1.2403713464736938
epoch: 9 --> loss 1.223166584968567
epoch: 10 --> loss 1.2211220264434814


In [None]:
print(f' Weights {model.weight.data}')

 Weights tensor([[42.0557]])


# <font color = 'indianred'>**Task 3 - MultiClass Classification using Mini Batch Gradient Descent with PyTorch**

- You will implement only training loop (no splitting of data in to training/validation).
- We will use minibatch Gradient Descent - Hence we will have two for loops in his case.
- You should use  Pytorch's nn.module and functions.

## <font color = 'indianred'>**Data**

In [None]:
# Imported the make_classification function from the sklearn.datasets module
# This function is used to generate a synthetic dataset for classification tasks.
from sklearn.datasets import make_classification

# Imported the StandardScaler class from the sklearn.preprocessing module
# StandardScaler is used to standardize the features by removing the mean and scaling to unit variance.
from sklearn.preprocessing import StandardScaler


In [None]:
# Imported the main PyTorch library, which provides the essential building blocks for constructing neural networks.
import torch

# Imported the 'optim' module from PyTorch for various optimization algorithms like SGD, Adam, etc.
import torch.optim as optim

# Imported the 'nn' module from PyTorch, which contains pre-defined layers, loss functions, etc., for neural networks.
import torch.nn as nn

# Imported the 'functional' module from PyTorch; incorrect import here, it should be 'import torch.nn.functional as F'
# This module contains functional forms of layers, loss functions, and other operations.
import torch.functional as F  # Should be 'import torch.nn.functional as F'

# Imported DataLoader and Dataset classes from PyTorch's utility library.
# DataLoader helps with batching, shuffling, and loading data in parallel.
# Dataset provides an abstract interface for easier data manipulation.
from torch.utils.data import DataLoader, Dataset


In [None]:
# Generated a synthetic dataset for classification using make_classification function.
# Parameters:
# - n_samples=1000: The total number of samples in the generated dataset.
# - n_features=5: The total number of features for each sample.
# - n_classes=3: The number of classes for the classification task.
# - n_informative=4: The number of informative features, i.e., features that are actually useful for classification.
# - n_redundant=1: The number of redundant features, i.e., features that can be linearly derived from informative features.
# - random_state=0: The seed for the random number generator to ensure reproducibility.

X, y = make_classification(n_samples=1000, n_features=5, n_classes=3, n_informative=4, n_redundant=1, random_state=0)



In this example, we're using `make_classification` to <font color = 'indianred'>**generates a dataset with 1,000 samples, 5 features per sample, and 3 classes for the classification problem**.</font> Of the 5 features, 4 are informative (useful for classification), and 1 is redundant (can be derived from the informative features). The `random_state` parameter ensures that the data generation is reproducible.

In [None]:
# Initialized the StandardScaler object from the sklearn.preprocessing module.
# This will be used to standardize the features of the dataset.
preprocessor = StandardScaler()

# Fit the StandardScaler on the dataset (X) and then transform it.
# The fit_transform() method computes the mean and standard deviation of each feature,
# and then standardizes the features by subtracting the mean and dividing by the standard deviation.
X = preprocessor.fit_transform(X)


In [None]:
print(X.shape, y.shape)

(1000, 5) (1000,)


In [None]:
X[0:5]

array([[-0.39443436, -0.78033571, -0.25005511,  0.09118536, -0.5690698 ],
       [ 0.64284479, -0.95837057,  0.83598996, -0.08438568,  0.50539358],
       [ 0.99102498,  0.8580679 ,  0.78786062, -0.9114329 ,  1.62615938],
       [-0.96923966,  0.86168226, -1.31837608, -1.22844863, -0.07591589],
       [ 0.96021518,  0.99206623,  1.0026402 , -0.25339161,  1.18831784]])

In [None]:
print(y[0:10])

[2 0 1 2 1 1 0 2 0 0]


## <font color = 'indianred'>**Dataset and Data Loaders**

In [None]:
# Convert the numpy arrays X and y to PyTorch Tensors.
# For X, we create a floating-point tensor since most PyTorch models expect float inputs for features.
# This is a  multiclass classification problem.

# ================================
# IMPORTANT: # Consider what cost function you will use and whether it expects the label tensor (y)  to be float or long type.
# ================================

x_tensor = torch.tensor(X)
y_tensor = torch.tensor(y)


In [None]:
# Define a custom PyTorch Dataset class for handling our data
class MyDataset(Dataset):
    # Constructor: Initialize the dataset with features and labels
    def __init__(self, X, y):
        self.features = X
        self.labels = y

    # Method to return the length of the dataset
    def __len__(self):
        return self.labels.shape[0]

    # Method to get a data point by index
    def __getitem__(self, index):
        x = self.features[index]
        y = self.labels[index]
        return x, y



In [None]:
# Create an instance of the custom MyDataset class, passing in the feature and label tensors.
# This will allow the data to be used with PyTorch's DataLoader for efficient batch processing.
train_dataset = MyDataset(x_tensor, y_tensor)


In [None]:
# Access the first element (feature-label pair) from the train_dataset using indexing.
# The __getitem__ method of MyDataset class will be called to return this element.
train_dataset[0]


(tensor([-0.3944, -0.7803, -0.2501,  0.0912, -0.5691], dtype=torch.float64),
 tensor(2))

In [None]:
# Create Data loader from Dataset
# Use a batch size of 16
# Use shuffle = True
train_loader = DataLoader(train_dataset,batch_size = 16,shuffle=True)

## <font color = 'indianred'>**Model**

In [None]:
# Task: Define your neural network model for multi-class classification.
# Thought through what layers you should add. Note: My task is to create a model that uses Softmax for
# classification but doesn't include any hidden layers.
# We can use nn.Linear or nn.Sequential for this task
model = nn.Linear(5,3,bias=True)



## <font color = 'indianred'>**Loss Function**

In [None]:
# Task: Specify the loss function for your model.
# Considered the architecture of my model, especially the last layer, when choosing the loss function.
# Reminder: The last layer in the previous step should guide your choice for an appropriate loss function for multi-class classification.

loss_function =  nn.CrossEntropyLoss()

## <font color = 'indianred'>**Initialization**

Created a function to initilaize weights.
- Initialized weights using normal distribution with mean = 0 and std = 0.05
- Initilaized the bias term with zeros

In [None]:
# Function to initialize the weights and biases of a neural network layer.
# This function specifically targets layers of type nn.Linear.
def init_weights(layer):
  # Check if the layer is of the type nn.Linear.
  if type(layer) == nn.Linear:
    # Initialize the weights with a normal distribution, centered at 0 with a standard deviation of 0.05.
    torch.nn.init.normal_(layer.weight, mean=0, std=0.05)
    # Initialize the bias terms to zero.
    torch.nn.init.zeros_(layer.bias)


## <font color = 'indianred'>**Training Loop**

**Model Training** involves five steps:

- Step 0: Randomly initialize parameters / weights
- Step 1: Compute model's predictions - forward pass
- Step 2: Compute loss
- Step 3: Compute the gradients
- Step 4: Update the parameters
- Step 5: Repeat steps 1 - 4

Model training is repeating this process over and over, for many **epochs**.

We will specify number of ***epochs*** and during each epoch we will iterate over the complete dataset and will keep on updating the parameters.

***Learning rate*** and ***epochs*** are known as hyperparameters. We have to adjust the values of these two based on validation dataset.

We will now create functions for step 1 to 4.

In [None]:
# Function to train a neural network model.
# Arguments include the number of epochs, loss function, learning rate, model architecture, and optimizer.

def train(epochs, loss_function, learning_rate, model, optimizer):

  # Loop through each epoch
  for epoch in range(epochs):

    # Initialized variables to hold aggregated training loss and correct prediction count for each epoch
    running_train_loss = 0
    running_train_correct = 0

    # Loop through each batch in the training dataset using train_loader
    for x, y in train_loader:

      # Move input and target tensors to the device (GPU or CPU)
      device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
      x = x.to(device,dtype= torch.float32)
      targets = y.to(device,dtype=torch.long)


      # Step 1: Forward Pass: Compute model's predictions
      output = model(x)

      # Step 2: Compute loss
      loss = loss_function(output,targets)

      # Step 3: Backward pass - Compute the gradients
      # Zero out gradients from the previous iteration
      optimizer.zero_grad()

      # Backward pass: Compute gradients based on the loss
      loss.backward()

      # Step 4: Update the parameters
      optimizer.step()

      # Accumulated the loss for the batch
      running_train_loss += loss.item()

      # Evaluated model's performance without backpropagation for efficiency
      # `with torch.no_grad()` temporarily disables autograd, improving speed and avoiding side effects during evaluation.
      with torch.no_grad():
          y_pred =  output.argmax(dim=1)  # Find the class index with the maximum predicted probability
          correct = (y_pred==targets).sum().item() # Compute the number of correct predictions in the batch
          running_train_correct += correct  # Update the cumulative count of correct predictions for the current epoch


    # Computed average training loss and accuracy for the epoch
    train_loss = running_train_loss / len(train_loader)
    train_acc = running_train_correct / len(train_loader.dataset)

    # Displayed training loss and accuracy metrics for the current epoch
    print(f'Epoch : {epoch + 1} / {epochs}')
    print(f'Train Loss: {train_loss:.4f} | Train Accuracy: {train_acc * 100:.4f}%')



In [None]:
# Fixed the random seed to ensure reproducibility across runs
torch.manual_seed(100)

# Defined the total number of epochs for which the model will be trained
epochs = 5

# Detected if a GPU is available and use it; otherwise, use CPU
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print(device)  # Output the device being used

# Defined the learning rate for optimization; consider its impact on model performance
learning_rate = 1

# Task: Configure the optimizer for model training.
# Here, we're using Stochastic Gradient Descent (SGD). Think through what parameters are needed.
# Reminder: Utilize the learning rate defined above when setting up your optimizer.
optimizer = torch.optim.SGD(model.parameters(),lr=learning_rate)


# Relocate the model to the appropriate compute device (GPU or CPU)
model.to(device)

# Apply custom weight initialization; this can affect the model's learning trajectory
# The `apply` function recursively applies a function to each submodule in a PyTorch model.
# In the given context, it's used to apply the `init_weights` function to initialize the weights of all layers in the model.
# The benefit is that it provides a convenient way to systematically apply custom weight initialization across complex models,
# potentially improving model convergence and performance.
model.apply(init_weights)

# Kick off the training process using the specified settings
train(epochs, loss_function, learning_rate, model, optimizer)



cpu
Epoch : 1 / 5
Train Loss: 0.8130 | Train Accuracy: 65.4000%
Epoch : 2 / 5
Train Loss: 0.8005 | Train Accuracy: 66.7000%
Epoch : 3 / 5
Train Loss: 0.7915 | Train Accuracy: 67.0000%
Epoch : 4 / 5
Train Loss: 0.7978 | Train Accuracy: 67.9000%
Epoch : 5 / 5
Train Loss: 0.8008 | Train Accuracy: 67.0000%


In [None]:
# Output the learned parameters (weights and biases) of the model after training
for name, param in model.named_parameters():
  # Print the name and the values of each parameter
  print(name, param.data)


weight tensor([[ 0.4345, -0.9407, -0.5891, -0.4591,  0.7364],
        [ 0.0557,  1.0332,  0.1920,  0.4732,  0.0554],
        [-0.6367, -0.0987,  0.2159,  0.0453, -0.8418]])
bias tensor([-0.2508, -0.0254,  0.2762])


# <font color = 'indianred'>**Task 4 - MultiLabel Classification using Mini Batch Gradient Descent with PyTorch**

- We will implement only training loop (no splitting of data in to training/validation).
- We will use minibatch Gradient Descent - Hence we will have two for loops in his case.
- We use  Pytorch's nn.module and functions.

## <font color = 'indianred'>**Data**

In [None]:
# Imported the function to generate a synthetic multilabel classification dataset
from sklearn.datasets import make_multilabel_classification

# Imported the StandardScaler for feature normalization
from sklearn.preprocessing import StandardScaler


In [None]:
# Imported PyTorch library for tensor computation and neural network modules
import torch

# Imported PyTorch's optimization algorithms package
import torch.optim as optim

# Imported PyTorch's neural network module for defining layers and models
import torch.nn as nn

# Imported PyTorch's functional API for stateless operations
import torch.functional as F

# Imported DataLoader, TensorDataset, and Dataset for data loading and manipulation
from torch.utils.data import DataLoader, TensorDataset, Dataset


In [None]:
# Generated a synthetic multilabel classification dataset
# n_samples: Number of samples in the dataset
# n_features: Number of feature variables
# n_classes: Number of distinct labels (or classes)
# n_labels: Average number of labels per instance
# random_state: Seed for reproducibility
X, y = make_multilabel_classification(n_samples=1000, n_features=5, n_classes=3, n_labels=2, random_state=0)


In [None]:
# Initialized the StandardScaler for feature normalization
preprocessor = StandardScaler()

# Fit the preprocessor to the data and transform the features for zero mean and unit variance
X = preprocessor.fit_transform(X)


In [None]:
# Printed the shape of the feature matrix X and the label matrix y
# Pay attention to these shapes as they will guide you in defining your neural network model
print(X.shape, y.shape)


(1000, 5) (1000, 3)


In [None]:
X[0:5]

array([[ 1.65506353,  0.2101857 ,  0.51570947, -2.00177184,  0.40001786],
       [-0.02349989, -0.51376047,  2.34771468,  0.78787635, -1.04334554],
       [ 1.09554239,  0.93413188, -0.09495894, -0.00916599, -0.01237169],
       [-0.58302103,  1.17544727,  0.21037527, -0.80620833,  0.8124074 ],
       [ 1.09554239,  0.69281649, -1.92696415,  1.18639752, -1.24954031]])

In [None]:
# ================================
# IMPORTANT: # NOTE: The y in this case is one hot encoded.
# This is different from Multiclass Classification.
# The loss function we use for multiclass classification handles this internally
# For multilabel case we have to provide y in this format
# ================================

print(y[0:10])

[[0 0 1]
 [1 0 0]
 [1 1 1]
 [0 1 1]
 [1 1 0]
 [0 1 0]
 [1 1 1]
 [1 0 1]
 [1 1 1]
 [1 1 0]]


## <font color = 'indianred'>**Dataset and Data Loaders**

In [None]:
# Created Tensors from the numpy arrays.
# Earlier, we focused on multiclass classification; now, we are dealing with multilabel classification.

# ================================
# IMPORTANT: # Consider what cost function you will use for multilabel classification and whether it expects the label tensor (y) to be float or long type.
# ================================

x_tensor = torch.tensor(X)
y_tensor = torch.tensor(y)


In [None]:
# Defined a custom PyTorch Dataset class for handling our data
class MyDataset(Dataset):
    # Constructor: Initialize the dataset with features and labels
    def __init__(self, X, y):
        self.features = X
        self.labels = y

    # Method to return the length of the dataset
    def __len__(self):
        return self.labels.shape[0]

    # Method to get a data point by index
    def __getitem__(self, index):
        x = self.features[index]
        y = self.labels[index]
        return x, y


In [None]:
# Initialized an instance of the custom MyDataset class
# This will be our training dataset, holding our features and labels as PyTorch tensors
train_dataset = MyDataset(x_tensor, y_tensor)


In [None]:
# Accessed the first element (feature-label pair) from the train_dataset using indexing.
# The __getitem__ method of MyDataset class will be called to return this element.
# This is useful for debugging and understanding the data structure
train_dataset[0]


(tensor([ 1.6551,  0.2102,  0.5157, -2.0018,  0.4000], dtype=torch.float64),
 tensor([0, 0, 1]))

In [None]:
# Created Data lOader from Dataset
# Used a batch size of 16
# Used shuffle = True
train_loader = torch.utils.data.DataLoader(train_dataset,batch_size=16,shuffle=True)


## <font color = 'indianred'>**Model**

In [None]:
# Task: Specify your model architecture here.
# This is a multilabel problem. Think through what layers you should add to handle this.
# Remember, the architecture of your last layer will also depend on your choice of loss function.
# Additional Note: No hidden layers should be added for this exercise.
# You can use nn.Linear or nn.Sequential for this task

model = nn.Linear(5,3,bias=True)




## <font color = 'indianred'>**Loss Function**

In [None]:
# Task: Specify the loss function for your model.
# Considered the architecture of your model, especially the last layer, when choosing the loss function.
# This is a multilabel problem, so make sure your choice reflects that.


loss_function = torch.nn.BCEWithLogitsLoss()


## <font color = 'indianred'>**Initialization**

Created a function to initilaize weights.
- Initialized weights using normal distribution with mean = 0 and std = 0.05
- Initilaized the bias term with zeros

In [None]:
# Function to initialize the weights and biases of the model's layers
# This is provided to you and is not a student task
def init_weights(layer):
  # Check if the layer is a Linear layer
  if type(layer) == nn.Linear:
    # Initialize the weights with a normal distribution, mean=0, std=0.05
    torch.nn.init.normal_(layer.weight, mean = 0, std = 0.05)
    # Initialize the bias terms to zero
    torch.nn.init.zeros_(layer.bias)


## <font color = 'indianred'>**Training Loop**

**Model Training** involves five steps:

- Step 0: Randomly initialize parameters / weights
- Step 1: Compute model's predictions - forward pass
- Step 2: Compute loss
- Step 3: Compute the gradients
- Step 4: Update the parameters
- Step 5: Repeat steps 1 - 4

Model training is repeating this process over and over, for many **epochs**.

We will specify number of ***epochs*** and during each epoch we will iterate over the complete dataset and will keep on updating the parameters.

***Learning rate*** and ***epochs*** are known as hyperparameters. We have to adjust the values of these two based on validation dataset.

We will now create functions for step 1 to 4.

In [None]:
# Installed the torchmetrics package, a PyTorch library for various machine learning metrics,
# to facilitate model evaluation during and after training.
!pip install torchmetrics


Collecting torchmetrics
  Downloading torchmetrics-1.3.1-py3-none-any.whl (840 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/840.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.6/840.4 kB[0m [31m5.6 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m840.4/840.4 kB[0m [31m14.9 MB/s[0m eta [36m0:00:00[0m
Collecting lightning-utilities>=0.8.0 (from torchmetrics)
  Downloading lightning_utilities-0.10.1-py3-none-any.whl (24 kB)
Installing collected packages: lightning-utilities, torchmetrics
Successfully installed lightning-utilities-0.10.1 torchmetrics-1.3.1


In [None]:
# Imported HammingDistance from torchmetrics
# HammingDistance is useful for evaluating multi-label classification problems.
from torchmetrics import HammingDistance

<font color = 'indianred'>**Hamming Distance**</font> is often used in multi-label classification problems to quantify the dissimilarity between the predicted and true labels. It does this by measuring the number of label positions where predicted and true labels differ for each sample. It is a useful metric because it offers a granular level of understanding of the discrepancies between the predicted and actual labels, taking into account each label in a multi-label setting.

<font color = 'indianred'>**Unlike accuracy, which is all-or-nothing, Hamming Distance can give partial credit by considering the labels that were correctly classified** </font>, thereby providing a more granular insight into the model's performance.

Let us understand this with an example:

In [None]:
target = torch.tensor([[0, 1], [1, 1]])
preds = torch.tensor([[0, 1], [0, 1]])
hamming_distance = HammingDistance(task="multilabel", num_labels=2)
hamming_distance(preds, target)

tensor(0.2500)

In the given example, the Hamming Distance is calculated for multi-label classification with two labels (0 and 1).

1. The target tensor has shape (2, 2): `[[0, 1], [1, 1]]`
2. The prediction tensor also has shape (2, 2): `[[0, 1], [0, 1]]`

Let's examine the individual sample pairs to understand the distance:

- For the first sample pair (target = `[0, 1]`, prediction = `[0, 1]`), the Hamming Distance is 0 because the prediction is accurate.
- For the second sample pair (target = `[1, 1]`, prediction = `[0, 1]`), the Hamming Distance is 1 for the first label (predicted 0, true label 1).

To calculate the overall Hamming Distance, we can take the number of label mismatches and divide by the total number of labels:

- Total Mismatches = 1 (from the second sample pair)
- Total Number of Labels = 2 samples * 2 labels per sample = 4

Therefore, the overall Hamming Distance is \(1 / 4 = 0.25\), which matches the output `tensor(0.2500)`.

Hamming Distance is a good metric for multi-label classification as it can capture the difference between sets of labels per sample, thereby providing a more granular measure of the model's performance.

In [None]:
def train(epochs, loss_function, learning_rate, model, optimizer, train_loader, device):

    train_hamming_distance = HammingDistance(task="multilabel", num_labels=3).to(device)

    for epoch in range(epochs):
        # Initialize train_loss at the start of the epoch
        running_train_loss = 0.0

        # Iterate on batches from the dataset using train_loader
        for x, y in train_loader:
            # Move inputs and outputs to GPUs
            x = x.to(device, dtype=torch.float32)
            y = y.to(device,dtype=torch.float32)

            # Step 1: Forward Pass: Compute model's predictions
            output =  model(x)

            # Step 2: Compute loss
            loss =  loss_function(output,y)

            # Step 3: Backward pass - Compute the gradients
            # Zero out gradients from the previous iteration
            optimizer.zero_grad()

            # Backward pass: Compute gradients based on the loss
            loss.backward()

            # Step 4: Update the parameters
            optimizer.step()

            # Update running loss
            running_train_loss += loss.item()

            with torch.no_grad():
                # Correct prediction using thresholding
                y_pred = output > 0.5

                # Update Hamming Distance metric
                train_hamming_distance.update(y_pred, y)

        # Compute mean train loss for the epoch
        train_loss = running_train_loss / len(train_loader)

        # Compute Hamming Distance for the epoch
        epoch_hamming_distance = train_hamming_distance.compute()

        # Print the train loss and Hamming Distance for the epoch
        print(f'Epoch: {epoch + 1} / {epochs}')
        print(f'Train Loss: {train_loss:.4f} | Train Hamming Distance: {epoch_hamming_distance:.4f}')

        # Reset metric states for the next epoch
        train_hamming_distance.reset()


In [None]:
# Set a manual seed for reproducibility across runs
torch.manual_seed(100)

# Define hyperparameters: learning rate and the number of epochs
learning_rate = 1
epochs = 20

# Determine the computing device (GPU if available, otherwise CPU)
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Student Task: Configure the optimizer for model training.
# Here, we're using Stochastic Gradient Descent (SGD). Think through what parameters are needed.
# Reminder: Utilize the learning rate defined above when setting up your optimizer.
optimizer = optim.SGD(model.parameters(), lr=learning_rate)

# Transfer the model to the selected device (CPU or GPU)
model.to(device)

# Apply custom weight initialization function to the model layers
# Note: Weight initialization can significantly affect training dynamics
model.apply(init_weights)

# Call the training function to start the training process
# Note: All elements like epochs, loss function, learning rate, etc., are passed as arguments
train(epochs, loss_function, learning_rate, model, optimizer, train_loader, device)


Using device: cpu
Epoch: 1 / 20
Train Loss: 0.5126 | Train Hamming Distance: 0.2947
Epoch: 2 / 20
Train Loss: 0.4856 | Train Hamming Distance: 0.2523
Epoch: 3 / 20
Train Loss: 0.4825 | Train Hamming Distance: 0.2547
Epoch: 4 / 20
Train Loss: 0.4833 | Train Hamming Distance: 0.2540
Epoch: 5 / 20
Train Loss: 0.4866 | Train Hamming Distance: 0.2530
Epoch: 6 / 20
Train Loss: 0.4829 | Train Hamming Distance: 0.2563
Epoch: 7 / 20
Train Loss: 0.4856 | Train Hamming Distance: 0.2520
Epoch: 8 / 20
Train Loss: 0.4843 | Train Hamming Distance: 0.2553
Epoch: 9 / 20
Train Loss: 0.4858 | Train Hamming Distance: 0.2533
Epoch: 10 / 20
Train Loss: 0.4860 | Train Hamming Distance: 0.2540
Epoch: 11 / 20
Train Loss: 0.4846 | Train Hamming Distance: 0.2620
Epoch: 12 / 20
Train Loss: 0.4842 | Train Hamming Distance: 0.2510
Epoch: 13 / 20
Train Loss: 0.4845 | Train Hamming Distance: 0.2563
Epoch: 14 / 20
Train Loss: 0.4848 | Train Hamming Distance: 0.2523
Epoch: 15 / 20
Train Loss: 0.4833 | Train Hamming Dis

In [None]:
# Loop through the model's parameters to display them
# This is helpful for debugging and understanding how well the model has learned
for name, param in model.named_parameters():
    # 'name' will contain the name of the parameter (e.g., 'layer1.weight')
    # 'param.data' will contain the parameter values
    print(name, param.data)


weight tensor([[ 0.9856, -0.1141, -0.2715,  0.0869, -0.9284],
        [-0.9591,  0.7973,  0.5119,  0.1237, -1.4677],
        [ 0.1281,  0.7948, -0.0565, -1.6264,  0.5559]])
bias tensor([-0.2102,  0.4204,  0.0613])
