<a href="https://colab.research.google.com/github/Rohanrathod7/my-ml-labs/blob/main/20_Introduction_to_Deep_Learning_with_PyTorch/02_Neural_Network_Architecture_and_Hyperparameters.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### 2. Neural Network Architecture and Hyperparameters


To train a neural network in PyTorch, you will first need to understand additional components, such as activation and loss functions. You will then realize that training a network requires minimizing that loss function, which is done by calculating gradients. You will learn how to use these gradients to update your model's parameters.

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
import datetime as dt
# Import confusion matrix and train_test_split
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split, KFold, cross_val_score, GridSearchCV
from sklearn.linear_model import Ridge, Lasso, LogisticRegression, LinearRegression
from scipy.cluster.hierarchy import linkage, dendrogram
import matplotlib.pyplot as plt
from sklearn.linear_model import SGDClassifier

url = "https://raw.githubusercontent.com/Rohanrathod7/my-ml-labs/main/19_Feature_Engineering_for_NLP_in_Python/Dataset/ted.csv"
# Read the CSV file
# Apply pd.to_numeric only to relevant columns, excluding 'text'
ted = pd.read_csv(url)


display(ted.head())

Unnamed: 0,transcript,url
0,"We're going to talk — my — a new lecture, just...",https://www.ted.com/talks/al_seckel_says_our_b...
1,"This is a representation of your brain, and yo...",https://www.ted.com/talks/aaron_o_connell_maki...
2,It's a great honor today to share with you The...,https://www.ted.com/talks/carter_emmart_demos_...
3,"My passions are music, technology and making t...",https://www.ted.com/talks/jared_ficklin_new_wa...
4,It used to be that if you wanted to get a comp...,https://www.ted.com/talks/jeremy_howard_the_wo...


### Discovering activation functions

**Activate your understanding!**   
Neural networks are a core component of deep learning models. They can power so much in your daily life, from language translation apps to the cameras on your smartphone.

Which of the following statements about neural networks is True?

    ->A neural network with a single linear layer followed by a sigmoid activation is similar to a logistic regression model

    A logistic regression model is essentially a single-layer neural network with a sigmoid activation.

**The sigmoid and softmax functions**  
The sigmoid and softmax functions are key activation functions in deep learning, often used as the final step in a neural network.

- Sigmoid is for binary classification
- Softmax is for multi-class classification

Given a pre-activation output tensor from a network, apply the appropriate activation function to obtain the final output

In [2]:
import torch
import torch.nn as nn

input_tensor = torch.tensor([[2.4]])

# Create a sigmoid function and apply it on input_tensor
sigmoid = nn.Sigmoid()
probability = sigmoid(input_tensor)
print(probability)

tensor([[0.9168]])


In [3]:
input_tensor = torch.tensor([[1.0, -6.0, 2.5, -0.3, 1.2, 0.8]])

# Create a softmax function and apply it on input_tensor
softmax = nn.Softmax()
probabilities = softmax(input_tensor)
print(probabilities)

tensor([[1.2828e-01, 1.1698e-04, 5.7492e-01, 3.4961e-02, 1.5669e-01, 1.0503e-01]])


  return self._call_impl(*args, **kwargs)


### Running a forward pass

***Building a binary classifier in PyTorch***  
Recall that a small neural network with a single linear layer followed by a sigmoid function is a binary classifier. It acts just like a logistic regression.

Practice building this small network and interpreting the output of the classifier.

In [4]:
import torch
import torch.nn as nn

input_tensor = torch.Tensor([[3, 4, 6, 2, 3, 6, 8, 9]])

# Implement a small neural network for binary classification
model = nn.Sequential(
  nn.Linear(8, 1),
  nn.Sigmoid()
)

output = model(input_tensor)
print(output)

tensor([[0.0013]], grad_fn=<SigmoidBackward0>)


**Building a binary classifier in PyTorch**  
Recall that a small neural network with a single linear layer followed by a sigmoid function is a binary classifier. It acts just like a logistic regression.

Practice building this small network and interpreting the output of the classifier.

**Question**
Which of the following is `false` about the output returned by your binary classifier?

- We can use a threshold of 0.5 to determine if the output belongs to one class or the other.

--It can return any float value.

- It is produced from an untrained model so it is not yet meaningful.

- The sigmoid function transforms the values of the input without changing its shape.

    -> The sigmoid output is always between 0 and 1.

**From regression to multi-class classification**   
The models you have seen for binary classification, multi-class classification and regression have all been similar, barring a few tweaks to the model.

Start building a model for regression, and then tweak the model to perform a multi-class classification.

In [5]:
import torch
import torch.nn as nn

input_tensor = torch.Tensor([[3, 4, 6, 7, 10, 12, 2, 3, 6, 8, 9]])

# Implement a neural network with exactly four linear layers
model = nn.Sequential(
  nn.Linear(11, 20),  # First layer: input size should match the number of input features
  nn.Linear(20, 12),  # Second layer
  nn.Linear(12, 6),   # Third layer
  nn.Linear(6, 1)     # Fourth layer: output size should be 1 for regression
)

output = model(input_tensor)
print(output)

tensor([[-0.4389]], grad_fn=<AddmmBackward0>)


In [6]:
import torch
import torch.nn as nn

input_tensor = torch.Tensor([[3, 4, 6, 7, 10, 12, 2, 3, 6, 8, 9]])

# Update network below to perform a multi-class classification with four labels
model = nn.Sequential(
  nn.Linear(11, 20),
  nn.Linear(20, 12),
  nn.Linear(12, 6),
  nn.Linear(6, 4),
  nn.Softmax()
)

output = model(input_tensor)
print(output)

# You transformed regression outputs into probabilities between 0 and 1 by updating the last layer
# and applying softmax. Next, we’ll explore how to access predictions!

tensor([[0.3786, 0.1639, 0.1629, 0.2946]], grad_fn=<SoftmaxBackward0>)


### Using loss functions to assess model predictions


**Creating one-hot encoded labels**  
One-hot encoding converts a single integer label into a vector with N elements, where N is the number of classes. This vector contains zeros and a one at the correct position.

In this exercise, you'll manually create a one-hot encoded vector for y, and then use PyTorch to simplify the process. Your dataset has three classes (0, 1, 2).

numpy (np), torch.nn.functional (F), and torch are already imported for you.

In [7]:
y = 1
num_classes = 3

# Create the one-hot encoded vector using NumPy
one_hot_numpy = np.array([0, 1, 0])

# Create the one-hot encoded vector using PyTorch
one_hot_pytorch = F.one_hot(torch.tensor(y), num_classes=num_classes)

print("One-hot vector using NumPy:", one_hot_numpy)
print("One-hot vector using PyTorch:", one_hot_pytorch)

# You created one-hot encoded vectors manually and with PyTorch

NameError: name 'F' is not defined

**Calculating cross entropy loss**  
Cross-entropy loss is a widely used method to measure classification loss. In this exercise, you’ll calculate cross-entropy loss in PyTorch using:

- y: the ground truth label.
- scores: a vector of predictions before softmax.
Loss functions help neural networks learn by measuring prediction errors. Create a one-hot encoded vector for y, define the cross-entropy loss function, and compute the loss using scores and the encoded label. The result will be a single float representing the sample's loss.

torch, CrossEntropyLoss, and torch.nn.functional as F have already been imported for you.

In [8]:
import torch
import torch.nn.functional as F
from torch.nn import CrossEntropyLoss

y = [2]
scores = torch.tensor([[0.1, 6.0, -2.0, 3.2]])

# Create a one-hot encoded vector of the label y
one_hot_label = F.one_hot(torch.tensor(y), num_classes=4)

# Create the cross entropy loss function
criterion = CrossEntropyLoss()

# Calculate the cross entropy loss
loss = criterion(scores.double(), one_hot_label.double())
print(loss)

tensor(8.0619, dtype=torch.float64)


### Using derivatives to update model parameters

**Accessing the model parameters**  
A PyTorch model created with the nn.Sequential() is a module that contains the different layers of your network. Recall that each layer parameter can be accessed by indexing the created model directly. In this exercise, you will practice accessing the parameters of different linear layers of a neural network.

In [9]:
model = nn.Sequential(nn.Linear(16, 8),
                      nn.Linear(8, 2)
                     )

# Access the weight of the first linear layer
weight_0 = model[0].weight
print("Weight of the first layer:", weight_0)

# Access the bias of the second linear layer
bias_1 = model[1].bias
print("Bias of the second layer:", bias_1)

# You’ve successfully accessed the model parameters.

Weight of the first layer: Parameter containing:
tensor([[-1.9255e-01, -8.0615e-02,  1.0408e-01,  4.0837e-02, -1.6673e-01,
          1.8673e-01, -1.8312e-01, -3.6285e-02,  4.6827e-02, -2.1024e-01,
         -1.5861e-01, -1.2675e-04,  1.2661e-01, -1.3056e-01,  2.7707e-02,
         -1.2274e-01],
        [ 1.8291e-01,  1.2115e-01, -5.1471e-02, -1.5578e-01,  8.8836e-02,
          1.9420e-01, -6.8894e-02,  1.7208e-01,  1.9331e-01, -1.4071e-02,
          1.7341e-01, -1.9473e-01, -2.0906e-02,  1.3160e-01,  7.7527e-03,
         -2.4793e-01],
        [ 9.4462e-02,  1.5797e-01, -6.3461e-02, -1.7784e-01, -2.0228e-01,
          5.9339e-02, -1.7728e-02,  2.3113e-02,  7.1032e-02, -6.0721e-03,
         -1.7263e-02,  3.5402e-02,  8.4981e-02,  6.7004e-02,  1.7864e-01,
         -1.7335e-01],
        [ 1.8673e-01, -8.9617e-02, -1.6191e-01, -3.1191e-02,  2.0274e-01,
         -1.7988e-01, -1.7093e-01,  1.6315e-01,  9.2390e-02,  1.4609e-01,
         -1.4829e-01, -7.7734e-02,  1.1535e-01, -2.1903e-01, -6.8367

**Updating the weights manually**   
Now that you know how to access weights and biases, you will manually perform the job of the PyTorch optimizer. While PyTorch automates this, practicing it manually helps you build intuition for how models learn and adjust. This understanding will be valuable when debugging or fine-tuning neural networks.

A neural network of three layers has been created and stored as the model variable. This network has been used for a forward pass and the loss and its derivatives have been calculated. A default learning rate, lr, has been chosen to scale the gradients when performing the update.

In [19]:
weight0 = model[0].weight
weight1 = model[1].weight

# Access the gradients of the weight of each linear layer
grads0 = weight0.grad
grads1 = weight1.grad

# Define a learning rate
lr = 0.001

# Update the weights using the learning rate and the gradients
# weight0 = model[0].weight - lr * grads0
# weight1 = model[1].weight - lr * grads1

In [18]:
import torch
import torch.nn as nn

# Assuming 'model' is already defined from a previous cell (e.g., cell hErUeSgCn-ZX)
# model = nn.Sequential(nn.Linear(16, 8),
#                       nn.Linear(8, 2)
#                      )

# Define a dummy input tensor and target tensor for demonstration
input_tensor_dummy = torch.randn(1, 16) # Batch size 1, 16 input features
target_tensor_dummy = torch.randn(1, 2) # Batch size 1, 2 output features (for regression example)

# Perform a forward pass
output_dummy = model(input_tensor_dummy)

# Define a loss function (e.g., Mean Squared Error for a regression task)
criterion_dummy = nn.MSELoss()

# Calculate the loss
loss_dummy = criterion_dummy(output_dummy, target_tensor_dummy)

# Perform the backward pass to compute gradients
loss_dummy.backward()

# Now the gradients are available for updating the weights
weight0 = model[0].weight
weight1 = model[1].weight

# Access the gradients of the weight of each linear layer
grads0 = weight0.grad
grads1 = weight1.grad

# Define a learning rate
lr = 0.01

# Update the weights using the learning rate and the gradients
# Note: In a real training loop, you would typically use an optimizer for this step
with torch.no_grad(): # Disable gradient calculation for weight updates
    weight0 -= lr * grads0
    weight1 -= lr * grads1

# Gradients are accumulated, so zero them out after updating
model[0].weight.grad.zero_()
model[1].weight.grad.zero_()

print("Gradients are now computed and weights are updated.")
# You can print the updated weights to verify
print("Updated weight of the first layer:", model[0].weight)
print("Updated weight of the second layer:", model[1].weight)

# Imagine doing this for a hundred layers—it would be exhausting!
# Thankfully, PyTorch optimizers handle it all in just one line of code.

Gradients are now computed and weights are updated.
Updated weight of the first layer: Parameter containing:
tensor([[-0.1921, -0.0811,  0.1062,  0.0430, -0.1649,  0.1850, -0.1835, -0.0364,
          0.0456, -0.2136, -0.1562, -0.0003,  0.1253, -0.1299,  0.0278, -0.1215],
        [ 0.1812,  0.1262, -0.0560, -0.1589,  0.0913,  0.1947, -0.0718,  0.1622,
          0.1989, -0.0122,  0.1637, -0.1951, -0.0251,  0.1229,  0.0063, -0.2520],
        [ 0.0940,  0.1596, -0.0644, -0.1782, -0.2009,  0.0590, -0.0188,  0.0197,
          0.0726, -0.0064, -0.0198,  0.0352,  0.0831,  0.0642,  0.1782, -0.1743],
        [ 0.1862, -0.0880, -0.1633, -0.0322,  0.2035, -0.1797, -0.1719,  0.1600,
          0.0942,  0.1467, -0.1513, -0.0779,  0.1140, -0.2218, -0.0688,  0.1444],
        [-0.0429,  0.1310, -0.1321, -0.1798,  0.1838, -0.1628,  0.0893,  0.1654,
          0.1234,  0.0602, -0.1011, -0.1757,  0.1223, -0.2415, -0.2114,  0.1939],
        [ 0.0415,  0.1712, -0.0891,  0.1670,  0.1215, -0.1439,  0.2485, -0.2

**Using the PyTorch optimizer**  
Earlier, you manually updated the weight of a network, gaining insight into how training works behind the scenes. However, this method isn’t scalable for deep networks with many layers.

Thankfully, PyTorch provides the SGD optimizer, which automates this process efficiently in just a few lines of code. Now, you’ll complete the training loop by updating the weights using a PyTorch optimizer.

A neural network has been created and provided as the model variable. This model was used to run a forward pass and create the tensor of predictions pred. The one-hot encoded tensor is named target and the cross entropy loss function is stored as criterion.

In [22]:
import torch.optim as optim

# Create the optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001)

# Assuming 'pred' and 'target' are defined from a forward pass and
# 'criterion' is defined as a loss function (e.g., CrossEntropyLoss)

# You need to perform a forward pass and calculate the loss before this step
# Example (replace with your actual forward pass and loss calculation):
# output = model(input_data)
# loss = criterion(output, target)

# Calculate the loss (using dummy variables for demonstration)
# You would replace these with your actual pred and target from a forward pass
# For demonstration, let's assume pred and target are defined elsewhere
# Example dummy pred and target (replace with your actual data)
# pred = torch.randn(1, 4, requires_grad=True) # Example output with 4 classes
# target = torch.tensor([2]) # Example target label

# Ensure pred and target are defined for the loss calculation
# Based on previous cells, 'criterion' is CrossEntropyLoss
# Let's use the scores and y from cell fLOYjNTFm65y for a working example
y = [2]
scores = torch.tensor([[0.1, 6.0, -2.0, 3.2]], requires_grad=True)
target = torch.tensor(y)
# Assuming criterion is already defined as CrossEntropyLoss from cell fLOYjNTFm65y
# from torch.nn import CrossEntropyLoss
# criterion = CrossEntropyLoss()

loss = criterion(scores, target) # Use scores before softmax with CrossEntropyLoss
loss.backward()

# Update the model's parameters using the optimizer
optimizer.step()

print("Optimizer step completed. Model parameters updated.")

Optimizer step completed. Model parameters updated.
