# 1. Neural Network Architecture and Hyperparameters

To train a neural network in PyTorch, you will first need to understand additional components, such as activation and loss functions. You will then realize that training a network requires minimizing that loss function, which is done by calculating gradients. You will learn how to use these gradients to update your model's parameters.

## 1.1 Import Libraries

In [29]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn import CrossEntropyLoss
import torch.optim as optim
import numpy as np

## 1.2 User Variables

In [3]:
# No user variables here

# 2. Exercises

## 2.1 Quiz: Activate your understanding!

### Description 

Neural networks are a core component of deep learning models. They can power so much in your daily life, from language translation apps to the cameras on your smartphone.

### Instructions

Which of the following statements about neural networks is True?

### Answers

* A neural network with a single linear layer followed by a sigmoid activation is similar to a logistic regression model.

An activation function in a neural network is a mathematical function applied to the output of a neuron or a layer. Its main role is to introduce non-linearity into the model, enabling the network to learn and represent complex patterns in data. Without an activation function, all layers would essentially behave as linear transformations, and no matter how many layers the network has, it would reduce to a single linear function, limiting the model's ability to solve complex problems.

## 2.2 The sigmoid and softmax functions

### Description

The sigmoid and softmax functions are key activation functions in deep learning, often used as the final step in a neural network.

* Sigmoid is for binary classification
    ![Sigmoid function](../images/sigmoid_function.png)

* Softmax is for multi-class classification
    ![Softmax function](../images/softmax_function.png)

![Classifications for Activation](../images/sigmoid_and_softmax.png)

Given a pre-activation output tensor from a network, apply the appropriate activation function to obtain the final output.

``torch.nn`` has already been imported as ``nn``.

### Notes

* Softmax Activation as a special case of Sigmoid Function
    * ![Softmax as sigmoid function](../images/softmax_as_sigmoid.png)

### Instructions

* Create a sigmoid function and apply it on ``input_tensor`` to generate a probability for a binary classification task.
* Create a softmax function and apply it on ``input_tensor`` to generate a probability for a multi-class classification task.
* Softmax function for two classes produces the same result as sigmoid function



In [4]:
input_tensor = torch.tensor([[2.4]])

# Create a sigmoid function and apply it on input_tensor
sigmoid = nn.Sigmoid()
probability = sigmoid(input_tensor)
print(probability)

tensor([[0.9168]])


In [5]:
input_tensor = torch.tensor([[1.0, -6.0, 2.5, -0.3, 1.2, 0.8]])

# Create a softmax function and apply it on input_tensor
softmax = nn.Softmax()
probabilities = softmax(input_tensor)
print(probabilities)

tensor([[1.2828e-01, 1.1698e-04, 5.7492e-01, 3.4961e-02, 1.5669e-01, 1.0503e-01]])


  return self._call_impl(*args, **kwargs)


### Practice: Activation Function

In [6]:
# Binary Classification
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

In [7]:
# Multi-class classification
def softmax(x):
    e_x = np.exp(x - np.max(x))  # subtract max for numerical stability
    return e_x / e_x.sum(axis=-1, keepdims=True)

## 2.3 Building a binary classifier in PyTorch

### Description

Recall that a small neural network with a single linear layer followed by a sigmoid function is a binary classifier. It acts just like a logistic regression.

Practice building this small network and interpreting the output of the classifier.

### Notes

![Binary_Classifier_8_1](../images/binary_classifier_8_1.png)

### Instructions

* Create a neural network that takes a 1x8 tensor as input and outputs a single value for binary classification.
* Pass the output of the linear layer to a sigmoid to produce a probability.

In [8]:
import torch
import torch.nn as nn

input_tensor = torch.Tensor([[3, 4, 6, 2, 3, 6, 8, 9]])

# Implement a small neural network for binary classification
model = nn.Sequential(
  nn.Linear(8,1),
  nn.Sigmoid()
)

output = model(input_tensor)
print(output) # tensor([[0.0127]]

tensor([[0.8050]], grad_fn=<SigmoidBackward0>)


### Practice: Manual Code of `torch.nn`

In [9]:
# Step 1: Input data
inputs = [3, 4, 6, 2, 3, 6, 8, 9]  # same as input_tensor row

# Step 2: Example weights and bias (from nn.Linear). These would be learned; here we pick arbitrary ones.
weights = [0.1, -0.2, 0.05, 0.3, -0.15, 0.4, -0.25, 0.1]  # 8 weights
bias = -0.5

# Step 3: Compute the linear combination (dot product + bias)
linear_output = sum(w * x for w, x in zip(weights, inputs)) + bias
print("Linear output:", linear_output)

# Step 4: Apply sigmoid manually
sigmoid_output = 1 / (1 + np.exp(-linear_output))
print("Sigmoid output:", sigmoid_output)

Linear output: 0.7500000000000004
Sigmoid output: 0.6791786991753931


### Quiz:

Which of the following is false about the output returned by your binary classifier?

* We can use a threshold of 0.5 to determine if the output belongs to one class or the other.
* It can return any float value. [x]
* It is produced from an untrained model so it is not yet meaningful.
* The sigmoid function transforms the values of the input without changing its shape.

## 2.4 From regression to multi-class classification

### Description

The models you have seen for binary classification, multi-class classification and regression have all been similar, barring a few tweaks to the model.

Start building a model for regression, and then tweak the model to perform a multi-class classification.

### Instruction

* Create a 4-layer linear network that takes 11 input features from ``input_tensor`` and produces a single regression output.
* Update the network provided to perform a multi-class classification with four outputs.

In [10]:
import torch
import torch.nn as nn

input_tensor = torch.Tensor([[3, 4, 6, 7, 10, 12, 2, 3, 6, 8, 9]])

# Update network below to perform a multi-class classification with four labels
model = nn.Sequential(
  nn.Linear(11, 20),
  nn.Linear(20, 12),
  nn.Linear(12, 6),
  nn.Linear(6, 4),
  nn.Softmax()
)

output = model(input_tensor)
print(output)

tensor([[0.2031, 0.3338, 0.0875, 0.3755]], grad_fn=<SoftmaxBackward0>)


## 2.5 Creating one-hot encoded labels

### Description

One-hot encoding converts a single integer label into a vector with N elements, where N is the number of classes. This vector contains zeros and a one at the correct position.

In this exercise, you'll manually create a one-hot encoded vector for y, and then use PyTorch to simplify the process. Your dataset has three classes (0, 1, 2).

``numpy (np)``, '``torch.nn.functional (F)``', and ``torch`` are already imported for you.

### Instructions

* Manually one-hot encode the ground truth label ``y`` using the provided NumPy array and save it as ``one_hot_numpy``.
* Use PyTorch to one-hot encode ``y`` and save it as ``one_hot_pytorch``.

In [11]:
y = 1
num_classes = 3

# Create the one-hot encoded vector using NumPy
one_hot_numpy = np.array([0, 1, 0])

# Create the one-hot encoded vector using PyTorch
one_hot_pytorch = F.one_hot(torch.tensor(y), num_classes=3)

print("One-hot vector using NumPy:", one_hot_numpy)
print("One-hot vector using PyTorch:", one_hot_pytorch)

One-hot vector using NumPy: [0 1 0]
One-hot vector using PyTorch: tensor([0, 1, 0])


In [15]:
y = 2
num_classes = 3

# Create the one-hot encoded vector using PyTorch
one_hot_pytorch = F.one_hot(torch.tensor(y), num_classes=3)

print("One-hot vector using PyTorch:", one_hot_pytorch)

One-hot vector using PyTorch: tensor([0, 0, 1])


* Class `0`: [1,0,0]
* Class `1`: [0,1,0]
* Class `2`: [0,0,1]

## 2.6 Calculating cross entropy loss

### Description

Cross-entropy loss is a widely used method to measure classification loss. In this exercise, you’ll calculate cross-entropy loss in PyTorch using:

* ``y``: the ground truth label.
* ``scores``: a vector of predictions before softmax.

Loss functions help neural networks learn by measuring prediction errors. Create a one-hot encoded vector for ``y``, define the cross-entropy loss function, and compute the loss using ``scores`` and the encoded label. The result will be a single float representing the sample's loss.

``torch``, ``CrossEntropyLoss``, and ``torch.nn.functional`` as ``F`` have already been imported for you.

### Notes

* CRE measures how well the predicted probability distribution of a model matches the true distribution of the labels.
* It is often used for binary classification (called Binary Cross-Entropy Loss) and multi-class classification (Categorical Cross-Entropy Loss).
* It uses a logarithmic function, which is why it’s sometimes called log loss.
* Formula for Binary Cross Entropy:

![BCE](../images/bce.png)

### Instructions

* Create the one-hot encoded vector of the ground truth label ``y``, with 4 features (one for each class), and assign it to ``one_hot_label``.
* Create the cross entropy loss function and store it as ``criterion``.
* Calculate the cross entropy loss using the ``one_hot_label`` vector and the ``scores`` vector, by calling the ``loss_function`` you created.

In [12]:
import torch
import torch.nn.functional as F
from torch.nn import CrossEntropyLoss

y = [2]
scores = torch.tensor([[0.1, 6.0, -2.0, 3.2]])

# Create a one-hot encoded vector of the label y
one_hot_label = F.one_hot(torch.tensor(y), num_classes=4)

# Create the cross entropy loss function
criterion = CrossEntropyLoss()

# Calculate the cross entropy loss
loss = criterion(scores.double(), one_hot_label.double())
print(loss)

tensor(8.0619, dtype=torch.float64)


## 2.7 Accessing the model parameters

### Description

A PyTorch model created with the ``nn.Sequential()`` is a module that contains the different layers of your network. Recall that each layer parameter can be accessed by indexing the created model directly. In this exercise, you will practice accessing the parameters of different linear layers of a neural network.

### Instructions

* Access the ``weight`` parameter of the first linear layer.
* Access the ``bias`` parameter of the second linear layer.

In [16]:
model = nn.Sequential(nn.Linear(16, 8),
                      nn.Linear(8, 2)
                     )

# Access the weight of the first linear layer
weight_0 = model[0].weight
print("Weight of the first layer:", weight_0)

# Access the bias of the second linear layer
bias_1 = model[1].bias
print("Bias of the second layer:", bias_1)

Weight of the first layer: Parameter containing:
tensor([[ 0.1599,  0.1900,  0.1030, -0.1427, -0.1252, -0.2235,  0.0390,  0.1285,
          0.0842, -0.0871,  0.0149,  0.1775,  0.2247,  0.0153, -0.2219,  0.0035],
        [ 0.2420, -0.1418, -0.1945, -0.1317, -0.0870,  0.1383,  0.2400,  0.1870,
          0.2184,  0.1664,  0.0349,  0.1237, -0.2163,  0.1853,  0.1931, -0.1957],
        [ 0.1005, -0.2228,  0.2209, -0.0236,  0.0581,  0.0340, -0.0561,  0.1102,
         -0.1637, -0.0369,  0.1283,  0.1468,  0.0893,  0.2418,  0.1864, -0.2202],
        [-0.0409, -0.0627, -0.1636, -0.1712,  0.2033, -0.0048, -0.2290,  0.0282,
          0.1778,  0.2003, -0.1220, -0.0683, -0.0909, -0.1600,  0.0121,  0.1201],
        [ 0.0054, -0.1840, -0.0604,  0.2043, -0.1347, -0.1650,  0.0729, -0.1691,
         -0.0646,  0.1013,  0.2091, -0.0743,  0.0278,  0.1342,  0.2025,  0.0538],
        [ 0.1109, -0.1858, -0.1443,  0.2490, -0.2001,  0.1719,  0.1588,  0.0625,
         -0.1041,  0.2239,  0.0731,  0.1850, -0.1722, -

## 2.8 Updating the weights manually

### Description

Now that you know how to access weights and biases, you will manually perform the job of the PyTorch optimizer. While PyTorch automates this, practicing it manually helps you build intuition for how models learn and adjust. This understanding will be valuable when debugging or fine-tuning neural networks.

A neural network of three layers has been created and stored as the ``model`` variable. This network has been used for a forward pass and the loss and its derivatives have been calculated. A default learning rate, ``lr``, has been chosen to scale the gradients when performing the update.

### Notes

![Gradients](../images/grads.png)

### Instructions

* Create the gradient variables by accessing the local gradients of each weight tensor.
* Update the weights using the gradients scaled by the learning rate.

In [None]:
lr = 0.001

model = nn.Sequential(
  nn.Linear(16,8),
  nn.Linear(8,4),
  nn.Linear(4,2)
)

input_tensor = torch.randn(1, 16) # Sample to get grads, else it will be blank
output_tensor = torch.tensor([1]) # Sample output to get grads, else it will be blank
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=lr)

# Training step
model.train()
for epoch in range(5):  # train for 5 epochs for example
    optimizer.zero_grad()             # Reset gradients
    output = model(input_tensor)      # Forward pass
    loss = criterion(output, output_tensor)  # Compute loss
    loss.backward()                   # Backward pass
    optimizer.step()                  # Update weights
    
    print(f"Epoch {epoch+1}, Loss: {loss.item()}")
    print("Output probabilities:", nn.functional.softmax(output, dim=1).detach().numpy())

Epoch 1, Loss: 0.47962456941604614
Output probabilities: [[0.3809843  0.61901575]]
Epoch 2, Loss: 0.4686005413532257
Output probabilities: [[0.37412247 0.62587756]]
Epoch 3, Loss: 0.4576696753501892
Output probabilities: [[0.36724356 0.6327565 ]]
Epoch 4, Loss: 0.44683101773262024
Output probabilities: [[0.36034802 0.639652  ]]
Epoch 5, Loss: 0.4360750615596771
Output probabilities: [[0.3534308 0.6465692]]


In [32]:
weight0 = model[0].weight
weight1 = model[1].weight
weight2 = model[2].weight

# Access the gradients of the weight of each linear layer
grads0 = weight0.grad
grads1 = weight1.grad
grads2 = weight2.grad

# Update the weights using the learning rate and the gradients
weight0 = weight0 - lr * grads0
weight1 = weight1 - lr * grads1
weight2 = weight2 - lr * grads2

## 2.9 Using the PyTorch optimizer

### Description

Earlier, you manually updated the weight of a network, gaining insight into how training works behind the scenes. However, this method isn’t scalable for deep networks with many layers.

Thankfully, PyTorch provides the SGD optimizer, which automates this process efficiently in just a few lines of code. Now, you’ll complete the training loop by updating the weights using a PyTorch optimizer.

A neural network has been created and provided as the ``model`` variable. This model was used to run a forward pass and create the tensor of predictions ``pred``. The one-hot encoded tensor is named ``target`` and the cross entropy loss function is stored as ``criterion``.

``torch.optim`` as ``optim``, and ``torch.nn`` as ``nn`` have already been loaded for you.

### Instructions

* Use ``optim`` to create an ``SGD`` optimizer with a learning rate of your choice (must be less than one) for the model provided.
* Update the model's parameters using the optimizer.

In [36]:
pred = model(input_tensor)
pred

tensor([[-0.5398,  0.0948]], grad_fn=<AddmmBackward0>)

In [42]:
target = torch.tensor([[1.0, 0.0]])
target

tensor([[1., 0.]])

In [43]:
# Create the optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001)

loss = criterion(pred, target)
loss.backward()

# Update the model's parameters using the optimizer
optimizer.step()