<a href="https://colab.research.google.com/github/babupallam/PyTorch-Learning-Repository/blob/main/02_Building_Neural_Networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


#### **2.1. Introduction to Neural Networks**
- Neural networks consist of **layers** of connected neurons, where each neuron performs a weighted sum of its inputs and applies an activation function.
- A simple neural network includes:
  - **Input layer**: Takes in the input data.
  - **Hidden layer(s)**: Transforms the input using weights, biases, and activation functions.
  - **Output layer**: Produces the final prediction or classification.
- Each neuron has a set of **weights** and **biases** which are learned during training using an **optimizer** (e.g., gradient descent) and a **loss function**.


#### Question 1: **Do neurons exist in the input and output layers?**

##### Answer:
Yes, neurons exist in both the **input** and **output layers**, but they serve different purposes:

- **Input Layer**:
  - The neurons in the input layer represent the features of the input data.
  - These neurons do not perform any computations (no activation functions, no weights, or biases).
  - They simply pass the input data to the next layer (typically a hidden layer).
  
  For example, if you have 10 features in your input data, you have 10 neurons in the input layer.

- **Output Layer**:
  - The output layer consists of neurons that generate the final prediction or classification.
  - The number of neurons depends on the type of task:
    - **Regression**: One neuron representing the predicted value.
    - **Binary classification**: One neuron representing the probability or label of a class.
    - **Multi-class classification**: One neuron per class, representing the probability of each class.

In summary:
- The **input layer** holds raw input features.
- The **output layer** produces final predictions or classifications.



#### Question 2: **Do weights, biases, and activation functions exist for the input and output layers?**

##### Answer:
- **Input Layer**:
  - **Weights and biases**: The input layer does not have weights or biases. It simply passes the input data to the next layer.
  - **Activation function**: No activation function is applied in the input layer.
  
- **Output Layer**:
  - **Weights and biases**: The output layer does have weights and biases, which are learned during training.
  - **Activation function**: The activation function used in the output layer depends on the task:
    - **Regression**: Typically no activation function, or sometimes a linear activation.
    - **Binary classification**: A **sigmoid** activation function is often used.
    - **Multi-class classification**: A **softmax** activation function is commonly used to produce class probabilities.

In summary:
- **Input layer**: No weights, biases, or activation functions.
- **Output layer**: Has weights, biases, and an activation function suited to the task.



---


#### **2.2. PyTorch `torch.nn.Module`: Building Blocks for Neural Networks**
- `torch.nn.Module` is the base class for all neural network modules in PyTorch.
- Neural networks in PyTorch are built by subclassing `torch.nn.Module` and defining the network architecture in the `__init__` and `forward()` methods.

---

**2.2.1. Building a Simple Neural Network from Scratch**

  **Demonstration: Basic Neural Network Implementation**

In [5]:
import torch
import torch.nn as nn
import torch.optim as optim

# Define a simple neural network class with one hidden layer.
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()  # Call the base class constructor

        # Step 1: Define the layers.
        # Hidden layer: Input features (3), Output features (5).
        self.hidden = nn.Linear(3, 5)  # Fully connected layer: input size=3, output size=5

        # Output layer: Input features (5 from the hidden layer), Output features (1).
        self.output = nn.Linear(5, 1)  # Fully connected layer: input size=5, output size=1

    # Define the forward pass, which describes how the data flows through the network.
    def forward(self, x):
        # Apply the hidden layer, followed by ReLU activation
        x = self.hidden(x)
        print(f"After hidden layer (raw output): {x}")  # Print the raw output from the hidden layer

        x = torch.relu(x)  # Apply ReLU non-linearity
        print(f"After ReLU activation: {x}")  # Print the output after ReLU activation

        # Pass to the output layer (no activation for regression)
        x = self.output(x)
        print(f"Final output (before returning): {x}")  # Print the final output before returning
        return x  # Return the final output value

# Step 4: Initialize the model by creating an instance of the SimpleNN class.
model = SimpleNN()
print("Model initialized. Here's the model structure:")
print(model)  # Print the model architecture for better understanding

# Step 5: Define a sample input tensor (2 samples, 3 features each).
input_data = torch.tensor([[0.5, -1.2, 3.0], [0.8, 0.3, -0.7]])
print(f"Input data:\n{input_data}")  # Print the input data for clarity

# Step 6: Perform a forward pass through the model and store the result in 'output'.
output = model(input_data)

# Step 7: Print the output, which contains the model's predictions for the input batch.
print(f"Predicted output:\n{output}")  # Output: predicted values for the input batch


Model initialized. Here's the model structure:
SimpleNN(
  (hidden): Linear(in_features=3, out_features=5, bias=True)
  (output): Linear(in_features=5, out_features=1, bias=True)
)
Input data:
tensor([[ 0.5000, -1.2000,  3.0000],
        [ 0.8000,  0.3000, -0.7000]])
After hidden layer (raw output): tensor([[ 0.7752, -1.6480, -0.8435, -0.1251, -0.3348],
        [-0.1911,  1.1290,  0.0403, -0.6626,  0.1418]],
       grad_fn=<AddmmBackward0>)
After ReLU activation: tensor([[0.7752, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 1.1290, 0.0403, 0.0000, 0.1418]], grad_fn=<ReluBackward0>)
Final output (before returning): tensor([[-0.0005],
        [-0.2281]], grad_fn=<AddmmBackward0>)
Predicted output:
tensor([[-0.0005],
        [-0.2281]], grad_fn=<AddmmBackward0>)



  **Explanation:**
  - **Model Definition**: The neural network is defined as a class inheriting from `nn.Module`.
  - **Layers**:
    - `nn.Linear(3, 5)` creates a fully connected layer with 3 input features and 5 output features (hidden layer).
    - `nn.Linear(5, 1)` creates the output layer with 1 output feature.
  - **Activation Function**: ReLU (Rectified Linear Unit) is used for non-linearity after the hidden layer.

---



#### **2.3. Forward and Backward Propagation**
- **Forward Propagation**: The process of passing the input data through the network layers to obtain the output prediction.
- **Backward Propagation**: After calculating the loss, we compute the gradients of the weights and biases using backpropagation, allowing the optimizer to update them.

**2.3.1. Performing Forward and Backward Propagation**
- PyTorch automatically tracks all operations on tensors with `requires_grad=True`, and gradients are computed with `.backward()`.

  **Demonstration: Forward and Backward Propagation**

In [7]:

# Step 2: Initialize the network, loss function, and optimizer
model = SimpleNN()
criterion = nn.MSELoss()  # Mean Squared Error Loss for regression
optimizer = optim.SGD(model.parameters(), lr=0.01)  # Stochastic Gradient Descent

# Step 3: Define input data and target (for supervised learning)
input_data = torch.tensor([[0.5, -1.2, 3.0], [0.8, 0.3, -0.7]], requires_grad=True)  # Input features
target = torch.tensor([[1.0], [2.0]])  # Actual target values

# Step 4: Perform Forward Propagation
output = model(input_data)
print("Predicted output before training:")
print(output)  # Predicted output before training (random weights, so the output won't match the target)

# Step 5: Calculate the loss (difference between predicted and target)
loss = criterion(output, target)
print("\nLoss before backpropagation:")
print(loss)

# Step 6: Perform Backward Propagation
optimizer.zero_grad()  # Zero out previous gradients (important step)
loss.backward()  # Compute gradients using backpropagation

# Step 7: Print gradients before updating
print("\nGradients before optimization step:")
for name, param in model.named_parameters():
    if param.requires_grad:
        print(f"{name}.grad: {param.grad}")

# Step 8: Perform optimization step to update model parameters
optimizer.step()  # Update model weights using the computed gradients

# Step 9: Perform another forward pass after the parameters have been updated
output_after = model(input_data)
print("\nPredicted output after one optimization step:")
print(output_after)

# Step 10: Recalculate the loss after the update
loss_after = criterion(output_after, target)
print("\nLoss after one optimization step:")
print(loss_after)

After hidden layer (raw output): tensor([[ 0.2619, -0.4934, -0.6943, -1.1839, -0.5413],
        [-0.6369,  0.6041, -0.6934,  0.2186,  0.5022]],
       grad_fn=<AddmmBackward0>)
After ReLU activation: tensor([[0.2619, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.6041, 0.0000, 0.2186, 0.5022]], grad_fn=<ReluBackward0>)
Final output (before returning): tensor([[-0.3560],
        [-0.5148]], grad_fn=<AddmmBackward0>)
Predicted output before training:
tensor([[-0.3560],
        [-0.5148]], grad_fn=<AddmmBackward0>)

Loss before backpropagation:
tensor(4.0815, grad_fn=<MseLossBackward0>)

Gradients before optimization step:
hidden.weight.grad: tensor([[ 0.1218, -0.2923,  0.7307],
        [ 0.7740,  0.2903, -0.6773],
        [ 0.0000,  0.0000,  0.0000],
        [ 0.0548,  0.0206, -0.0480],
        [-0.1306, -0.0490,  0.1142]])
hidden.bias.grad: tensor([ 0.2436,  0.9676,  0.0000,  0.0685, -0.1632])
output.weight.grad: tensor([[-0.3551, -1.5191,  0.0000, -0.5498, -1.2630]])
output.bias.g


  **Explanation:**
  - `loss = criterion(prediction, target)`: The loss function compares the predicted output with the true target values.
  - `loss.backward()`: Backward propagation computes the gradients of the loss with respect to the model parameters.
  - `optimizer.step()`: The optimizer updates the model's weights based on the computed gradients.

---



#### **2.4. Loss Functions and Optimizers**
- **Loss Function**: Measures how well the model's predictions match the true target values.
  - In regression, **Mean Squared Error** (`MSELoss`) is commonly used.
  - In classification tasks, **Cross-Entropy Loss** (`nn.CrossEntropyLoss()`) is often used.
- **Optimizer**: Updates the model’s parameters to minimize the loss using optimization algorithms like Stochastic Gradient Descent (SGD) or Adam.
Here’s an extended list of **common loss functions** and **optimizers** in PyTorch, along with additional entries that are frequently used:


- **Common Loss Functions in PyTorch**:
  - `nn.MSELoss()`: For regression tasks, computes the mean squared error between predicted and target values.
  - `nn.CrossEntropyLoss()`: For multi-class classification tasks, combines `nn.LogSoftmax()` and `nn.NLLLoss()`.
  - `nn.BCELoss()`: For binary classification tasks, computes the binary cross-entropy loss between predicted probabilities and binary labels.
  - `nn.BCEWithLogitsLoss()`: Combines a sigmoid layer with binary cross-entropy loss, making it more stable for binary classification.
  - `nn.L1Loss()`: Computes the mean absolute error between predicted and target values. Useful when you want the loss to be less sensitive to outliers.
  - `nn.HingeEmbeddingLoss()`: Used for binary classification, particularly in margin-based learning tasks like SVMs.
  - `nn.MarginRankingLoss()`: Used for ranking tasks, especially for learning to rank problems.
  - `nn.KLDivLoss()`: Kullback-Leibler divergence loss, often used for measuring the difference between two probability distributions.
  - `nn.SmoothL1Loss()`: Also known as Huber loss, it is a combination of L1 and L2 losses, robust to outliers and used in regression tasks.
  - `nn.NLLLoss()`: Negative Log-Likelihood Loss, often used with `nn.LogSoftmax()` for classification tasks where probabilities are log-scaled.

- **Common Optimizers in PyTorch**:
  - `optim.SGD()`: Stochastic Gradient Descent, a simple and widely used optimizer.
  - `optim.Adam()`: Adam optimizer, combines momentum and adaptive learning rates, widely used for many tasks due to its fast convergence.
  - `optim.AdamW()`: A variant of Adam with weight decay regularization for better generalization, especially useful in deep learning tasks.
  - `optim.RMSprop()`: An adaptive learning rate method that adjusts the learning rate based on recent gradients, often used in recurrent neural networks (RNNs).
  - `optim.Adagrad()`: Adaptive Gradient Algorithm, adjusts learning rates dynamically for each parameter based on the sum of squared gradients.
  - `optim.Adadelta()`: An extension of Adagrad that seeks to reduce its aggressive, monotonically decreasing learning rates over time.
  - `optim.ASGD()`: Averaged Stochastic Gradient Descent, averages the weights over time to achieve better generalization.
  - `optim.LBFGS()`: Limited-memory Broyden–Fletcher–Goldfarb–Shanno, a quasi-Newton optimization method, used for smaller datasets where second-order methods can be effective.

---



#### **2.5. Training a Neural Network**
- The process of training a neural network involves multiple iterations (epochs) of forward and backward propagation.
- At each epoch:
  - The model makes predictions (forward pass).
  - The loss is calculated based on the predictions and actual targets.
  - Gradients are computed using backpropagation.
  - The optimizer updates the weights based on the gradients.

**2.5.1. Full Training Loop**

  **Demonstration: Training a Simple Neural Network**

In [12]:
import torch
import torch.nn as nn
import torch.optim as optim

# Initialize the model, loss function (criterion), and optimizer
# Assuming the model is already defined as `SimpleNN`
# `nn.MSELoss()` is used for regression tasks to calculate the mean squared error
# `optim.SGD()` is the optimizer used to update model weights using stochastic gradient descent
# These components are necessary for training a neural network.

# model = SimpleNN()  # Already defined model
# criterion = nn.MSELoss()  # Mean Squared Error Loss for regression
# optimizer = optim.SGD(model.parameters(), lr=0.01)  # SGD optimizer with learning rate of 0.01

# Training data: A batch of inputs (2 samples, each with 3 features)
# This tensor represents two samples with 3 features each. 'requires_grad=True' ensures
# that we can calculate gradients with respect to this input during training.
input_data = torch.tensor([[0.5, -1.2, 3.0], [0.8, 0.3, -0.7]], requires_grad=True)

# Corresponding target values for regression.
# These are the expected (target) output values that the model should predict during training.
target = torch.tensor([[1.0], [0.5]])

# Set the number of epochs (iterations).
# This specifies how many times the training loop will run, updating the model weights after each iteration.
epochs = 10

# Training loop: This loop will train the model for a specified number of epochs (iterations).
for epoch in range(epochs):
    print(f"Epoch {epoch + 1}/{epochs}")  # Print the current epoch number for tracking progress

    # Forward pass: Make predictions based on the current state of the model.
    # Here, the input_data is passed through the model to compute the predictions.
    prediction = model(input_data)
    print(f"Predicted output:\n{prediction}")  # Display the predictions for this epoch

    # Calculate the loss: This measures how far off the model's predictions are from the target values.
    # The loss function compares the predicted values with the actual target values.
    loss = criterion(prediction, target)
    print(f"Loss: {loss.item():.4f}")  # Print the current loss value (error)

    # Zero the gradients: Before the backward pass, we need to clear the gradients from the previous iteration.
    # Gradients accumulate by default, so we must zero them out before the next iteration.
    optimizer.zero_grad()

    # Backward pass: Calculate the gradients of the loss with respect to the model's parameters.
    # This step computes how each parameter (weights and biases) should change to minimize the loss.
    loss.backward()

    # Print gradients after the backward pass.
    # We loop through each parameter and print its gradient values. These gradients show the direction
    # and magnitude of change needed for each weight to reduce the loss.
    print("Gradients after backward pass:")
    for name, param in model.named_parameters():
        if param.grad is not None:
            print(f"{name}.grad:\n{param.grad}")  # Print gradients of weights and biases

    # Update model weights: The optimizer updates the model's weights using the computed gradients.
    # This step modifies the parameters based on the gradients and the learning rate (0.01 in this case).
    optimizer.step()

    # Print updated model parameters (weights and biases).
    # After each optimization step, we print the new parameter values to see how they change after every epoch.
    print("Updated model parameters:")
    for name, param in model.named_parameters():
        print(f"{name}:\n{param}")

    # Print a separator line for better readability between epochs.
    print("-" * 50)


Epoch 1/10
After hidden layer (raw output): tensor([[-0.0201, -0.6019, -0.6943, -1.1961, -0.5384],
        [-0.6113,  0.8312, -0.6934,  0.2441,  0.4963]],
       grad_fn=<AddmmBackward0>)
After ReLU activation: tensor([[0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.8312, 0.0000, 0.2441, 0.4963]], grad_fn=<ReluBackward0>)
Final output (before returning): tensor([[0.9956],
        [0.5021]], grad_fn=<AddmmBackward0>)
Predicted output:
tensor([[0.9956],
        [0.5021]], grad_fn=<AddmmBackward0>)
Loss: 0.0000
Gradients after backward pass:
hidden.weight.grad:
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00],
        [-8.9988e-04, -3.3746e-04,  7.8740e-04],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00],
        [-1.3014e-04, -4.8803e-05,  1.1387e-04],
        [-7.6445e-05, -2.8667e-05,  6.6890e-05]])
hidden.bias.grad:
tensor([ 0.0000e+00, -1.1249e-03,  0.0000e+00, -1.6268e-04, -9.5557e-05])
output.weight.grad:
tensor([[0.0000, 0.0017, 0.0000, 0.0005, 0.0010]])
output.bias.g


  **Explanation:**
  - The model is trained for 10 epochs.
  - At each epoch, a forward pass is performed to predict the output based on the current weights.
  - The loss is calculated, gradients are computed via backpropagation, and weights are updated using the optimizer.
  - The loss is printed every 100 epochs to monitor the training process.

---



#### **2.6. Observations on State-of-the-Art Research**:
  - **Gradient-based Optimization**: Current research is focused on improving optimization techniques to deal with issues like vanishing/exploding gradients and slow convergence.
    - **AdamW Optimizer**: A variant of the Adam optimizer, `AdamW`, addresses weight decay regularization and has become a standard for many models.
    - **Gradient Clipping**: Clipping gradients to prevent them from becoming too large has become common practice in deep learning to avoid instability.
  
  - **Activation Functions**:
    - While ReLU remains widely used due to its simplicity, newer activation functions like **Leaky ReLU**, **PReLU**, and **Swish** have been shown to provide better performance in some architectures by mitigating issues like the "dying ReLU" problem.

  - **Batch Normalization**:
    - Batch normalization has become an essential component in training deep networks. It helps stabilize training by normalizing inputs to layers, allowing for higher learning rates and faster convergence.

---



#### **2.7. Adding Continuity to the Next Section**
- In the next section, we will explore how to efficiently manage datasets and dataloaders using PyTorch's `torch.utils.data` module.
- We will load and preprocess datasets, including commonly used datasets like MNIST and CIFAR-

10, and handle custom datasets.
  
This section introduced you to building simple neural networks, defining layers, performing forward and backward propagation, and training the model with different loss functions and optimizers. You can now move on to handling datasets for larger and more complex neural network training tasks.

### Observations


#### Question 3: **How are layers connected in a PyTorch neural network?**

##### Answer:
In PyTorch, layers are typically connected in the `forward()` method of the neural network class, which defines how data flows through the network:

- Each layer is created in the `__init__()` method and saved as part of the class. These layers are connected by passing the output of one layer as input to the next.
- The final layer output gives the prediction of the network.

##### Example:


In [13]:
import torch
import torch.nn as nn

# Define a simple neural network class by inheriting from nn.Module
class SimpleNeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        # Call the constructor of nn.Module to initialize the base class
        super(SimpleNeuralNet, self).__init__()

        # Define the hidden layer
        # nn.Linear is a fully connected layer that applies a linear transformation
        # This layer takes 'input_size' features and maps them to 'hidden_size' outputs (neurons)
        self.hidden = nn.Linear(input_size, hidden_size)

        # Define the output layer
        # This layer takes 'hidden_size' inputs from the hidden layer and maps them to 'output_size' outputs
        # Typically, 'output_size' is the number of predictions you want to make (e.g., for regression or classification)
        self.output = nn.Linear(hidden_size, output_size)

    # Define the forward pass, which describes how data moves through the network
    def forward(self, x):
        # Pass the input 'x' through the hidden layer
        # This step applies a linear transformation: y = xW + b, where W is the weight matrix and b is the bias
        x = self.hidden(x)

        # Apply the ReLU activation function to introduce non-linearity
        # ReLU sets any negative values to zero, helping the model learn more complex patterns
        x = torch.relu(x)

        # Pass the result through the output layer to produce the final output
        # No activation function is applied here, as this is a simple regression model or an unactivated output
        x = self.output(x)

        # Return the final output of the network
        return x



#### Question 4: **What is the role of activation functions in PyTorch?**

##### Answer:
Activation functions introduce non-linearity into the neural network, allowing it to learn complex patterns. Without activation functions, the model would just be a linear combination of inputs.

Common activation functions:
- **ReLU (Rectified Linear Unit)**: Used in hidden layers, it outputs 0 for negative inputs and the input itself for positive inputs.
- **Sigmoid**: Used in binary classification to output probabilities between 0 and 1.
- **Softmax**: Used in multi-class classification to output probabilities for each class.

##### Example:


In [14]:
import torch
import torch.nn as nn

# Define a simple neural network class that inherits from nn.Module
class SimpleNeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNeuralNet, self).__init__()

        # Step 1: Define the hidden layer
        # The hidden layer is a fully connected (dense) layer that takes 'input_size' features
        # and outputs 'hidden_size' neurons.
        self.hidden = nn.Linear(input_size, hidden_size)

        # Step 2: Define the output layer
        # The output layer is a fully connected layer that takes 'hidden_size' outputs from the hidden layer
        # and produces 'output_size' outputs. These could be the class probabilities (for classification) or
        # raw values (for regression).
        self.output = nn.Linear(hidden_size, output_size)

        # Step 3: Define the activation functions
        # ReLU (Rectified Linear Unit) is used for the hidden layer to introduce non-linearity.
        # It replaces negative values with zero and keeps positive values unchanged.
        self.relu = nn.ReLU()

        # Softmax is used for the output layer, typically in classification tasks.
        # Softmax converts the output scores into probabilities that sum to 1 across the given dimension.
        # 'dim=1' means the softmax will be applied across the second dimension (rows) of the output tensor.
        self.softmax = nn.Softmax(dim=1)

    # Define the forward pass, which describes how the input data flows through the layers
    def forward(self, x):
        # Step 4: Pass the input through the hidden layer
        x = self.hidden(x)  # Linear transformation: xW + b

        # Step 5: Apply the ReLU activation function to the hidden layer output
        x = self.relu(x)  # Apply ReLU non-linearity

        # Step 6: Pass the result through the output layer
        x = self.output(x)  # Linear transformation: xW' + b'

        # Step 7: Apply the Softmax activation function to the output layer's result
        # Softmax is often used in classification to output probabilities for each class.
        x = self.softmax(x)

        # Step 8: Return the final output of the network
        return x



#### Question 5: **How are weights initialized in PyTorch layers?**

##### Answer:
In PyTorch, weights and biases for each layer are initialized automatically when you create a layer. However, you can also manually initialize weights using custom methods (e.g., `torch.nn.init` methods).

- **Default Initialization**: By default, PyTorch uses uniform or normal distributions for initializing weights depending on the layer.
  
- **Custom Initialization**: You can manually initialize weights using functions like `nn.init.xavier_uniform_()` or `nn.init.kaiming_uniform_()`.

##### Example:


In [15]:
import torch
import torch.nn as nn
import torch.nn.init as init  # Import initialization methods

# Define a simple neural network class
class SimpleNeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        # Initialize the nn.Module parent class
        super(SimpleNeuralNet, self).__init__()

        # Define the hidden and output layers
        # Hidden layer: Maps 'input_size' features to 'hidden_size' neurons
        self.hidden = nn.Linear(input_size, hidden_size)

        # Output layer: Maps 'hidden_size' neurons to 'output_size' outputs
        self.output = nn.Linear(hidden_size, output_size)

        # Define ReLU activation function for the hidden layer
        self.relu = nn.ReLU()

        # Step 1: Custom weight initialization for the hidden layer
        # Xavier initialization is often used for layers with ReLU activation
        # It helps the network start with weights that have a good scale for training
        init.xavier_uniform_(self.hidden.weight)

        # Step 2: Initialize the biases of the output layer to zero
        # This ensures that the initial bias term starts at zero
        init.zeros_(self.output.bias)

    # Define the forward pass through the network
    def forward(self, x):
        # Step 3: Pass input through the hidden layer
        x = self.hidden(x)

        # Step 4: Apply ReLU activation to the hidden layer output
        x = self.relu(x)

        # Step 5: Pass the result through the output layer
        x = self.output(x)

        # Step 6: Return the final output
        return x



#### Question 6: **What are loss functions, and how are they used in PyTorch?**

##### Answer:
Loss functions measure the difference between the predicted output and the true labels. During training, the network tries to minimize this loss using optimization algorithms.

Common loss functions:
- **Mean Squared Error (MSE)**: Used for regression tasks.
- **Cross-Entropy Loss**: Used for classification tasks (binary or multi-class).

##### Example:


In [17]:
import torch
import torch.nn as nn

# Define a simple neural network with one hidden layer
class SimpleNeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNeuralNet, self).__init__()

        # Define the hidden layer (fully connected layer)
        self.hidden = nn.Linear(input_size, hidden_size)

        # Define the output layer (fully connected layer)
        self.output = nn.Linear(hidden_size, output_size)

    # Define the forward pass through the network
    def forward(self, x):
        # Pass the input through the hidden layer
        print(f"Input to hidden layer:\n{x}")
        x = self.hidden(x)
        print(f"Output of hidden layer (before activation):\n{x}")

        # Apply the ReLU activation function to introduce non-linearity
        x = torch.relu(x)
        print(f"Output of hidden layer (after ReLU activation):\n{x}")

        # Pass through the output layer (no activation function at this stage because
        # we'll use CrossEntropyLoss, which includes Softmax internally)
        x = self.output(x)
        print(f"Output of final layer (logits, before softmax):\n{x}")

        # Return the final output (logits)
        return x

# Instantiate the model with:
# - 10 input features
# - 5 neurons in the hidden layer
# - 3 output neurons (corresponding to 3 classes for classification)
model = SimpleNeuralNet(input_size=10, hidden_size=5, output_size=3)
print("Initialized model:\n", model)

# Define the loss function (Cross-Entropy Loss) for classification
# CrossEntropyLoss combines LogSoftmax and NLLLoss, which is suitable for multi-class classification
criterion = nn.CrossEntropyLoss()

# Generate a random input tensor representing one sample with 10 features
input_data = torch.randn(1, 10)  # Perform a forward pass through the model
print(f"\nRandom input data (1 sample, 10 features):\n{input_data}")

# Forward pass: Get the model's output (logits) for this random input
output = model(input_data)

# Define the target as class 1 (the ground truth for the classification task)
# Note that target should contain class indices, not one-hot encoded vectors
target = torch.tensor([1])  # This means the correct class for the input is class 1
print(f"\nTarget class index:\n{target}")

# Calculate the loss between the model's prediction (logits) and the true target
loss = criterion(output, target)
print(f"\nLoss value (Cross-Entropy Loss):\n{loss.item()}")


Initialized model:
 SimpleNeuralNet(
  (hidden): Linear(in_features=10, out_features=5, bias=True)
  (output): Linear(in_features=5, out_features=3, bias=True)
)

Random input data (1 sample, 10 features):
tensor([[-0.2935,  0.4157, -1.0423,  0.3393, -1.1876, -0.3527, -1.6055, -0.4963,
          0.6685,  0.7962]])
Input to hidden layer:
tensor([[-0.2935,  0.4157, -1.0423,  0.3393, -1.1876, -0.3527, -1.6055, -0.4963,
          0.6685,  0.7962]])
Output of hidden layer (before activation):
tensor([[-0.6597,  0.1025, -0.1949, -0.9492, -0.4829]],
       grad_fn=<AddmmBackward0>)
Output of hidden layer (after ReLU activation):
tensor([[0.0000, 0.1025, 0.0000, 0.0000, 0.0000]], grad_fn=<ReluBackward0>)
Output of final layer (logits, before softmax):
tensor([[-0.3365, -0.0719, -0.3390]], grad_fn=<AddmmBackward0>)

Target class index:
tensor([1])

Loss value (Cross-Entropy Loss):
0.9294456839561462


Steps after Calculating the Loss:

- Zero the Gradients:
  - Gradients accumulate by default in PyTorch, so you must clear the old gradients before computing new ones. This is done using optimizer.zero_grad().

- Backpropagation:
  - Perform backward propagation to compute the gradients of the loss with respect to the model’s parameters (weights and biases). This is done using loss.backward(), which calculates the gradients for each parameter.

- Optimizer Step:
  - Once the gradients are computed, you use the optimizer to update the model parameters. This step adjusts the weights and biases of the network to minimize the loss. This is done using optimizer.step().

In [18]:
# Step 1: Zero the gradients from the previous backward pass
optimizer.zero_grad()

# Step 2: Perform backpropagation to compute the gradients
loss.backward()

# Step 3: Update the model parameters using the optimizer
optimizer.step()

# Print the updated model parameters after optimization
print("Updated model parameters:")
for name, param in model.named_parameters():
    print(f"{name}:\n{param}")


Updated model parameters:
hidden.weight:
Parameter containing:
tensor([[ 0.2768,  0.0388, -0.2583,  0.2788,  0.1130,  0.1815,  0.0303,  0.1883,
         -0.2732, -0.1995],
        [ 0.0697,  0.1735, -0.1501,  0.0195, -0.2969, -0.1825,  0.1080,  0.2503,
         -0.2624,  0.1561],
        [ 0.2598, -0.0559,  0.1124, -0.2343, -0.1168,  0.2740, -0.2217, -0.1452,
         -0.0128, -0.1145],
        [ 0.3030,  0.2274,  0.1841, -0.2281,  0.0573,  0.1298,  0.1362,  0.2095,
         -0.0574, -0.2441],
        [-0.0162,  0.2711, -0.0591, -0.2672, -0.1973, -0.0737,  0.2943, -0.3159,
         -0.2851, -0.1880]], requires_grad=True)
hidden.bias:
Parameter containing:
tensor([-0.2767, -0.1804, -0.2691, -0.0163, -0.1756], requires_grad=True)
output.weight:
Parameter containing:
tensor([[-0.3330, -0.0419, -0.2449,  0.2733,  0.3738],
        [-0.3713,  0.2972, -0.0493,  0.3442,  0.4066],
        [ 0.3321,  0.3652, -0.2726, -0.3417, -0.1864]], requires_grad=True)
output.bias:
Parameter containing:
tens



Note: Then next, repeat the whole step until we find the efficient model


#### Question 7: **How are optimizers used in PyTorch?**

##### Answer:
Optimizers update the weights of the model during training based on the gradients of the loss function. The most commonly used optimizer is **Stochastic Gradient Descent (SGD)**, but **Adam** is also popular for its adaptive learning rates.

##### Example:


In [19]:
import torch
import torch.nn as nn
import torch.optim as optim

# Define a simple neural network
class SimpleNeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNeuralNet, self).__init__()

        # Define the hidden layer (fully connected layer)
        self.hidden = nn.Linear(input_size, hidden_size)

        # Define the output layer (fully connected layer)
        self.output = nn.Linear(hidden_size, output_size)

    # Define the forward pass through the network
    def forward(self, x):
        # Pass the input through the hidden layer
        print(f"Input to hidden layer:\n{x}")
        x = self.hidden(x)
        print(f"Output of hidden layer (before activation):\n{x}")

        # Apply the ReLU activation function to introduce non-linearity
        x = torch.relu(x)
        print(f"Output of hidden layer (after ReLU activation):\n{x}")

        # Pass through the output layer (no activation here, raw logits)
        x = self.output(x)
        print(f"Output of final layer (logits, before softmax):\n{x}")

        # Return the final output (logits)
        return x

# Instantiate the model with:
# - 10 input features
# - 5 neurons in the hidden layer
# - 3 output neurons (for classification into 3 classes)
model = SimpleNeuralNet(input_size=10, hidden_size=5, output_size=3)
print("Initialized model:\n", model)

# Define an optimizer (Adam optimizer) to update the weights
# The learning rate is set to 0.001
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Define the loss function (Cross-Entropy Loss) for classification
criterion = nn.CrossEntropyLoss()

# Generate a random input tensor (1 sample, 10 features)
input_data = torch.randn(1, 10)
print(f"\nRandom input data (1 sample, 10 features):\n{input_data}")

# Forward pass: Get the model's output (logits) for this random input
output = model(input_data)

# Define the target as class 1 (ground truth)
target = torch.tensor([1])  # The correct class is 1
print(f"\nTarget class index:\n{target}")

# Calculate the loss between the model's output (logits) and the true target
loss = criterion(output, target)
print(f"\nLoss value (Cross-Entropy Loss):\n{loss.item()}")

# Backward pass to compute gradients
optimizer.zero_grad()  # Clear the gradients before backpropagation
print("\nGradients cleared.")

# Backpropagate the loss to compute gradients of the weights
loss.backward()
print("Gradients after backward pass:")
for name, param in model.named_parameters():
    if param.grad is not None:
        print(f"{name}.grad:\n{param.grad}")

# Perform the optimizer step to update model weights
optimizer.step()
print("\nModel parameters updated after optimizer step:")
for name, param in model.named_parameters():
    print(f"{name}:\n{param}")



Initialized model:
 SimpleNeuralNet(
  (hidden): Linear(in_features=10, out_features=5, bias=True)
  (output): Linear(in_features=5, out_features=3, bias=True)
)

Random input data (1 sample, 10 features):
tensor([[-0.2007, -0.4387,  1.2775, -0.0624, -0.6971,  0.3294, -0.6417,  0.1808,
         -1.4538,  0.2542]])
Input to hidden layer:
tensor([[-0.2007, -0.4387,  1.2775, -0.0624, -0.6971,  0.3294, -0.6417,  0.1808,
         -1.4538,  0.2542]])
Output of hidden layer (before activation):
tensor([[-1.1955,  0.5762,  0.3802,  0.0842,  0.3115]],
       grad_fn=<AddmmBackward0>)
Output of hidden layer (after ReLU activation):
tensor([[0.0000, 0.5762, 0.3802, 0.0842, 0.3115]], grad_fn=<ReluBackward0>)
Output of final layer (logits, before softmax):
tensor([[-0.7532, -0.1636, -0.1574]], grad_fn=<AddmmBackward0>)

Target class index:
tensor([1])

Loss value (Cross-Entropy Loss):
0.9403031468391418

Gradients cleared.
Gradients after backward pass:
hidden.weight.grad:
tensor([[-0.0000e+00, -0.


#### Question 8: **How do you train a neural network in PyTorch?**

##### Answer:
Training a neural network in PyTorch involves several steps:
1. Forward pass: Compute the output.
2. Compute loss: Use a loss function to compare predictions and targets.
3. Backward pass: Compute gradients of the loss with respect to the weights.
4. Update weights: Use an optimizer to update the weights based on the gradients.

##### Example:


In [21]:
import torch
import torch.nn as nn
import torch.optim as optim

# Define a simple neural network with one hidden layer and one output layer
class SimpleNeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNeuralNet, self).__init__()

        # Hidden layer (fully connected)
        self.hidden = nn.Linear(input_size, hidden_size)

        # Output layer (fully connected)
        self.output = nn.Linear(hidden_size, output_size)

    # Define the forward pass through the network
    def forward(self, x):
        # Pass the input through the hidden layer
        print(f"Input to hidden layer:\n{x}")
        x = self.hidden(x)
        print(f"Output of hidden layer (before activation):\n{x}")

        # Apply ReLU activation to the hidden layer output
        x = torch.relu(x)
        print(f"Output of hidden layer (after ReLU activation):\n{x}")

        # Pass through the output layer (logits, no activation here)
        x = self.output(x)
        print(f"Output of final layer (logits, before softmax):\n{x}")

        return x

# Instantiate the model with:
# - 10 input features
# - 5 neurons in the hidden layer
# - 3 output neurons (corresponding to 3 classes for classification)
model = SimpleNeuralNet(input_size=10, hidden_size=5, output_size=3)
print("\nInitialized model:\n", model)

# Define the Adam optimizer and CrossEntropyLoss (for classification)
optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

# Training loop over 10 epochs
for epoch in range(10):
    print(f"\nEpoch {epoch + 1}/100")  # Print current epoch

    # Generate a random input tensor (1 sample, 10 features)
    input_data = torch.randn(1, 10)
    print(f"\nRandom input data (1 sample, 10 features):\n{input_data}")

    # Define the target as class 1 (ground truth)
    target = torch.tensor([1])
    print(f"\nTarget class index (ground truth):\n{target}")

    # Forward pass: Get model's output (logits) for this random input
    output = model(input_data)

    # Compute the loss between the model's output and the true target
    loss = criterion(output, target)
    print(f"\nLoss value (Cross-Entropy Loss): {loss.item()}")

    # Backward pass and optimization
    optimizer.zero_grad()  # Clear gradients from previous epoch
    print("\nGradients cleared.")

    loss.backward()  # Backpropagate the loss to compute gradients
    print("Gradients after backward pass:")
    for name, param in model.named_parameters():
        if param.grad is not None:
            print(f"{name}.grad:\n{param.grad}")

    optimizer.step()  # Update the model weights based on gradients
    print("\nUpdated model parameters after optimizer step:")
    for name, param in model.named_parameters():
        print(f"{name}:\n{param}")

    # Print loss every 10 epochs for monitoring
    if epoch % 10 == 0:
        print(f"\n[Epoch {epoch}] Loss: {loss.item()}\n" + "="*50)

    print(f"END OF EPOCH No: {epoch}\n\n\n")


Initialized model:
 SimpleNeuralNet(
  (hidden): Linear(in_features=10, out_features=5, bias=True)
  (output): Linear(in_features=5, out_features=3, bias=True)
)

Epoch 1/100

Random input data (1 sample, 10 features):
tensor([[-0.8018, -2.1324, -0.9448, -1.1086, -0.4982,  1.6505,  0.2798,  0.1208,
          0.6454,  0.7081]])

Target class index (ground truth):
tensor([1])
Input to hidden layer:
tensor([[-0.8018, -2.1324, -0.9448, -1.1086, -0.4982,  1.6505,  0.2798,  0.1208,
          0.6454,  0.7081]])
Output of hidden layer (before activation):
tensor([[-0.2597,  0.4406,  0.3717,  0.4491, -0.1756]],
       grad_fn=<AddmmBackward0>)
Output of hidden layer (after ReLU activation):
tensor([[0.0000, 0.4406, 0.3717, 0.4491, 0.0000]], grad_fn=<ReluBackward0>)
Output of final layer (logits, before softmax):
tensor([[-0.1617,  0.0173, -0.1608]], grad_fn=<AddmmBackward0>)

Loss value (Cross-Entropy Loss): 0.983154296875

Gradients cleared.
Gradients after backward pass:
hidden.weight.grad:



###### Explanation of Output:

1. **Model Architecture**: Displays the layers (hidden and output) with input/output dimensions.
2. **Random Input Data**: Shows the generated random input tensor with 10 features.
3. **Forward Pass**:
   - The input passes through the hidden layer, ReLU activation, and the output layer, with intermediate results printed.
4. **Loss Calculation**: Shows the loss value (Cross-Entropy) after comparing the output logits with the target.
5. **Backward Pass**:
   - The gradients of the weights and biases are computed and displayed after backpropagation.
6. **Weight Updates**: The updated weights and biases are printed after the Adam optimizer updates the model.

By adding these print statements, the code provides a clear view of what happens during the forward pass, how the loss is calculated, how the gradients are computed during backpropagation, and how the model parameters are updated with the Adam optimizer. This helps make the entire training loop more transparent and easier to follow.