<a href="https://colab.research.google.com/github/babupallam/Msc_AI_Module2_Natural_Language_Processing/blob/main/L06-Feed%20Forward%20Networks%20for%20Natural%20Language%20Processing/03_MLP_Forward_Pass_and_Output.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



### 1. **Introduction**

- **The Forward Pass in Neural Networks**:
  - **Forward pass**: The process of passing input data through the network's layers (from input to output).
  - Each layer transforms the input by applying **weights**, **biases**, and **activation functions**.
  - The forward pass computes the final predictions of the network.

- **Activation Functions**:
  - Introduce **activation functions** which are essential for adding non-linearity to the network.
  - Without activation functions, the network would behave like a linear model regardless of the number of layers.
  - Common activation functions:
    - **ReLU (Rectified Linear Unit)**: Outputs the input if it’s positive, otherwise outputs 0. Helps the network learn complex patterns.
    - **Softmax**: Converts the raw scores (logits) from the network into probabilities, often used in the final layer for classification tasks.

  **Observation**:
  - Activation functions are what give neural networks the ability to learn and model non-linear data. They enable the network to capture more complex relationships in the data.




---



### 2. **Forward Pass**

- **Defining the Forward Pass in PyTorch**:
  - In PyTorch, the forward pass is implemented in the `forward()` method of the model class (e.g., `MultilayerPerceptron`).
  - During the forward pass, input data flows through each layer, transformations are applied, and the final output is produced.

- **Code Breakdown**:
  - Here’s how the forward pass works in the `MultilayerPerceptron` class:


In [2]:
import torch.nn as nn
import torch.nn.functional as F

class MultilayerPerceptron(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(MultilayerPerceptron, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)  # First layer
        self.fc2 = nn.Linear(hidden_dim, output_dim)  # Output layer

    def forward(self, x, apply_softmax=False):
        # Apply ReLU activation after the first layer
        x = F.relu(self.fc1(x))

        # Apply the second layer
        output = self.fc2(x)

        # Optionally apply Softmax if needed for classification
        if apply_softmax:
            output = F.softmax(output, dim=1)
        return output



- **Explanation**:
  - **First Layer (`fc1`)**: Transforms the input using a linear transformation and then applies the ReLU activation.
  - **Second Layer (`fc2`)**: Produces the final output of the network (logits or raw scores).
  - **Softmax**: Can be optionally applied to convert the logits into probabilities when performing classification tasks.

  **Observation**:
  - The forward pass defines how data moves through the network, applying transformations in each layer. Adding activation functions like ReLU introduces non-linearity between layers.

  **Demonstration**:


In [6]:
import torch

#Create an instance of the model and define the input data:
input_dim = 3
hidden_dim = 100
output_dim = 4
mlp = MultilayerPerceptron(input_dim, hidden_dim, output_dim)

# Create random input tensor
x_input = torch.rand(2, input_dim)


In [7]:

# Pass the input tensor through the model and print the output:
output = mlp(x_input)
print("Output of the network (without Softmax):\n", output)


Output of the network (without Softmax):
 tensor([[ 0.0118,  0.0963,  0.0022,  0.0604],
        [ 0.1121, -0.0793,  0.0499,  0.0396]], grad_fn=<AddmmBackward0>)



---



### 3. **Understanding Outputs**

- **Raw Output (Logits)**:
  - When performing the forward pass without Softmax, the network produces **logits**. Logits are unnormalized values that the model outputs before applying any probability distribution.
  - Logits can be positive or negative, and they are typically used as inputs for loss functions like **Cross-Entropy Loss** (which applies Softmax internally).

- **Applying Softmax**:
  - **Softmax** normalizes the logits into probabilities. This is useful when the network is performing **classification tasks** and we want to interpret the output as the likelihood of each class.
  - Softmax converts logits into values between 0 and 1, where the sum of the probabilities is 1.

  **Observation**:
  - If you’re working on a classification problem, the output logits should be passed through Softmax to interpret them as probabilities.

  **Demonstration**:


In [8]:
# Run the forward pass with Softmax and print the resulting probabilities:
output_with_softmax = mlp(x_input, apply_softmax=True)
print("Output of the network (with Softmax):\n", output_with_softmax)

# Show that the sum of probabilities for each sample is 1:
print("Sum of probabilities for each sample:\n", output_with_softmax.sum(dim=1))


Output of the network (with Softmax):
 tensor([[0.2422, 0.2636, 0.2399, 0.2543],
        [0.2706, 0.2235, 0.2543, 0.2517]], grad_fn=<SoftmaxBackward0>)
Sum of probabilities for each sample:
 tensor([1., 1.], grad_fn=<SumBackward1>)



---



### 4. **Exercise**

- **Turn Softmax On/Off**:
  - Modify the `apply_softmax` flag and observe how the output changes when Softmax is applied vs. when it isn’t.

  **Task**:
  - Compare the output with and without Softmax:


In [9]:
output_without_softmax = mlp(x_input, apply_softmax=False)
output_with_softmax = mlp(x_input, apply_softmax=True)

print("Without Softmax:\n", output_without_softmax)
print("With Softmax:\n", output_with_softmax)


Without Softmax:
 tensor([[ 0.0118,  0.0963,  0.0022,  0.0604],
        [ 0.1121, -0.0793,  0.0499,  0.0396]], grad_fn=<AddmmBackward0>)
With Softmax:
 tensor([[0.2422, 0.2636, 0.2399, 0.2543],
        [0.2706, 0.2235, 0.2543, 0.2517]], grad_fn=<SoftmaxBackward0>)



  - Explain the differences between the logits (raw output) and the probabilities (Softmax output).

- **Print Intermediate Activations**:
  - Print the activations from the hidden layer to visualize how the input is transformed as it moves through the network.

  **Task**:
  - Modify the `forward()` method to print intermediate activations after the ReLU function:


In [10]:
def forward(self, x, apply_softmax=False):
    x = F.relu(self.fc1(x))
    print("Intermediate activations after ReLU:\n", x)
    output = self.fc2(x)
    if apply_softmax:
        output = F.softmax(output, dim=1)
    return output



  **Observation**:
  - The intermediate activations show how the data is transformed after applying ReLU, demonstrating how the network processes input at each stage.

---

### 5. **Conclusion**

- **Recap of the Forward Pass**:
  - The forward pass is the process of feeding input through the network to obtain the output.
  - The **ReLU** activation function introduces non-linearity, enabling the model to learn more complex patterns.
  - **Softmax** is used when working with classification tasks, converting logits into interpretable probabilities.

- **Key Takeaways**:
  - The forward pass is a critical part of neural networks that defines how data flows through the layers.
  - Activation functions like ReLU and Softmax play essential roles in the learning process by transforming data into meaningful representations.

  **Demonstration**:
  - Run the entire model on a new set of inputs and print both intermediate activations and final outputs to summarize the flow through the network:


In [11]:
x_input = torch.rand(3, input_dim)  # New input data
output = mlp(x_input, apply_softmax=True)
print("Final output with Softmax:\n", output)

Final output with Softmax:
 tensor([[0.2705, 0.2742, 0.2289, 0.2264],
        [0.2550, 0.2493, 0.2630, 0.2328],
        [0.2558, 0.2477, 0.2470, 0.2494]], grad_fn=<SoftmaxBackward0>)
