<a href="https://colab.research.google.com/github/babupallam/Msc_AI_Module2_Natural_Language_Processing/blob/main/L06-Feed%20Forward%20Networks%20for%20Natural%20Language%20Processing/05_1D_Convolution_with_BatchNormalization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


### 1. **Introduction**

- **What are 1D Convolutions?**
  - **1D Convolutions** are applied to **sequential data** where the input is a sequence of vectors (like time series, text, or audio data).
  - Each convolutional layer applies filters that slide across the sequence, capturing local dependencies between time steps or tokens.
  
- **Applications**:
  - **Time-series data** (e.g., stock prices, sensor readings).
  - **Natural Language Processing (NLP)** tasks (e.g., text classification, sentiment analysis).
  - **Audio signal processing** (e.g., speech recognition).

- **Why Batch Normalization?**
  - **Batch Normalization** helps **stabilize learning** by normalizing the outputs of a layer, ensuring that activations are well-distributed across the network.
  - It allows the model to use **higher learning rates** and accelerates training by reducing **internal covariate shift**.
  - Additionally, it adds a slight regularization effect, making it less dependent on dropout.

**Observation**:
- Without normalization, neural networks may converge slowly, or the training may become unstable due to exploding or vanishing gradients.



---

### 2. **Imports and Data Setup**

- **Importing Required Libraries**:
  - Import essential libraries like `torch` and `torch.nn` for building the convolutional neural network.
  - PyTorch provides several modules, including `nn.Conv1d` for 1D convolutions and `nn.BatchNorm1d` for batch normalization.

- **Creating Sample Input Data**:
  - Create a random input tensor to simulate sequential data:
    - **Batch size**: Number of independent sequences processed in parallel.
    - **One-hot size**: Dimensionality of each input vector (e.g., size of one-hot encoded categories).
    - **Sequence width**: Number of time steps in the sequence (e.g., words in a sentence, time steps in time series).

**Code**:


In [1]:
import torch
import torch.nn as nn

# Define parameters
batch_size = 2
one_hot_size = 10  # Represents the dimensionality of the one-hot encoding
sequence_width = 7  # Represents the length of the sequence (e.g., time steps)

# Create random input tensor
data = torch.randn(batch_size, one_hot_size, sequence_width)
print("Data shape:", data.shape)  # Shape: (batch_size, one_hot_size, sequence_width)


Data shape: torch.Size([2, 10, 7])



**Explanation**:
- **Batch size**: 2 sequences will be processed simultaneously.
- **One-hot size**: The data has 10 features per time step.
- **Sequence width**: The sequence is 7 time steps long.

**Observation**:
- Sequential data (e.g., text, time-series) can be modeled by using 1D convolutions, which slide across the time dimension.

---

### 3. **Convolution Layers and Batch Normalization**

- **Defining Convolutional Layers**:
  - Use `nn.Conv1d` to define the convolutional layers. We define three layers in this case:
    - **conv1**: Takes in `one_hot_size` input channels and produces 16 output channels (filters) with a kernel size of 3.
    - **conv2**: Takes 16 input channels from `conv1` and produces 32 output channels.
    - **conv3**: Takes 32 input channels from `conv2` and produces 64 output channels.

**Code**:


In [2]:
conv1 = nn.Conv1d(in_channels=one_hot_size, out_channels=16, kernel_size=3)
conv2 = nn.Conv1d(in_channels=16, out_channels=32, kernel_size=3)
conv3 = nn.Conv1d(in_channels=32, out_channels=64, kernel_size=3)



- **Applying Batch Normalization**:
  - After each convolutional layer (except the last one), apply batch normalization to normalize the activations.
  - ReLU activation is applied to add non-linearity between the layers.
  - **conv1_bn** and **conv2_bn** normalize the activations of `conv1` and `conv2` respectively.

**Code**:


In [3]:

conv1_bn = nn.BatchNorm1d(num_features=16)
conv2_bn = nn.BatchNorm1d(num_features=32)

# Forward pass with ReLU and batch normalization
intermediate1 = conv1_bn(torch.relu(conv1(data)))
intermediate2 = conv2_bn(torch.relu(conv2(intermediate1)))
intermediate3 = conv3(intermediate2)  # No batch normalization on the last layer



- **Printing Intermediate Tensor Sizes**:
  - Print the shapes of tensors after each layer to observe how the sequence width shrinks and the number of output channels increases.

**Demonstration**:


In [4]:
print("Shape after conv1 + batch norm + ReLU:", intermediate1.shape)
print("Shape after conv2 + batch norm + ReLU:", intermediate2.shape)
print("Shape after conv3:", intermediate3.shape)


Shape after conv1 + batch norm + ReLU: torch.Size([2, 16, 5])
Shape after conv2 + batch norm + ReLU: torch.Size([2, 32, 3])
Shape after conv3: torch.Size([2, 64, 1])



**Observation**:
- The sequence width shrinks as the convolutional layers apply filters across the time steps.
- Batch normalization helps stabilize the values passed to the next layer, speeding up learning and improving the model's performance.

---

### 4. **Exercise**

- **Modifying the Kernel Size**:
  - The **kernel size** defines the window size for each convolution operation.
  - Change the kernel size in the convolutional layers and observe how it affects the output tensor’s size.
  
**Task**:
  - Modify the kernel size and print the resulting output shapes:


In [5]:
  conv1 = nn.Conv1d(in_channels=one_hot_size, out_channels=16, kernel_size=2)  # Modify kernel size
  conv2 = nn.Conv1d(in_channels=16, out_channels=32, kernel_size=2)

  intermediate1 = conv1_bn(torch.relu(conv1(data)))
  intermediate2 = conv2_bn(torch.relu(conv2(intermediate1)))
  print("Shape with kernel size 2:", intermediate2.shape)


Shape with kernel size 2: torch.Size([2, 32, 5])



  **Observation**:
  - Smaller kernels may capture finer details in the sequence, but they may also lose long-range dependencies.

- **Adding More Layers**:
  - Add additional convolutional layers to make the architecture deeper.
  - Observe how adding more layers increases the complexity of the model and can help it learn more abstract patterns.

**Task**:
  - Add more layers and apply ReLU activation:


In [10]:
conv3 = nn.Conv1d(in_channels=32, out_channels=64, kernel_size=1)
intermediate4 = torch.relu(conv3(intermediate2))
print("Shape after adding conv4:", intermediate4.shape)


Shape after adding conv4: torch.Size([2, 64, 5])



  **Observation**:
  - Deeper architectures can model more complex dependencies in sequential data, but they require more computational power.

---


### 5. Demonstration (using Code from the previous file)

In [11]:
import torch
import torch.nn as nn
import torch.nn.functional as F

# Define a more complex 1D CNN architecture with two convolutional layers, batch normalization, dropout, and two fully connected layers
class Complex1DCNN(nn.Module):
    def __init__(self):
        super(Complex1DCNN, self).__init__()

        # First 1D convolutional layer
        self.conv1 = nn.Conv1d(in_channels=1, out_channels=16, kernel_size=3, padding=1)

        # Batch normalization after Conv1
        self.bn1 = nn.BatchNorm1d(num_features=16)  # Batch normalization for 16 channels (from Conv1)

        # Second 1D convolutional layer
        self.conv2 = nn.Conv1d(in_channels=16, out_channels=32, kernel_size=3, padding=1)

        # Batch normalization after Conv2
        self.bn2 = nn.BatchNorm1d(num_features=32)  # Batch normalization for 32 channels (from Conv2)

        # Max pooling layer
        self.pool = nn.MaxPool1d(kernel_size=2, stride=2)

        # Dropout layer to reduce overfitting (p=0.5 means 50% chance of dropping a neuron)
        self.dropout = nn.Dropout(p=0.5)

        # Fully connected layers
        self.fc1 = nn.Linear(32 * 5, 100)  # 32 channels with length reduced to 5 after pooling twice
        self.fc2 = nn.Linear(100, 10)  # 10 output classes

    def forward(self, x):
        # Initial input dimensions
        print(f"Input Data:\n{x}")
        print(f" - Shape: {x.shape} (batch_size, channels, length)")

        # First convolution -> Batch Normalization -> ReLU -> Max Pooling
        x = self.conv1(x)
        print(f"Values after Conv1:\n{x}")

        # Apply batch normalization after Conv1
        x = self.bn1(x)
        print(f"Values after BatchNorm1 (Conv1):\n{x}")

        x = F.relu(x)
        x = self.pool(x)
        print(f"Values after Max Pooling (Conv1):\n{x}")
        print(f" - Output shape after Max Pooling (Conv1): {x.shape} (batch_size, out_channels=16, length=10)")

        # Second convolution -> Batch Normalization -> ReLU -> Max Pooling
        x = self.conv2(x)
        print(f"Values after Conv2:\n{x}")

        # Apply batch normalization after Conv2
        x = self.bn2(x)
        print(f"Values after BatchNorm2 (Conv2):\n{x}")

        x = F.relu(x)
        x = self.pool(x)
        print(f"Values after Max Pooling (Conv2):\n{x}")
        print(f" - Output shape after Max Pooling (Conv2): {x.shape} (batch_size, out_channels=32, length=5)")

        # Flatten the tensor before passing into fully connected layers
        print(f"\nFlattening the tensor for Fully Connected layers")
        x = x.view(-1, 32 * 5)
        print(f"Values after Flattening:\n{x}")
        print(f" - Shape after flattening: {x.shape} (batch_size, flattened size)")

        # Fully connected layer 1
        x = self.fc1(x)
        print(f"Values after FC1:\n{x}")
        x = F.relu(x)

        # Apply dropout after the first fully connected layer
        x = self.dropout(x)
        print(f"Values after Dropout:\n{x} (note: some neurons will have values set to zero)")

        # Fully connected layer 2 (Output layer)
        x = self.fc2(x)
        print(f"Values after FC2 (Output):\n{x}")
        print(f" - Output shape after FC2: {x.shape} (batch_size, neurons=10)")

        return x

# Create an instance of the complex CNN model with batch normalization and dropout
complex_model = Complex1DCNN()

# Generate a random 1D input tensor
input_data = torch.randn(1, 1, 20)
print(f"Input Data:\n{input_data}\n")

# Forward pass through the model
output = complex_model(input_data)

# Print the final output shape and values
print(f"\nFinal Output Shape: {output.shape}")
print(f"Final Output Values:\n{output}")


Input Data:
tensor([[[-1.2090, -0.5226,  0.6870,  0.2028,  1.3623, -1.5955,  1.1614,
           1.6579, -0.0861, -2.3708,  0.3673, -0.0206,  0.1043,  1.5678,
           1.3681, -1.1600,  0.5823, -0.3004,  0.2747,  1.0564]]])

Input Data:
tensor([[[-1.2090, -0.5226,  0.6870,  0.2028,  1.3623, -1.5955,  1.1614,
           1.6579, -0.0861, -2.3708,  0.3673, -0.0206,  0.1043,  1.5678,
           1.3681, -1.1600,  0.5823, -0.3004,  0.2747,  1.0564]]])
 - Shape: torch.Size([1, 1, 20]) (batch_size, channels, length)
Values after Conv1:
tensor([[[ 6.7277e-01,  1.4928e-01,  2.4794e-01,  2.7395e-01,  6.7104e-01,
           6.1018e-01, -2.8030e-01,  5.0563e-01,  1.2253e+00,  5.9284e-01,
           5.1939e-04,  4.9356e-01,  1.1820e-01,  3.6025e-02,  8.2193e-01,
           6.8499e-01,  2.4799e-01,  5.2450e-01,  1.5545e-01,  3.9215e-01],
         [-1.6731e-01, -4.8241e-04, -3.8558e-01, -3.2023e-01, -8.2194e-01,
          -8.1485e-02, -1.6730e-01, -8.3561e-01, -8.2258e-01,  1.7423e-01,
          -9.9

To extend the current `Complex1DCNN` model with **batch normalization**, we need to add batch normalization layers after the convolutional layers. Batch normalization normalizes the output of the previous layers by adjusting and scaling the activations, which helps improve the stability and convergence of the network, especially for deeper networks.


##### Key Modifications:

1. **Batch Normalization Layers**:
   - **`self.bn1 = nn.BatchNorm1d(num_features=16)`**: This layer normalizes the output of `conv1` across the batch. It helps the network stabilize learning by ensuring that the activations have zero mean and unit variance.
   - **`self.bn2 = nn.BatchNorm1d(num_features=32)`**: Similarly, this normalizes the output of `conv2` before the ReLU activation.
   
2. **Where Batch Normalization is Applied**:
   - Batch normalization is applied **after the convolutional layers** but **before the ReLU activation**. This allows the activations to be scaled and shifted appropriately.
   - For instance, after the first convolution (`conv1`), batch normalization is applied to normalize the feature maps before passing them through ReLU:
     ```python
     x = self.conv1(x)
     x = self.bn1(x)
     x = F.relu(x)
     ```

3. **Impact of Batch Normalization**:
   - **Faster convergence**: Batch normalization helps the network converge faster by preventing the internal covariate shift (where layer outputs change distribution during training).
   - **Reduced overfitting**: By stabilizing the learning process, batch normalization can also act as a regularizer, sometimes even reducing the need for dropout (though we keep both in this model).

4. **Forward Pass**:
   - The model performs a forward pass with batch normalization added after each convolutional layer.
   - The values after each step, including convolution, batch normalization, ReLU activation, max pooling, and dropout, are printed for detailed inspection.

##### Example Output (Simplified):
```plaintext
Input Data:
tensor([[ ... ]])
 - Shape: torch.Size([1, 1, 20]) (batch_size, channels, length)

Values after Conv1:
tensor([[ ... ]])
Values after BatchNorm1 (Conv1):
tensor([[ ... ]])
Values after Max Pooling (Conv1):
tensor([[ ... ]])
 - Output shape after Max Pooling (Conv1): torch.Size([1, 16, 10])

Values after Conv2:
tensor([[ ... ]])
Values after BatchNorm2 (Conv2):
tensor([[ ... ]])
Values after Max Pooling (Conv2):
tensor([[ ... ]])
 - Output shape after Max Pooling (Conv2): torch.Size([1, 32, 5])

Flattening the tensor for Fully Connected layers
Values after Flattening:
tensor([[ ... ]])
 - Shape after flattening: torch.Size([1, 160])

Values after FC1:
tensor([[ ... ]])
Values after Dropout:
tensor([[ ... ]]) (note: some neurons will have values set to zero)

Values after FC2 (Output):
tensor([[ ... ]])
 - Output shape after FC2: torch.Size([1, 10])

Final Output Shape: torch.Size([1, 10])
Final Output Values:
tensor([[ ... ]])
```



### 6. **Conclusion**

- **Recap**:
  - **1D Convolutions** are essential for sequential data, allowing the network to capture relationships between neighboring time steps or tokens.
  - **Batch Normalization** improves the training process by normalizing activations, allowing higher learning rates, and stabilizing gradients.

- **Importance of Batch Normalization**:
  - Batch normalization helps prevent overfitting, speeds up convergence, and allows deeper networks to be trained more effectively.

**Quiz**:
  - Why is batch normalization typically applied before the activation function in many architectures?
  - What is the impact of changing the kernel size in a 1D convolution layer?
  
  **Observation**:
  - Batch normalization regularizes the activations, enabling the model to converge faster and generalize better on new data.

---
