In [None]:
Q1. Explain the concept of batch normalization in the context of Artificial Neural Networks:

Batch normalization is a technique used to improve the training stability and performance of artificial neural networks. It 
involves normalizing the activations of each layer by adjusting and scaling them so they have a standard mean and variance. 
This normalization helps in reducing internal covariate shift, which is the change in the distribution of network activations 
due to parameter updates during training. By stabilizing the distribution of inputs to each layer, batch normalization allows 
for faster and more stable training of deep neural networks.

Q2. Describe the benefits of using batch normalization during training:

The benefits of using batch normalization during training include:

Improved convergence: Batch normalization helps in stabilizing the training process by maintaining a consistent distribution of 
    inputs, which leads to faster convergence.
Regularization: Batch normalization acts as a form of regularization, reducing the risk of overfitting by adding noise to the 
    activations during training.
Reduced sensitivity to initialization: Batch normalization reduces the dependency of the network on the initial weights, making 
    it less sensitive to weight initialization schemes.
Gradient flow: Batch normalization helps in mitigating the vanishing gradient problem by ensuring that gradients flow smoothly 
    during backpropagation.
Q3. Discuss the working principle of batch normalization, including the normalization step and the learnable parameters:

The working principle of batch normalization involves two main steps:

Normalization: For each mini-batch of training data, batch normalization normalizes the activations of each layer by subtracting
    the mean and dividing by the standard deviation of the batch. This step ensures that the activations have a standard mean 
    and variance, which helps in stabilizing the training process.
Learnable parameters: Batch normalization introduces two learnable parameters, usually denoted as gamma (γ) and beta (β), for 
    each layer. These parameters allow the network to adaptively scale and shift the normalized activations, giving it more 
    flexibility during training.
Q4. Implementation:

To implement batch normalization, you can use any deep learning framework/library like TensorFlow or PyTorch. Here's a basic 
outline of the implementation process:

Choose a dataset (e.g., MNIST, CIFAR-10) and preprocess it.
Implement a simple feedforward neural network without using batch normalization.
Train the neural network on the chosen dataset.
Implement batch normalization layers in the neural network and train the model again.
Compare the training and validation performance (e.g., accuracy, loss) between the models with and without batch normalization.
Q5. Experimentation and Analysis:

Experiment with different batch sizes and observe the effect on the training dynamics and model performance. Discuss the 
advantages and potential limitations of batch normalization in improving the training of neural networks.

In [None]:
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
import torch.optim as optim

# Hyperparameters
input_size = 784  # 28x28
hidden_size = 128
num_classes = 10
num_epochs = 10
batch_size = 100
learning_rate = 0.001

# MNIST dataset (images and labels)
train_dataset = torchvision.datasets.MNIST(root='./data', train=True, transform=transforms.ToTensor(), download=True)
test_dataset = torchvision.datasets.MNIST(root='./data', train=False, transform=transforms.ToTensor())

# Data loader (input pipeline)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)

# Feedforward neural network without batch normalization
class FeedForwardNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(FeedForwardNN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

# Feedforward neural network with batch normalization
class FeedForwardNN_BN(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(FeedForwardNN_BN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.bn = nn.BatchNorm1d(hidden_size)  # Batch normalization layer
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        out = self.fc1(x)
        out = self.bn(out)
        out = self.relu(out)
        out = self.fc2(out)
        return out

# Initialize models
model_without_bn = FeedForwardNN(input_size, hidden_size, num_classes)
model_with_bn = FeedForwardNN_BN(input_size, hidden_size, num_classes)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer_without_bn = optim.Adam(model_without_bn.parameters(), lr=learning_rate)
optimizer_with_bn = optim.Adam(model_with_bn.parameters(), lr=learning_rate)

# Training the model without batch normalization
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.reshape(-1, 28*28)

        # Forward pass
        outputs = model_without_bn(images)
        loss = criterion(outputs, labels)

        # Backward and optimize
        optimizer_without_bn.zero_grad()
        loss.backward()
        optimizer_without_bn.step()

        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
                   .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

# Testing the model without batch normalization
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.reshape(-1, 28*28)
        outputs = model_without_bn(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Accuracy of the network without batch normalization on the 10000 test images: {} %'.format(100 * correct / total))

# Training the model with batch normalization
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.reshape(-1, 28*28)

        # Forward pass
        outputs = model_with_bn(images)
        loss = criterion(outputs, labels)

        # Backward and optimize
        optimizer_with_bn.zero_grad()
        loss.backward()
        optimizer_with_bn.step()

        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
                   .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

# Testing the model with batch normalization
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.reshape(-1, 28*28)
        outputs = model_with_bn(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Accuracy of the network with batch normalization on the 10000 test images: {} %'.format(100 * correct / total))
