
Batch Normalization in Artificial Neural Networks (ANN)

1. Explain the concept of batch normalization in the context of Artificial Neural Networks:

- Batch normalization (BN) is a technique to improve the training speed and performance of neural networks. It normalizes the output of a previous activation layer by subtracting the batch mean and dividing by the batch standard deviation.

2. Describe the benefits of using batch normalization during training:

- Faster Training: BN allows for higher learning rates by reducing internal covariate shift.
- Stability: It stabilizes the training process, reducing the sensitivity to initialization.
- Regularization: BN acts as a form of regularization, potentially reducing the need for other regularization techniques like dropout.

3. Discuss the working principle of batch normalization, including the normalization step and the learnable parameters:
- Normalization Step: For each mini-batch, calculate the mean and variance of the activations. Normalize each activation by subtracting the mean and dividing by the standard deviation.
- Learnable Parameters: After normalization, the output is scaled and shifted using learnable parameters (gamma and beta), allowing the network to maintain the representational power.

Q2. Implementation:

1. Choose a dataset of your choice (e.g., MNIST, CIFAR-10) and preprocess it:

In [3]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Data preprocessing
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = datasets.MNIST(root='./data', train=False, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1000, shuffle=False)


2. Implement a simple feedforward neural network using any deep learning framework/library (e.g., TensorFlow, PyTorch):

In [4]:
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(28*28, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 10)

    def forward(self, x):
        x = x.view(-1, 28*28)
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

model = SimpleNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)


3. Train the neural network on the chosen dataset without using batch normalization:

In [5]:
def train(model, train_loader, criterion, optimizer, epochs=10):
    model.train()
    for epoch in range(epochs):
        running_loss = 0.0
        for data, target in train_loader:
            optimizer.zero_grad()
            output = model(data)
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()
        print(f'Epoch {epoch+1}, Loss: {running_loss/len(train_loader)}')

train(model, train_loader, criterion, optimizer)

Epoch 1, Loss: 0.40141680640484223
Epoch 2, Loss: 0.1661402302514166
Epoch 3, Loss: 0.11830560415625763
Epoch 4, Loss: 0.09260945883529909
Epoch 5, Loss: 0.07818306103618795
Epoch 6, Loss: 0.06611174414964961
Epoch 7, Loss: 0.05378639259894909
Epoch 8, Loss: 0.04733584208231765
Epoch 9, Loss: 0.04049042546815092
Epoch 10, Loss: 0.03641706440918722


4. Implement batch normalization layers in the neural network and train the model again:

In [6]:
class SimpleNNWithBN(nn.Module):
    def __init__(self):
        super(SimpleNNWithBN, self).__init__()
        self.fc1 = nn.Linear(28*28, 256)
        self.bn1 = nn.BatchNorm1d(256)
        self.fc2 = nn.Linear(256, 128)
        self.bn2 = nn.BatchNorm1d(128)
        self.fc3 = nn.Linear(128, 10)

    def forward(self, x):
        x = x.view(-1, 28*28)
        x = torch.relu(self.bn1(self.fc1(x)))
        x = torch.relu(self.bn2(self.fc2(x)))
        x = self.fc3(x)
        return x

model_bn = SimpleNNWithBN()
optimizer_bn = optim.SGD(model_bn.parameters(), lr=0.01, momentum=0.9)

In [7]:
# Train the modified model:
train(model_bn, train_loader, criterion, optimizer_bn)

Epoch 1, Loss: 0.21599571231399167
Epoch 2, Loss: 0.08668426302537671
Epoch 3, Loss: 0.06035005617519416
Epoch 4, Loss: 0.04373052048579808
Epoch 5, Loss: 0.03365582184814441
Epoch 6, Loss: 0.026780540008159447
Epoch 7, Loss: 0.022409169710174735
Epoch 8, Loss: 0.016622860357897586
Epoch 9, Loss: 0.016902302078712707
Epoch 10, Loss: 0.012140795331523279


5. Compare the training and validation performance (e.g., accuracy, loss) between the models with and without batch normalization:

In [8]:
def evaluate(model, test_loader):
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for data, target in test_loader:
            outputs = model(data)
            _, predicted = torch.max(outputs.data, 1)
            total += target.size(0)
            correct += (predicted == target).sum().item()
    return 100 * correct / total

accuracy = evaluate(model, test_loader)
accuracy_bn = evaluate(model_bn, test_loader)

print(f'Accuracy without BN: {accuracy}%')
print(f'Accuracy with BN: {accuracy_bn}%')


Accuracy without BN: 97.76%
Accuracy with BN: 98.19%


6. Discuss the impact of batch normalization on the training process and the performance of the neural network:

- Training Speed: Batch normalization can help the model converge faster by allowing higher learning rates.
- Stability: It reduces the internal covariate shift, stabilizing the learning process.
- Performance: Batch normalization generally improves the model's performance, resulting in better accuracy on both training and validation sets.

In [9]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Data preprocessing
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = datasets.MNIST(root='./data', train=False, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1000, shuffle=False)

# Neural network without batch normalization
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(28*28, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 10)

    def forward(self, x):
        x = x.view(-1, 28*28)
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

model = SimpleNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

def train(model, train_loader, criterion, optimizer, epochs=10):
    model.train()
    for epoch in range(epochs):
        running_loss = 0.0
        for data, target in train_loader:
            optimizer.zero_grad()
            output = model(data)
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()
        print(f'Epoch {epoch+1}, Loss: {running_loss/len(train_loader)}')

train(model, train_loader, criterion, optimizer)

# Neural network with batch normalization
class SimpleNNWithBN(nn.Module):
    def __init__(self):
        super(SimpleNNWithBN, self).__init__()
        self.fc1 = nn.Linear(28*28, 256)
        self.bn1 = nn.BatchNorm1d(256)
        self.fc2 = nn.Linear(256, 128)
        self.bn2 = nn.BatchNorm1d(128)
        self.fc3 = nn.Linear(128, 10)

    def forward(self, x):
        x = x.view(-1, 28*28)
        x = torch.relu(self.bn1(self.fc1(x)))
        x = torch.relu(self.bn2(self.fc2(x)))
        x = self.fc3(x)
        return x

model_bn = SimpleNNWithBN()
optimizer_bn = optim.SGD(model_bn.parameters(), lr=0.01, momentum=0.9)

train(model_bn, train_loader, criterion, optimizer_bn)

def evaluate(model, test_loader):
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for data, target in test_loader:
            outputs = model(data)
            _, predicted = torch.max(outputs.data, 1)
            total += target.size(0)
            correct += (predicted == target).sum().item()
    return 100 * correct / total

accuracy = evaluate(model, test_loader)
accuracy_bn = evaluate(model_bn, test_loader)

print(f'Accuracy without BN: {accuracy}%')
print(f'Accuracy with BN: {accuracy_bn}%')


Epoch 1, Loss: 0.4043550709608013
Epoch 2, Loss: 0.16365424480074758
Epoch 3, Loss: 0.11631351864056737
Epoch 4, Loss: 0.08867974621680642
Epoch 5, Loss: 0.07432134291862072
Epoch 6, Loss: 0.0644305779780847
Epoch 7, Loss: 0.05525515688536589
Epoch 8, Loss: 0.04605809748974512
Epoch 9, Loss: 0.03988742701913307
Epoch 10, Loss: 0.035847287947140226
Epoch 1, Loss: 0.21517041752905225
Epoch 2, Loss: 0.08624984715074333
Epoch 3, Loss: 0.0592294706400039
Epoch 4, Loss: 0.04570024191532959
Epoch 5, Loss: 0.03304272288367696
Epoch 6, Loss: 0.027324564059077305
Epoch 7, Loss: 0.0239504706890219
Epoch 8, Loss: 0.018043148601438845
Epoch 9, Loss: 0.015576080385179046
Epoch 10, Loss: 0.013250092071776928
Accuracy without BN: 97.61%
Accuracy with BN: 98.34%


Q3. Experimentation and Analysis:
1. Experiment with different batch size and observe the effect on the training dynamics and model performance.


To observe the effect of different batch sizes on training dynamics and model performance, we can modify the batch size in our data loaders and retrain the models. Let's choose three different batch sizes: 32, 64, and 128.

In [11]:
# Batch Size 32:
train_loader_32 = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader_32 = DataLoader(test_dataset, batch_size=1000, shuffle=False)

# Train model without batch normalization
model_32 = SimpleNN()
optimizer_32 = optim.SGD(model_32.parameters(), lr=0.01, momentum=0.9)
train(model_32, train_loader_32, criterion, optimizer_32)

# Train model with batch normalization
model_bn_32 = SimpleNNWithBN()
optimizer_bn_32 = optim.SGD(model_bn_32.parameters(), lr=0.01, momentum=0.9)
train(model_bn_32, train_loader_32, criterion, optimizer_bn_32)

# Evaluate both models
accuracy_32 = evaluate(model_32, test_loader_32)
accuracy_bn_32 = evaluate(model_bn_32, test_loader_32)

Epoch 1, Loss: 0.3310694493830204
Epoch 2, Loss: 0.14309563914984463
Epoch 3, Loss: 0.10695876042315115
Epoch 4, Loss: 0.08459196136618653
Epoch 5, Loss: 0.07003962687837581
Epoch 6, Loss: 0.0571495078012192
Epoch 7, Loss: 0.04966393201490088
Epoch 8, Loss: 0.04304751966406475
Epoch 9, Loss: 0.04027762156855703
Epoch 10, Loss: 0.03384864194946131
Epoch 1, Loss: 0.20819080470651388
Epoch 2, Loss: 0.09568213318772614
Epoch 3, Loss: 0.07200215510924657
Epoch 4, Loss: 0.05621409612915789
Epoch 5, Loss: 0.04549729872699827
Epoch 6, Loss: 0.035941357943741606
Epoch 7, Loss: 0.03143149889875203
Epoch 8, Loss: 0.02614082324236321
Epoch 9, Loss: 0.021808457904080085
Epoch 10, Loss: 0.01920710031868269


In [12]:
# Batch Size 64:
train_loader_64 = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader_64 = DataLoader(test_dataset, batch_size=1000, shuffle=False)

# Train model without batch normalization
model_64 = SimpleNN()
optimizer_64 = optim.SGD(model_64.parameters(), lr=0.01, momentum=0.9)
train(model_64, train_loader_64, criterion, optimizer_64)

# Train model with batch normalization
model_bn_64 = SimpleNNWithBN()
optimizer_bn_64 = optim.SGD(model_bn_64.parameters(), lr=0.01, momentum=0.9)
train(model_bn_64, train_loader_64, criterion, optimizer_bn_64)

# Evaluate both models
accuracy_64 = evaluate(model_64, test_loader_64)
accuracy_bn_64 = evaluate(model_bn_64, test_loader_64)


Epoch 1, Loss: 0.4077172031812767
Epoch 2, Loss: 0.16736758740614854
Epoch 3, Loss: 0.11737099300915085
Epoch 4, Loss: 0.09479700793116999
Epoch 5, Loss: 0.07708533286555871
Epoch 6, Loss: 0.06422849480675331
Epoch 7, Loss: 0.05640166123081912
Epoch 8, Loss: 0.04968486795053462
Epoch 9, Loss: 0.04060087869283128
Epoch 10, Loss: 0.034686552515791565
Epoch 1, Loss: 0.21563254157776263
Epoch 2, Loss: 0.0870849877717033
Epoch 3, Loss: 0.059885570633489246
Epoch 4, Loss: 0.044925752793625394
Epoch 5, Loss: 0.03389470920419452
Epoch 6, Loss: 0.025387144549243422
Epoch 7, Loss: 0.022432971158254045
Epoch 8, Loss: 0.01845445458672748
Epoch 9, Loss: 0.016028627575252766
Epoch 10, Loss: 0.013997254320012002


In [None]:
# Batch Size 128:
train_loader_128 = DataLoader(train_dataset, batch_size=128, shuffle=True)
test_loader_128 = DataLoader(test_dataset, batch_size=1000, shuffle=False)

# Train model without batch normalization
model_128 = SimpleNN()
optimizer_128 = optim.SGD(model_128.parameters(), lr=0.01, momentum=0.9)
train(model_128, train_loader_128, criterion, optimizer_128)

# Train model with batch normalization
model_bn_128 = SimpleNNWithBN()
optimizer_bn_128 = optim.SGD(model_bn_128.parameters(), lr=0.01, momentum=0.9)
train(model_bn_128, train_loader_128, criterion, optimizer_bn_128)

# Evaluate both models
accuracy_128 = evaluate(model_128, test_loader_128)
accuracy_bn_128 = evaluate(model_bn_128, test_loader_128)

Epoch 1, Loss: 0.5423439370988529
Epoch 2, Loss: 0.23217710630217595
Epoch 3, Loss: 0.16505739236596043
Epoch 4, Loss: 0.12499488233280842
Epoch 5, Loss: 0.10433103207713251
Epoch 6, Loss: 0.08848691769817998
Epoch 7, Loss: 0.07627236963843485
Epoch 8, Loss: 0.06651358935894615
Epoch 9, Loss: 0.058390087614864555
Epoch 10, Loss: 0.05276924473787549
Epoch 1, Loss: 0.26144389499193316
Epoch 2, Loss: 0.0904197796686753
Epoch 3, Loss: 0.05856420552886244
Epoch 4, Loss: 0.041727523343252346
Epoch 5, Loss: 0.031766765220249606


Let's print out the results for easy comparison:

In [None]:
print(f'Batch Size 32 - Accuracy without BN: {accuracy_32}%, with BN: {accuracy_bn_32}%')
print(f'Batch Size 64 - Accuracy without BN: {accuracy_64}%, with BN: {accuracy_bn_64}%')
print(f'Batch Size 128 - Accuracy without BN: {accuracy_128}%, with BN: {accuracy_bn_128}%')
