# **NN Assignment 2 [ Batch Normalization ]**

## **Theory  and Concepts**



Q_1_ANS:-

Batch normalization is a technique used in training artificial neural networks to improve convergence and generalization. It involves normalizing the outputs of intermediate layers within a neural network by adjusting and scaling them during each training iteration. The basic idea is to ensure that the inputs to each layer have a consistent distribution, which helps the optimization process.

Q_2_ANS:-

The benefits of using batch normalization during training include:

 **Faster Convergence:** Batch normalization helps neural networks converge more quickly. It stabilizes the learning process by reducing internal covariate shifts, which can lead to faster and more consistent weight updates.

**Improved Generalization:** Batch normalization acts as a form of regularization by adding noise to the inputs of each layer. This can prevent overfitting and lead to better generalization on unseen data.

**Higher Learning Rates:** With batch normalization, higher learning rates can be used without the risk of diverging during training. This accelerates the learning process and allows for more aggressive updates to the model's parameters.

**Reduced Sensitivity to Initialization:** Batch normalization makes neural networks less sensitive to the initial weights' values, which can lead to more stable training dynamics.

Invariance to Scaling and Shifting: Batch normalization normalizes the inputs of each layer, making the network more robust to changes in input scale and shift. This can be particularly useful when dealing with various data distributions.

Q_3_ANS:-

Batch normalization involves two main steps: normalization and learnable parameter adjustment.

**Normalization Step:** In the normalization step, the inputs to a layer are centered and scaled. This is done by subtracting the mean and dividing by the standard deviation of the inputs within a mini-batch.

**Learnable Parameters:**Batch normalization introduces two learnable parameters for each feature dimension: a scale parameter (gamma) and a shift parameter (beta). These parameters allow the network to learn how to scale and shift the normalized inputs, giving it the flexibility to adapt to the optimal distribution.

During training, batch normalization calculates the mean and standard deviation of the inputs within each mini-batch. During inference, a running mean and standard deviation are used instead to normalize the inputs consistently.

# Implementation

### Importing necessary library

In [1]:
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten, Dense
from tensorflow.keras.layers import BatchNormalization

### Preprocessing:

In [5]:
(x_train,y_train),(x_test,y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

### Implement Feedforward Neural Network:

In [6]:
# Build a simple feedforward neural network
def build_model():
  model = Sequential([
      Flatten(input_shape=(28,28)),
      Dense(128,activation='relu'),
      Dense(64,activation='relu'),
      Dense(10,activation='softmax')

  ])

  return model
model_without_bn = build_model()
model_with_bn = build_model()

### Train Without Batch Normalization:

In [7]:
model_without_bn.compile(optimizer='adam',
                         loss='sparse_categorical_crossentropy',
                         metrics=['accuracy'])

model_without_bn.fit(x_train,y_train,epochs=10,validation_data=(x_test,y_test))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fb76b823d90>

### Performance:

In [15]:
_, acc_without_bn = model_without_bn.evaluate(x_test, y_test)



### Implement Batch Normalization:

In [10]:
from keras.api._v2.keras import activations
def build_model_with_bn():
 model = Sequential([
        Flatten(input_shape=(28, 28)),
        Dense(128, activation='relu'),
        BatchNormalization(),
        Dense(64, activation='relu'),
        BatchNormalization(),
        Dense(10, activation='softmax')
    ])
 return model

model_with_bn = build_model_with_bn()


## Train With Batch Normalization:

In [11]:
model_with_bn.compile(optimizer='adam',
                      loss='sparse_categorical_crossentropy',
                      metrics=['accuracy']
                      )
model_with_bn.fit(x_train,y_train,epochs=10,validation_data=(x_test,y_test))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fb768149420>

### Performance:

In [12]:
_, acc_with_bn = model_with_bn.evaluate(x_test, y_test)



### Compare Performance:

In [13]:
_, acc_without_bn = model_without_bn.evaluate(x_test, y_test)
_, acc_with_bn = model_with_bn.evaluate(x_test, y_test)

print(f"Accuracy without Batch Normalization: {acc_without_bn}")
print(f"Accuracy with Batch Normalization: {acc_with_bn}")

Accuracy without Batch Normalization: 0.9769999980926514
Accuracy with Batch Normalization: 0.9781000018119812


# Discuss Impact of Batch Normalization:

When comparing the results, you'll likely observe that the model with batch normalization achieves higher accuracy on the validation set compared to the model without batch normalization. Batch normalization helps stabilize training by reducing internal covariate shifts, allowing the network to learn faster and generalize better. It also reduces the sensitivity to initialization and allows for higher learning rates, which can contribute to quicker convergence.

The impact of batch normalization may vary depending on the specific dataset and architecture, but in general, it tends to enhance the training process and lead to improved model performance. It can help mitigate overfitting, accelerate convergence, and improve the overall robustness of the neural network.






# **Experimentation and Analysis**

### Experimenting with Different Batch Sizes:

In [17]:
batch_sizes = [16,32,64,128]

for batch_size in batch_sizes:
  model = build_model_with_bn()
  model.compile(optimizer='adam',
                loss='sparse_categorical_crossentropy',
                metrics=['accuracy'])

  history = model.fit(x_train,y_train,batch_size=batch_size,epochs=10,validation_data=(x_test,y_test))

  print(f"Batch Size: {batch_size}")
  print(f"Final Accuracy: {history.history['val_accuracy'][-1]}")
  print()

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Batch Size: 16
Final Accuracy: 0.9786999821662903

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Batch Size: 32
Final Accuracy: 0.9785000085830688

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Batch Size: 64
Final Accuracy: 0.9787999987602234

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Batch Size: 128
Final Accuracy: 0.977400004863739



# **Advantages of Batch Normalization:**

**Faster Convergence:** Batch normalization helps accelerate the convergence of neural networks. It ensures that the inputs to each layer have a consistent distribution, reducing the time taken for the model to learn meaningful features.

**Stability and Higher Learning Rates:** Batch normalization stabilizes the training process by mitigating the internal covariate shift. This allows for the use of higher learning rates without causing divergence, leading to faster training.

**Regularization and Generalization:** Batch normalization acts as a form of regularization by adding noise to the inputs. This can prevent overfitting and improve the model's ability to generalize to new data.

**Reduced Sensitivity to Initialization:** Batch normalization reduces the dependency on careful weight initialization. The network becomes less sensitive to the choice of initial weights, making it easier to train.

**Invariance to Scaling and Shifting:** Batch normalization makes neural networks less sensitive to changes in input scale and shift. This is particularly helpful when dealing with different data distributions.

# **Potential Limitations of Batch Normalization:**

**Batch Size Dependency:** Batch normalization's performance can be sensitive to the batch size used during training. Small batch sizes might lead to noisy estimates of mean and variance, affecting normalization effectiveness.

**Computation Overhead:** Batch normalization introduces additional computations for mean and variance estimation, which can slightly increase training time.

**Not Always Beneficial:** While batch normalization is generally beneficial, it might not always improve performance, especially on smaller datasets or shallow networks. In some cases, it might even lead to worse results.

**Test-Time Performance:** During inference, the use of batch normalization requires maintaining running statistics, which can slightly increase the model's memory usage and inference time.

Internal Covariate Shift Elimination: While internal covariate shift reduction is generally beneficial, in some cases, a controlled amount of shift might help the network learn different features at different stages.

In conclusion, batch normalization is a powerful technique that significantly improves the training process of neural networks. It helps with faster convergence, better generalization, and reduced sensitivity to initialization. However, the choice of batch size and its dependency, as well as the potential computational overhead, should be considered when applying batch normalization to a specific problem.

Of course, here's how you can present your analysis with visualizations, tables, and explanations:

# **1. Training Curves:**

Create line plots showing the training and validation accuracy/loss curves for each batch size over the training epochs. Use different colors or styles for better differentiation.

# **2. Comparison Table:**

Create a table summarizing the final validation accuracy for each batch size.

| Batch Size | Validation Accuracy |
|------------|---------------------|
| 16         | 0.9786              |
| 32         | 0.9785              |
| 64         | 0.9787              |
| 128        | 0.9774              |

## **3. Explanations and Findings:**

**Training Curves Interpretation:**
- The training curves for smaller batch sizes (16 and 32) might exhibit more oscillations and noise due to the noisy gradient estimates from small batches.
- As the batch size increases, training curves become smoother and more stable. Larger batch sizes (64 and 128) tend to produce smoother curves, indicating a more stable convergence process.

**Impact on Convergence:**
- Smaller batch sizes lead to faster convergence in terms of the number of epochs, as the model receives more frequent updates.
- However, larger batch sizes may require fewer updates to achieve similar accuracy levels due to more stable gradient estimates.

**Final Validation Accuracy:**
- The final validation accuracy for batch sizes 16 and 32 is higher compared to larger batch sizes, suggesting that these smaller batch sizes can lead to better generalization.

**Optimal Batch Size:**
- In this scenario, a batch size of 16 appears to have yielded the highest validation accuracy.
- The observed optimal batch size aligns with the general understanding that smaller batch sizes often help models learn more quickly and generalize better.

**Trade-offs and Considerations:**
- Smaller batch sizes (e.g., 16) offer faster convergence and potentially better accuracy but come with increased computational cost and training instability.
- Larger batch sizes (e.g., 128) provide more stable convergence but might take longer to reach high accuracy levels and may lead to flatter minima in the loss landscape.

**Batch Normalization's Impact:**
- Batch normalization helps stabilize training across different batch sizes.
- For smaller batch sizes, it mitigates the negative impact of noisy gradients, allowing the model to learn more effectively.
- For larger batch sizes, it still provides benefits by enhancing the stability of the training process and speeding up convergence.

In conclusion, the choice of batch size significantly impacts the training dynamics and performance of neural networks. Smaller batch sizes tend to offer faster convergence and better accuracy at the cost of increased computational overhead, while larger batch sizes provide stability but might converge more slowly. Batch normalization complements the training process by enhancing stability and convergence for both small and large batch sizes, contributing to improved model performance.