In [1]:
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, BatchNormalization
from tensorflow.keras.optimizers import Adam

# Step 1: Preprocess the dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train / 255.0
x_test = x_test / 255.0

# Step 2: Implement a simple feedforward neural network
model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(784,)))
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='softmax'))

# Step 3: Train the neural network without batch normalization
model.compile(optimizer=Adam(learning_rate=0.001),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
model.fit(x_train.reshape(-1, 784), y_train, epochs=10, batch_size=32, validation_data=(x_test.reshape(-1, 784), y_test))

# Step 4: Implement batch normalization layers and train the model again
model_with_bn = Sequential()
model_with_bn.add(Dense(64, activation='relu', input_shape=(784,)))
model_with_bn.add(BatchNormalization())
model_with_bn.add(Dense(64, activation='relu'))
model_with_bn.add(BatchNormalization())
model_with_bn.add(Dense(10, activation='softmax'))

model_with_bn.compile(optimizer=Adam(learning_rate=0.001),
                      loss='sparse_categorical_crossentropy',
                      metrics=['accuracy'])
model_with_bn.fit(x_train.reshape(-1, 784), y_train, epochs=10, batch_size=32, validation_data=(x_test.reshape(-1, 784), y_test))

# Step 5: Compare the training and validation performance
_, accuracy = model.evaluate(x_test.reshape(-1, 784), y_test)
print("Model without batch normalization - Accuracy:", accuracy)

_, accuracy_bn = model_with_bn.evaluate(x_test.reshape(-1, 784), y_test)
print("Model with batch normalization - Accuracy:", accuracy_bn)

# Step 6: Discuss the impact of batch normalization
print("Batch normalization improves model performance by:", accuracy_bn - accuracy)


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz




Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10




Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Model without batch normalization - Accuracy: 0.9750999808311462
Model with batch normalization - Accuracy: 0.9764000177383423
Batch normalization improves model performance by: 0.001300036907196045


# Experiment with different batch sizes and observe the effect on the training dynamics and model performance:

In [2]:
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, BatchNormalization
from tensorflow.keras.optimizers import Adam

# Step 1: Preprocess the dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train / 255.0
x_test = x_test / 255.0

# Experiment with different batch sizes
batch_sizes = [16, 32, 64, 128]

for batch_size in batch_sizes:
    # Step 2: Implement a simple feedforward neural network
    model = Sequential()
    model.add(Dense(64, activation='relu', input_shape=(784,)))
    model.add(Dense(64, activation='relu'))
    model.add(Dense(10, activation='softmax'))

    # Step 3: Train the neural network without batch normalization
    model.compile(optimizer=Adam(learning_rate=0.001),
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    model.fit(x_train.reshape(-1, 784), y_train, epochs=10, batch_size=batch_size, validation_data=(x_test.reshape(-1, 784), y_test))




Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10




Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10




Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10




Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


#### In the code above, we iterate over different batch sizes and train the model without using batch normalization. By experimenting with different batch sizes (e.g., 16, 32, 64, 128), you can observe the effect on training dynamics and model performance. You can analyze the convergence speed, stability, and overall accuracy for each batch size.

#### Advantages of batch normalization:

1) Improved convergence: Batch normalization helps to stabilize and speed up the training process by reducing the internal covariate shift problem. It normalizes the inputs of each layer, ensuring that the mean activation is close to zero and the variance is close to one. This allows for more efficient weight updates during backpropagation and faster convergence.

2) Reduced sensitivity to weight initialization: Batch normalization reduces the dependence of the model's performance on the initial values of the weights. It enables the use of higher learning rates without causing the network to diverge or get stuck in local minima.

3) Regularization effect: Batch normalization acts as a form of regularization by adding a small amount of noise to the network during training. This noise helps to prevent overfitting and improve generalization performance.



#### Limitations and considerations:

1) Increased computational cost: Batch normalization introduces additional computations during both the forward and backward passes of the network, which can slightly increase the training time.

2) Dependency on batch size: The effectiveness of batch normalization can vary with different batch sizes. In some cases, very small batch sizes (e.g., 1 or 2) may lead to unstable or inaccurate results. It is generally recommended to use reasonably sized mini-batches (e.g., 32 or 64) to leverage the benefits of batch normalization effectively.

3) Influence on model performance: While batch normalization can often improve model performance, it may not always be beneficial or necessary. In certain cases, for small networks or datasets, batch normalization might not provide significant advantages and can even hinder performance.