<a href="https://colab.research.google.com/github/Aryan556gaur/map-reduce-filter/blob/main/BatchNormalisation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [7]:
import tensorflow as tf

print(tf.config.list_physical_devices('GPU'))
print(tf.config.list_physical_devices('CPU'))

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]


In [8]:
mnist = tf.keras.datasets.mnist

(x_train,y_train), (x_test,y_test) = mnist.load_data()

In [9]:
class MNISTmodel(tf.keras.Model):
  def __init__(self, num_classes=10):
    super(MNISTmodel, self).__init__()
    self.flatten = tf.keras.layers.Flatten()
    self.dense = tf.keras.layers.Dense(512, activation="relu", name="d1")
    self.drop = tf.keras.layers.Dropout(0.2)
    self.prediction = tf.keras.layers.Dense(10, activation="softmax", name="d2")

  def call(self, inputs):
    x = self.flatten(inputs)
    x = self.dense(x)
    x = self.drop(x)
    return self.prediction(x)

mnist_model = MNISTmodel()

batch_size = 32
steps_per_epoch = len(x_train)//batch_size
print(steps_per_epoch)
mnist_model.compile (optimizer= tf.keras.optimizers.Adam(), loss='sparse_categorical_crossentropy',metrics = ['accuracy'])
mnist_model.fit(x_train, y_train, batch_size=32, epochs=10)

mnist_model.evaluate(x_test, y_test)

1875
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


[0.3098161518573761, 0.9469000101089478]

In [12]:
from keras.layers.normalization.batch_normalization import BatchNormalization
class MNISTmodel(tf.keras.Model):
  def __init__(self, num_classes=10):
    super(MNISTmodel, self).__init__()
    self.flatten = tf.keras.layers.Flatten()
    self.dense = tf.keras.layers.Dense(512, activation="relu", name="d1")
    self.drop = tf.keras.layers.Dropout(0.2)
    BatchNormalization()
    self.prediction = tf.keras.layers.Dense(10, activation="softmax", name="d2")

  def call(self, inputs):
    x = self.flatten(inputs)
    x = self.dense(x)
    x = self.drop(x)
    return self.prediction(x)

mnist_model = MNISTmodel()

batch_size=[32,64,128,256]

for batch in batch_size:
  steps_per_epoch = len(x_train)//batch
  print(steps_per_epoch)
  mnist_model.compile (optimizer= tf.keras.optimizers.Adam(), loss='sparse_categorical_crossentropy',metrics = ['accuracy'])
  mnist_model.fit(x_train, y_train, batch_size=batch, epochs=5)

  mnist_model.evaluate(x_test, y_test)

1875
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
937
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
468
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
234
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


### **Batch normalization helps stabilize and accelerate the training of neural networks. It normalizes the activations of each layer, reducing the internal covariate shift and allowing for faster convergence. This can lead to faster training, better generalization, and potentially improved performance on validation and test datasets.**
### **The model with batch normalization may show better validation accuracy and faster convergence compared to the model without batch normalization**

Effect of Batch Size on Training Dynamics and Performance:
When using batch normalization, the choice of batch size can influence the training process:

Smaller Batch Sizes:

With smaller batch sizes, the noise in the gradient estimates is higher, which can lead to more erratic updates to model parameters.
However, batch normalization can help mitigate this noise by normalizing the activations within each batch.
Training with smaller batch sizes might require more iterations to converge, but the network can adapt faster to new examples within each batch.
Larger Batch Sizes:

Larger batch sizes can provide more stable gradient estimates and might lead to smoother convergence.
However, extremely large batch sizes might cause convergence to a suboptimal solution due to overfitting on the specific batch.
Batch normalization can still help stabilize training by ensuring that activations remain centered and scaled.
Advantages of Batch Normalization:

Faster Convergence: Batch normalization helps in faster convergence of training by reducing internal covariate shift, allowing higher learning rates, and enabling efficient use of optimization algorithms.

Stabilized Gradients: Batch normalization helps stabilize gradient values during training, preventing the vanishing gradient problem. This is especially important in deep networks.

Regularization Effect: Batch normalization introduces a slight regularization effect, which can help prevent overfitting to the training data.

Reduced Dependency on Initialization: Batch normalization reduces the sensitivity of the model to the initial parameter values, making it less dependent on careful weight initialization.

Potential Limitations of Batch Normalization:

Batch Size Dependency: While batch normalization provides some degree of normalization within each batch, extremely small batch sizes can still result in unstable training dynamics.

Inference Complexity: During inference, batch normalization requires computing batch statistics (mean and variance), which can add computational overhead. Techniques like "running averages" can be used to mitigate this.

Normalization Conflicts: In some cases, batch normalization may not work well if used in combination with other normalization techniques or if the dataset is already well-normalized.

Mode Collapse in Generative Models: In certain generative models, batch normalization can lead to mode collapse, where the generator focuses on a subset of data modes.

Hyperparameter Tuning: Batch normalization introduces additional hyperparameters (gamma and beta) that require tuning, though their default values often work well in practice.