## Objective: The objective of this assignment is to assess students' understanding of batch normalization in artificial neural networks (ANN) and its impact on training performance.

In [None]:
Q1. Theory and Concepts
1. Explain the concept of batch normalization in the context of Artificial Neural Networks
2. Describe the benefits of using batch normalization during training
3. Discuss the working principle of batch normalization, including the normalization step and the learnable 
parameters.

1. Batch Norm is a normalization technique done between the layers of a Neural Network instead of in the raw data. It is done along mini-batches instead of the full data set. It serves to speed up training and use higher learning rates, making learning easier.
Batch-Normalization (BN) is an algorithmic method which makes the training of Deep Neural Networks (DNN) faster and more stable

2. Using batch normalization allows us to use much higher learning rates, which further increases the speed at which networks train. Makes weights easier to initialize — Weight initialization can be difficult, and it's even more difficult when creating deeper networks

3. It consists of normalizing activation vectors from hidden layers using the first and the second statistical moments (mean and variance) of the current batch. This normalization step is applied right before (or right after) the nonlinear function.


Batch Norm is just another network layer that gets inserted between a hidden layer and the next hidden layer. Its job is to take the outputs from the first hidden layer and normalize them before passing them on as the input of the next hidden layer.

Just like the parameters (eg. weights, bias) of any network layer, a Batch Norm layer also has parameters of its own:
    
Two learnable parameters called beta and gamma.
Two non-learnable parameters (Mean Moving Average and Variance Moving Average) are saved as part of the ‘state’ of the Batch Norm layer.


These parameters are per Batch Norm layer. So if we have, say, three hidden layers and three Batch Norm layers in the network, we would have three learnable beta and gamma parameters for the three layers. Similarly for the Moving Average parameters.

During training, we feed the network one mini-batch of data at a time. During the forward pass, each layer of the network processes that mini-batch of data. The Batch Norm layer processes its data as follows.

In [None]:
Q2. Implementation
1. Choose a dataset of your choice (e.g., MNIST, CIFAR-10) and preprocess it
2. Implement a simple feedforward neural network using any deep learning framework/library (e.g., 
TensorFlow, PyTorch)
3. Train the neural network on the chosen dataset without using batch normalization
4. Implement batch normalization layers in the neural network and train the model again
5. Compare the training and validation performance (e.g., accuracy, loss) between the models with and 
without batch normalizationr
6 Discuss the impact of batch normalization on the training process and the performance of the neural 
network.


## With batch normalization (size 32)

In [7]:
import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os

mnist=tf.keras.datasets.mnist
(X_train_full, y_train_full), (X_test, y_test) = mnist.load_data()

X_valid,X_train=X_train_full[:5000]/255., X_train_full[5000:]/255.
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]

# scale the test set as well
X_test = X_test / 255.

# Creating layers of ANN
LAYERS=[tf.keras.layers.Flatten(input_shape=[28, 28], name="inputLayer"),
        tf.keras.layers.Dense(300, activation="relu", name="hiddenLayer1"),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dense(100, activation="relu", name="hiddenLayer2"),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dense(10, activation="softmax", name="outputLayer")]

model_clf=tf.keras.models.Sequential(LAYERS)

LOSS_FUNCTION = "sparse_categorical_crossentropy" # use => tf.losses.sparse_categorical_crossentropy
OPTIMIZER = "SGD" # or use with custom learning rate=> tf.keras.optimizers.SGD(0.02)
METRICS = ["accuracy"]


model_clf.compile(loss=LOSS_FUNCTION,
              optimizer=OPTIMIZER,
              metrics=METRICS)



EPOCHS = 10
VALIDATION_SET = (X_valid, y_valid)

history=model_clf.fit(X_train,y_train,epochs=EPOCHS,validation_data=VALIDATION_SET,batch_size=32)

pd.DataFrame(history.history)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Unnamed: 0,loss,accuracy,val_loss,val_accuracy
0,0.339461,0.901236,0.152341,0.957
1,0.168787,0.9514,0.112844,0.9658
2,0.124734,0.963855,0.099177,0.9704
3,0.098705,0.971364,0.092308,0.9702
4,0.082098,0.976055,0.085137,0.974
5,0.067097,0.9806,0.08181,0.9738
6,0.058196,0.983182,0.081604,0.9768
7,0.04994,0.985727,0.077088,0.9768
8,0.042007,0.988509,0.078856,0.975
9,0.038037,0.989345,0.077891,0.9772


## Without batch normalization

import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os

mnist=tf.keras.datasets.mnist
(X_train_full, y_train_full), (X_test, y_test) = mnist.load_data()

X_valid,X_train=X_train_full[:5000]/255., X_train_full[5000:]/255.
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]

# scale the test set as well
X_test = X_test / 255.

# Creating layers of ANN
LAYERS=[tf.keras.layers.Flatten(input_shape=[28, 28], name="inputLayer"),
        tf.keras.layers.Dense(300, activation="relu", name="hiddenLayer1"),
        tf.keras.layers.Dense(100, activation="relu", name="hiddenLayer2"),
        tf.keras.layers.Dense(10, activation="softmax", name="outputLayer")]

model_clf=tf.keras.models.Sequential(LAYERS)

LOSS_FUNCTION = "sparse_categorical_crossentropy" # use => tf.losses.sparse_categorical_crossentropy
OPTIMIZER = "SGD" # or use with custom learning rate=> tf.keras.optimizers.SGD(0.02)
METRICS = ["accuracy"]


model_clf.compile(loss=LOSS_FUNCTION,
              optimizer=OPTIMIZER,
              metrics=METRICS)



EPOCHS = 10
VALIDATION_SET = (X_valid, y_valid)

history=model_clf.fit(X_train,y_train,epochs=EPOCHS,validation_data=VALIDATION_SET,batch_size=32)

pd.DataFrame(history.history)

## accuracy is increased with batch normalization from 0.969 to 0.989 (for training data) and 0.967 to 0.977 (for validation data)

## loss is reduced with batch normalization from 0.108 to 0.038 (for training data) and 0.111 to 0.077 (for validation data)

In [None]:
Q3. Experimentation and Analysis
1. Experiment with different batch sizes and observe the effect on the training dynamics and model 
performancer
2. Discuss the advantages and potential limitations of batch normalization in improving the training of 
neural networks.

## With batch normalization (size 64)

In [9]:
import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os

mnist=tf.keras.datasets.mnist
(X_train_full, y_train_full), (X_test, y_test) = mnist.load_data()

X_valid,X_train=X_train_full[:5000]/255., X_train_full[5000:]/255.
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]

# scale the test set as well
X_test = X_test / 255.

# Creating layers of ANN
LAYERS=[tf.keras.layers.Flatten(input_shape=[28, 28], name="inputLayer"),
        tf.keras.layers.Dense(300, activation="relu", name="hiddenLayer1"),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dense(100, activation="relu", name="hiddenLayer2"),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dense(10, activation="softmax", name="outputLayer")]

model_clf=tf.keras.models.Sequential(LAYERS)

LOSS_FUNCTION = "sparse_categorical_crossentropy" # use => tf.losses.sparse_categorical_crossentropy
OPTIMIZER = "SGD" # or use with custom learning rate=> tf.keras.optimizers.SGD(0.02)
METRICS = ["accuracy"]


model_clf.compile(loss=LOSS_FUNCTION,
              optimizer=OPTIMIZER,
              metrics=METRICS)



EPOCHS = 10
VALIDATION_SET = (X_valid, y_valid)

history=model_clf.fit(X_train,y_train,epochs=EPOCHS,validation_data=VALIDATION_SET,batch_size=64)

pd.DataFrame(history.history)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Unnamed: 0,loss,accuracy,val_loss,val_accuracy
0,0.400043,0.882927,0.205926,0.9436
1,0.194065,0.946345,0.155528,0.9548
2,0.147613,0.9578,0.130193,0.9616
3,0.118389,0.966382,0.116655,0.9644
4,0.099855,0.971891,0.106387,0.968
5,0.084659,0.976491,0.099388,0.9686
6,0.07319,0.980182,0.094548,0.971
7,0.064344,0.983345,0.091589,0.971
8,0.056262,0.985327,0.08727,0.9734
9,0.049713,0.986945,0.086281,0.974


## With batch normalization (size 16)

In [10]:
import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os

mnist=tf.keras.datasets.mnist
(X_train_full, y_train_full), (X_test, y_test) = mnist.load_data()

X_valid,X_train=X_train_full[:5000]/255., X_train_full[5000:]/255.
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]

# scale the test set as well
X_test = X_test / 255.

# Creating layers of ANN
LAYERS=[tf.keras.layers.Flatten(input_shape=[28, 28], name="inputLayer"),
        tf.keras.layers.Dense(300, activation="relu", name="hiddenLayer1"),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dense(100, activation="relu", name="hiddenLayer2"),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dense(10, activation="softmax", name="outputLayer")]

model_clf=tf.keras.models.Sequential(LAYERS)

LOSS_FUNCTION = "sparse_categorical_crossentropy" # use => tf.losses.sparse_categorical_crossentropy
OPTIMIZER = "SGD" # or use with custom learning rate=> tf.keras.optimizers.SGD(0.02)
METRICS = ["accuracy"]


model_clf.compile(loss=LOSS_FUNCTION,
              optimizer=OPTIMIZER,
              metrics=METRICS)



EPOCHS = 10
VALIDATION_SET = (X_valid, y_valid)

history=model_clf.fit(X_train,y_train,epochs=EPOCHS,validation_data=VALIDATION_SET,batch_size=16)

pd.DataFrame(history.history)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Unnamed: 0,loss,accuracy,val_loss,val_accuracy
0,0.311245,0.906182,0.133507,0.964
1,0.163277,0.951382,0.106187,0.9696
2,0.124562,0.962727,0.085798,0.975
3,0.100842,0.969,0.077413,0.978
4,0.08601,0.973964,0.07463,0.9794
5,0.076302,0.976636,0.069615,0.9814
6,0.066287,0.979545,0.064046,0.9812
7,0.06052,0.981127,0.06672,0.982
8,0.051045,0.983764,0.069149,0.9816
9,0.047922,0.984927,0.0724,0.9794


## Conclusion:
    1. batch size 32 get better results than batch size 16 and 32, 
    2. so we need to use batch size as hyper parameter to get the optimum batch size.