## Q1 Theory and Concepts



<b>q1 Explain the concept of batch normalization in the context of Artificial Neural Networks

Batch Normalization (BN) is a technique used in artificial neural networks to improve the training stability and convergence speed of deep networks. It helps address the internal covariate shift, which refers to the change in the distribution of input values to a layer as the network's parameters are updated during training. Batch normalization works by normalizing the input of each layer within a mini-batch of training examples.

Normalization: During the forward pass, for each mini-batch of training examples, the mean and standard deviation of the activations (output values) for each feature (neuron) in a layer are calculated.

<b>q2 Describe the benefits of using batch normalization during training

Faster Convergence: Batch normalization helps mitigate the vanishing gradient problem and allows for faster convergence. It stabilizes training by keeping activations within a reasonable range and ensuring that gradients don't become too small, leading to more effective weight updates.

Higher Learning Rates: Batch normalization allows for the use of higher learning rates in training, which can speed up convergence. Higher learning rates might otherwise cause training to become unstable, but batch normalization helps mitigate this issue.

Reduced Dependency on Initialization: Batch normalization reduces the sensitivity of the network to the choice of weight initialization, making it easier to train deep networks.

Regularization Effect: Batch normalization adds a slight regularization effect by introducing randomness through mini-batch statistics. This can help prevent overfitting to the training data.

Independence from Input Scaling: Batch normalization helps make the network less dependent on the input scaling, which can be especially useful when dealing with data of different scales.

<b>q3 Discuss the working principle of batch normalization, including the normalization step and the learnable parameters.

Normalization: During the forward pass, for each mini-batch of training examples, the mean and standard deviation of the activations (output values) for each feature (neuron) in a layer are calculated.

Normalization Step: The activations are normalized using the calculated mean and standard deviation. This ensures that the activations have a mean close to 0 and a standard deviation close to 1. This step helps mitigate the internal covariate shift problem, leading to more stable and faster training.

Scaling and Shifting: The normalized activations are then scaled by a learnable parameter (γ) and shifted by another learnable parameter (β). These parameters allow the network to learn the optimal scale and shift for the normalized activations, giving the model the flexibility to undo the normalization if necessary.

Learnable Parameters: In addition to the weights and biases of the neural network, batch normalization introduces two new learnable parameters (γ and β) per feature for each layer. These parameters are learned during training through backpropagation and gradient descent.

## Q2 implementaion

In [17]:
import tensorflow
import time
from tensorflow import keras
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense,Flatten ,BatchNormalization

In [18]:
(X_train,y_train),(X_test,y_test) = keras.datasets.mnist.load_data()

In [19]:
X_train = X_train/255
X_test = X_test/255



## Model without BatchNormalization

In [20]:
model_Without = Sequential()
model_Without.add(Flatten(input_shape=(28,28)))
model_Without.add(Dense(128,activation='relu'))
model_Without.add(Dense(32,activation='relu'))
model_Without.add(Dense(10,activation='softmax'))

In [21]:
model_Without.compile(loss='sparse_categorical_crossentropy',optimizer='Adam',metrics=['accuracy'])

In [22]:
start = time.time()

model_Without.fit(X_train,y_train,epochs=10,validation_split=0.2)

end = time.time()

print(f"run time of program is {end - start}")

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
run time of program is 74.04018688201904


In [23]:
y_prob = model_Without.predict(X_test)
y_pred = y_prob.argmax(axis=1)



In [24]:
from sklearn.metrics import accuracy_score
accuracy_score(y_test,y_pred)

0.9751

## Model with BatchNormalization

In [32]:
model = Sequential()

model.add(Flatten(input_shape=(28,28)))
model.add(BatchNormalization())
model.add(Dense(128,activation='relu'))
model.add(BatchNormalization())
model.add(Dense(32,activation='relu'))
model.add(BatchNormalization())
model.add(Dense(10,activation='softmax'))

In [26]:
model.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_3 (Flatten)         (None, 784)               0         
                                                                 
 batch_normalization_3 (Bat  (None, 784)               3136      
 chNormalization)                                                
                                                                 
 dense_9 (Dense)             (None, 128)               100480    
                                                                 
 batch_normalization_4 (Bat  (None, 128)               512       
 chNormalization)                                                
                                                                 
 dense_10 (Dense)            (None, 32)                4128      
                                                                 
 batch_normalization_5 (Bat  (None, 32)               

In [27]:
model.compile(loss='sparse_categorical_crossentropy',optimizer='Adam',metrics=['accuracy'])

In [28]:
start = time.time()

history = model.fit(X_train,y_train,epochs=10,validation_split=0.2)


end = time.time()

print(f"run time of program is {end - start}")

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
run time of program is 91.82681035995483


In [29]:
y_prob = model.predict(X_test)
y_pred = y_prob.argmax(axis=1)



In [30]:
from sklearn.metrics import accuracy_score
accuracy_score(y_test,y_pred)

0.9698

Batch Normalization is a technique commonly used in training neural networks to improve convergence speed, mitigate the vanishing/exploding gradient problem, and enhance generalization. It operates by normalizing the input of each layer in a mini-batch of training data. This normalization helps stabilize and accelerate the training process by maintaining a consistent distribution of inputs throughout the network.

## Q3 Experimentation and Analysis

<b>1. Experiment with different batch sizes and observe the effect on the training dynamics and model
performancer

In [45]:
# take Batch_size = 3

In [37]:
model = Sequential()  

model.add(Flatten(input_shape=(28,28)))
model.add(BatchNormalization(batch_size = 3))
model.add(Dense(128,activation='relu'))
model.add(BatchNormalization(batch_size = 3))
model.add(Dense(32,activation='relu'))
model.add(BatchNormalization(batch_size = 3))
model.add(Dense(10,activation='softmax'))

In [38]:
model.compile(loss='sparse_categorical_crossentropy',optimizer='Adam',metrics=['accuracy'])
start = time.time()

history = model.fit(X_train,y_train,epochs=10,validation_split=0.2)


end = time.time()

print(f"run time of program is {end - start}")

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
run time of program is 90.17563509941101


In [39]:
y_prob = model.predict(X_test)
y_pred = y_prob.argmax(axis=1)



In [40]:
from sklearn.metrics import accuracy_score
accuracy_score(y_test,y_pred)

0.9725

In [41]:
model = Sequential()

model.add(Flatten(input_shape=(28,28)))
model.add(BatchNormalization(batch_size = 8))
model.add(Dense(128,activation='relu'))
model.add(BatchNormalization(batch_size = 8))
model.add(Dense(32,activation='relu'))
model.add(BatchNormalization(batch_size = 8))
model.add(Dense(10,activation='softmax'))

In [None]:
## take batch_size = 8

In [42]:
model.compile(loss='sparse_categorical_crossentropy',optimizer='Adam',metrics=['accuracy'])
start = time.time()

history = model.fit(X_train,y_train,epochs=10,validation_split=0.2)


end = time.time()

print(f"run time of program is {end - start}")

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
run time of program is 90.07444047927856


In [43]:
y_prob = model.predict(X_test)
y_pred = y_prob.argmax(axis=1)



In [44]:
from sklearn.metrics import accuracy_score
accuracy_score(y_test,y_pred)

0.9719

<b>using batch size we can get more Accuracy.

<b>q2 Discuss the advantages and potential limitations of batch normalization in improving the training of
neural networks.

Advantages:

Accelerated Convergence: Batch Normalization helps networks converge faster by reducing internal covariate shifts. This enables the use of higher learning rates, leading to quicker convergence to a lower training loss.

Stable Gradients: By normalizing the inputs, Batch Normalization mitigates the vanishing and exploding gradient problems. It maintains gradients at a moderate range, making optimization more stable and facilitating better weight updates.

Regularization Effect: Batch Normalization acts as a form of regularization by adding noise to the inputs of each layer due to the normalization process. This can help prevent overfitting and enhance the model's generalization capabilities.

Reduced Sensitivity to Weight Initialization: With Batch Normalization, the network becomes less sensitive to the initial weights. This allows for faster training and makes it easier to train very deep networks.

Effective for Various Architectures: Batch Normalization is compatible with various network architectures and activation functions, making it a versatile tool for improving the training process.

Improved Gradient Flow: The normalization process helps maintain a more consistent gradient flow throughout the network, allowing for more efficient training in deep architectures