# Ques 1:

## Ques 1:

### Ans: Batch normalization is a technique used in artificial neural networks to improve the training speed and stability of deep learning models. It normalizes the inputs of a layer by adjusting and scaling them using the mean and variance computed over a mini-batch of training samples.
### In a typical neural network, the distribution of input values changes during training as the model's parameters get updated. This phenomenon is known as the internal covariate shift. It makes the learning process slower and requires careful initialization and tuning of network parameters.

## Ques 2:

### Ans: Faster convergence: By normalizing the inputs and reducing the internal covariate shift, batch normalization helps accelerate the training process. It allows the model to converge faster and achieve better accuracy in fewer training iterations.
### Improved gradient flow: Normalizing the inputs reduces the impact of vanishing or exploding gradients. This stabilization of gradients helps in training deeper networks by improving the flow of gradients through the layers.
### Regularization effect: Batch normalization acts as a regularizer by adding a small amount of noise to the inputs. This noise can help prevent overfitting and improve the model's generalization ability.
### Reduced sensitivity to initialization: Batch normalization makes the network less sensitive to the choice of initial weights. It reduces the need for careful weight initialization techniques, making the training process more robust.

### Ques 3:

### Ans: Normalization Step:
### The normalization step in batch normalization involves adjusting and scaling the inputs of a layer using the mean and variance computed over a mini-batch of training samples. Here's a breakdown of the normalization process:
### a. Mini-Batch Statistics: Given a mini-batch of input data for a specific layer, the mean (μ) and variance (σ^2) of the batch are computed. These statistics are calculated independently for each feature or channel of the input data.
### b. Mean and Variance Calculation: The mean is calculated by taking the average of the values for each feature in the mini-batch. The variance is computed by taking the average of the squared differences between each value and the mean.
### c. Normalization: The inputs in the mini-batch are then normalized by subtracting the mean and dividing by the square root of the variance. This step transforms the inputs to have zero mean and unit variance, resulting in a normalized distribution.
### Learnable Parameters:
### Batch normalization also introduces learnable parameters to the normalized inputs. These parameters allow the network to adjust the normalized values based on the specific needs of the layer. The learnable parameters include:
### a. Scale (γ): The scale parameter is a learnable parameter that allows the network to amplify or diminish the normalized values. It provides the flexibility to the layer to rescale the outputs according to its requirements.
### b. Shift (β): The shift parameter is another learnable parameter that enables the network to add an offset or bias to the normalized values. It helps the layer to adjust the mean of the outputs based on its specific requirements.

## Observing results before and after applying Batch Normalization

### Before applying Batch Normalization

In [None]:
import os
import tensorflow as tf
import numpy as np
import pandas as pd
import keras
import matplotlib.pyplot as plt
import seaborn as sns
import time
plt.style.use("fivethirtyeight")
%load_ext tensorboard

In [None]:
# Loading the dataset of fashion MNIST
(X_train_full,y_train_full), (X_test,y_test) = tf.keras.datasets.fashion_mnist.load_data()
X_train_full = X_train_full/255.0
X_test = X_test/255.0
X_valid, X_train = X_train_full[:5000],X_train_full[5000:]
y_valid, y_train = y_train_full[:5000],y_train_full[5000:]

In [None]:
# Creating layer of model
tf.random.set_seed(42) # For getting similar output (optional)
np.random.seed(42)     # For getting similar output (optional)
LAYERS = [
    tf.keras.layers.Flatten(input_shape=[28,28]),
    tf.keras.layers.Dense(300, kernel_initializer="he_normal"),
    tf.keras.layers.LeakyReLU(),
    tf.keras.layers.Dense(100, kernel_initializer="he_normal"),
    tf.keras.layers.LeakyReLU(),
    tf.keras.layers.Dense(10,activation='softmax')
]

model = tf.keras.models.Sequential(LAYERS)

In [None]:
# Compiling the model
model.compile(loss="sparse_categorical_crossentropy",
              optimizer=tf.keras.optimizers.SGD(lr=1e-3),
              metrics=['accuracy'])



In [None]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_3 (Flatten)         (None, 784)               0         
                                                                 
 dense_3 (Dense)             (None, 300)               235500    
                                                                 
 leaky_re_lu_2 (LeakyReLU)   (None, 300)               0         
                                                                 
 dense_4 (Dense)             (None, 100)               30100     
                                                                 
 leaky_re_lu_3 (LeakyReLU)   (None, 100)               0         
                                                                 
 dense_5 (Dense)             (None, 10)                1010      
                                                                 
Total params: 266,610
Trainable params: 266,610
Non-trai

In [None]:
# Now training and calculating the training time

# starting time
start = time.time()

history = model.fit(X_train,y_train,epochs=10,
                    validation_data=[X_valid,y_valid], verbose=2)

# ending time
end = time.time()

# Total time taken
print(f"Runtime of the program is {end - start}")

Epoch 1/10
1719/1719 - 10s - loss: 0.6904 - accuracy: 0.7692 - val_loss: 0.5013 - val_accuracy: 0.8304 - 10s/epoch - 6ms/step
Epoch 2/10
1719/1719 - 9s - loss: 0.4829 - accuracy: 0.8314 - val_loss: 0.4335 - val_accuracy: 0.8530 - 9s/epoch - 5ms/step
Epoch 3/10
1719/1719 - 8s - loss: 0.4414 - accuracy: 0.8444 - val_loss: 0.5291 - val_accuracy: 0.8022 - 8s/epoch - 5ms/step
Epoch 4/10
1719/1719 - 9s - loss: 0.4177 - accuracy: 0.8540 - val_loss: 0.4014 - val_accuracy: 0.8662 - 9s/epoch - 5ms/step
Epoch 5/10
1719/1719 - 9s - loss: 0.4023 - accuracy: 0.8590 - val_loss: 0.3872 - val_accuracy: 0.8652 - 9s/epoch - 5ms/step
Epoch 6/10
1719/1719 - 8s - loss: 0.3862 - accuracy: 0.8648 - val_loss: 0.3791 - val_accuracy: 0.8716 - 8s/epoch - 5ms/step
Epoch 7/10
1719/1719 - 8s - loss: 0.3754 - accuracy: 0.8678 - val_loss: 0.3742 - val_accuracy: 0.8694 - 8s/epoch - 5ms/step
Epoch 8/10
1719/1719 - 8s - loss: 0.3657 - accuracy: 0.8704 - val_loss: 0.3912 - val_accuracy: 0.8604 - 8s/epoch - 4ms/step
Epoch 

## Conclusion:
- Runtime of the program is 142.88s
- Accuracy of the model is 0.8710


## After applying Batch Normalization

In [None]:
# Delete the previous model
del model

In [None]:
# Defining the model with batch normalization
LAYERS_BN = [
    tf.keras.layers.Flatten(input_shape=[28,28]),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(300, activation='relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(10,activation='softmax')
]
model = tf.keras.models.Sequential(LAYERS_BN)

In [None]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_6 (Flatten)         (None, 784)               0         
                                                                 
 batch_normalization (BatchN  (None, 784)              3136      
 ormalization)                                                   
                                                                 
 dense_6 (Dense)             (None, 300)               235500    
                                                                 
 batch_normalization_1 (Batc  (None, 300)              1200      
 hNormalization)                                                 
                                                                 
 dense_7 (Dense)             (None, 100)               30100     
                                                                 
 batch_normalization_2 (Batc  (None, 100)             

In [None]:
bn1 = model.layers[1]

In [None]:
for variable in bn1.variables:
  print(variable.name, variable.trainable)

batch_normalization/gamma:0 True
batch_normalization/beta:0 True
batch_normalization/moving_mean:0 False
batch_normalization/moving_variance:0 False


In [None]:
model.compile(loss="sparse_categorical_crossentropy",
              optimizer=tf.keras.optimizers.SGD(lr=1e-3),
              metrics=['accuracy'])



In [None]:
# Now training and calculating the training time

# starting time
start = time.time()

history = model.fit(X_train,y_train,epochs=10,
                    validation_data=[X_valid,y_valid], verbose=2)

# ending time
end = time.time()

# Total time taken
print(f"Runtime of the program is {end - start}")

Epoch 1/10
1719/1719 - 13s - loss: 0.2264 - accuracy: 0.9178 - val_loss: 0.3086 - val_accuracy: 0.8934 - 13s/epoch - 8ms/step
Epoch 2/10
1719/1719 - 13s - loss: 0.2176 - accuracy: 0.9209 - val_loss: 0.3108 - val_accuracy: 0.8912 - 13s/epoch - 7ms/step
Epoch 3/10
1719/1719 - 13s - loss: 0.2103 - accuracy: 0.9235 - val_loss: 0.3117 - val_accuracy: 0.8868 - 13s/epoch - 8ms/step
Epoch 4/10
1719/1719 - 13s - loss: 0.1995 - accuracy: 0.9273 - val_loss: 0.3166 - val_accuracy: 0.8908 - 13s/epoch - 8ms/step
Epoch 5/10
1719/1719 - 13s - loss: 0.1945 - accuracy: 0.9286 - val_loss: 0.2987 - val_accuracy: 0.8986 - 13s/epoch - 7ms/step
Epoch 6/10
1719/1719 - 13s - loss: 0.1837 - accuracy: 0.9338 - val_loss: 0.3120 - val_accuracy: 0.8914 - 13s/epoch - 8ms/step
Epoch 7/10
1719/1719 - 14s - loss: 0.1787 - accuracy: 0.9356 - val_loss: 0.3153 - val_accuracy: 0.8964 - 14s/epoch - 8ms/step
Epoch 8/10
1719/1719 - 13s - loss: 0.1720 - accuracy: 0.9375 - val_loss: 0.3216 - val_accuracy: 0.8924 - 13s/epoch - 8

## Conclusion
- Runtime of the program is 142.19s
- Accuracy of the model is 0.9430