<a href="https://colab.research.google.com/github/MithunSR/Gradient_Descent_Tutorial/blob/main/Gradient_Calculation_Neural_Network.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Introduction:

This code demonstrates how to compute the gradient of a neural network with multiple layers using backpropagation. It utilizes the Fashion MNIST dataset, a popular benchmark dataset for image classification tasks. The neural network model consists of an input layer, a hidden layer with ReLU activation, and an output layer with softmax activation. The model is trained using the Adam optimizer and sparse categorical cross-entropy loss.

##Importing the necessary libraries:

numpy is imported as np for numerical computations.
tensorflow is imported as tf for building and training neural networks.
fashion_mnist module from tensorflow.keras.datasets is imported to access the Fashion MNIST dataset.

In [3]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import fashion_mnist

##Loading and Preprocessing the data:

The Fashion MNIST dataset is loaded using the fashion_mnist.load_data() function, and the train-test split is performed.
The pixel values of the images are normalized by dividing them by 255.0 to bring them within the range of 0-1.

In [4]:
# Load the Fashion MNIST dataset
(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

In [5]:
# Preprocess the data
X_train = X_train.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0

##Defining the neural network architecture:

The neural network model is defined using the Sequential API from tensorflow.keras.
The Flatten layer is used to convert the input images from 2D to 1D.
The Dense layer with 128 units and ReLU activation is added as the hidden layer.
The final Dense layer with 10 units and softmax activation is added as the output layer.

In [6]:
# Define the architecture of the neural network
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

##Compiling the model:

The model is compiled with the Adam optimizer, which is a popular choice for gradient-based optimization.
The sparse categorical cross-entropy loss function is used, suitable for multi-class classification tasks.
The accuracy metric is also specified to monitor the performance of the model during training.

In [7]:
# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

##Training the model:

The model is trained using the fit function, which takes the training data, labels, number of epochs, batch size, and validation data.
The training process iterates for the specified number of epochs, updating the model's parameters to minimize the loss.

In [8]:
# Train the model
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f32c635bfd0>

##Computing the gradient using backpropagation:

The tf.GradientTape context is used to trace operations for automatic differentiation.
The model's predictions are computed using the training data within the tape context.
The compiled loss function is called with the true labels and predictions to calculate the loss.
The gradients of the loss with respect to the trainable variables (model weights) are computed using tape.gradient.

In [9]:
# Compute the gradient using backpropagation
with tf.GradientTape() as tape:
    predictions = model(X_train)
    loss = model.compiled_loss(tf.squeeze(y_train), predictions)


gradients = tape.gradient(loss, model.trainable_variables)

##Printing the gradients:

Finally, the code loops through the model's trainable variables and corresponding gradients, printing their names and shapes.

In [10]:
# Print the gradients for each layer
for layer, gradient in zip(model.trainable_variables, gradients):
    print(layer.name, gradient.shape)


dense_4/kernel:0 (784, 128)
dense_4/bias:0 (128,)
dense_5/kernel:0 (128, 10)
dense_5/bias:0 (10,)
