# The Image Cartoonifier SoC’23
# Assignment 3

In [70]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

The Digit Recognizer dataset consists of images of the numbers 0-9. Each image is represented
as a 28x28 matrix, with each pixel having a single pixel-value associated with it. 784 pixels in
each 28x28 input image.
a) Split the data set into dev_set and train_set like X_dev, Y_dev for 1 to 1000 samples from the above data and X_train,Y_train from 1000 to m.
b) Extract the pixel values except the label values and Normalize the pixel values by dividing
them by 255 to bring them into the range of 0 to 1. Store the normalized pixel values in
variables called X_dev,X_train respectively.

In [71]:
df = pd.read_csv("train.csv")

dev_set = df.iloc[:1000, :]
train_set = df.iloc[1000: , :]

X_dev = dev_set.iloc[:, 1:].values
X_train = train_set.iloc[:, 1:].values

Y_dev = dev_set.iloc[:, 0].values
Y_train = train_set.iloc[:, 0].values

X_dev = X_dev/255
X_train = X_train/255

## Part 1: Neural Network Implementation:

i) Write a function named init_params that initializes the parameters for each layer of the
neural network. The input layer (a[0]) should have 784 units, the second layer (a[1]) should have
120 units, the third layer (a[2]) should have 45 units, and the output layer (a[3]) should have 10
units. Initialize the weight matrices (W) with random values between 0 and 1, and the bias
vectors (b) with random values between 0 and 1.

In [210]:
def init_params(): 
    layer_sizes = [784, 120, 45, 10]
    
    params = {}
    
    for i in range(1, 4):
        input_size = layer_sizes[i - 1]
        output_size = layer_sizes[i]
        
        params['W' + str(i)] = np.random.rand(output_size, input_size)
        params['b' + str(i)] = np.random.rand(output_size, 1)
        
    return params

ii)Activation Functions:
a) Implement the ReLU activation function in a function called ReLU.
b) Implement the Softmax activation function in a function called Softmax.

In [202]:
def ReLU(Z):
    A = np.maximum(Z, 0)
    return A

def Softmax(Z):
    Z_scaled = Z / 100000
    exp_Z = np.exp(Z_scaled)
    sum_exp_Z = np.sum(exp_Z)
    A = exp_Z / sum_exp_Z
    return A * 100000

iii)Forward Propagation:
a) Create a function named forward_propagation that takes the weights (W) and
biases (b) as input and performs forward propagation through the neural network.
b) Implement the necessary calculations to compute the intermediate values (Z) and
activations (A) for each layer.
c) Apply the ReLU activation function to the intermediate values for the hidden
layers (a[1] and a[2]).
d) Apply the Softmax activation function to the intermediate values for the output
layer (a[3]).
e) Return the intermediate values (Z) and activations (A) for each layer.

In [282]:
def forward_propagation(params, X):
    W1 = params['W1']
    b1 = params['b1']
    W2 = params['W2']
    b2 = params['b2']
    W3 = params['W3']
    b3 = params['b3']
    
    Z1 = np.dot(W1, X.T) + b1
    A1 = ReLU(Z1)
    Z2 = np.dot(W2, A1) + b2
    A2 = ReLU(Z2)
    Z3 = np.dot(W3, A2) + b3
    A3 = Softmax(Z3)
    
    return Z1, A1, Z2, A2, Z3, A3

iv) One-Hot Encoding:
a) Implement a function called one_hot that converts an input array Y into its one-hot
encoded representation.
b) Use numpy arrays to assign a binary vector to each element in Y, setting the
index corresponding to the element's value to 1 and all others to 0.
c) Transpose the resulting array, where columns represent elements in Y and rows
represent different classes.

In [283]:
def one_hot(Y):
    num_classes = 10
    num_elements = Y.shape[0]
    encoding = np.zeros(num_elements, num_classes)
    for i in range(num_elements):
        encoding[i, Y[i]] = 1
    encoding = encoding.T
    return encoding

## Part 2: Backward Propagation and Model Training:

i)Backward Propagation:
a) Write a function named backward_propagation that performs backward
propagation to calculate the gradients of the parameters.
b) Use the provided arguments Z1, A1, Z2, A2, Z3, A3, W1, W2, W3, X, and Y
to calculate the gradients.
c) Apply the appropriate activation functions' derivatives in the backpropagation
process.
d) Return the gradients dW1, db1, dW2, db2, dW3, and db3.

In [321]:
def ReLU_derivative(Z):
    return np.where(Z > 0, 1, 0)

def backward_propagation(Z1, A1, Z2, A2, Z3, A3, W1, W2, W3, X, Y):
    m = X.shape[0]
    
    dA3 = A3 - Y
    dZ3 = dA3
    dW3 = (1/m) * (np.dot(dZ3, A2.T))
    db3 = (1/m) * (np.sum(dZ3, axis=1, keepdims=True))
    
    dA2 = np.dot(W3.T, dZ3)
    dZ2 = dA2 * ReLU_derivative(Z2)
    dW2 = (1/m) * (np.dot(dZ2, A1.T))
    db2 = (1/m) * (np.sum(dZ2, axis=1, keepdims=True))
    
    dA1 = np.dot(W2.T, dZ2)
    dZ1 = dA1 * ReLU_derivative(Z1)
    dW1 = (1/m) * (np.dot(dZ1, X))
    db1 = (1/m) * (np.sum(dZ1, axis=1, keepdims=True))
    
    return dW1, db1, dW2, db2, dW3, db3

ii)Update Parameters:
a) Implement a function named update_params that updates the parameters
of the neural network using gradient descent.
b) Use the provided arguments W1, b1, W2, b2, dW1, db1, dW2, db2, dW3, db3,
and alpha (learning rate).
c) Update the parameters using the gradients and the learning rate.
d) Return the updated parameters W1, b1, W2, b2, W3, and b3.

In [322]:
def update_params(W1, b1, W2, b2, W3, b3, dW1, db1, dW2, db2, dW3, db3, alpha):
    W1 = W1 - alpha * dW1
    b1 = b1 - alpha * db1
    
    W2 = W2 - alpha * dW2
    b2 = b2 - alpha * db2
    
    W3 = W3 - alpha * dW3
    b3 = b3 - alpha * db3
    
    return W1, b1, W2, b2, W3, b3

iii)Get Prediction and Accuracy:
a) Create a function named get_prediction that takes the output activations
(A3) as input and returns the predicted labels.
b) Use numpy's argmax function to find the index of the highest value in each
column of A3.
c) Implement a function named get_accuracy that takes the predicted labels and
true labels (Y) as input and calculates the accuracy.
d) Print the predicted labels and true labels in the required format

In [323]:
def get_prediction(A3):
    prediction = np.argmax(A3, axis=0)
    return prediction

def get_accuracy(prediction, Y):
    print(prediction, Y)
    condition = prediction==Y
    accuracy = np.mean(condition)
    return accuracy

iv)Gradient Descent:
a) Write a function named gradient_descent that performs the training of
the neural network using gradient descent.
b) Use the provided arguments X_train, Y_train, (alpha=0.1), and
(num_iterations=1000).
c) Perform the iterations of gradient descent, updating the parameters and tracking
the accuracy every 10th iteration.
d) Print the output layer prediction and accuracy during training.

In [325]:
def gradient_descent(X_train, Y_train, alpha=0.1, num_iterations=1000):

    params = init_params()
    W1 = params['W1']
    b1 = params['b1']
    W2 = params['W2']
    b2 = params['b2']
    W3 = params['W3']
    b3 = params['b3']

    for iteration in range(num_iterations):
        Z1, A1, Z2, A2, Z3, A3 = forward_propagation(params, X_train)

        dW1, db1, dW2, db2, dW3, db3 = backward_propagation(Z1, A1, Z2, A2, Z3, A3, W1, W2, W3, X_train, Y_train)

        W1, b1, W2, b2, W3, b3 = update_params(W1, b1, W2, b2, W3, b3, dW1, db1, dW2, db2, dW3, db3, alpha)
        
        if iteration % 10 == 0:
            predictions = get_prediction(A3)
            accuracy = get_accuracy(predictions, Y_train)
            print("Iteration:", iteration)
            print("Accuracy:", accuracy)
            print("")

    print("Final Output Layer Prediction:")
    print(predictions)
    print("Final Accuracy:", accuracy) 

## Part 3: Model Evaluation:

i)Making Predictions:
a) Implement a function named make_predictions that takes the inputs (X)
and trained parameters (W1, b1, W2, b2) as input.
b) Use the forward propagation function to obtain the predictions from the trained
model.
c) Return the predictions.

In [328]:
def make_predictions(X, W1, b1, W2, b2):
    Z1, A1, Z2, A2, Z3, A3 = forward_propagation(X, W1, b1, W2, b2)
    prediction = np.argmax(A3, axis=0)
    return prediction

ii)Testing Predictions:
a) Write a function named test_prediction that tests the model's predictions
on a specific index of the training data.
b) Use the provided arguments index, W1, b1, W2, and b2.
c) Obtain the prediction and true label for the specified index.
d) Visualize the image data using Matplotlib and print the prediction
and true label.

In [330]:
def test_prediction(index, W1, b1, W2, b2):
    x = X_train[index]
    y_true = Y_train[index]
    Z1, A1, Z2, A2, Z3, A3 = forward_propagation(x, W1, b1, W2, b2)
    prediction = get_prediction(A3)

    image = x.reshape(28, 28)
    plt.imshow(image, cmap='gray')
    plt.axis('off')

    print("Prediction:", prediction)
    print("True Label:", y_true)

    plt.show()