# L-layer deep neural network with numpy (no tensorflow)

This code implements a deep neural network from scratch using NumPy. It trains a multi-layer neural network on a given dataset to perform classification tasks.



## Import Libraries and Load Data:

Our dataset is originally in a row-wise format, where each row represents an image, and each column contains the pixel values for that image. To process this data correctly, we need to transpose the dataset so that each column represents an image, and each row contains the pixel values. After transposing, we then convert the data into a NumPy array for further processing


In [2]:
import numpy as np
import pandas as pd
df = pd.read_csv("train.csv")
df.head()

  from pandas.core import (


Unnamed: 0,label,pixel0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,...,pixel774,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783
0,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,4,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## Preprocess Data

In [3]:
train = df.to_numpy() 
np.random.shuffle(train) # shuffle the dataset
train = train.T # take transpose of the dataset to make it columnwise
y = train[0]
x = train[1:]
x = x / np.max(x) # Normalizes the feature values

## Split Data into Training and Test Sets

In [7]:
train_len = int(len(y)*0.8)
y_train = y[:train_len]
y_test = y[train_len:]
x_train = x[:,:train_len]
x_test = x[:,train_len:]

## Initialize Network Parameters

In [46]:
input_size = x_train.shape[0]
output_size = np.unique(y).shape[0]
layer_dims = [input_size, 10, 5, output_size]
n_samples = x_train.shape[1]
activations = ["relu","relu","softmax"]

## Define Functions

These functions are defined to initialize parameters, compute activations, forward propagate inputs through the network, compute losses, backpropagate errors, update parameters, and compute accuracy.

The forward propagation in a neural network involves matrix multiplication to compute the activations of the current layer:
z[l] = W[l] * A[l-1] + b[l]

Z[l] is the linear combination of the inputs (before applying the activation function) for layer l.
W[l] is the weight matrix of shape (layer_dims[l], layer_dims[l-1]).
A[l-1] is the activation from the previous layer (layer l-1), which has a shape (layer_dims[l-1], number_of_examples).
b[l] is the bias vector for layer l, which is added to each column of W[l] * A[l-1].

So the shape of the W[l] must be (layer_dims[l], layer_dims[l-1])



In [10]:
def initialize_parameters(layer_dims):
    parameters = {}
    L = len(layer_dims)
    for l in range(1, L):
        parameters[f'W{l}'] = np.random.randn(layer_dims[l], layer_dims[l-1]) * 0.01
        parameters[f'b{l}'] = np.zeros((layer_dims[l], 1))
    return parameters

In [194]:
def softmax(Z):
    exp = np.exp(Z)
    return exp / np.sum(exp, axis=0, keepdims=True)

def sigmoid(Z):
    return 1 / (1 + np.exp(-Z))

def relu(z):
    return np.maximum(z, 0)

In [206]:
def loss_function(A,Y):
    m = Y.shape[1]
    loss = np.sum(Y * np.log(A)) / -m
    loss = np.squeeze(loss)
    return loss

def binary_cross_entropy_loss(A,Y):
    m = Y.shape[1]
    loss = np.sum(-Y * np.log(A) + (1 - Y) * np.log(1 - A)) / -m
    loss = np.squeeze(loss)
    return loss

In [195]:
def forward_propagation(X, parameters, activations):
    cache = {}
    A = X
    cache['A0'] = A
    L = len(parameters) // 2  # Number of layers

    for l in range(1, L + 1):
        A_prev = A
        W = parameters['W' + str(l)]
        b = parameters['b' + str(l)]
        Z = np.dot(W, A_prev) + b
        if activations[l-1] == 'relu':
            A = relu(Z)
        elif activations[l-1] == 'softmax':
            A = softmax(Z)
        elif activations[l-1] == 'sigmoid':
            A = sigmoid(Z)

        cache['A' + str(l)] = A
        cache['Z' + str(l)] = Z
        cache['W' + str(l)] = W
        cache['b' + str(l)] = b

    return A, cache

In [188]:
def relu_derivative(z):
    return np.array(z > 0)

def softmax_derivative(softmax_output):
    s = softmax_output.reshape(-1, 1)
    jacobian_matrix = np.diagflat(s) - np.dot(s, s.T)
    return jacobian_matrix

def loss_function_derivative(y,s):
    return y/s


## Back Propagation

The back_propagation function calculates the gradients of the loss function with respect to the weights and biases of a deep neural network using the chain rule of calculus. This process is essential for updating the network's parameters during training. The function takes the predicted output AL, the true labels y, the cache containing intermediate values from forward propagation, and the list of activation functions used in each layer. It initializes an empty dictionary gradients to store the calculated gradients. The function iterates backward from the output layer to the first hidden layer. For each layer, it computes the gradient of the loss with respect to the activations (dZ), the weights (dW), and the biases (db). The computation involves matrix multiplications and applying the derivatives of the activation functions (ReLU and Softmax in this case). These gradients are used to adjust the weights and biases to minimize the loss during the optimization step, allowing the network to learn from the data. The function finally returns the gradients dictionary containing the necessary gradients for updating the network's parameters.

In [86]:
def back_propagation(AL, y, cache, activations):
    gradients = {}
    L = len(cache) // 4 
    m = y.shape[1]

    # gradients["dZ"+ str(L)] = np.dot(loss_function_derivative(y,AL),softmax_derivative(AL))
    gradients["dZ"+ str(L)] = AL - y # derivative of loss function wrt zL
    gradients["dW"+ str(L)] = np.dot(gradients["dZ"+ str(L)],cache["A" + str(L-1)].T) / m
    gradients["db" + str(L)] = np.sum(gradients["dZ"+ str(L)], axis=1, keepdims=True) / m
    for l in reversed(range(1, L)):
        if activations[l-1] == 'softmax':
            gradients["dZ"+ str(l)] = np.dot(cache["W" + str(l+1)].T,gradients["dZ"+ str(l+1)]) * softmax_derivative(cache["Z"+ str(l)])
        elif activations[l-1] == 'relu':
            gradients["dZ"+ str(l)] = np.dot(cache["W" + str(l+1)].T,gradients["dZ"+ str(l+1)]) * relu_derivative(cache["Z"+ str(l)])
        
        gradients["dW"+ str(l)] = np.dot(gradients["dZ"+ str(l)],cache["A" + str(l-1)].T) / m
        gradients["db" + str(l)] = np.sum(gradients["dZ"+ str(l)], axis=1, keepdims=True) / m

    return gradients


In [68]:
def update_parameters(parameters, gradients, learning_rate):
    L = len(parameters) // 2
    for l in range(1, L + 1):
        parameters['W' + str(l)] -= learning_rate * gradients['dW' + str(l)]
        parameters['b' + str(l)] -= learning_rate * gradients['db' + str(l)]
    return parameters

## Utility Functions

In [18]:
def get_accuracy(pred, y):
    return np.sum(pred == y) / y.size

In [216]:
def binary_prediction(a2):
    return np.where(a2 > 0.5, 1, 0)

In [270]:
def get_predictions(a2):
    return np.argmax(a2, 0)

In [20]:
def test_preds(x_train, parameters,y):
    A, cache = forward_propagation(x_train,parameters,activations)
    preds = get_predictions(A)
    accuracy = get_accuracy(preds, y)
    return accuracy

In [21]:
def one_hot(y, num_classes):
    return np.eye(num_classes)[y].T

In [263]:
layer_dims = [input_size, 10, output_size]
activations = ["relu", "softmax"]

In [286]:
def L_layer_model(x_train,y_train,x_test,y_test, learning_rate, num_iterations,classification_type="multiclass"):
    parameters = initialize_parameters(layer_dims)
    y_train_one_hot = one_hot(y_train, num_classes)
    for epoch in range(num_iterations):
        AL, cache = forward_propagation(x_train, parameters, activations)
        if classification_type == "multiclass":
            loss = loss_function(AL, y_train_one_hot)
            gradients = back_propagation(AL, y_train_one_hot, cache, activations)
            parameters = update_parameters(parameters, gradients, learning_rate)
            preds_train = get_predictions(AL)
            accuracy_train = get_accuracy(preds_train, y_train)
            accuracy_test = test_preds(x_test, parameters, y_test)
        elif classification_type == "binaryclass":
            loss = binary_cross_entropy_loss(AL, y_train)
            gradients = back_propagation(AL, y_train, cache, activations)
            parameters = update_parameters(parameters, gradients, 0.06)
            preds_train = binary_prediction(AL)
            accuracy_train = get_accuracy(preds_train, y_train)
            accuracy_test = test_preds(x_test, parameters, y_test)
        else:
            print("multiclass or binaryclass clasification!")
            break
        print(f"Epoch {epoch+1} - Train Accuracy: {accuracy_train:.4f} - Cost: {loss:.4f}")
        print(f"Test Accuracy: {accuracy_test:.4f}")
    return parameters


## Train the Model

In [287]:
params = L_layer_model(x_train,y_train,x_test,y_test, 0.1, 500)

Epoch 1 - Train Accuracy: 0.0697 - Cost: 2.3027
Test Accuracy: 0.1450
Epoch 2 - Train Accuracy: 0.1409 - Cost: 2.3026
Test Accuracy: 0.1270
Epoch 3 - Train Accuracy: 0.1296 - Cost: 2.3025
Test Accuracy: 0.1156
Epoch 4 - Train Accuracy: 0.1175 - Cost: 2.3024
Test Accuracy: 0.1117
Epoch 5 - Train Accuracy: 0.1136 - Cost: 2.3024
Test Accuracy: 0.1101
Epoch 6 - Train Accuracy: 0.1123 - Cost: 2.3023
Test Accuracy: 0.1098
Epoch 7 - Train Accuracy: 0.1120 - Cost: 2.3022
Test Accuracy: 0.1098
Epoch 8 - Train Accuracy: 0.1120 - Cost: 2.3022
Test Accuracy: 0.1098
Epoch 9 - Train Accuracy: 0.1120 - Cost: 2.3021
Test Accuracy: 0.1098
Epoch 10 - Train Accuracy: 0.1120 - Cost: 2.3020
Test Accuracy: 0.1098
Epoch 11 - Train Accuracy: 0.1120 - Cost: 2.3019
Test Accuracy: 0.1098
Epoch 12 - Train Accuracy: 0.1120 - Cost: 2.3018
Test Accuracy: 0.1098
Epoch 13 - Train Accuracy: 0.1120 - Cost: 2.3018
Test Accuracy: 0.1098
Epoch 14 - Train Accuracy: 0.1120 - Cost: 2.3017
Test Accuracy: 0.1098
Epoch 15 - Trai

In [275]:
def make_preds(i,parameters,activations,image):
    AL,_ = forward_propagation(x_test[:,i].reshape(-1, 1),parameters,activations)
    preds_t = get_predictions(AL)
    print(preds_t)
    print(y_test[i])
    if image:
        image = x_test[:, i].reshape(-1, 1)
        plt.gray()
        plt.imshow(image, interpolation='nearest')
        plt.show()

In [283]:
make_preds(800,params,activations,False)

[0]
0
