# Matrix operations for a multi-class Softmax classifier with a hidden ReLu layer

This notebook illustrates the matrix operations for the forward and backwards passes for a multinomial classifier, using the softmax activation function, the multi-class cross-entropy loss function, and with a hidden layer of ReLu units

There is a unit in the output layer for each possible class, the units are "one-hotted" to yield 0 if the class is incorrect and 1 if it is correct

There is also a quiz question, for you to figure out

In [None]:
#
# This cell imports functions that are used later on
#
!wget -nv https://github.com/IS-pillar-3/A_AI_anc/raw/main/A_AI_softmax_relu_loss_01_v01.py
import A_AI_softmax_relu_loss_01_v01 as sr
#

In [None]:
#
# Simulate forward and backward pass for Softmax multi-class classifier
#
import numpy as np
import random
import copy
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay, accuracy_score
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
#
#
# Properties of generated training set
#
no_classes  = 4
no_hidden   = 2
no_features = 5
no_samples  = 10
#
# Set up X training set, including a bias constant (1) as row 1
#
X = np.concatenate((np.ones((1, no_samples)), np.random.randn(no_features, no_samples)), axis=0)
#
#
# Set up one-hotted Y's in Y_hot training set
#
# Use lower case letters as clases
#
letters = "abcdefghijklmnopqrstuvwxyz"
classes = list(letters[0:no_classes])
#
Y = np.random.choice(classes, no_samples)
#
# One-hot encode Y (from machinelearningmastery.com)
#
# Fit a LabelEncoder model to encode classes as integers
#
label_encoder = LabelEncoder()
integer_codes = label_encoder.fit_transform(classes)
#
# Fit a OneHotEncoder model to encode integers as one-hots
#
onehot_encoder = OneHotEncoder(sparse=False)
integer_codes  = integer_codes.reshape(no_classes, 1)
onehot_encoded = onehot_encoder.fit_transform(integer_codes)
#
# Use encoder models to one-hot encode Y
#
Y_ints = label_encoder.transform(Y)
print(Y)
print(Y_ints)
#
Y_ints = Y_ints.reshape(no_samples, 1)
Y_hot  = np.array(onehot_encoder.transform(Y_ints))
print(Y_hot)
#
# Set up weights
#
W1 = np.random.randn(no_hidden, no_features + 1)
W2 = np.random.randn(no_classes, no_hidden + 1)
#
# ReLU activation (could try leaky ReLU later)
#
Z1 = np.matmul(W1, X)
F1 = copy.deepcopy(Z1)
F1[F1 < 0] = 0
#
# Add bias to F1
#
F1_with_bias = np.concatenate((np.ones((1, no_samples)), F1), axis=0) 
#
# Softmax activation with stabilising adjustment
#
Z2 = np.matmul(W2, F1_with_bias)
#
C     = np.max(Z2, axis=0)
C_adj = np.atleast_2d(C).repeat(repeats=(no_classes), axis=0)
Z2_adj = Z2 - C_adj
#
eZ2      = np.exp(Z2_adj)
eZ2_sums = np.sum(eZ2, axis=0)
#
print("\n++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")
F2 = eZ2 / eZ2_sums
print(F2)
#
# Confusion matrix and accuracy
#
P_ints = np.argmax(F2, axis=0)
P      = label_encoder.inverse_transform(P_ints)
#
print("\n++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")
print(P)
print(Y)
#
cm   = confusion_matrix(Y, P)
disp = ConfusionMatrixDisplay(confusion_matrix=cm,  display_labels=classes)
disp.plot()
plt.show()
#
print("Accuracy:", accuracy_score(Y, P))
#
# Gradient from Softmax
#
G2 = np.matmul((F2 - np.transpose(Y_hot)), np.transpose(F1_with_bias)) / no_samples
#
print("\n++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")
print(G2)
print(G2.shape)
#
# Gradient from hidden (don't need the bias gradients, hence the [:,1:])
#
# G1a = del_L / del_F2 * del_F2 / del_Z2 * del_Z2 / del_F1
#
G1a = np.matmul(np.transpose(F2 - np.transpose(Y_hot)), W2)[:,1:]
G1a = np.transpose(G1a)
#
del_ReLU         = np.zeros(np.shape(G1a))
del_ReLU[F1 > 0] = 1
#
# G1a * del_F1 / del_Z1 * del_Z1 / del_W1

# Backpropagation of ReLU is element wise
#
G1 = np.matmul(np.multiply(G1a, del_ReLU), np.transpose(X))
#
print("\n++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")
print(G1)
print(G1.shape)
#

In [None]:
#
# Show shapes of principal data structures
#
print("X shape:", X.shape)
print("Y shape:", Y.shape)
print("Y_hot shape:", Y_hot.shape)
print("W1 shape:", W1.shape)
print("F1 shape:", F1.shape)
print("F1_with_bias shape:", F1_with_bias.shape)
print("Z1 shape:", Z1.shape)
print("Z2 shape:", Z2.shape)
print("W2 shape:", W2.shape)
print("F2 shape:", F2.shape)
print("G1 shape:", G1.shape)
print("G2 shape:", G2.shape)
#

## Quiz

*Can you use the data in the numpy arrays computed above to compute the value of the loss function for each sample?*

*You can use the next cell to figure out your answer, and the cell below that to get it checked*

In [None]:
#
# Use this cell to work out your answer
#

*Change `myL` in the function call, to the variable you have created, containing the loss value for each sample. This is expected to be a numpy array, it can be either a vector of length `no_samples`, or a matrix with shape `(1, no_samples)`. The loss values are expected to be in the same sample sequence as the other sample related data objects*

In [None]:
#
# Change myL to your variable, and then run the function to check your result
#
sr.check_softmax_relu_loss(Y_hot, F2, L=myL)
#