# Matrix operations for a multi-class Softmax classifier

This notebook illustrates the matrix operations for the forward and backwards passes for a multinomial classifier, using the softmax activation function, and the multi-class cross-entropy loss function

There is a unit in the output layer for each possible class, the units are "one-hotted" to yield 0 if the class is incorrect and 1 if it is correct

There is also a quiz question, for you to figure out

In [None]:
#
# This cell imports functions that are used later on
#
!wget -nv https://github.com/IS-pillar-3/A_AI_anc/raw/main/A_AI_softmax_loss_01_v01.py
import A_AI_softmax_loss_01_v01 as ml
#

In [None]:
#
# Simulate forward and backward pass for Softmax multi-class classifier
#
import numpy as np
import random
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay, accuracy_score
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
#
#
# Properties of generated training set
#
no_classes  = 4
no_features = 5
no_samples  = 10
#
# Set up X training set, including a bias constant (1) as row 1
#
X = np.concatenate((np.ones((1, no_samples)), np.random.randn(no_features, no_samples)), axis=0)
#
# Set up one-hotted Y's in Y_hot training set
#
# Use lower case letters as clases
#
letters = "abcdefghijklmnopqrstuvwxyz"
classes = list(letters[0:no_classes])
#
Y = np.random.choice(classes, no_samples)
#
# One-hot encode Y (from machinelearningmastery.com)
#
# Fit a LabelEncoder model to encode classes as integers
#
label_encoder = LabelEncoder()
integer_codes = label_encoder.fit_transform(classes)
#
# Fit a OneHotEncoder model to encode integers as one-hots
#
onehot_encoder = OneHotEncoder(sparse=False)
integer_codes  = integer_codes.reshape(no_classes, 1)
onehot_encoded = onehot_encoder.fit_transform(integer_codes)
#
# Use encocer models to one-hot encode Y
#
Y_ints = label_encoder.transform(Y)
print(Y)
print(Y_ints)
#
Y_ints = Y_ints.reshape(no_samples, 1)
Y_hot  = np.array(onehot_encoder.transform(Y_ints))
print(Y_hot)
#
# Set up weights
#
W = np.random.randn(no_classes, no_features + 1)
#
# Activation without stablising terms
#
Z       = np.matmul(W, X)
eZ      = np.exp(Z)
eZ_sums = np.sum(eZ, axis=0)
#
print("\n++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")
F = eZ / eZ_sums
print(F)
#
# Activation with stabilising adjustment (as per medium.com)
#
C     = np.max(Z, axis=0)
C_adj = np.atleast_2d(C).repeat(repeats=(no_classes), axis=0)
Z_adj = Z - C_adj
#
eZ      = np.exp(Z_adj)
eZ_sums = np.sum(eZ, axis=0)
#
print("\n++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")
F = eZ / eZ_sums
print(F)
#
# Predictions, confusion matrix and accuracy
#
P_ints = np.argmax(F, axis=0)
P      = label_encoder.inverse_transform(P_ints)
#
print("\n++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")
print(P)
print(Y)
#
cm   = confusion_matrix(Y, P)
disp = ConfusionMatrixDisplay(confusion_matrix=cm,  display_labels=classes)
disp.plot()
plt.show()
#
print("Accuracy:", accuracy_score(Y, P))
#
# Gradients
#
G = np.matmul((F - np.transpose(Y_hot)), np.transpose(X)) / no_samples
#
print("\n++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")
print(G)
#

In [None]:
#
# Show shapes of principal data structures
#
print("X shape:", X.shape)
print("Y shape:", Y.shape)
print("Y_hot shape:", Y_hot.shape)
print("W shape:", W.shape)
print("Z shape:", Z.shape)
print("eZ shape:", eZ.shape)
print("eZ_sums shape:", eZ_sums.shape)
print("F shape:", F.shape)
print("C shape:", C.shape)
print("C_adj shape:", C_adj.shape)
print("G shape:", G.shape)
#

## Quiz

*Can you use the data in the numpy arrays computed above to compute the value of the loss function for each sample?*

*You can use the next cell to figure out your answer, and the cell below that to get it checked*

In [None]:
#
# Use this cell to work out your answer
#

*Change `myL` in the function call, to the variable you have created, containing the loss value for each sample. This is expected to be a numpy array, it can be either a vector of length `no_samples`, or a matrix with shape `(1, no_samples)`. The loss values are expected to be in the same sample sequence as the other sample related data objects*

In [None]:
#
# Change myL to your variable, and then run the function to check your result
#
ml.check_softmax_loss(Y_hot, F, L=myL)
#