<a href="https://colab.research.google.com/github/cagBRT/IntroToDNNwKeras/blob/master/Choosing_Loss_Functions_Multi_Classif.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Multi-Class Classification Loss Functions**

This notebook looks at loss fuctions for problems where examples are assigned to one of three or more classes. <br>
For example: Image classification of the MNIST dataset, it has 10 classes.

**Install and import required libraries**

In [None]:
#!pip install keras.utils

In [None]:
# mlp for the blobs multi-class classification problem with cross-entropy loss
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import utils
from sklearn.datasets import make_blobs
from keras.layers import Dense
from keras.models import Sequential
from tensorflow.keras.optimizers import SGD
from numpy import where
from matplotlib import pyplot
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

Create a synthetic dataset<br>
1000 examples, 3 classes, 2 features

In [None]:
# generate dataset
dataX, y = make_blobs(n_samples=2000, centers=3, n_features=2, cluster_std=2, random_state=2)
#y

Plot the dataset

In [None]:
# select indices of points with each class label
for i in range(3):
	samples_ix = where(y == i)
	pyplot.scatter(dataX[samples_ix, 0], dataX[samples_ix, 1])
pyplot.show()

**Standardize the data**

In [None]:
scaler = StandardScaler()
X=scaler.fit_transform(dataX)

## **Cross-Entropy Loss**

**Split the data into training and test sets**

Note for Cross-Entropy we have to convert the labels to one-hot encoded

The three outputs:<br>

(10 10 01) = 2<br>
(10 01 10) = 1<br>
(01 10 10) = 0

In [None]:
y = tf.keras.utils.to_categorical(y)
y

In [None]:
# split into train and test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.10)

In [None]:
# The patience parameter is the amount of epochs to check for improvement
early_stop = keras.callbacks.EarlyStopping(monitor='loss', patience=10)

**Define the model**

Two inputs and three outputs and one internal dense layer.

In [None]:
# define model
model = Sequential()
model.add(Dense(50, input_dim=2, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(3, activation='softmax'))

In [None]:
# The patience parameter is the amount of epochs to check for improvement
early_stop = keras.callbacks.EarlyStopping(monitor='loss', patience=10)

Cross-entropy is the default loss function to use for multi-class classification models

**Train the model**

In [None]:
# compile model
opt = SGD(learning_rate=0.01, momentum=0.9)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
# fit model
history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=100, verbose=0,callbacks=[early_stop])

**Evaluate the model**

In [None]:
# evaluate the model
_, train_acc = model.evaluate(X_train, y_train, verbose=0)
_, test_acc = model.evaluate(X_test, y_test, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))

In [None]:
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))
# plot loss during training
pyplot.subplot(211)
pyplot.title('Loss')
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
# plot accuracy during training
pyplot.subplot(212)
pyplot.title('Accuracy')
pyplot.plot(history.history['accuracy'], label='train')
pyplot.plot(history.history['val_accuracy'], label='test')
pyplot.legend()
pyplot.show()

**Assignment**:<br>
Run the same code, but this time don't standardize the input data. 
Does this change the accuracy? 

## **Sparse Multiclass Cross-Entropy Loss**

Cross entropy requires converting labels to one-hot encoding.<br>
Which can lead to a sparse label matrix.
It can also lead to a larger number of inputs on the model, which means a larger model. 

Sparse MultiClass Cross-Entropy loss does not require one-hot encoding. <br>
Notice we don't do the categorical conversion for this dataset.

**Create the dataset and split into test and training sets**<br>
This is the same dataset as used in the example above

In [None]:
# generate a classification dataset
X, y = make_blobs(n_samples=2000, centers=3, n_features=2, cluster_std=2, random_state=2)
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.10)

In [None]:
# select indices of points with each class label
for i in range(3):
	samples_ix = where(y == i)
	pyplot.scatter(X[samples_ix, 0], X[samples_ix, 1])
pyplot.show()

In [None]:
# define model
model = Sequential()
model.add(Dense(50, input_dim=2, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(3, activation='softmax'))

In [None]:
# compile model
opt = SGD(learning_rate=0.01, momentum=0.9)
model.compile(loss='sparse_categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

In [None]:
# fit model
history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=100, verbose=0 ,callbacks=[early_stop])

In [None]:
# evaluate the model
_, train_acc = model.evaluate(X_train, y_train, verbose=0)
_, test_acc = model.evaluate(X_test, y_test, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))

In [None]:
# plot loss during training
pyplot.subplot(211)
pyplot.title('Loss')
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
# plot accuracy during training
pyplot.subplot(212)
pyplot.title('Accuracy')
pyplot.plot(history.history['accuracy'], label='train')
pyplot.plot(history.history['val_accuracy'], label='test')
pyplot.legend()
pyplot.show()

**Kullback Leibler Divergence Loss**<br>

KL Divergence measures how one probability differs from a baseline distribution<br>
KL Divergence is usually used for more complex functions, such as autoencoders.<br>But it can be used for Multi-Class Classification, where it is functionally equivalent to multi-class cross entropy. 


In [None]:
# generate 2d classification dataset
X, y = make_blobs(n_samples=1000, centers=3, n_features=2, cluster_std=2, random_state=2)
# one hot encode output variable
y = tf.keras.utils.to_categorical(y)

In [None]:
# split into train and test
n_train = 500
X_train, X_test = X[:n_train, :], X[n_train:, :]
y_train, y_test = y[:n_train], y[n_train:]

In [None]:
# define model
model = Sequential()
model.add(Dense(50, input_dim=2, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(3, activation='softmax'))

In [None]:
# compile model
opt = SGD(learning_rate=0.01, momentum=0.9)
model.compile(loss='kullback_leibler_divergence', optimizer=opt, metrics=['accuracy'])

In [None]:
# fit model
history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=100, verbose=0)

In [None]:
# evaluate the model
_, train_acc = model.evaluate(X_train, y_train, verbose=0)
_, test_acc = model.evaluate(X_test, y_test, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))

In [None]:
# plot loss during training
pyplot.subplot(211)
pyplot.title('Loss')
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
# plot accuracy during training
pyplot.subplot(212)
pyplot.title('Accuracy')
pyplot.plot(history.history['accuracy'], label='train')
pyplot.plot(history.history['val_accuracy'], label='test')
pyplot.legend()
pyplot.show()