# Multiclass Classification with Keras

## Session Objectives
- Learn how to build a classifier to recognize hand written digits
 - Class encoding (One hot encoding)
 - Softmax activation function
- Learn how to use dropout layers to improve generalization

## Handwritten Digits Recognition

we will build a network that can recognize handwritten numbers. For achieving this
goal, we use MNIST (for more information, refer to http://yann.lecun.com/exdb/mnist/), a database of
handwritten digits made up of a training set of 60,000 examples and a test set of 10,000 examples.
The training examples are annotated by humans with the correct answer. For instance, if the
handwritten digit is the number three, then three is simply the label associated with that example

Each MNIST image is in gray scale, and it consists of 28 x 28 pixels. A subset of these numbers is
represented in the following diagram:


![dataset](subset.png)

### softmax activation function

How does a softmax function work?

The softmax function squashes the outputs of each unit to be between 0 and 1, just like a sigmoid function. But it also divides each output such that the total sum of the outputs is equal to 1 (check it on the figure below).

![softmax_exp](softmax_exp.png)

The output of the softmax function is equivalent to a categorical probability distribution, it tells you the probability that any of the classes are true.

Mathematically the softmax function is shown below, where z is a vector of the inputs to the output layer (if you have 10 output units, then there are 10 elements in z). And again, j indexes the output units, so j = 1, 2, ..., K.

The softmax function
![softmax](softmax.png)

In [4]:
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.layers import Dense, Input, Activation
from tensorflow.keras.optimizers import SGD,Adam
from tensorflow.keras.models import Model,Sequential
from sklearn.model_selection import train_test_split
from tensorflow.keras.datasets import mnist
# from tensorflow.keras.utils import np_utils
np.random.seed(1671) # for reproducibility
%matplotlib inline


In [5]:
X_train = np.load('DATASET/X_train.npy')
y_train = np.load('DATASET/y_train.npy')
X_test = np.load('DATASET/X_test.npy')
y_test = np.load('DATASET/y_test.npy')
print(X_train.shape)
print(X_test.shape)

#OR run this:
#(X_train, y_train), (X_test, y_test) = mnist.load_data()


(60000, 28, 28)
(10000, 28, 28)


### Data split and preprocess

In [6]:
# ?
# reshape
# normalize
# convert class vectors to binary class matrices

### Create Simple net (Single Layer NN)

In [8]:
# create simple network (without hidden layers), name it model
# train it with sgd optimizer for 200 epoch and 128 batch size
# use validation_split parameter for fit function with the value 0.2  

In [None]:
score = model.evaluate(X_test, Y_test, verbose=1)
print("Test score:", score[0])
print('Test accuracy:', score[1])

### Improving the simple net with hidden layers

In [9]:
# create network with two hidden layers (128 neuron in each) and 'relu' activation function, name it model_2
# train it with sgd optimizer for 20 epoch and 128 batch size
# use validation_split parameter for fit function with the value 0.2  

In [None]:
score_2 = model_2.evaluate(X_test, Y_test, verbose=1)
print("Test score:", score_2[0])
print('Test accuracy:', score_2[1])

### Further improving with dropout

In [10]:
# create network with two hidden layers (128 neuron in each) and 'relu' activation function, name it model_3
# add Dropout with 0.3 after each hidden layer
# train it with sgd optimizer for 20 epoch and 128 batch size
# use validation_split parameter for fit function with the value 0.2  

In [11]:
# recreate the prevoius net but train it for 250 epoch

In [None]:
score = model_3.evaluate(X_test, Y_test, verbose=1)
print("Test score:", score[0])
print('Test accuracy:', score[1])