# Neural Network models using the MNIST dataset
The Modified National Institute of Standards and Technology (MNIST) dataset is an image set of handwritten digits, where each image has target values.
I will not use data augmentations to focus on the basic algorithms for the NN models.

In [1]:
import numpy as np
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout, Flatten
from keras import backend as K
import tensorflow as tf
from keras.initializers import RandomUniform
from keras.optimizers import SGD

import matplotlib.pyplot as plt

Using TensorFlow backend.


In [2]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

Below are the dimensions of the training and testing images

In [3]:
print('Initial example dimensions')
print("x_train shape", x_train.shape)
print("y_train shape", y_train.shape)
print("x_test shape", x_test.shape)
print("y_test shape", y_test.shape)

Initial example dimensions
x_train shape (60000, 28, 28)
y_train shape (60000,)
x_test shape (10000, 28, 28)
y_test shape (10000,)


## Preparing Data
In practice, one should normalize images for convenience in computations.

In [4]:
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
new_y_train = y_train.reshape(60000)
new_y_test = y_test.reshape(10000)
normalized_x_train = x_train / 255
normalized_x_test = x_test / 255

In [5]:
print("New dimensions")
print("flat_X_train shape:", normalized_x_train.shape)
print("flat_X_test shape:", normalized_x_test.shape)
print("new_Y_train shape:", new_y_train.shape)
print("new_Y_test shape:", new_y_test.shape)

New dimensions
flat_X_train shape: (60000, 784)
flat_X_test shape: (10000, 784)
new_Y_train shape: (60000,)
new_Y_test shape: (10000,)


## Binary Classification using Linear Regression
In this model, I classify whether an image is a  number or not. Here, I am detecting images whether it is a number 1 or not.

In [6]:
training = normalized_x_train
testing = normalized_x_test
labels = np.where(y_train==0,1,-1)
testlabel = np.where(y_test==0,1,-1)

In [7]:
import keras.backend as K
from keras.optimizers import SGD
import numpy.random as npr

model = Sequential()
model.add(Dense(1, input_shape=(784,), activation='linear'))
model.summary()

opt = keras.optimizers.SGD(lr=.01)

model.compile(optimizer=opt, loss='mse', metrics=['accuracy'])
history = model.fit(training, labels, epochs=70, batch_size=2200,verbose=0)
w = model.layers[0].get_weights()
loss_values = history.history['loss']
epochs = range(1, len(loss_values)+1)

model.evaluate(x=testing,y=testlabel)

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 1)                 785       
Total params: 785
Trainable params: 785
Non-trainable params: 0
_________________________________________________________________


[0.10705387117266656, 0.8924999833106995]

Below shows the loss function graph

In [None]:
plt.plot(epochs, loss_values, label='Training Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

Let's compare the results using a general Linear Regression algorithm

In [None]:
reg = LinearRegression(fit_intercept=True).fit(training, labels)
y_pred = reg.predict(testing)

# acc_LR = r2_score(testing,y_pred)
acc_LR = accuracy_score(testlabel,y_pred.round(), normalize=True)
print('Accuracy:',acc_LR)

The general Linear Regression method already finds the best possible linear seperable line with coefficient vector $\hat{\textbf{w}}$ by using the following equation. 

$$ \hat{\textbf{w}} = (\textbf{X}^T \textbf{X})^{-1} \textbf{X}^T \textbf{y}$$

where $\textbf{X}$ represents the training data and $\textbf{y}$ as the target label vector. In a way, this equation finds the global minumum of the loss function.
The NN method has to optmize its weights, or the slope and y-intercept value of the line in order to generate the best seperable line. 

## Binary Classification using Support Vector Machine (SVM)
To make this, we use a Ridge Regression, or an l2-regularizer. We also use a hinge loss function.

In [None]:
training = normalized_x_train
labels = np.where(y_train==0,1,-1)
testlabel = np.where(y_test==0,1,-1)

In [None]:
model_linsvm = Sequential()
model_linsvm.add(Dense(1, input_shape=(784,), activation='softsign',kernel_initializer=RandomUniform(minval=-.2, maxval=.2),bias_initializer=RandomUniform(minval=-.1, maxval=.1),kernel_regularizer = keras.regularizers.l2(l=.001)))
# model_linsvm.add(Dense(1,kernel_regularizer = keras.regularizers.l2(l=0.5/4), activation='softsign'))

learning_rate = 0.05
batch_size = 3000
epochs = 100
model_linsvm.compile(optimizer=keras.optimizers.SGD(lr=learning_rate,momentum=.01), loss='hinge', metrics=['accuracy'])
history = model_linsvm.fit(training, labels, epochs=epochs, batch_size=batch_size,verbose=1,shuffle=False)
model.evaluate(x=testing,y=testlabel)
w = model.layers[0].get_weights()
loss_values = history.history['loss']
valloss_values = history.history['val_loss']
epochs = range(1, len(loss_values)+1)

model_linsvm.evaluate(x=x_test, y=testlabel)

Below shows the loss function

In [None]:
plt.plot(epochs, loss_values, label='Training Loss')
plt.plot(epochs, valloss_values, label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

Let's compare this results using the regular SVM algorithm

In [None]:
from sklearn import svm
from sklearn.metrics import accuracy_score
clf = svm.SVC(kernel='poly',verbose=0)
clf.fit(training, labels)
y_pred = clf.predict(testing)
print('Acc:', accuracy_score(y_pred, testlabel))

## Multi-class Classification problem using SVM
In a multi-class case, the SVM-NN algortihm can generate a nonlinear separable curve. We can extend our current understaning in the perceptron model by adding addional layers. The first layer are input nodes. We will then introduce nonlinearity by applying a Ridge Regession, also called l2 regularizer, and a Rectified Linear Unit (ReLU) activation function. Each node will then be passed through a Linear-SVM algorithm. We classify examples by selecting the highest values in the output layer.

Multi-class classification requires addional steps. A process called *One-Hot encoding* is used to label multi-class datasets.

In [47]:
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

In [None]:
model = Sequential([
    Dense(50, input_shape=(784,), activation='relu',kernel_initializer=RandomUniform(minval=-.2, maxval=.2),bias_initializer=RandomUniform(minval=-.1, maxval=.1),kernel_regularizer = keras.regularizers.l2(l=0.125/4))])
model.add(Dense(10, input_shape=(50,), activation='softmax'))
learning_rate = 0.07
epochs = 100
opt = SGD(lr=learning_rate)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
# batch_size = np.int32(5/(learning_rate**2))
batch_size = 500
print(batch_size)
history = model.fit(training, labels, epochs=epochs, batch_size=batch_size, verbose=0)
model.evaluate(x=testing,y=testlabel)
w = model.layers[0].get_weights()
loss_values = history.history['loss']
valloss_values = history.history['val_loss']
epochs = range(1, len(loss_values)+1)

model.evaluate(x=x_test, y=y_test)

Below shows the loss function

In [None]:
plt.plot(epochs, loss_values, label='Training Loss')
plt.plot(epochs, valloss_values, label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

Below uses the SVM algorithm 

In [37]:
from sklearn import svm
from sklearn.metrics import accuracy_score
clf = svm.SVC(kernel='poly',verbose=0)
clf.fit(x_train, y_train)
y_pred = clf.predict(x_test)
print('Acc:', accuracy_score(y_pred, y_test))

Acc: 0.9771


The general SVM algorithm classifies examples very well. However, this method is computationally heavy for large datasets. Using a Neural Network that imitates SVM can reduce the taxing computations, while using the technique of it.