## ML Algorithms, Part 9, Neural Networks

Artificial Neural Networks were initially by our understang of biological Neural Networks within the human brain, and have since their (re-)discovery revolutionized Machine Learning (for their differences, take a look at [our previous article](https://www.theaispace.com/blog/difference-between-ai-machine-learning-and-deep-learning)). We will not talk about how optimization works (e.g. gradient descent), the types of layers, weights, matrix multiplications; these would assume some basic yet solid math knowledge, so we skip them.

There are many libraries and frameworks that help you write your own Neural Networks, and we already [discussed some of them](https://www.theaispace.com/blog/the-most-important-python-libraries-used-for-ai-and-machine-learning) before. Here we will use sklearn's MLP (Multi-Layer Perceptron, a very basic Neural Network) and we will leave Tensorflow (and its high-level interface: Keras - which we will demo here) for a future article. We will once again focus on the digits dataset.

#### Imports

In [1]:
from sklearn.neural_network import MLPClassifier # This is an ensemble model
from sklearn.svm import LinearSVC
from sklearn.metrics import accuracy_score
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import pandas as pd
from sklearn.model_selection import train_test_split

# we will work with the real MNIST dataset, and get it from keras
from keras.datasets import mnist

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [2]:
# Keras took care of splitting the data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

Since we're using sklearn here, we need to flatten the X data, and we'll just use a 6th of our training data and a 5th of our test data (out of speed concerns). Feel free to use all the data.

In [3]:
x_train = x_train.reshape((-1, 28*28))[:10000]
x_test = x_test.reshape((-1, 28*28))[:2000]

y_train = y_train[:10000]
y_test = y_test[:2000]

#### Model definition and fitting

In [4]:
NNet = MLPClassifier()

NNet.fit(x_train, y_train)

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

In [5]:
print("Training set accuracy: {:.2f}%".format(NNet.score(x_train, y_train)*100))

Training set accuracy: 99.32%


In [6]:
print("Test set accuracy: {:.2f}%".format(NNet.score(x_test, y_test)*100))

Test set accuracy: 88.20%


Let's compare with a LineraSVC (chosen over SVC due to performance) from before:

In [7]:
clf = LinearSVC()

clf.fit(x_train, y_train)

print("Training set accuracy: {:.2f}%".format(clf.score(x_train, y_train)*100))
print("Test set accuracy: {:.2f}%".format(clf.score(x_test, y_test)*100))

Training set accuracy: 96.01%
Test set accuracy: 81.65%


The results were quite better (~7% in this particular run) for the MLP, even without us playing around with the parameters. Let's try just that. For convenience, we wrap the four steps in a function with optional keyword arguments (`**params`). This is just a shortcut so we can pass whatever parameters we want (if you wonder what are the parameters you can specify, you can always [check the documentation on MLP](http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier), or any other algorithm for that matter)

In [8]:
def fit_neural_net(**params):
    NNet = MLPClassifier(**params)
    NNet.fit(x_train, y_train)
    print("Training set accuracy: {:.2f}%".format(NNet.score(x_train, y_train)*100))
    print("Test set accuracy: {:.2f}%".format(NNet.score(x_test, y_test)*100))

In [9]:
fit_neural_net(activation="logistic", solver='adam')

Training set accuracy: 95.72%
Test set accuracy: 90.85%


In [10]:
fit_neural_net(activation="tanh", solver='adam', alpha=0.05)

Training set accuracy: 91.19%
Test set accuracy: 86.30%


In [11]:
fit_neural_net(activation="relu", solver='adam', learning_rate="adaptive")

Training set accuracy: 99.80%
Test set accuracy: 88.15%


In [12]:
# Note that this will take some time, be prepared
fit_neural_net(hidden_layer_sizes=400, alpha=0.2, max_iter=300)

Training set accuracy: 99.05%
Test set accuracy: 91.55%


In [13]:
# You can safely ignore any warnings about convergence,
# all they tell you is that maybe you should train for more iterations
fit_neural_net(hidden_layer_sizes=(50,), max_iter=10, alpha=1e-4,
               solver='sgd', verbose=10, tol=1e-5, 
               random_state=1, learning_rate_init=.1)

Iteration 1, loss = 190556.97147976
Iteration 2, loss = 292002.47158714
Iteration 3, loss = 292794.39801545
Iteration 4, loss = 292783.92279442
Training loss did not improve more than tol=0.000010 for two consecutive epochs. Stopping.
Training set accuracy: 10.70%
Test set accuracy: 10.25%


Phew. Quite a few parameters to play around! In your Machine Learning endeavors you probably need to search for these parameters via cross-validation, grid searches, or randomly (*which is not the same as what we did here*), but these are more advanced topics.

Anyhow, we managed to get close to 100% training accuracy, and a test set accuracy of around ~91%, which is good but lower than what SVM achieved. Multi-Layer Perceptrons are cool, but very basic. Also, this it not the `digits` toy dataset we've used before, this is real deal.

Now, let's see how one of the more modern algorithms, a Convolutional Neural Network (or CNN, or ConvNets) model, performs. These are precisely built to work well with images. We'll use (just for the sake of demonstration) one of the more advanced ConvNets, `MobileNet`, that was designed for use in devices with lower computational power (it still consumes significant resources, but it much lighter than other Deep Neural Networks). Before that, though, we need to reshape our data to be in 3 dimensions (width, height, channels) since that is what a CNN usually expects (ConvNets, unlike other algorithms, are designed to work with images!). We also need to change our Y variables to categorical vectors (similar to what we used for text data; we use [keras' to_categorical](https://keras.io/utils/#to_categorical)). That said, we won't explain the parameters used here, or how CNNs work, that deserves a whole *course* on its own.

In [14]:
from keras.utils.np_utils import to_categorical

It creates a vector of length `num_classes` with the X element being equal to 1 and all the others 0. Here are a few examples so you can see how this works.

In [15]:
# starting from 0
to_categorical(3)

array([0., 0., 0., 1.], dtype=float32)

In [16]:
to_categorical(3, num_classes=5)

array([0., 0., 0., 1., 0.], dtype=float32)

In [17]:
to_categorical(3, num_classes=10)

array([0., 0., 0., 1., 0., 0., 0., 0., 0., 0.], dtype=float32)

#### Reformat the data

In [18]:
x_train = x_train.reshape((-1,28,28,1))
x_test = x_test.reshape((-1,28,28,1))

In [19]:
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

In [20]:
import keras
from keras.applications.mobilenet import MobileNet
from keras.models import Sequential
from keras.layers import UpSampling2D

In [21]:
model = Sequential()

# we need to upsample the images to use this model
model.add(UpSampling2D(size=(2,2), input_shape=(28,28,1)))

mobilenet = MobileNet(weights=None, classes=10, input_shape=(56,56,1))
model.add(mobilenet)

model.compile(loss="categorical_crossentropy",
              optimizer="Adam",
              metrics=['accuracy'])

# This will take a LONG time to train if you don't have a GPU
model.fit(x_train, y_train, epochs=10, verbose=1, validation_split=0.1)

Train on 9000 samples, validate on 1000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x20dee067f60>

We trained for only 10 epochs, and with virtually no parameter tuning so the results are quite good (~96% validation -and test (see below)- accuracy)! One thing that we would like to note here is that we used a `validation_split` parameter. This is essentially the same as using a test set, but we monitor it during training. Often, we tune our models based on how the model performs on the test data and this behavior causes "test set overfitting". We will instead use the actual test data only at the end, and tune the model based on the validation set performance:

In [22]:
loss, acc = model.evaluate(x_test, y_test)

print("Test set accuracy: {:.2f}%".format(acc*100))

Test set accuracy: 96.20%


In this article we covered just a very rough outline and we used off-the-shelf models without getting into any specifics. It is worth noting that the best models on MNIST are getting over 99.7% accuracy (with SVM's achieving ~99.5%)! Feel free to play around, and take a look at [this Wikipedia table](https://en.wikipedia.org/wiki/MNIST_database#Classifiers).

The field of Neural Networks is quite broad and covering it in one short article is impossible, but we saw them in practice. We hope that this fueled your interest to dig deeper into the fields of Machine Learning and A.I.!