This notebook translates example code from *Hands-on Machine Learning* by A. Geron into the Keras deep learning library.
This code uses the *sequential* Keras API.

In [1]:
import keras
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.optimizers import SGD
from sklearn.preprocessing import StandardScaler
import numpy as np

Using TensorFlow backend.


Thankfully, `keras` packages the MNIST dataset for easy startup...

In [2]:
from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
print("Data shapes:\n Training input = {0}\n Testing input = {1}".format(X_train.shape, X_test.shape))

Data shapes:
 Training input = (60000, 28, 28)
 Testing input = (10000, 28, 28)


Let's grab an image at random to see what is inside the data set.

In [3]:
import matplotlib
import matplotlib.pyplot as plt
from random import randint
idx = randint(0,X_train.shape[0])
print("Training instance {0}".format(idx))
plt.imshow(X_train[idx], cmap = matplotlib.cm.binary)
plt.show()

Training instance 5355


<Figure size 640x480 with 1 Axes>

The Keras library is used in three larger phases: 1) model structure specification, 2) model optimization (fitting) process specification, and finally 3) model training.
First up: **model structure specification**.

In [4]:
# Normalizing the data for the neural network training process.
# Each pixel is now a float64 data type
scaler = StandardScaler()
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train_scale = scaler.fit_transform(X_train)
X_test_scale = scaler.transform(X_test)
print("Feature data type = {0}".format(type(X_train_scale[0][0])))



Feature data type = <class 'numpy.float64'>


Also need to modify the labels to be one hot vectors. For example, the label 3 would result in a vector:
```
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
```

In [5]:
num_classes = len(np.unique(y_train))
y_train_enc = keras.utils.to_categorical(y_train, num_classes)
y_test_enc = keras.utils.to_categorical(y_test, num_classes)
print("Label as number: {0}".format(y_train[0]))
print("Label as one-hot vector: {0}".format(y_train_enc[0]))

Label as number: 5
Label as one-hot vector: [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]


In [6]:
# This model will be a simple multi-layer perceptron - single hidden layer
model = Sequential()
model.add(Dense(196, activation='relu', input_shape=(784,)))
model.add(Dense(num_classes, activation='softmax'))
model.summary()

Instructions for updating:
Colocations handled automatically by placer.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 196)               153860    
_________________________________________________________________
dense_2 (Dense)              (None, 10)                1970      
Total params: 155,830
Trainable params: 155,830
Non-trainable params: 0
_________________________________________________________________


Keras next performs the **model optimization specification** phase that provides the loss function, its [optimizer](https://keras.io/optimizers/), and the desired [metric](https://keras.io/metrics/#available-metrics).

In [7]:
model.compile(loss='categorical_crossentropy', optimizer=SGD(), metrics=['accuracy'])

Now the model will be trained. The `batch_size` parameter carries how many data points are drawn from the training set per optimization step. The number of `epochs` specifies how many times the whole `X_train_scale` and `y_train_enc` data is used.

In [8]:
train_hist = model.fit(X_train_scale, y_train_enc, batch_size=256, epochs=10, 
                       verbose=1, validation_data=(X_test_scale, y_test_enc))

Instructions for updating:
Use tf.cast instead.
Train on 60000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [9]:
score = model.evaluate(X_test_scale, y_test_enc, verbose=0)

In [10]:
print("Loss: {0}\nAccuracy: {1}".format(score[0], score[1]))

Loss: 0.22000497583001852
Accuracy: 0.9382


Not bad performance for a very simple NN architecture.