# MNIST demo using Keras CNN (GPU)

**Purpose**: Trains a simple ConvNet on the MNIST dataset using Keras using [Databricks Runtime for Machine Learning](https://databricks.com/blog/2018/06/05/distributed-deep-learning-made-simple.html)

**Source**: [`keras/examples/mnist_cnn.py`](https://github.com/keras-team/keras/blob/master/examples/mnist_cnn.py)

In [0]:
import warnings
warnings.filterwarnings("ignore")

In [0]:
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K


# Use TensorFlow Backend
import tensorflow as tf
tf.random.set_seed(42)

## Source Data: MNIST
These set of cells are based on the TensorFlow's [MNIST for ML Beginners](https://www.tensorflow.org/versions/r0.9/tutorials/mnist/beginners/index.html). 

In reference to `from keras.datasets import mnist` in the previous cell:

The purpose of this notebook is to use Keras (with TensorFlow backend) to **automate the identification of handwritten digits** from the  [MNIST Database of Handwritten Digits](http://yann.lecun.com/exdb/mnist/) database. The source of these handwritten digits is from the National Institute of Standards and Technology (NIST) Special Database 3 (Census Bureau employees) and Special Database 1 (high-school students).

In [0]:
# -----------------------------------------------------------
# Hyperparameters
batch_size = 128
num_classes = 10
epochs = 12


# -----------------------------------------------------------
# Image Datasets

# input image dimensions
img_rows, img_cols = 28, 28

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples


## What is the image?
Within this dataset, this 28px x 28px 3D structure has been flattened into an array of size 784. 

* `x_` contains the handwritten digit 
* `y_` contains the labels
* `_train` contains the 60,000 training samples
* `_test` contains the 10,000 test samples


For example, if you take the `25168`th element, the label for it is `y_train[25168,:]` indicates its the value `9`.

&nbsp;

In [0]:
# One-Hot Vector for y_train = 25168 representing the number 9 
#  The nth-digit will be represented as a vector which is 1 in the nth dimensions. 
y_train[25168,:]

Out[4]: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 1.], dtype=float32)

`x_train[25168,:]` is the array of 784 digits numerically representing the handwritten digit number `9`.

&nbsp;

In [0]:
from __future__ import print_function

# This is the extracted array for x_train = 25168 from the training matrix
xt_25168 = x_train[25168,:]

print(xt_25168)

[[[0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]]

 [[0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]]

 [[0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0. 

Let's print it as 28 x 28

&nbsp;

In [0]:
# As this is a 28 x 28 image, let's print it out this way
txt = ""
for i in range (0, 27):
   for j in range(0, 27):
      val = "%.3f" % xt_25168[i,j]
      txt += str(val).replace("[", "").replace("]", "") + ", "
   
   print(txt)
   txt = ""

0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0

You can sort of see the number **9** in there, but let's add a color-scale (the higher the number, the darker the value), you will get the following matrix:

<img src="https://dennyglee.files.wordpress.com/2018/09/nine.png" width=500/>

Here, you can access the [full-size version](https://dennyglee.files.wordpress.com/2018/09/nine.png) of this image.

## Oh where art thou GPU?

Or **[How can I run Keras on GPU?](https://keras.io/getting-started/faq/#how-can-i-run-keras-on-gpu)**: If you are running on the TensorFlow backends, your code will automatically run on GPU if any available GPU is detected.

In [0]:
import tensorflow.keras.backend as K

In [0]:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 14214727065236012914
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 15335948288
locality {
  bus_id: 1
  links {
  }
}
incarnation: 2058958696486690441
physical_device_desc: "device: 0, name: Tesla V100-PCIE-16GB, pci bus id: 0001:00:00.0, compute capability: 7.0"
xla_global_id: 416903419
, name: "/device:GPU:1"
device_type: "GPU"
memory_limit: 15335948288
locality {
  bus_id: 1
  links {
  }
}
incarnation: 13306949156392183180
physical_device_desc: "device: 1, name: Tesla V100-PCIE-16GB, pci bus id: 0002:00:00.0, compute capability: 7.0"
xla_global_id: 2144165316
]


## How fast did you say?

| Processor | Duration |
| --------- | -------- |
| GPU       | 1.87min  |
| CPU       | 23.08min |

## Convolutional Neural Networks
![](https://dennyglee.files.wordpress.com/2018/09/keras-cnn-activate.png)

1. The input layer is a grey scale image of 28x28 pixels. 
2. The first convolution layer maps one grayscale image to 32 feature maps using the activation function
3. The second convolution layer maps the image to 64 feature maps using the activation function
4. The pooling layer down samples image by 2x so you have a 14x14 matrix 
5. The first dropout layer delete random neurons (regularization technique to avoid overfitting)
6. The fully connected feed-forward maps the features with 128 neurons in the hidden layer
7. The second dropout layer delete random neurons (regularization technique to avoid overfitting)
8. Apply `softmax` with 10 hidden layers to identify digit.

In [0]:
def runCNN(activation, verbose):
  # Building up our CNN
  model = Sequential()
  
  # Convolution Layer
  model.add(Conv2D(32, kernel_size=(3, 3),
                 activation=activation,
                 input_shape=input_shape)) 
  
  # Convolution layer
  model.add(Conv2D(64, (3, 3), activation=activation))
  
  # Pooling with stride (2, 2)
  model.add(MaxPooling2D(pool_size=(2, 2)))
  
  # Delete neuron randomly while training (remain 75%)
  #   Regularization technique to avoid overfitting
  model.add(Dropout(0.25))
  
  # Flatten layer 
  model.add(Flatten())
  
  # Fully connected Layer
  model.add(Dense(128, activation=activation))
  
  # Delete neuron randomly while training (remain 50%) 
  #   Regularization technique to avoid overfitting
  model.add(Dropout(0.5))
  
  # Apply Softmax
  model.add(Dense(num_classes, activation='softmax'))

  # Loss function (crossentropy) and Optimizer (Adadelta)
  model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

  # Fit our model
  model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=verbose,
          validation_data=(x_test, y_test))

  # Evaluate our model
  score = model.evaluate(x_test, y_test, verbose=0)
  
  # Return
  return score

### Using sigmoid

In [0]:
score_sigmoid = runCNN('sigmoid', 0)
print('Test loss:', score_sigmoid[0])
print('Test accuracy:', score_sigmoid[1])

Test loss: 2.3008954524993896
Test accuracy: 0.11349999904632568


### Using tanh

In [0]:
score_tanh = runCNN('tanh', 0)
print('Test loss:', score_tanh[0])
print('Test accuracy:', score_tanh[1])

Test loss: 0.5477973222732544
Test accuracy: 0.868399977684021


### Using ReLu

In [0]:
score_relu = runCNN('relu', 1)

Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12


Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


In [0]:
print('Test loss:', score_relu[0])
print('Test accuracy:', score_relu[1])

Test loss: 0.7120862007141113
Test accuracy: 0.8497999906539917
