In [1]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

## Load data

In [2]:
from tensorflow.keras.datasets import mnist

(X_train, y_train), (X_test, y_test) = mnist.load_data()

Normalizing the data, bring it into the range between 0 and 1.

The motivation to normalize is to achieve consistency in dynamic range for a set of data, signals, or images to avoid mental distraction and reduce the data redundancy. Also, normalizing the data can help you improve the model performance.

In [5]:
X_train = X_train / 255.0
X_test = X_test / 255.0

Flatten data

Most of the the supervised learning algorithms that execute classification and regression tasks, as well as some deep learning models built for this purposes, are fed with two-dimensional data. Since we have our data as three-dimensional, we will need to flatten our data to make it two-dimensional.

In [6]:
X_train_flat = X_train.reshape(len(X_train), X_train.shape[1] * X_train.shape[2])
X_test_flat = X_test.reshape(len(X_test), 28*28) # since we already know the shape

## Building the model

In [7]:
model = tf.keras.Sequential([
    tf.keras.layers.Dense(10, input_shape=(X_train_flat.shape[1],), activation='sigmoid')
])
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 10)                7850      
Total params: 7,850
Trainable params: 7,850
Non-trainable params: 0
_________________________________________________________________


In [8]:
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

- `adam` is an optimization algorithm which is faster than Stochastic Gradient Descent. We know that Stochastic Gradient Descent (SGD in short) is just a type of Gradient Descent algorithm.

- `sparse_categorical_crossentropy` is a loss function similar to `binary_crossentropy` (discussed in Binary Classification Notebook), the only difference is that if the target variable is binary we use `binary_crossentropy` but if your target values are normal integers more then two, use sparse categorical crossentropy.

- The metrics used to evaluate the model is `accuracy`. Accuracy calculates how often the predictions calculated by the model are correct.

In [9]:
model.fit(X_train_flat, y_train, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x22449cd4b08>

In [10]:
model.evaluate(X_test_flat, y_test)



[0.26348546147346497, 0.9269000291824341]

In [11]:
y_predicted = model.predict(X_test_flat)
y_predicted[0]

array([1.6935246e-05, 1.1680738e-10, 5.0150542e-05, 1.0143608e-02,
       1.1119520e-06, 9.0983143e-05, 9.9387643e-10, 7.3930335e-01,
       5.9540536e-05, 8.0263615e-04], dtype=float32)

In [12]:
np.argmax(y_predicted[0])

7

### An actual deep network

In [13]:
model = tf.keras.Sequential([
    tf.keras.layers.Dense(100, input_shape=(784,), activation='relu'),
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.Dense(10, activation='sigmoid')
])
model.summary()

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(X_train_flat, y_train, batch_size= 128,epochs=5)

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 100)               78500     
_________________________________________________________________
dense_2 (Dense)              (None, 100)               10100     
_________________________________________________________________
dense_3 (Dense)              (None, 10)                1010      
Total params: 89,610
Trainable params: 89,610
Non-trainable params: 0
_________________________________________________________________
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x2244a249c08>

In [15]:
model.evaluate(X_test_flat,y_test)



[0.09775564074516296, 0.9710000157356262]

## Save and load the model

In [16]:
save_dir = "/results/"
model_name = 'keras_mnist.h5'
model.save(model_name)
model_path = save_dir + model_name
print('Saved trained model at %s ' % model_path)

Saved trained model at /results/keras_mnist.h5 
