# Deep Learning with Python 
## Chapter 2, Example 1 - MNIST Digit Recognition

In [3]:
import tensorflow as tf
import keras

In [15]:
# Keras installed as a tensorflow sublibary
from tensorflow.keras.datasets import mnist

In [16]:
# train/test split
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


### Training and Test Data

In [17]:
train_images.shape

(60000, 28, 28)

Multidimensional array of 60k samples, each of which is a `28x28 numpy` array of pixel values.

In [20]:
# Confirm number of training labels same as number of training samples
len(train_labels)

60000

In [21]:
train_labels

array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

In [24]:
# Same for test data
print("Test Images shape = " + str(test_images.shape))
print("Length of test labels = " + str(len(test_labels)))
print("Test Labels are " + str(test_labels))

Test Images shape = (10000, 28, 28)
Length of test labels = 10000
Test Labels are [7 2 1 ... 4 5 6]


### Building the Network

In [25]:
from tensorflow.keras import models
from tensorflow.keras import layers

In [28]:
network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax')) # returns 10 probability scores

We're creating a `Sequential` neural network model in which each `layer` object is added to the model in sequence with the `add` function. 

A `layer` is the fundamental building block of a neural network. A layer is a data processing module that extracts representations out of the data fed into it. These representations are supposed to be more useful for the task at hand. Since each layer can be thought of as a filter/data distillation module, chaining together sequences of layers creates increasingly refined filters.

### Compile the Network
- `loss` function defines how the network is going to measure its performance i.e. how it will compute the difference between its predicted output and the actual output
- `optimizer` is the algorithm or procedure that will use the output of the loss function to modify the parameters of the neural network to minimise loss function output.
- `metrics` define the quantitative measures we will use to asses network performane. In this case, `accuracy` i.e. $\frac{(TP + TN)}{(TP + TN + FP + FN)}$ is a measure of the **true** classifications made by the neural network.

In [29]:
network.compile(optimizer='rmsprop', 
               loss='categorical_crossentropy', 
               metrics=['accuracy'])

### Preprocessing Data

In [39]:
# Reshape training data and cast to float 32
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255

# Reshape testing data and cast to float 32
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

### Encoding Labels

In [40]:
from tensorflow.keras.utils import to_categorical

In [41]:
# Do this for both training and test labels
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

### Test the Model

In [43]:
train_labels.shape

(60000, 10, 2, 2)

In [42]:
network.fit(train_images, train_labels, epochs=5, batch_size=128)

Epoch 1/5


InvalidArgumentError: Incompatible shapes: [128,10,2] vs. [128]
	 [[{{node metrics/acc/Equal}}]]