# 《Deep Learning with Python》

# Chapter 2 

## 2.1  MNIST Example

In [2]:
from keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


### We first look at the training data

In [3]:
# The training data (28 * 28 pixels)

train_images.shape

(60000, 28, 28)

In [4]:
len(train_labels)

60000

In [5]:
# train labels are arrays of digits, ranging from 0 to 9
train_labels

array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

### Then the test data

In [6]:
test_images.shape

(10000, 28, 28)

In [7]:
len(test_labels)

10000

In [8]:
test_labels

array([7, 2, 1, ..., 4, 5, 6], dtype=uint8)

### Now we enter into the workflow to build the neural network

In [10]:
from keras import models
from keras import layers

network = models.Sequential()
network.add(layers.Dense(512, activation = 'relu', input_shape = (28 * 28,)))
network.add(layers.Dense(10, activation = 'softmax'))

# Here our network consists of two Dense layers (densely/fully connected)
# The second layer is a 10-way softmax layer, which will return an array of 10 probability scores(summing to 1)
# Each score is the probability that current digit image belongs to one of 10 digits

The core of neural network is the ***layer***--- a data processing module that works like a filter for data. 
Layers extract more useful *** representations *** out of the data fed into them.
Most of deep learning consists of chaining together simple layers that will implement a form of progressive *** data distillation ***.

### Then is the *compilation* step, which prepares the network for training. (Three things in compilation)
### 1) *A loss function* --- How to measure the performance of the network on training data and thus how to steer further
### 2) *An optimizer* --- mechanism through which the network will update itself based on data & loss function
### 3) *Metrics to monitor during training and testing* --- Here we only care about accuracy(fraction of images correctly classified)

In [11]:
network.compile(optimizer = 'rmsprop',
                loss = 'categorical_crossentropy',
                metrics = ['accuracy'])

In [12]:
# Before training, we first precoess data by reshaping them into the expected form of the network
# and scale all to the [0, 1] interval

train_images = train_images.reshape((60000, 28*28))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

Specifically, training_images were stored in an array of shape ***(60000, 28, 28)*** of type ***unit8*** with values in ***[0, 255]*** interval.
We transform it into a ***float32*** array of shape ***(60000, 28 &times; 28)*** with values in ***[0, 1]***

### We also need to categorically encode the labels

In [13]:
from keras.utils import to_categorical

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

###  We are  now ready to train the network, which in Keras is done via a call to the network's *fit* method

In [14]:
# We fit the model to its training data

network.fit(train_images, train_labels, epochs = 5, batch_size = 128)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7fcb94f64390>

### For the training data, we quickly reach an accuracy of 0.9887(98.9%)  Now we check the model on test data too

In [15]:
test_loss, test_acc = network.evaluate(test_images, test_labels)
print('test accuracy is:', test_acc)

test accuracy is: 0.9811999797821045


### The test-set accuracy is 98.1%,  a bit lower than the training-set accuracy. The gap between these two is an example of  *overfitting*---the fact that machine-learning models tend to perform worse on new data than on training data.