# 02 Introduction to Neural Networks
**Adapted from Deep Learning with Pyton by Francois Chollet**

https://github.com/fchollet/deep-learning-with-python-notebooks

In [None]:
# Comment_aditya: I think it will be good to add some more text to the steps carried out

Classify grayscale images of handwritten digits (28 pixels by 28 pixels), into their 10 categories (0 to 9) using the MNIST dataset.

In [1]:
import keras
keras.__version__

Using TensorFlow backend.


'2.2.4'

In [2]:
# load dataset
from keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

In [3]:
train_images.shape

(60000, 28, 28)

In [4]:
test_images.shape

(10000, 28, 28)

First we will present our neural network with the training data, `train_images` and `train_labels`. The 
network will then learn to associate images and labels. Finally, we will ask the network to produce predictions for `test_images`, and we 
will verify if these predictions match the labels from `test_labels`.

In [5]:
from keras import models
from keras import layers

network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))

This network consists of a sequence of two `Dense` layers. 
The second (and last) layer is a 10-way "softmax" layer, which means it will return an array of 10 probability scores (summing to 1). Each 
score will be the probability that the current digit image belongs to one of our 10 digit classes.

To make the network ready for training, we need to pick three more things, as part of "compilation" step:

* **loss function**: the is how the network will be able to measure how good a job it is doing on its training data.
* **optimizer**: this is the mechanism through which the network will update itself based on the data it sees and its loss function.
* **metrics**: to monitor during training and testing. Here we will only care about accuracy (the fraction of the images that were correctly classified).

In [6]:
network.compile(optimizer='rmsprop',
               loss = 'categorical_crossentropy',
               metrics=['accuracy'])

Before training, we will preprocess our data by reshaping it into the shape that the network expects, and scaling it so that all values are in the `[0, 1]` interval. We transform it into a `float32` array of shape `(60000, 28 * 28)` with values between 0 and 1.

In [7]:
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

We also need to categorically encode the labels:

In [8]:
from keras.utils import to_categorical

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

In [9]:
# training 
network.fit(train_images, train_labels, epochs=5, batch_size=128)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7fbb97b2a7f0>

We quickly reach an accuracy of 0.989 on the training data. Let's check that our model performs well on the test set too:

In [10]:
test_loss, test_acc = network.evaluate(test_images, test_labels)
print('test_acc:', test_acc)

test_acc: 0.9812


This gap between training accuracy and test accuracy is an example of "overfitting", 
the fact that machine learning models tend to perform worse on new data than on their training data. 

### Exercise:
Train the network with different epochs numbers and/or different batch sizes.