# Mathematical building blocks of neural networks

Going through the book "**Deeplearning with Python by Francois Chollet**"

## Loading the MINST dataset from keras

Build and train a neural network to classify handwritten digits 



In [1]:
# load MINST dataset from Keras (handwritting images)
import tensorflow as tf

# data is returned as training and test data
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()

# check the shape of the data
print(train_images.shape)
print(len(train_labels))
print(test_images.shape)
print(len(test_labels))
print(test_labels)

(60000, 28, 28)
60000
(10000, 28, 28)
10000
[7 2 1 ... 4 5 6]


## Neural network architecture

### Workflow 
- feed the neural network the training data, train_images and train_labels
- network will then learn to associate images and labels
- we ask the network to produce predictions for test_images, and we’ll
verify whether these predictions match the labels from test_labels

### Building blocks 
...of neural networks is the layer, a data-processing module that you can think of as a filter for data. Some data goes in, and it comes out in a more useful
form. Specifically, layers extract representations out of the data fed into them—hopefully, representations that are more meaningful for the problem at hand. Most of deep learning consists of chaining together simple layers that will implement a form of progressive data distillation. A deep-learning model is like a sieve for data processing, made of a succession of increasingly refined data filters—the layers.
Here, our network consists of a sequence of two Dense layers, which are densely
connected (also called fully connected) neural layers. The second (and last) layer is a 10-way softmax layer, which means it will return an array of 10 probability scores (summing to 1). Each score will be the probability that the current digit image belongs to one of our 10 digit classes.

### Compilation step
To make the network ready for training, we need to pick three more things, as part of the compilation step:
- A loss function—How the network will be able to measure its performance on
the training data, and thus how it will be able to steer itself in the right direction.
- An optimizer—The mechanism through which the network will update itself
based on the data it sees and its loss function.
- Metrics to monitor during training and testing—Here, we’ll only care about accuracy (the fraction of the images that were correctly classified).


In [3]:
# architecture, set the building blocks
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(512, activation = 'relu', input_shape = (28*28,)),
    tf.keras.layers.Dense(10, activation = 'softmax')
])
# compilation step
model.compile( 
    optimizer='rmsprop',
    loss = 'categorical_crossentropy',
    metrics = ['accuracy'] 
                )

## Preparing the data
Before training, we’ll preprocess the data by reshaping it into the shape the network expects and scaling it so that all values are in the [0, 1] interval. Previously, our training images, for instance, were stored in an array of shape (60000, 28, 28) of type uint8 with values in the [0, 255] interval. We transform it into a float32 array of shape (60000, 28 * 28) with values between 0 and 1.

We also need to categorically encode the labels, convert them from vector of integers to binary class matrix to be used with categorical_crossentropy

Data is now ready to train the network, this is done in keras by calling the fit method 

**Note**
the gap between training accuracy and test accuracy is an example of overfitting: the fact that machine-learning models tend to perform worse on new data than on their training data.  

In [4]:
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255
train_labels = tf.keras.utils.to_categorical(train_labels)

test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255
test_labels = tf.keras.utils.to_categorical(test_labels)

model.summary()

# now fit the model to its training data:
model.fit(train_images, train_labels, epochs=5, batch_size=128)

# Now let’s check that the model performs well on the test set
test_loss, test_acc = model.evaluate(test_images, test_labels)
print('test_acc:', test_acc)

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_2 (Dense)              (None, 512)               401920    
_________________________________________________________________
dense_3 (Dense)              (None, 10)                5130      
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
test_acc: 0.9763000011444092
