# Exercises

### 6. Adjust the batch size. Try a batch size of 10000. How does the required time change? What about the accuracy?

**Solution**

Find the line that declares the batch size.

Change batch_size from 100 to 10000.

    BATCH_SIZE = 10000
    
A bigger batch size results in slower training. That's what we expected from the theory. We are taking advantage of batching because of the amazing speed increase.

Notice that the validation accuracy starts from a low number and with 5 epochs actually **finishes** at a lower number. That's because there are **fewer** updates in a single epoch.

*Try a batch size of 30,000 or 50,000. That's very close to single batch GD for this problem. What do you think about the speed?You will need to change the max epochs to 100 (for instance), as 5 epochs won't be enough to train the model. What do you think about the speed of optimization?*

Change batch_size from 100 to 1.

batch_size = 1

**A batch size of 1 results in the SGD. It takes the algorithm very little time to process a single batch (as it is one data point**), but there are thousands of batches (**54000 to be precise**), thus the algorithm is actually slow. Remember that this depends on the number of cores that you train on. If you are using a CPU with 4 or 8 cores, you can only train 4 or 8 batches at once. The middle ground (mini-batching such as 100 samples per batch) is optimal.

Notice that the validation accuracy starts from a high number. That's because there are lots updates in a single epoch. Once the training is over, the accuracy is lower than all other batch sizes (SGD was an approximation).



1.   2 hidden layers, 50 layers size, act func 'relu', 'relu', batch size 100  
    Epoch 1/5 13s loss: 0.4045 accuracy: 0.8868 val_accuracy: 0.9408  
    Epoch 5/5 3s loss: 0.0910 accuracy: 0.9732 val_accuracy: 0.9717  
2.   2 hidden layers, 50 layers size, act func 'relu', 'relu', batch size 10000  
    Epoch 1/5 10s loss: 2.2362 accuracy: 0.1756 val_accuracy: 0.3723  
    Epoch 5/5 3s loss: 1.0924 accuracy: 0.7246 val_accuracy: 0.7882  
3.   2 hidden layers, 50 layers size, act func 'relu', 'relu', batch size 1  
    Epoch 1/5 168s loss: 0.2508 accuracy: 0.9252 val_accuracy: 0.9522  
    Epoch 5/5 139s loss: 0.1329 accuracy: 0.9672 val_accuracy: 0.9630  





# Deep Neural Network for MNIST Classification

## Import the relevant packages

In [1]:
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds


## Data

That's where we load and preprocess our data.

In [2]:

mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)

mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']

num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
num_validation_samples = tf.cast(num_validation_samples, tf.int64)

num_test_samples = mnist_info.splits['test'].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64)


def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255.

    return image, label

scaled_train_and_validation_data = mnist_train.map(scale)
test_data = mnist_test.map(scale)


BUFFER_SIZE = 10000

shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)
validation_data = shuffled_train_and_validation_data.take(num_validation_samples)
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)


BATCH_SIZE = 1

train_data = train_data.batch(BATCH_SIZE)
validation_data = validation_data.batch(num_validation_samples)
test_data = test_data.batch(num_test_samples)

validation_inputs, validation_targets = next(iter(validation_data))

## Model

### Outline the model
When thinking about a deep learning algorithm, we mostly imagine building the model. So, let's do it :)

In [3]:
input_size = 784
output_size = 10

hidden_layer_size = 50

model = tf.keras.Sequential([

    tf.keras.layers.Flatten(input_shape=(28, 28, 1)), # input layer

    # tf.keras.layers.Dense is basically implementing: output = activation(dot(input, weight) + bias)
    # it takes several arguments, but the most important ones for us are the hidden_layer_size and the activation function
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
    # tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 3rd hidden layer
    # tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 4th hidden layer
    # tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 5th hidden layer

    # the final layer is no different, we just make sure to activate it with softmax
    tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])

### Choose the optimizer and the loss function

In [4]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

### Training
That's where we train the model we have built.

In [5]:
NUM_EPOCHS = 5

model.fit(train_data, epochs=NUM_EPOCHS, validation_data=(validation_inputs, validation_targets), verbose =2)

Epoch 1/5
54000/54000 - 168s - loss: 0.2508 - accuracy: 0.9252 - val_loss: 0.1722 - val_accuracy: 0.9522 - 168s/epoch - 3ms/step
Epoch 2/5
54000/54000 - 141s - loss: 0.1589 - accuracy: 0.9543 - val_loss: 0.1421 - val_accuracy: 0.9593 - 141s/epoch - 3ms/step
Epoch 3/5
54000/54000 - 139s - loss: 0.1435 - accuracy: 0.9618 - val_loss: 0.1520 - val_accuracy: 0.9673 - 139s/epoch - 3ms/step
Epoch 4/5
54000/54000 - 140s - loss: 0.1387 - accuracy: 0.9648 - val_loss: 0.1498 - val_accuracy: 0.9622 - 140s/epoch - 3ms/step
Epoch 5/5
54000/54000 - 139s - loss: 0.1329 - accuracy: 0.9672 - val_loss: 0.1685 - val_accuracy: 0.9630 - 139s/epoch - 3ms/step


<keras.src.callbacks.History at 0x7d2476acb010>

## Test the model

In [6]:
test_loss, test_accuracy = model.evaluate(test_data)
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Test loss: 0.20. Test accuracy: 95.97%




1.   2 hidden layers, 50 layers size, act func 'relu', 'relu', batch size 100  
    Epoch 1/5 13s loss: 0.4045 accuracy: 0.8868 val_accuracy: 0.9408  
    Epoch 5/5 3s loss: 0.0910 accuracy: 0.9732 val_accuracy: 0.9717  
2.   2 hidden layers, 50 layers size, act func 'relu', 'relu', batch size 10000  
    Epoch 1/5 10s loss: 2.2362 accuracy: 0.1756 val_accuracy: 0.3723  
    Epoch 5/5 3s loss: 1.0924 accuracy: 0.7246 val_accuracy: 0.7882  
3.   2 hidden layers, 50 layers size, act func 'relu', 'relu', batch size 1  
    Epoch 1/5 168s loss: 0.2508 accuracy: 0.9252 val_accuracy: 0.9522  
    Epoch 5/5 139s loss: 0.1329 accuracy: 0.9672 val_accuracy: 0.9630  



