In [7]:
from __future__ import print_function
import numpy as np
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.optimizers import SGD
from keras.utils import np_utils
np.random.seed(1671)  # for reproducibility

# Network and training
NB_EPOCH = 30 # Left at 30 for improved accuracy on tests moving forward
BATCH_SIZE = 64 # Left at 64 for improved accuracy on tests moving forward
VERBOSE = 1
NB_CLASSES = 10  # number of outputs = number of digits
OPTIMIZER = SGD()  # optimizer, explained later in this chapter
N_HIDDEN = 256
VALIDATION_SPLIT = 0.2  # how much TRAIN is reserved for VALIDATION
RESHAPED = 784

# Data: shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(60000, RESHAPED)
X_test = X_test.reshape(10000, RESHAPED)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

# Normalize
X_train /= 255
X_test /= 255

print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# Convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, NB_CLASSES)
Y_test = np_utils.to_categorical(y_test, NB_CLASSES)

# Build the model
model = Sequential()
model.add(Dense(N_HIDDEN, input_shape=(RESHAPED,)))
model.add(Activation('relu'))
model.add(Dense(N_HIDDEN))
model.add(Activation('relu'))
model.add(Dense(NB_CLASSES))
model.add(Activation('softmax'))

model.summary()

# Compile the model
model.compile(loss='categorical_crossentropy',
              optimizer=OPTIMIZER,
              metrics=['accuracy'])

# Train the model
history = model.fit(X_train, Y_train,
                    batch_size=BATCH_SIZE, epochs=NB_EPOCH,
                    verbose=VERBOSE, validation_split=VALIDATION_SPLIT)

# Evaluate the model
score = model.evaluate(X_test, Y_test, verbose=VERBOSE)
print("\nTest score:", score[0])
print('Test accuracy:', score[1])


60000 train samples
10000 test samples
Model: "sequential_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_19 (Dense)             (None, 256)               200960    
_________________________________________________________________
activation_19 (Activation)   (None, 256)               0         
_________________________________________________________________
dense_20 (Dense)             (None, 256)               65792     
_________________________________________________________________
activation_20 (Activation)   (None, 256)               0         
_________________________________________________________________
dense_21 (Dense)             (None, 10)                2570      
_________________________________________________________________
activation_21 (Activation)   (None, 10)                0         
Total params: 269,322
Trainable params: 269,322
Non-trainable params: 0
_________

# Experiment Results: Modifying Epochs, Batch Size, and Hidden Neurons

## Experiment 1: Modifying Epochs (Default Batch Size: 128)

I first experimented with different epoch values to see how the number of epochs impacts the training, validation, and test accuracy of the model. I tested 10, 20, and 30 epochs with the default batch size of 128.

### Results:

- **Epochs: 10**
  - Training Accuracy: 92.44%
  - Validation Accuracy: 92.81%
  - Test Accuracy: 92.61%

- **Epochs: 20**
  - Training Accuracy: 94.56%
  - Validation Accuracy: 94.98%
  - Test Accuracy: 94.63%

- **Epochs: 30**
  - Training Accuracy: 95.92%
  - Validation Accuracy: 95.74%
  - Test Accuracy: 95.47%

### Observations:
- Increasing the number of epochs led to an improvement in both training and test accuracy, though the rate of improvement slowed down after 20 epochs.
- **30 epochs** provided the best balance between training and test accuracy, with a test accuracy of 95.47%. Thus, I decided to use **30 epochs** for further experiments.

---

## Experiment 2: Modifying Batch Size (Fixed Epochs: 30)

After determining that 30 epochs gave the best performance, I experimented with different batch sizes (64, 128, and 256) to see how they impacted the accuracy.

### Results:

- **Batch Size: 64**
  - Training Accuracy: 97.52%
  - Validation Accuracy: 96.58%
  - Test Accuracy: 96.53%

- **Batch Size: 128 (Default)**
  - Training Accuracy: 95.92%
  - Validation Accuracy: 95.74%
  - Test Accuracy: 95.47%

- **Batch Size: 256**
  - Training Accuracy: 93.84%
  - Validation Accuracy: 93.89%
  - Test Accuracy: 93.89%

### Observations:
- **Batch Size 64**: Provided the best accuracy across all datasets (training, validation, and test). The smaller batch size led to more frequent weight updates, which likely contributed to the better learning performance, though it increased training time.
- **Batch Size 128**: The default configuration provided good results, but it did not perform as well as Batch Size 64.
- **Batch Size 256**: The larger batch size reduced the accuracy overall. While larger batches process more data per step, they lead to less frequent updates, which seems to have limited the model’s ability to learn as effectively.

---

## Experiment 3: Modifying Hidden Neurons (Fixed Epochs: 30, Batch Size: 64)

In this experiment, I varied the number of hidden neurons (64, 128, 256) to see how this parameter affects the model’s performance.

### Results:

- **Hidden Neurons: 64**
  - Training Accuracy: 97.00%
  - Validation Accuracy: 96.18%
  - Test Accuracy: 96.26%

- **Hidden Neurons: 128 (Default)**
  - Training Accuracy: 97.52%
  - Validation Accuracy: 96.58%
  - Test Accuracy: 96.53%

- **Hidden Neurons: 256**
  - Training Accuracy: 97.93%
  - Validation Accuracy: 96.95%
  - Test Accuracy: 96.97%

### Observations:
- **Hidden Neurons 64**: The performance decreased slightly compared to the default, with a test accuracy of 96.26%. The model had fewer parameters to train, which likely reduced its ability to capture complex patterns.
- **Hidden Neurons 128 (Default)**: This provided a good balance between model complexity and accuracy, yielding a test accuracy of 96.53%.
- **Hidden Neurons 256**: Increasing the number of hidden neurons improved the accuracy further, with the best test accuracy of 96.97%. However, this came at the cost of longer training times.

---

## Conclusion:
The model performs best with **30 epochs**, **batch size of 64**, and **256 hidden neurons**. This combination yielded the highest test accuracy of **96.97%**. Increasing the number of hidden neurons helped improve the model's capacity to learn complex patterns, while using a smaller batch size allowed for more frequent updates, leading to better performance.
