# 2-2 Assignment: Identifying Hand-written Digits
### Marc Anthony Aradillas

In [1]:
from __future__ import print_function
import numpy as np
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.optimizers import SGD
from keras.utils import np_utils
np.random.seed(1671)  # for reproducibility

# Network and training
NB_EPOCH = 20
BATCH_SIZE = 128
VERBOSE = 1
NB_CLASSES = 10  # number of outputs = number of digits
OPTIMIZER = SGD()  # optimizer
N_HIDDEN = 128
VALIDATION_SPLIT = 0.2  # how much TRAIN is reserved for VALIDATION

# Data: shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
RESHAPED = 784

X_train = X_train.reshape(60000, RESHAPED)
X_test = X_test.reshape(10000, RESHAPED)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

# Normalize
X_train /= 255
X_test /= 255
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# Convert class vectors to binary class matrices
y_train = np_utils.to_categorical(y_train, NB_CLASSES)
y_test = np_utils.to_categorical(y_test, NB_CLASSES)

# Build the model
model = Sequential()
model.add(Dense(N_HIDDEN, input_shape=(RESHAPED,)))
model.add(Activation('relu'))
model.add(Dense(N_HIDDEN))
model.add(Activation('relu'))
model.add(Dense(NB_CLASSES))
model.add(Activation('softmax'))
model.summary()

# Compile the model
model.compile(loss='categorical_crossentropy',
              optimizer=OPTIMIZER,
              metrics=['accuracy'])

# Train the model
history = model.fit(X_train, y_train,
                    batch_size=BATCH_SIZE, epochs=NB_EPOCH,
                    verbose=VERBOSE, validation_split=VALIDATION_SPLIT)

# Evaluate the model
score = model.evaluate(X_test, y_test, verbose=VERBOSE)
print("Test score:", score[0])
print("Test accuracy:", score[1])


Using TensorFlow backend.


Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz
60000 train samples
10000 test samples
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 128)               100480    
_________________________________________________________________
activation_1 (Activation)    (None, 128)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 128)               16512     
_________________________________________________________________
activation_2 (Activation)    (None, 128)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 10)                1290      
_________________________________________________________________
activation_3 (Activation)    (None, 10)                0         
Total param

### Accuracy rate for the training, validation, and test data sets with two hidden layers.

In [2]:
# Extract training accuracy from the history object
training_accuracy = history.history['accuracy'][-1]  # Last epoch training accuracy
print(f"Training Accuracy: {training_accuracy * 100:.2f}%")

# Extract validation accuracy from the history object
validation_accuracy = history.history['val_accuracy'][-1]  # Last epoch validation accuracy
print(f"Validation Accuracy: {validation_accuracy * 100:.2f}%")

# Evaluate the model on the test dataset
test_score = model.evaluate(X_test, y_test, verbose=VERBOSE)
test_accuracy = test_score[1]
print(f"Test Accuracy: {test_accuracy * 100:.2f}%")

Training Accuracy: 94.56%
Validation Accuracy: 94.98%
Test Accuracy: 94.63%


## Experiment 1: Reduced Hidden Neurons (N_HIDDEN = 64)

In [3]:
N_HIDDEN = 64  # Reduced number of neurons in each hidden layer

# Define the model
model = Sequential()
model.add(Dense(N_HIDDEN, input_shape=(RESHAPED,)))  # First hidden layer
model.add(Activation('relu'))  # Activation function for the first hidden layer
model.add(Dense(N_HIDDEN))  # Second hidden layer
model.add(Activation('relu'))  # Activation function for the second hidden layer
model.add(Dense(NB_CLASSES))  # Output layer
model.add(Activation('softmax'))  # Softmax activation for multi-class classification
model.summary()  # Print the model summary

# Compile the model with loss function and optimizer
model.compile(loss='categorical_crossentropy',
              optimizer=OPTIMIZER,
              metrics=['accuracy'])

# Train the model
history = model.fit(X_train, y_train,  # Training data
                    batch_size=BATCH_SIZE,  # Batch size for training
                    epochs=NB_EPOCH,  # Number of epochs
                    verbose=VERBOSE,  # Verbosity of training output
                    validation_split=VALIDATION_SPLIT)  # Validation split

# Evaluate the model on the test data
score = model.evaluate(X_test, y_test, verbose=VERBOSE)
print("Test Accuracy (N_HIDDEN=64):", score[1])  # Print the test accuracy


Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_4 (Dense)              (None, 64)                50240     
_________________________________________________________________
activation_4 (Activation)    (None, 64)                0         
_________________________________________________________________
dense_5 (Dense)              (None, 64)                4160      
_________________________________________________________________
activation_5 (Activation)    (None, 64)                0         
_________________________________________________________________
dense_6 (Dense)              (None, 10)                650       
_________________________________________________________________
activation_6 (Activation)    (None, 10)                0         
Total params: 55,050
Trainable params: 55,050
Non-trainable params: 0
__________________________________________________

## Reduced Hidden Neurons
By reducing the number of neurons in the hidden layers to 64, the model's capacity to learn complex patterns is limited. This leads to lower accuracy on the training, validation, and test datasets. The reduced model is not able to effectively capture the intricacies of the MNIST dataset, which consists of the handwritten digits with subtle variations. This result aligns with the concepts discussed in Deep Learning with Keras (Chapter 1, Pages 22–24), where the authors emphasize the importance of having sufficient neurons in hidden layers to balance learning capacity and generalization. With fewer neurons, the model underfits the data, as it cannot extract and learn all the necessary patterns. As a result, both training and test accuracies drop compared to the baseline experiment.

## Experiment 2: Default Hidden Neurons (N_HIDDEN = 128)

In [4]:
N_HIDDEN = 128  # Default number of neurons in each hidden layer

# Define the model
model = Sequential()
model.add(Dense(N_HIDDEN, input_shape=(RESHAPED,)))  # First hidden layer
model.add(Activation('relu'))  # Activation function for the first hidden layer
model.add(Dense(N_HIDDEN))  # Second hidden layer
model.add(Activation('relu'))  # Activation function for the second hidden layer
model.add(Dense(NB_CLASSES))  # Output layer
model.add(Activation('softmax'))  # Softmax activation for multi-class classification
model.summary()  # Print the model summary

# Compile the model with loss function and optimizer
model.compile(loss='categorical_crossentropy',
              optimizer=OPTIMIZER,
              metrics=['accuracy'])

# Train the model
history = model.fit(X_train, y_train,  # Training data
                    batch_size=BATCH_SIZE,  # Batch size for training
                    epochs=NB_EPOCH,  # Number of epochs
                    verbose=VERBOSE,  # Verbosity of training output
                    validation_split=VALIDATION_SPLIT)  # Validation split

# Evaluate the model on the test data
score = model.evaluate(X_test, y_test, verbose=VERBOSE)
print("Test Accuracy (N_HIDDEN=128):", score[1])  # Print the test accuracy


Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_7 (Dense)              (None, 128)               100480    
_________________________________________________________________
activation_7 (Activation)    (None, 128)               0         
_________________________________________________________________
dense_8 (Dense)              (None, 128)               16512     
_________________________________________________________________
activation_8 (Activation)    (None, 128)               0         
_________________________________________________________________
dense_9 (Dense)              (None, 10)                1290      
_________________________________________________________________
activation_9 (Activation)    (None, 10)                0         
Total params: 118,282
Trainable params: 118,282
Non-trainable params: 0
________________________________________________

## Default Hidden Neurons

The configuration with 128 hidden neurons represents the baseline setup discussed in the book (Deep Learning with Keras, Pages 22–23). The configuration achieves a balanced performance by providing the network with enough capacity to learn complex patterns without overfitting. The training and validation accuracies are closely aligned, indicating that the model generalizes well to unseen data. The test accuracy is high, demonstrating that the model is effective at recognizing handwritten digits. As noted in the book, the selection of this parameter strikes a balance between model complexity and the risk of overfitting, making it an ideal choice for datasets of moderate complexity like MNIST.

## Experiment 3: Increased Hidden Neurons (N_HIDDEN = 256)

In [5]:
N_HIDDEN = 256  # Increased number of neurons in each hidden layer

# Define the model
model = Sequential()
model.add(Dense(N_HIDDEN, input_shape=(RESHAPED,)))  # First hidden layer
model.add(Activation('relu'))  # Activation function for the first hidden layer
model.add(Dense(N_HIDDEN))  # Second hidden layer
model.add(Activation('relu'))  # Activation function for the second hidden layer
model.add(Dense(NB_CLASSES))  # Output layer
model.add(Activation('softmax'))  # Softmax activation for multi-class classification
model.summary()  # Print the model summary

# Compile the model with loss function and optimizer
model.compile(loss='categorical_crossentropy',
              optimizer=OPTIMIZER,
              metrics=['accuracy'])

# Train the model
history = model.fit(X_train, y_train,  # Training data
                    batch_size=BATCH_SIZE,  # Batch size for training
                    epochs=NB_EPOCH,  # Number of epochs
                    verbose=VERBOSE,  # Verbosity of training output
                    validation_split=VALIDATION_SPLIT)  # Validation split

# Evaluate the model on the test data
score = model.evaluate(X_test, y_test, verbose=VERBOSE)
print("Test Accuracy (N_HIDDEN=256):", score[1])  # Print the test accuracy


Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_10 (Dense)             (None, 256)               200960    
_________________________________________________________________
activation_10 (Activation)   (None, 256)               0         
_________________________________________________________________
dense_11 (Dense)             (None, 256)               65792     
_________________________________________________________________
activation_11 (Activation)   (None, 256)               0         
_________________________________________________________________
dense_12 (Dense)             (None, 10)                2570      
_________________________________________________________________
activation_12 (Activation)   (None, 10)                0         
Total params: 269,322
Trainable params: 269,322
Non-trainable params: 0
________________________________________________

## Increased Hidden Neurons

Increasing the number of hidden neurons to 256 boosts the model's capacity to learn from the training data, leading to higher training accuracy. However, this increase also raises the risk of overfitting, as the model may begin to memorize the training data rather than generalizing to unseen examples. This is highlighted by a potential widening gap between training and validation accuracies. The book (Deep Learning with Keras, Page 24) discusses how larger models can improve performance but may require careful tuning to avoid overfitting. While the test accuracy might improve some, it can plateau or even decline if the model starts to overfit. This experiment demonstrates the trade-offs between increasing model complexity and maintaining generalization.

# --------------------------------------------------------------------------------------------------------------

## Conclusion:

Across the three experiments, it is evident that the number of neurons in the hidden layers significantly impacts model performance. A smaller network with 64 neurons **underfits** the data, leading to **lower accuracy** across all datasets. The baseline setup with 128 neurons strikes a good balance, achieving **high accuracy** while avoiding overfitting. Increasing the number of neurons to 256 provides a **slight improvement** in training accuracy but risks overfitting, as discussed in Deep Learning with Keras. This analysis underscores the importance of tuning hyperparameters like the number of neurons to optimize performance based on the dataset's complexity and size.