# Practice 1.1: Classical Neural Networks
MQIST 2025/26: Quantum Computing and Machine Learning
Alfredo Chavert Sancho
Pedro Herrero Maldonado

In [1]:
'Package Imports'
import numpy as np
from tensorflow.keras.utils import to_categorical
import keras 
from keras import layers
from keras.datasets import fashion_mnist

### Auxiliary functions

In [None]:
import matplotlib.pyplot as plt

def plot(axis, train, validation, title):
    # We create a list of epoch numbers from 1 to the length of the training set
    epochs = range(1, len(train) + 1)
    # Graph of the training data with a solid blue line
    axis.plot(epochs, train, 'b-o', label='Training ' + title)
    # Graph of the validation data with a red dashed line
    axis.plot(epochs, validation, 'r--o', label='Validation '+ title)
    # We set the title of the graph, the X and Y axis labels
    axis.set_title('Training and validation ' + title)
    axis.set_xlabel('Epochs')
    axis.set_ylabel(title)
    # We show the legend of the graph
    axis.legend()    

def multiplot(history):
    fig, axes = plt.subplots(1, 2)
    fig.set_figwidth(11)
    plot(axes[0], history.history['loss'], history.history['val_loss'], 'loss')
    plot(axes[1], history.history['accuracy'], history.history['val_accuracy'], 'accuracy')

    # We show the graphs on screen
    plt.show()

# Data loading and preprocessing

In [None]:
'Load the Fashion-MNIST dataset'
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

### Flatten the data

In [None]:
train_images = train_images.reshape((60000, 28*28))
test_images = test_images.reshape((10000, 28*28))

### Converting [0,255] integers to [0,1] floats

In [None]:
train_images = train_images.astype('float32') / 255
test_images = test_images.astype('float32') / 255

### One hot encoding

In [None]:
train_labels = to_categorical(train_labels)
# test_labels = to_categorical(test_labels) # Not needed for categorical_accuracy

### Divide the train data into train and validation sets

In [None]:
val_images = train_images[:10000]
val_labels = train_labels[:10000]
train_images = train_images[10000:]
train_labels = train_labels[10000:]

# Classical Neural Network Setup

In this section, we define the architecture of our classical neural network model. The model consists of an input layer, various hidden layers with activation functions, and an output layer with softmax activation for multi-class classification. The main goal of this section, in terms of performance, is to achieve the highest possible accuracy on the validation dataset even if the model overfits the training data.

This issue will be later addressed in the next part of the practice, where we will implement regularization techniques to improve the model's generalization capabilities.

### Structural hyperparameters
We will use the following structural hyperparameters for our classical neural network:
- Number of hidden layers: 
- Number of neurons per hidden layer: 
- Activation function: 

The justification for these choices is as follows:
- **Number of hidden layers**: 
- **Number of neurons per hidden layer**: 
- **Activation function**: 


### Learning hyperparameters
We will use the following learning hyperparameters for our classical neural network:
- Learning rate:
- Batch size:
- Number of epochs:
- Optimizer:
- Loss function:

The justification for these choices is as follows:
- **Learning rate**: A learning rate of 0.001 allows for gradual convergence to the optimal weights without overshooting.
- **Batch size**: A batch size of 64 balances memory efficiency and gradient estimation accuracy.
- **Number of epochs**: Training for 20 epochs provides enough iterations for the model to learn from the data without overfitting.
- **Optimizer**: The Adam optimizer is chosen for its adaptive learning rate capabilities, which help in faster convergence.
- **Loss function**: Categorical crossentropy is appropriate for multi-class classification tasks like Fashion-MNIST.

In [None]:
model = keras.Sequential(name='fashion_mnist')
# Input layer
model.add(layers.Input(shape=(28*28, )))
# Hidden layers
model.add(layers.Dense(50, name='hidden_1', activation='sigmoid')) # To change
model.add(layers.Dense(50, name='hidden_2', activation='sigmoid')) # To change
# Output layer
model.add(layers.Dense(10, name='output', activation='softmax'))

model.summary()

The learning hyperparameters are the following:

In [None]:
learning_rate = 0.001
opt = keras.optimizers.Adam(learning_rate = learning_rate)
loss_function = "categorical_crossentropy"
model.compile(optimizer = opt, loss = loss_function, metrics = ["accuracy"])
epochs = 25
batch_size = 512

Now the neural network is ready to be trained using the specified learning hyperparameters and tested with validation data 

In [None]:
history = model.fit(train_images,
                    train_labels,
                    epochs = epochs,
                    batch_size = batch_size,
                    validation_data = (val_images, val_labels))

We now plot the training and validation accuracy and loss curves to evaluate the model's performance over epochs.

In [None]:
history_dict = history.history
multiplot(history)

*Put your observations here*

Finally, we evaluate the model on the test dataset to determine its generalization performance.

In [None]:
from sklearn.metrics import classification_report
# Evaluate the model
predicted_values = model.predict(test_images)
predicted_classes = np.argmax(predicted_values, axis =1)
# Classification report for precision, recall and f1-score
report = classification_report(test_labels, predicted_classes, target_names = ['T-shirt/top', 
                                                                               'Trouser', 
                                                                               'Pullover', 
                                                                               'Dress', 
                                                                               'Coat', 
                                                                               'Sandal', 
                                                                               'Shirt', 
                                                                               'Sneaker', 
                                                                               'Bag', 
                                                                               'Ankle boot'])
# Print the evaluation metrics
print ("Classification Report:")
print (report)

## Regularization techniques

In this part of the practice, we will implement regularization techniques to improve the generalization capabilities of our classical neural network model. Regularization helps prevent overfitting by adding constraints to the model during training. Some common regularization techniques include dropout, L1/L2 regularization, and early stopping. We will experiment with these techniques to find the best combination that enhances the model's performance on unseen data.

### Early Stopping Implementation

The early stopping mechanism is implemented to monitor the validation loss during training. If the validation loss does not improve for a specified number of consecutive epochs (patience), the training process is halted to prevent overfitting. This technique helps in maintaining the model's ability to generalize well on unseen data by stopping the training at the optimal point before overfitting occurs.

### Dropout, batch normalization, weight regularization and initialization
Some other regularization techniques that can be implemented are dropout, batch normalization, weight regularization and initialization.
- **Dropout**: Randomly drops a fraction of neurons during training to prevent co-adaptation of neurons.
- **Batch Normalization**: Normalizes the inputs of each layer to stabilize learning and improve convergence.
- **Weight Regularization**: Adds a penalty to the loss function based on the magnitude of the weights (L1 or L2 regularization).
- **Weight Initialization**: Proper initialization of weights can help in faster convergence and better performance and can be added to the model to further enhance its generalization capabilities.

# Conclusions

Comment on the results obtained by each model.
- Make a reasoned comparison of the results obtained, where they have improved,
worsened, etc.
- Comment advantages, disadvantages of the different methods, conclusions
obtained and other aspects of interest.
- It is advisable to include a final graph or table summarizing all the results
obtained by the different models.