# Project 1

This describes the first bigger programming project in the course, devoted to artificial
neural networks.
Application: Recognizing handwritten numbers
Most people effortlessly recognise these digits as 504192. This ease is deceptive. The
difficulty of visual pattern recognition becomes obvious when trying to write a computer
program to recognise digits like the above. What seems easy when we do it ourselves
suddenly becomes extremely difficult. Simple notions about how we recognise shapes -
“a 9 has a loop at the top and a vertical line at the bottom right” - turn out to be not
so easy to express algorithmically. If you try to specify such rules, you quickly get lost
in a quagmire of exceptions, restrictions and special cases. It seems hopeless.
Neural networks approach the problem differently. The idea is to use a large number of
handwritten digits, called training examples, and then develop a system that can learn
from these training examples. In other words, the neural network uses the examples to
automatically derive rules for recognising handwritten digits. In addition, by increasing
the number of training examples, the network can learn more about handwriting and
thus improve its accuracy.

In [9]:
# Imports
import numpy as np
import pickle

### Task 1

Implement a feedforward neural network (as a class) consisting of 3 layers (input,
hidden, output layer), where each layer can contain any number of neurons. Use
the sigmoid function as the activation function.

### Task 3

Implement the stochastic gradient method (SGD) to train the network. The implementation of the SGD should allow for different mini-batch sizes and different
numbers of epochs. An epoch is the complete pass of the training data through
the learning algorithm.

### Task 4
Implement the backpropagation algorithm (used in SGD to effectively calculate
the derivative).

In [81]:
# Feedforward NN Class
class FeedForwardNeuralNetwork():
    def __init__(self, input_size, hidden_size, output_size):
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
            
        self.weights_input_hidden = np.random.randn(self.input_size, self.hidden_size)
        self.bias_hidden = np.random.randn(self.hidden_size)
        self.weights_output_hidden = np.random.randn(self.hidden_size, self.output_size)
        self.bias_output = np.random.randn(self.output_size)

    def sigmoid(self, X):
        return 1 / (1 + np.exp(-X))

    def forward(self, input_data):
        hidden_input = np.dot(input_data, self.weights_input_hidden) + self.bias_hidden
        hidden_output = self.sigmoid(hidden_input)

        output = np.dot(hidden_output, self.weights_output_hidden) + self.bias_output
        network_output = self.sigmoid(output)

        return network_output
    
    def train_sgd(self, X_train, y_train, X_val, Y_val_onehot, learning_rate, batch_size, epochs):
        best_val_loss = 100
        for epoch in range(epochs):
            # Shuffle the dataset
            indices = np.arange(len(X_train))
            np.random.shuffle(indices)
            total_loss = []

            print(f'----------epoch: {epoch}------------')

            for i in range(0, len(X_train), batch_size):
                # Create mini-batches
                batch_indices = indices[i:i+batch_size]
                X_batch = X_train[batch_indices]
                y_batch = y_train[batch_indices]

                # Forward pass
                hidden_input = np.dot(X_batch, self.weights_input_hidden) + self.bias_hidden
                hidden_output = self.sigmoid(hidden_input)
                output = np.dot(hidden_output, self.weights_output_hidden) + self.bias_output
                network_output = self.sigmoid(output)

                # Backpropagation
                # Calculate loss and gradients
                # quadratic function (square loss)
                loss = np.mean(0.5 * (network_output - y_batch) ** 2, axis=0)
                total_loss.append(np.mean(loss))  # Average over the mini-batch

                error = network_output - y_batch
                # calculate the gradient (derivative) of the loss with respect to the output of the neural network's final layer.
                d_output = error * network_output * (1 - network_output)
                # calculate the gradient of the loss with respect to the hidden layer's activations.
                d_hidden = np.dot(d_output, self.weights_output_hidden.T) * hidden_output * (1 - hidden_output)

                # Update weights and biases
                self.weights_output_hidden -= learning_rate * np.dot(hidden_output.T, d_output)
                self.bias_output -= learning_rate * np.sum(d_output, axis=0)
                self.weights_input_hidden -= learning_rate * np.dot(X_batch.T, d_hidden)
                self.bias_hidden -= learning_rate * np.sum(d_hidden, axis=0)
                
            print(f'mean total loss: {np.mean(total_loss)}')
                # Evaluate the model on the validation set
            val_loss, val_accuracy = self.evaluate(X_val, Y_val_onehot)
            print(f"Epoch {epoch+1}: Validation Loss = {val_loss:.4f}, Validation Accuracy = {val_accuracy:.4f}")
    
            # Check for early stopping or other criteria to save the best model
            if val_loss < best_val_loss:
                best_val_loss = val_loss
                woh =  self.weights_output_hidden
                bo = self.bias_output
                wih = self.weights_input_hidden
                bh = self.bias_hidden

        # Update weights to the best weights
        self.weights_output_hidden = woh 
        self.bias_output = bo
        self.weights_input_hidden = wih
        self.bias_hidden = bh

    def evaluate(self, X_val, Y_val):
        num_examples = len(X_val)
        # Forward pass on the validation data
        val_predictions = self.predict(X_val)
        # Calculate the categorical cross-entropy loss
        val_loss = -np.sum(Y_val * np.log(val_predictions + 1e-15)) / num_examples

        # Convert predicted probabilities to predicted class labels (0-9)
        val_predictions_class = np.argmax(val_predictions, axis=1)
        
        # Convert true labels (one-hot encoded) to true class labels (0-9)
        Y_val_class = np.argmax(Y_val, axis=1)

        # Calculate accuracy
        val_accuracy = np.sum(val_predictions_class == Y_val_class) / num_examples

        return val_loss, val_accuracy

        
    def predict(self, input_data):
        # Perform a forward pass to generate predictions
        hidden_input = np.dot(input_data, self.weights_input_hidden) + self.bias_hidden
        hidden_output = self.sigmoid(hidden_input)
        output = np.dot(hidden_output, self.weights_output_hidden) + self.bias_output
        network_output = self.sigmoid(output)

        # Return the predicted values
        return network_output



### Task 2

Reading in MNIST data (provided in canvas). The data is separated into training
data (50 000), validation data (10 000), and test data (10 000).

In [82]:
# Load MNIST data from mnist.pkl
with open('mnist.pkl', 'rb') as f:
    mnist_data = pickle.load(f, encoding='latin1')

# Extract training, validation, and test sets
train_data, valid_data, test_data = mnist_data

# Unpack the data into inputs and labels
X_train, Y_train = train_data
X_val, Y_val = valid_data
X_test, Y_test = test_data

# Convert inputs to numpy arrays
X_train = np.array(X_train)
X_val = np.array(X_val)
X_test = np.array(X_test)

# Convert labels to numpy arrays
Y_train = np.array(Y_train)
Y_val = np.array(Y_val)
Y_test = np.array(Y_test)


### Normalize 

In [None]:

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
X_train_normalized = scaler.fit_transform(X_train)
X_val_normalized = scaler.transform(X_val)
X_test_normalized = scaler.transform(X_test)

### Onehot encode the labels

In [83]:
from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder(sparse_output=False)
Y_train_onehot = encoder.fit_transform(Y_train.reshape(-1, 1))
Y_val_onehot = encoder.fit_transform(Y_val.reshape(-1, 1))

np.shape(Y_train_onehot), np.shape(X_train)

((50000, 10), (50000, 784))

### Task 5
Train and test the accuracy of the network for the following parameters:

• Input layer with 784 + 1 neurons

• hidden layer with 30 + 1 neurons

• Output layer with 10 neurons

As loss function use the quadratic function (square loss),
where (x, y) is a pair of training data, n the amount of used training data, and hw
represents the neural network.

In [84]:
nn = FeedForwardNeuralNetwork(784, 30, 10)

nn.train_sgd(X_train, Y_train_onehot, X_val, Y_val_onehot, 0.1, 1, 100)


----------epoch: 0------------
mean total loss: 0.02643615672831189
Epoch 1: Validation Loss = 3.2324, Validation Accuracy = 0.6656
----------epoch: 1------------
mean total loss: 0.018135396144793193
Epoch 2: Validation Loss = 2.6348, Validation Accuracy = 0.7541
----------epoch: 2------------
mean total loss: 0.015555680781183162
Epoch 3: Validation Loss = 2.5913, Validation Accuracy = 0.7609
----------epoch: 3------------
mean total loss: 0.014774703651921604
Epoch 4: Validation Loss = 2.5630, Validation Accuracy = 0.7649
----------epoch: 4------------
mean total loss: 0.014290993677797273
Epoch 5: Validation Loss = 2.5704, Validation Accuracy = 0.7669
----------epoch: 5------------
mean total loss: 0.013937079170021054
Epoch 6: Validation Loss = 2.5896, Validation Accuracy = 0.7671
----------epoch: 6------------
mean total loss: 0.013657118741604244
Epoch 7: Validation Loss = 2.5102, Validation Accuracy = 0.7701
----------epoch: 7------------
mean total loss: 0.013448852778359895
E

### Prediction

In [88]:
y_pred = nn.predict(X_test)
# Convert to labels (0-9)
y_pred = y_pred.argmax(axis=1)

In [90]:
y_pred, Y_test

(array([7, 2, 1, ..., 4, 5, 6]), array([7, 2, 1, ..., 4, 5, 6]))

### Accuracy

In [92]:
from sklearn.metrics import accuracy_score

accuracy = accuracy_score(Y_test, y_pred)
accuracy

0.946

### Task 6
Print an output of the learning success per epoch.