### Overfitting revisited ###
We have previously shown that by using a middle layer our network was able to create "intermediate" correlation. This actually made learning (or more mundanely error minimisation) possible. <br>
__Although adding more layers (or neurons at each layer) increases the networks expressiveness, it also increases the risk of overfitting__.
The next step is to talk about regularisation, one of the main tools to reduce overfitting.

In [1]:
import MNIST_Attempt.image_utils as iu
import numpy as np

In [43]:
# import the mnist data:

test_data_file= "t10k-images.idx3-ubyte"
test_labels_file = "t10k-labels.idx1-ubyte"

training_data_file = "train-images.idx3-ubyte"
training_labels_file = "train-labels.idx1-ubyte"

test_data = iu.load_image_file("MNIST_Attempt/data", test_data_file)
test_labels = iu.load_labels_file("MNIST_Attempt/data", test_labels_file)

training_data = iu.load_image_file("MNIST_Attempt/data", training_data_file)
training_labels = iu.load_labels_file("MNIST_Attempt/data", training_labels_file)

Total images in file: 10000
Total labels in file: 10000
Total images in file: 60000
Total labels in file: 60000


In [44]:
# Define the activation function and its derivative:
def relu(x):
    return (x > 0) * x

def relu_deriv(x):
    return x > 0

def one_hot_encode(np_arr):
    return np.eye(np.max(np_arr) + 1)[np_arr.flatten()]

In [45]:
training_labels = one_hot_encode(training_labels)
test_labels = one_hot_encode(test_labels)

In [49]:
# Define the network's hyperparameters:
alpha = 0.005
epochs = 100

input_layer_size = training_data.shape[1] # 784 
hidden_layer_size = 100 
output_layer_size = 10

# Normalise the image data:
training_data_norm = (training_data - 128) / 128
test_data_norm = (test_data - 128) / 128

In [50]:
# Initialise the weights:
weights_0_1 = 0.2 * np.random.random((input_layer_size, hidden_layer_size)) - 0.1 # (784 x 100)
weights_1_2 = 0.2 * np.random.random((hidden_layer_size, output_layer_size)) - 0.1 # (100 x 10)

# Train the network:
for epoch in range(epochs):
    error, correct_cnt = (0,0)
    for i in range(1000):
        layer_0 = training_data_norm[i: i+1]
        label = training_labels[i: i+1]
        
        layer_1 = relu(np.dot(layer_0, weights_0_1)) # (1x100)
        layer_2 = np.dot(layer_1, weights_1_2) # (1 x 10)
        
        correct_cnt += int(np.argmax(layer_2) == np.argmax(label))
        error = np.sum((label - layer_2)**2)
        
        delta_2 = layer_2 - label # (1 x 10)
        delta_1 = delta_2.dot(weights_1_2.T) * relu_deriv(layer_1) #(1x100)
        
        weights_1_2 -= alpha * layer_1.T.dot(delta_2)
        weights_0_1 -= alpha * layer_0.T.dot(delta_1)
        
    print("Accuracy: {}, Error:{}".format(correct_cnt/1000, error))
        
    

Accuracy: 0.625, Error:0.4157463074591412
Accuracy: 0.785, Error:0.3540538022001188
Accuracy: 0.838, Error:0.2830472224002628
Accuracy: 0.866, Error:0.25530557290898764
Accuracy: 0.889, Error:0.22286178560863248
Accuracy: 0.908, Error:0.1938857846655671
Accuracy: 0.922, Error:0.16916906972638163
Accuracy: 0.938, Error:0.14857874423530387
Accuracy: 0.943, Error:0.13000050315385672
Accuracy: 0.951, Error:0.10274531621452632
Accuracy: 0.954, Error:0.09315652271520697
Accuracy: 0.961, Error:0.07324297150136856
Accuracy: 0.962, Error:0.06059672089677537
Accuracy: 0.966, Error:0.04911458373123536
Accuracy: 0.968, Error:0.03830243792759229
Accuracy: 0.972, Error:0.03200202765188133
Accuracy: 0.976, Error:0.03156054995661453
Accuracy: 0.98, Error:0.027624832602849456
Accuracy: 0.982, Error:0.023467741419800802
Accuracy: 0.983, Error:0.021765418769072523
Accuracy: 0.983, Error:0.02137737968980616
Accuracy: 0.985, Error:0.018821317296111473
Accuracy: 0.987, Error:0.01959784756702774
Accuracy: 0.

Notice that just after 60 iterations, our network has trained so well that it correctly classified 99.9% of the training data! But how does it perform on unseen data? Lets use our trained weights in the test set:

In [57]:
correct_test_cnt = 0
for i in range(10000):
    layer_0 = test_data_norm[i: i + 1]
    label = test_labels[i: i + 1]
    
    layer_1 = relu(np.dot(layer_0, weights_0_1))
    layer_2 = np.dot(layer_1, weights_1_2)
    
    correct_test_cnt += int(np.argmax(layer_2) == np.argmax(label))

print("Test Accuracy: {}".format(correct_test_cnt/10000))
    

Test Accuracy: 0.845
