### Overfitting and Reusing Weights:
In general, the more parameters/weights, the greater the tendency for the neural network to overfit the training dataset. Overfitting is concerned with the <em>ratio</em> between the <em>number of weights</em> in the model and the <em>number of datapoints</em> it trains on to learn those weights.

Regularisation is just one class of techniques used to combat overfitting. Using convolutional layers involves using a small set of weights, offsetting the ratio of weights to number of datapoints.

### Convolutional Layers:
The most well-known and widespread <em>structure</em> used in neural networks is called a <em>convolution</em>, and when used in a hidden layer, it's called a <em>convolutional layer</em>.

- From Wikipedia: 'CNNs take advantage of the hierarchical pattern in data to assemble more complex patterns from smaller, simpler patterns'
- Convolutional neural networks are great for image processing
- Just like regular layers, convolutional layers will take in input from the previous layer's nodes, operate on it, then pass on the output to the next layer. It applies a convolution operation on the input
- We define <em>filters</em> on the convolutional layer, which are just a matrices of values
- <em>Convolving</em> is the process of 'sliding' across the input image and sampling a different subsection of the image
- As the name suggests, neural networks use <em>linear convolutions</em> over matrix multiplication in convolutional layers


<em>Linear layers</em> connect to every node in the previous layer and output to every node in the next layer. Convolutional layers have lots of very small linear layers &mdash; usually fewer than 25 inputs and a single output &mdash; which are used on every input position. Each of these small linear layers are called <em>convolutional kernels</em>. Convolution layers usually consist of multiple convolutional kernels.

Below is a $3 \times 3$ convolutional kernel which will 'sweep' across the first row, pixel-by-pixel, then drop one pixel down and sweep left, pixel-by-pixel, repeating across the entire input.
<img src="img/convolutional_kernel.png" style="width: 30%;">



### Convolutional Kernels:
The following is an example of how a convolutional layer applies its convolutional kernels on the input, gets each kernel's resulting matrices, then summarises it into a <em>single</em> matrix that gets passed onto the next layer.

<br>
<div>
    Suppose this convolutional layer has 4 kernels. Each of them will begin the top left corner, taking a $3 \times 3$ matrix sample of the input at that point, then perform SOMETHING to obtain a single value. This single value gets stored in a result matrix.
    <img src="img/convolutional_kernel_step1.png" style="width: 25%;">
</div>
<div>
    After sweeping through all possible $3 \times 3$ sample matrices, the resulting matrix will be $6 \times 6$. The resulting matrices each kernel produced will be combined to a final matrix that gets passed along to the next layer. We can do this by summing them together (sum pooling), averaging them (mean pooling) or taking the max (max pooling).   
    <img src="img/convolutional_kernel_step2.png" style="width: 25%;">
</div>
<div>
    This is the final matrix produced by this convolutional layer when max pooling is used to combine the $6 \times 6$ matrices produced by the kernels. 
    <img src="img/convolutional_kernel_step3.png" style="width: 20%;">
</div>



### An Implementation of Convolutional Layers in the MNIST Digit Classifier:

In [None]:
import sys
import numpy as np
from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
images = x_train[0 : 1000].reshape(1000, 28 * 28) / 255
labels = y_train[0 : 1000]
test_images = x_test.reshape(len(x_test), 28 * 28) / 255

one_hot_labels = np.zeros((len(labels), 10))
for i, eachLabel in enumerate(labels):
    one_hot_labels[i][eachLabel] = 1
labels = one_hot_labels

test_labels = np.zeros((len(y_test), 10))
for i, eachLabel in enumerate(y_test):
    test_labels[i][eachLabel] = 1

alpha = 2
iterations = 300
pixels_per_image, num_labels = (784, 10)
batch_size = 128

input_rows, input_cols = (28, 28)
kernel_rows, kernel_cols = (3, 3)
num_kernels = 16

hidden_size = ((input_rows - kernel_rows) * (input_cols - kernel_cols)) * num_kernels

np.random.seed(1)
kernels = 0.02 * np.random.random((kernel_rows * kernel_cols, num_kernels)) - 0.01
weights_1_2 = 0.2 * np.random.random((hidden_size, num_labels)) - 0.1

def tanh(x):
    return np.tanh(x)

def tanhDerivative(x):
    return 1 - x ** 2

def softmax(x):
    tmp = np.exp(x)
    return tmp / np.sum(tmp, axis=1, keepdims=True)

# Selects a subsection of a batch of images
def get_image_section(layer,row_from, row_to, col_from, col_to):
    section = layer[:,row_from:row_to,col_from:col_to]
    return section.reshape(-1,1,row_to-row_from, col_to-col_from)

for iteration in range(iterations):
    correct_count = 0
    for i in range(int(len(images) / batch_size)):
        batch_start = i * batch_size
        batch_end = (i + 1) * batch_size
        curr_batch = images[batch_start : batch_end]
        
        # Layer_0 is a 2D array with dimensions 128 * 784
        layer_0 = curr_batch
        # Layer_0 is now a 3D array with dimensions 128 x 28 x 28: 28 sheets of 128 x 28 matrices
        # so each input image is represented as a 28*28 grid rather than a 1*784 sequence
        layer_0 = layer_0.reshape(layer_0.shape[0], 28, 28)
        
        # Getting ALL the subsections of the input images in the batch of 128 images 
        sects = []
        for row_start in range(layer_0.shape[1] - kernel_rows):        # Ranges from 0 to (28 - 3)
            for col_start in range(layer_0.shape[2] - kernel_cols):    # Ranges from 0 to (28 - 3)
                # For each 28*28 image, there will exist 25*25 subsections of size 3*3
                curr_section = get_image_section(layer_0,
                                                 row_start,
                                                 row_start + kernel_rows,
                                                 col_start,
                                                 col_start + kernel_cols)
                sects.append(curr_section)
        
        # Joining all subsections into a single array of subsections
        expanded_input = np.concatenate(sects, axis=1)
        es = expanded_input.shape
        # flattened_input has dimensions 80000 x 9, representing each 9-length subsection 
        # extracted from all 128 images
        flattened_input = expanded_input.reshape(es[0] * es[1], -1)
        
        kernel_output = flattened_input.dot(kernels)
        # Layer_1 has dimensions 128 x 10000
        layer_1 = tanh(kernel_output.reshape(es[0], -1))
        dropout_mask = np.random.randint(2, size=layer_1.shape)
        layer_1 *= dropout_mask * 2
        layer_2 = softmax(np.dot(layer_1, weights_1_2))
        
        for j in range(batch_size):
            label_set = labels[batch_start + j : batch_start + j + 1]
            correct_count += int(np.argmax(layer_2[j : j + 1]) == np.argmax(label_set))
        
        layer_2_delta = (labels[batch_start : batch_end] - layer_2) / (batch_size * layer_2.shape[0])
        layer_1_delta = layer_2_delta.dot(weights_1_2.T) * tanhDerivative(layer_1)
        layer_1_delta *= dropout_mask
        weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)
        
        layer_1_reshape_1D = layer_1_delta.reshape(kernel_output.shape)
        k_update = flattened_input.T.dot(layer_1_reshape_1D)
        kernels -= alpha * k_update
        
    test_correct_count = 0
    for i in range(len(test_images)):
        layer_0 = test_images[i:i+1]
        layer_0 = layer_0.reshape(layer_0.shape[0],28,28)
        
        sects = list()
        for row_start in range(layer_0.shape[1] - kernel_rows):
            for col_start in range(layer_0.shape[2] - kernel_cols):
                sect = get_image_section(layer_0,
                                         row_start,
                                         row_start + kernel_rows,
                                         col_start,
                                         col_start + kernel_cols)
                sects.append(sect)
    
        expanded_input = np.concatenate(sects,axis=1)
        es = expanded_input.shape
        flattened_input = expanded_input.reshape(es[0] * es[1], -1)
        kernel_output = flattened_input.dot(kernels)
        layer_1 = tanh(kernel_output.reshape(es[0],-1))
        layer_2 = np.dot(layer_1, weights_1_2)
        test_correct_count += int(np.argmax(layer_2) == np.argmax(test_labels[i:i+1]))
    
    if(iteration % 1 == 0):
        sys.stdout.write("\n" + \
                         "I:" + str(iteration) + \
                         " Test-Acc:"+str(test_correct_count / float(len(test_images))) + \
                         " Train-Acc:" + str(correct_count / float(len(images))))


#### Some Numpy Array Examples:

In [23]:
import numpy as np

myArr = np.array(
    [
        [
            [
                1, 2, 3
            ],
            [
                4, 5, 6
            ], 
            [
                7, 8, 9
            ]
        ],
        [
            [
                10, 11, 12
            ],
            [
                13, 14, 15
            ], 
            [
                16, 17, 18
            ]
        ],
        [
            [
                19, 20, 21
            ],
            [
                22, 23, 24
            ], 
            [
                25, 26, 27
            ]
        ]
    ]
)

print("===== Original 3D Array (3 x 3 x 3) =====")
print(myArr)
print("Shape: {}\n".format(myArr.shape))
print("===== Transposed 3D Array (3 x 3 x 3) =====")
transpose = myArr.T
print(transpose)
print("Shape: {}\n".format(transpose.shape))

print("===== Reshaped to a single 1D row (1 x 27) =====")
reshapedMatrix = myArr.reshape(1, -1)
print(reshapedMatrix)  # The -1 means 'unknown length' which lets numpy figure out the valid column number
print("Shape: {}\n".format(reshapedMatrix.shape))
print("===== Reshaped to a 2D array (3 x 9) =====")
reshapedMatrix = myArr.reshape(3, -1)
print(reshapedMatrix)
print("Shape: {}\n".format(reshapedMatrix.shape))

print("===== Concatenation =====")
concatenated = np.concatenate([myArr, myArr])
print(concatenated)
print("Shape: {}\n".format(concatenated.shape))



===== Original 3D Array (3 x 3 x 3) =====
[[[ 1  2  3]
  [ 4  5  6]
  [ 7  8  9]]

 [[10 11 12]
  [13 14 15]
  [16 17 18]]

 [[19 20 21]
  [22 23 24]
  [25 26 27]]]
Shape: (3, 3, 3)

===== Transposed 3D Array (3 x 3 x 3) =====
[[[ 1 10 19]
  [ 4 13 22]
  [ 7 16 25]]

 [[ 2 11 20]
  [ 5 14 23]
  [ 8 17 26]]

 [[ 3 12 21]
  [ 6 15 24]
  [ 9 18 27]]]
Shape: (3, 3, 3)

===== Reshaped to a single 1D row (1 x 27) =====
[[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
  25 26 27]]
Shape: (1, 27)

===== Reshaped to a 2D array (3 x 9) =====
[[ 1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18]
 [19 20 21 22 23 24 25 26 27]]
Shape: (3, 9)

===== Concatenation =====
[[[ 1  2  3]
  [ 4  5  6]
  [ 7  8  9]]

 [[10 11 12]
  [13 14 15]
  [16 17 18]]

 [[19 20 21]
  [22 23 24]
  [25 26 27]]

 [[ 1  2  3]
  [ 4  5  6]
  [ 7  8  9]]

 [[10 11 12]
  [13 14 15]
  [16 17 18]]

 [[19 20 21]
  [22 23 24]
  [25 26 27]]]
Shape: (6, 3, 3)

