#### CNN 
A Convolutional Neural Network (ConvNet/CNN) is a deep learning algorithm that processes input images, assigns importance to various image elements, and learns to distinguish them. Unlike traditional methods, ConvNets require minimal preprocessing and can automatically learn important filters or features.

### How to do it 
Based on the following paper https://arxiv.org/pdf/1511.08458.pdf, we need to define three layers: Convolutional, Pooling and  FullyConnected. 

CNN's core functions can be summarized as follows:

- The input layer stores image pixel values.

- Convolutional layers calculate neuron outputs connected to local image regions using weighted sums and apply ReLu activation.

- The pooling layer downsamples spatial dimensions, reducing parameters.

- Fully-connected layers generate class scores for classification, often with ReLu activation.

In [1]:
# First the basic function: softmax, cross entropy and RELU

def softmax(x):
    return np.exp(x) / np.sum(np.exp(x), axis=0)


def cross_entropy(x):
    return -np.log(x)


def regularized_cross_entropy(layers, lam, x):
    loss = cross_entropy(x)
    for layer in layers:
        loss += lam * (np.linalg.norm(layer.get_weights()) ** 2)
    return loss


def ReLU(x, alpha=0.001):
    return x * alpha if x < 0 else x


def ReLU_derivative(x, alpha=0.01):
    return alpha if x < 0 else 1


def lr_schedule(learning_rate, iteration):
    if iteration == 0:
        return learning_rate
    if (iteration >= 0) and (iteration <= 10000):
        return learning_rate
    if iteration > 10000:
        return learning_rate * 0.1
    if iteration > 30000:
        return learning_rate * 0.1

Convolution layer: 
- Best way to undestand convolution operator: https://computationalthinking.mit.edu/Fall20/
- How it's used in CNN:  CNN is a feed-forward, in the forward path the input data undergoes convolution, activation, pooling, flattening, fully connected layer processing. In the backward path the gradients of the loss with respect to the kernel's parameters are calculated. These gradients guide the adjustments made to the kernel's values as part of the weight update process. The goal is to learn the kernal through backpropagation. 

In [2]:
class Convolutional:                                        
    def __init__(self, name, num_filters=16, stride=1, size=3, activation=None):
        self.name = name
        self.filters = np.random.randn(num_filters, 3, 3) * 0.1
        self.stride = stride
        self.size = size
        self.activation = activation
        self.last_input = None
        self.leakyReLU = np.vectorize(leakyReLU)
        self.leakyReLU_derivative = np.vectorize(leakyReLU_derivative)

    def forward(self, image):
        # keep track of last input for later backward propagation
        self.last_input = image                             

        input_dimension = image.shape[1]                                                
        output_dimension = int((input_dimension - self.size) / self.stride) + 1         
        # matrix to hold the values of the convolution
        out = np.zeros((self.filters.shape[0], output_dimension, output_dimension))     
        
        #Apply convolution
        for f in range(self.filters.shape[0]):              
            tmp_y = out_y = 0                               
            while tmp_y + self.size <= input_dimension:
                tmp_x = out_x = 0
                while tmp_x + self.size <= input_dimension:
                    patch = image[:, tmp_y:tmp_y + self.size, tmp_x:tmp_x + self.size]
                    out[f, out_y, out_x] += np.sum(self.filters[f] * patch)
                    tmp_x += self.stride
                    out_x += 1
                tmp_y += self.stride
                out_y += 1
        
                               
        self.ReLU(out)
        
        return out

    def backward(self, din, learn_rate=0.005):
        input_dimension = self.last_input.shape[1]          

                          
        self.ReLU_derivative(din)

        dout = np.zeros(self.last_input.shape)             
        dfilt = np.zeros(self.filters.shape)                

        for f in range(self.filters.shape[0]):              
            tmp_y = out_y = 0
            while tmp_y + self.size <= input_dimension:
                tmp_x = out_x = 0
                while tmp_x + self.size <= input_dimension:
                    patch = self.last_input[:, tmp_y:tmp_y + self.size, tmp_x:tmp_x + self.size]
                    dfilt[f] += np.sum(din[f, out_y, out_x] * patch, axis=0)
                    dout[:, tmp_y:tmp_y + self.size, tmp_x:tmp_x + self.size] += din[f, out_y, out_x] * self.filters[f]
                    tmp_x += self.stride
                    out_x += 1
                tmp_y += self.stride
                out_y += 1
        self.filters -= learn_rate * dfilt                 
        return dout                                        

    def get_weights(self):
        return np.reshape(self.filters, -1)

Polling layer: 
- This layer enables an image to be sub-sampled when it is classified by a neural network. The aim of this layer is to reduce the size of the images without modifying the important features of the image.

In [None]:
class Pooling:                                             
    def __init__(self, name, stride=2, size=2):
        self.name = name
        self.last_input = None
        self.stride = stride
        self.size = size

    def forward(self, image):
        self.last_input = image                             
        num_channels, h_prev, w_prev = image.shape
        h = int((h_prev - self.size) / self.stride) + 1     
        w = int((w_prev - self.size) / self.stride) + 1

        downsampled = np.zeros((num_channels, h, w))        

        for i in range(num_channels):                       
            curr_y = out_y = 0                              
            while curr_y + self.size <= h_prev:             
                curr_x = out_x = 0
                while curr_x + self.size <= w_prev:         
                    patch = image[i, curr_y:curr_y + self.size, curr_x:curr_x + self.size]
                    downsampled[i, out_y, out_x] = np.max(patch)       
                    curr_x += self.stride                              
                    out_x += 1
                curr_y += self.stride
                out_y += 1

        return downsampled

    def backward(self, din, learning_rate):
        num_channels, orig_dim, *_ = self.last_input.shape      
                                                                

        dout = np.zeros(self.last_input.shape)                  

        for c in range(num_channels):
            tmp_y = out_y = 0
            while tmp_y + self.size <= orig_dim:
                tmp_x = out_x = 0
                while tmp_x + self.size <= orig_dim:
                    patch = self.last_input[c, tmp_y:tmp_y + self.size, tmp_x:tmp_x + self.size]    # obtain index of largest
                    (x, y) = np.unravel_index(np.nanargmax(patch), patch.shape)                     # value in patch
                    dout[c, tmp_y + x, tmp_x + y] += din[c, out_y, out_x]
                    tmp_x += self.stride
                    out_x += 1
                tmp_y += self.stride
                out_y += 1

        return dout

    def get_weights(self):                          
        return 0
