## <center>Building a Convolutional Neural Network from Scratch using Numpy</center>

### Theory

#### There are two components of CNN:
- Convolutional layer
- Pooling layer

#### Convolutional Layers

- A convolutional layer consists of a set of filters (also called kernels) that when applied to the layer’s input perform some kind of modification of the original image

[<img src="./images/CNN-from-Scratch1.png" width="500" />](./images/CNN-from-Scratch1.png)

- In the above image, a 3x3 kernel is used.
- The value of the elements in the kernels are not to be chosen manually but are parameters that the network learns during training.
- The role of convolutions is to isolate different features present in the image.

#### Pooling Layers

- The task of a pooling layer is to shrink the input images to reduce the computational load and memory consumption of the network.

[<img src="./images/CNN-from-Scratch2.gif" width="500" />](./images/CNN-from-Scratch2.gif)

- In Max Pooling, the 2x2 max pooling kernel takes 4 pixels of the input image and returns only the pixel with the maximum value.
- In Average Pooling, the 2x2 max pooling kernel takes 4 pixels of the input image and returns only the pixel with the average value.

### Thoery of some other processes in CNN

#### Padding

[<img src="./images/CNN-from-Scratch3.png" width="500" />](./images/CNN-from-Scratch3.png)

- we have to keep the size of kernel same in the all process because in convolutional operations, the size of input image decreases. 
- padding preserves the input size.
- in the above example, the input size was (36x36x3) but afer convolutional operations, the size decreased to (32x32x3) and we have preserved that by padding as shown above.

#### Flattening

[<img src="./images/CNN-from-Scratch4.png" width="500" />](./images/CNN-from-Scratch4.png)

- flattening is basically converting the output matrix of convolutional and pooling layer to one dimensional array.
- we do this because after flatening, the matrix is passed to ANN (Artifical Neural Network) which consists of one dimensional array.

#### Dropout

[<img src="./images/CNN-from-Scratch5.png" width="500" />](./images/CNN-from-Scratch5.png)

- in dropout, we drop nodes of visible or hidden layers in neural network.
- dropout is basically a regularization technique for reducing overfitting.

#### Stride

[<img src="./images/CNN-from-Scratch6.png" width="500" />](./images/CNN-from-Scratch6.png)

- stride is the number of pixels shifts over the input matrix.
- when stride is 1, the filter moves by 1 pixel every time.
- similarly when stride is 2, the filter moves by 2 pixel every time.

#### Fully Connected Layer

[<img src="./images/CNN-from-Scratch7.png" width="500" />](./images/CNN-from-Scratch7.png)

- in a fully connected layer, the input layer nodes are connected to every node in the second layer.
- these fully connected layers are usually used at the end of the CNN.
- activation functions and dropout layers are used between two fully connected layers

### Loss Function

#### Cross Entropy

[<img src="./images/CNN-from-Scratch8.png" width="750" />](./images/CNN-from-Scratch8.png)

### Code Implementation

In [1]:
import numpy as np

In [1]:
class ConvolutionLayer:
    
    # constructor to initializa the convolutional layer with number of kernels, size of kernel and random filter of shape (kernel_sum, kernel_size, kernel_size)
    def __init__(self, kernel_num, kernel_shape):
        self.kernel_num = kernel_num # number of kernels of convolutional layer
        self.kernel_shape = kernel_shape # size of the kernel
        self.kernels = np.random.rand(kernel_num, kernel_shape, kernel_shape)/(kernel_shape**2) # we have divided by the squared of kernel size for normalization
       
    # generate patches of input image whose shape depends on kernel size
    def patches_generator(self, input_image):
        height_image, width_image = input_image.shape
        self.input_image = input_image
        for h in range(height_image - self.kernel_shape + 1):
            for w in range(width_image - self.kernel_shape + 1):
                patch = input_image[h:(h+self.kernel_shape), w:(w+self.kernel_shape)]
                yield patch, h, w
                
    # forward propogation after generating the patches of the input image i.e. it carries the convolution of each image
    def forward_propogation(self, input_image):
        height_image, width_image = input_image.shape
        convolution_output = np.zeros((height_image - self.kernel_shape + 1, width_image - self.kernel_shape + 1, self.kernel_num))
        for patch, h, w in self.patches_generator(input_image=input_image):
            convolution_output[h, w] = np.sum(patch*self.kernels, axis=(1, 2))
        return convolution_output
    
    # back propogation responsible for finding the gradiant of loss function w.r.t each weight of the layer
    def back_propogation(self, dE_dY, alpha):
        dE_dk = np.zeros(self.kernels.shape) # initializing the array of gradiant of loss function w.r.t each weight
        for patch, h, w in self.patches_generator(self.image):
            for f in range(self.kernel_num):
                dE_dk[f] += patch*dE_dY[h, w, f] 
        self.kernels -= alpha*dE_dk # updating the weight's value
        return dE_dk
    
    

In [3]:
class MaxPoolingLayer:
    
    # constructor to initialize Max Pooling layer with kernel size
    def __init__(self, kernel_size):
        self.kernel_size = kernel_size
        
    # generate patches of the input image depending on kernel size
    def patches_generator(self, input_image):
        height_image, width_image = input_image.shape[0], input_image.shape[1]
        self.input_image = input_image
        for h in range(height_image):
            for w in range(width_image):
                patch = input_image[(h*self.kernel_size):(h*self.kernel_size + self.kernel_size), (w*self.kernel_size):(w*self.kernel_size+self.kernel_size)]
                yield patch, h, w
        
    def forward_propogation(self, input_image):
        height_image, width_image, num_of_kernel = input_image.shape
        max_pooling_output = np.zeros((height_image//self.kernel_size, width_image//self.kernel_size, num_of_kernel))
        for patch, h, w in self.patches_generator(self.input_image):
            max_pooling_output[h, w] = np.amax(patch, axis=(0, 1))
        return max_pooling_output
    
    def back_propogation(self, dE_dY):
        dE_dk = np.zeros(self.input_image.shape)
        for patch, h, w in self.patches_generator(self.input_image):
            height_image, width_image, num_of_kernel = patch.shape
            max_value = np.amax(patch, axis=(0, 1))
            for ih in range(height_image):
                for iw in range(width_image):
                    for k in range(num_of_kernel):
                        if patch[ih, iw, k] == max_value[k]:
                            dE_dk[h*self.kernel_size+ih, w*self.kernel_size+iw, k] = dE_dY[h, w, k]
        return dE_dk
    