Import the Dependencies.

In [3]:
import numpy as np
import matplotlib.pyplot as plt

import torch
from torch.utils.data import DataLoader
from torchvision import transforms,datasets

Inside the transform.Compose "pipeline" we convert the images of Cifar10 which comes in PIL format to pytorch tensors using transforms.ToTensor.

In [4]:
transform = transforms.Compose([
    transforms.ToTensor()
])

Download of the (train)CIFAR10 dataset and creation of batches of 64 images(shuffled).

In [5]:
dataset = datasets.CIFAR10(root="./data",train=True,download=True,transform=transform)
train_batch = DataLoader(dataset,batch_size=64,shuffle=True)

100.0%


Creation of the function to initialize the parameters W & B of each kernel.\
Here the "filters" will be the amount of kernels that we gonna have in each conv2d layer.\
The "channels" parameter because we're using kernels that are tensor3d with height,width and depth.The amount of channels will change,for example,the first(before enter the conv2d layer)will be 3(RGB) and then the amount will change representing every channel a feature map(basically is one feature map per kernel of the previous conv2d layer).\
And the kernel_size is how big the kernel in terms of height and width it gonna be,in this model i'll use a 3x3 kernel size.

In [6]:
def init_conv2_params(filters,channels,kernel_size):
    
    fan_in = channels*kernel_size*kernel_size

    Wconv = np.random.randn(filters,channels,kernel_size,kernel_size) * np.sqrt(2/fan_in)

    Bconv = np.zeros((filters))

    return Wconv,Bconv

We initialize the parameters giving the respective arguments.\
I gonna use 4 conv2d layers giving to my CNN an architecture of:

Conv2d->ReLU\
Conv2d->ReLU\
Maxpooling

Conv2d->ReLU\
Conv2d->ReLU\
Maxpooling

Flatten

In [7]:
W1,B1 = init_conv2_params(16,3,3)
W2,B2 = init_conv2_params(32,16,3)
W3,B3 = init_conv2_params(64,32,3)
W4,B4 = init_conv2_params(128,64,3)

This is the function for each Convolutional Layer in this Neural Network.\
Breaking it down:

We declare the function setting up the parameters which will be the batch of images(images),the kernels and its respective weights(kernels),the bias of each kernel(Bias) and some fixed parameters: The padding that we gonna give to each image(Padding=1) and the step that each image should do(Stride=1). 

The first part of the block of code is the if statement,it will detect if the batch of images is a pytorch tensor and if it is detach it from its "mathematical obligations" being an pytorch tensor and "move it" to the cpu for us,so we can convert it to a numpy tensor.

Next we take the shape of the Batch an kernels so we can work with each variable individually.

Then make the variables that represent the height and width of each image that gonna leave the function,this being 32x32 (without padding it would be something like 30x30 because it don't reach the borders and shrinks the resolution).\
Comment:the "+1" at the final of the line is just because the kernel start counting after it make the first move but the actual first position is BEFORE it moves so the +1 represent the initial position.That's it.

The variable "out" will be the "new batch" that gonna leave the conv2d function.It could be interpretered as an empty box in which we will put the results of each image after going through every kernel in the layer.

After all this preparation here is the nested loop:

1-This for loop will pass for every image of the batch and it will be padding each one of the images.\
2-This for loop will make the image pass through every filter(kernel).\
3-This for loop is just to travel all the height of the image.\
4-This for loop is have the same goal as the for loop of the height but the important difference here is the order in which we gonna travel through the image,this being first left to right then up to down(for every full pass left to right we move a little bit down and make another full pass).

Now the explanation of what we do in every cycle:

In these CNNs we usually say that every kernel scans every "little part" of the image and call the "little part" as "window" or "patch".\
Well in reality we take the patch of the image and "Swap it" through the weights of the kernel(Like a Credit Card),each kernel have its own weights that have the same dimensions than the patch,so it will give us a "new patch" this being modifier by the weights and bias of each kernel,so finally when we go through the entire image with the same kernel it will give us a Feature Map,that represent the previous image modified by the kernel,this image will have the same height and width in this case because we've used padding.

In the code we have exactly that,the start and end of the height and width of every patch corresponding to the current position of the loop.\
The patch which will take a portion of the image padded of the size previously mentioned.\
And then the storage of the patch "modified" by the kernel in our new empty box previously declared.

In [8]:
def foward_conv2d(images,kernels,Bias,Padding=1,Stride=1):

    if isinstance(images, torch.Tensor):
        images = images.detach().cpu().numpy()

    batch_size,channel,h_in,w_in = images.shape
    filters,channels,kh,kw = kernels.shape

    h_out = (h_in - kh + 2*Padding) // Stride + 1
    w_out = (w_in - kw + 2*Padding) // Stride + 1

    out= np.zeros((batch_size,filters,h_out,w_out))

    for x in range(batch_size):
        img = images[x]
        img_padded = np.pad(img,((0,0),(Padding,Padding),(Padding,Padding)),mode="constant")
        for f in range(filters):
            for h in range(h_out):
                for w in range(w_out):
                        
                        w_start = w * Stride 
                        w_end = w_start + kw

                        h_start = h * Stride
                        h_end = h_start + kh

                        patch = img_padded[:,h_start:h_end,w_start:w_end]

                        out[x,f,h,w]= np.sum(patch * kernels[f]) + Bias[f]

    return out
        

Classic ReLU to make the negative values "turn off" so we only work with the positive values.

In [9]:
def ReLU(Output):
    return np.maximum(0,Output)

Here i'ma apply a maxpooling function,this will help us to use less computational power in the long run, simplifying the resolution of each image by exactly 50%.\
In terms of coding is basically the same as the conv2d layer,but it has some functional changes:

Kelnel :In comparation with the kernel of the convolutional layer this kernel doesn't have any weight or bias,it's only function is to detect(in the quadrant that we select,in this case 2x2 pixels) the highest value and return a 1x1 with that number,eliminating the others smaller values in the quadrant selected.


Stride :Previously in the conv2d layer we've used an Stride of 1.In this case the Stride will be of 2,this combining with the new size and purpose given to the kernel makes that we can get the desire output size of the image after going through maxpooling.


In [17]:
def maxpooling(Input,pool_h=2,pool_w=2,Stride=2):

    batch_size,channels,h_in,w_in = Input.shape

    h_out = (h_in - pool_h) // Stride + 1
    w_out = (w_in - pool_w) // Stride + 1

    out = np.zeros((batch_size,channels,h_out,w_out))

    for x in range(batch_size):
        for c in range (channels):
            for h in range (h_out):
                for w in range(w_out):

                    h_start = h * Stride
                    h_end = h_start + pool_h

                    w_start = w * Stride
                    w_end = w_start + pool_w

                    patch = Input[x,c,h_start:h_end,w_start:w_end]

                    out[x,c,h,w] = np.max(patch)

    return out

    

Normally inside the Foward Pass of a CNN you can see 2 separated "regions".\
The first one being the "region" where the convolution happens,the convolutional layers, And the other "region" that is in almost every classification model you can find, the FC(fully-connected) layers.

Now what this matter in the explanation of this function?

Because we can see this function like a gate or portal that transforms a tensor into another tensor.\
At first in the convolutional part we work with tensors4d (batch_size,channels,height,width),now after the flatten we have got a tensor1d(64,128 * 8 * 8)because with the flatten we collapse the dimensions,this being exactly what we needed for the fc layers to work.



In [18]:
def flatten(tensor_in):
    flatten_tensor = tensor_in.reshape(tensor_in.shape[0],-1)
    return flatten_tensor

Initialization of the FC layers parameters,the change in the function with respect to the initialization of the conv2d parameters is in the dimensions.\
This function creating weights as a tensor2d and every neuron getting it's own personal bias.

In [19]:
def init_fc_params(neurons,flatten):

    Wfc = np.random.randn(flatten,neurons) * np.sqrt(2/flatten)

    Bfc = np.zeros((neurons))

    return Wfc,Bfc

For us to initialize the parameters of the FC layers we need to know the dimensions of the flatten in order to pass it to the init_fc_params function.\
For that reason we use a "batch dummy"(which is a tensor imitating an image of cifar10) and pass it through the convolutional part of our CNN,at the end we just need to measure the shape of the final result.\
We use this to make a more modular CNN model,because if we use a different dataset or training method the dimensions may change.\
So in order to prevent that we just can run this cell with the respective size of "batch dummy" and always know the dimension that we would get when we collapse it to flatten it.

In [22]:
flatten_dim = np.zeros((1,3,32,32))
flatten_dim = foward_conv2d(flatten_dim,W1,B1)
flatten_dim = foward_conv2d(flatten_dim,W2,B2)
flatten_dim = ReLU(flatten_dim)
flatten_dim = maxpooling(flatten_dim)

flatten_dim = foward_conv2d(flatten_dim,W3,B3)
flatten_dim = foward_conv2d(flatten_dim,W4,B4)
flatten_dim = ReLU(flatten_dim)
flatten_dim = maxpooling(flatten_dim)

print(flatten_dim.shape)

flatten_size = np.prod(flatten_dim.shape[1:])
print(flatten_size)

(1, 128, 8, 8)
8192


Now we can call the function to initilize our parameters of the FC layers

In [21]:
W5,B5 = init_fc_params(128,flatten_size) 
W6,B6 = init_fc_params(10,128)

This is the Softmax function to get the final prediction of our model giving to this funcion the Z6 that we got in the final sum of our foward pass,for a more formal explanation I encourage you to check the NN from scratch using the dataset of MNIST in my github.

In [26]:
def Softmax(Z2):

    shifted = Z2 - np.max(Z2,axis=1,keepdims=True)

    scores = np.exp(shifted)

    preds = scores / np.sum(scores,axis=1,keepdims=True)

    return preds

This is the other "region" that I talk about earlier, is the most common part of the Neural Networks.\
We take the flatten version of our batch which will have a dimension of (64,8192) and multiply them by the weights of dimensions(8192,128) at the end we just sum the bias corresponding to every neuron so we can get a Z5 of dimensions(64,128) to then giving them to the ReLU function to get the activation of everyone(A5).\
Now we do the same process but instead of using flatten we use the A5(64,128) and multiply it by the weights(128,10),sum them with the bias of each neuron and getting the Z6(64,10).\
Then Just pass it to the Softmax function and getting the preddiction of the model as A6.\
After that we return Z5,A5,Z6,A6 because we will need it to calculate the back propagation and update de parameters.

In [29]:
def foward_fc(W5,B5,W6,B6,flatten):

    Z5 = flatten @ W5 + B5
    
    A5 = ReLU(Z5)

    Z6 = A5 @ W6 + B6

    A6 = Softmax(Z6)

    return Z5,A5,Z6,A6