# CONCOLUTIONAR NN

The difference between **NN** and **CNN** is that, the CNN uses convolution to locate/discern features in an image. This is with multiple convolutional layers, which enable it to condense its data, for example an image. 

**Convolution** is done by selecting a smaller piece of the image, and *evaluate* it with a certain function (in this case **max pooling**) to determinate the next neuron in the layer. Then that value is relayed forward in to its neuron which reproduces the process a few more times until each piece, which is usually described as a **tensor**, is very simple. The goal of this proccess is for the neural network to be able to recognize something we would call "features" on the image. This proccess gives in conjuction with a lot of training is what is usually refered to as **Deep Learning**. 
So in *summary* CNNs drastically simplify the image and look for features in it and then try to learn what each of the features indicates. 

**Simple:** Reduces your image to simple building blocks and then finds patterns of these blocks given how many layers you have.

## Training data tricks ## 

Important note to make here is that for image classification many tricks can be used to make our dataset more rounded and higher the accuarcy for generalization. These tricks involve: croping images, resizing them and adding white spaces or even rotating them and using the modified versions as new images which can increase the number of our training samples by 4x (4 ways of rotating an image) or more...

In [5]:
import os
import cv2
import numpy as np
from tqdm import tqdm

# This data set is big so we don't want to rebuild it unless we change something.
REBUILD_DATA = False

# We generally don't need a class for this, but in our case (image processing) there are quite a few steps
class DogsVSCats():
    IMG_SIZE = 50 # 50x50 pixels
    CATS = "Kaggle/PetImages/Cat"
    DOGS = "Kaggle/PetImages/Dog"
    LABELS = {CATS: 0, DOGS: 1}
    training_data = []
    # A important fact to mention is the importance of a balanced ammount of data for each class we are trying
    # to disscerne. Therefore we will create counters here for each class and make sure there isn't an unbalance.
    # Id there is an unbalance we will correct it.
    catcount = 0
    dogcount = 0
    
    def make_training_data(self):
        # iterate trough our dictionary of classes
        for label in self.LABELS:
            print(label)
            # iterate trough images in directory
            for f in tqdm(os.listdir(label)):
                try: 
                    # We use the os.path.join functrion to add a label (0 or 1) to each image
                    path = os.path.join(label, f)
                    img = cv2.imread(path, cv2.IMREAD_GRAYSCALE) # converting to GRAYSCALE is not a necessity
                    img = cv2.resize(img, (self.IMG_SIZE, self.IMG_SIZE))
                    self.training_data.append([np.array(img), np.eye(2)[self.LABELS[label]]])

                    if label == self.CATS:
                        self.catcount += 1
                    elif label == self.DOGS:
                        self.dogcount += 1
                    
                except Exception as e:
                    # For some images there is an exception, maybe it's because they are corrupted or 
                    # maybe it's the resize...
                    pass
                    #print(str(e))

        np.random.shuffle(self.training_data)
        np.save("training_data.npy", self.training_data)
        print("Cats:", self.catcount)
        print("Dogs:", self.dogcount)

# If we want to rebuild everything (takes long time)
if REBUILD_DATA:
    dogsvcats = DogsVSCats()
    dogsvcats.make_training_data()
    
# We extract the training data so we don't have to create it again, for speeds sake
# There is an issue with the pickle function.
training_data = np.load("training_data.npy", allow_pickle=True)
print(len(training_data))


24946


## Building our Neural Network ##

In the next cell we build the convolutional network for this we have to import PyTorch-es libraries. We need the general `torch` library for tensors, the NN module, `torch.nn`, for convolutional functionality and we specifically save the functional part of the NN module, `torch.nn.functional`, in to F so we can access it through there.

We start by defining a `Net` function, which will work as a constructor for our NN. We defien a initialization class `__init__(self)` which inherits it's parent init class form the `nn.Module`.
Let's start the init by defining 3 convolutional layers (`conv1, conv2, conv3`), for which we use `nn.Conv2d` - 
We define the 2d convolutional function with 3 parameters `in_channels, out_channels, kernel_size`, where:
- `in_channels`: number of channels in the input image
- `out_channels`: number of channels produced by the convolution
- `kernel_size`: size of the convolving kernel - int(5) = 5x5; tuple(5, 3) = 5x3 kernel

** 2D convolution example: **

<img src="https://miro.medium.com/max/1800/1*7S266Kq-UCExS25iX_I_AQ.png"> 


Next we use a **max pooling** function, which is a *sample-based discretizaion process*, whose objective is to down-sample an input (in our case an image), which reduces its dimensionality by making assumptions about the features contained in the sub-regions of the sample being pooled.

** Max pooling example: **

<img src="https://computersciencewiki.org/images/8/8a/MaxpoolSample2.png">

The max pool function which we are going to use is `nn.MaxPool2d`, which takes in:
- `kernel_size` the size of the window to take the max over

With this we define 3 different pools for each piece of the convolutional layers (`conv1, conv2, conv3`) for later use. In our case we are going to run our results through a 2x2 window (just like the example shows above). 

Next and last in line is an **linear transformation**. In PyTorch we have a predifined method that applies a linear transform, to the incomming data, in the shape of: 

$$ y = xA^{T} + b $$

We apply the `torch.nn.Linear` function to our convolutional and pool output, but we have to determine the size of that output first. To do that we flatten foward feed the NN with a random example and print the results shape `print(x.shape)`. 

In order to foward feed the `Net` with an example we have to first define the **foward feeding** function.

So we create a function `foward` that takes in `self` and a parameter of `x` which represents our data. Inside this foward function we take our parameter **x** and we run it through our first convolutional layer, next we apply the rectified linear unit function to our layer with `torch.nn.functional.relu`, which returns a **tensor** that we lastly pass to our `torch.nn.MaxPool2d` function which we defined in our `__init__` class as `pool1, pool2` and `pool3`.

After our input parameter **x** through all 3 of our layers we flatten the output `flatten(start_dim)`. The first time we run this function we output the shape of the result after all the layers, so we know what to input our `torch.nn.Linear` function above needs. In our case the output was 512.

## Graph of the ReLU function: ## 

<img src="https://pytorch.org/docs/stable/_images/ReLU.png">


The last 2 steps of the foward feeding function are to run our parameter through both `torch.nn.Linear` functions for the first we again use a `torch.nn.functional.relu` activation function so we get a linear spread in the result. We do not do use the same actiavtion function on the last layer, we rather use a one dimensional softmax function, the `torch.nn.functional.softmax`, before we return our output. The **softmax** function is defined as: 

$$ Softmax(x_{i}) = \frac{exp(x_{i})}{\sum_{j} exp(x_{i})} $$

In our case it will be applied to all slices along our first dimeansion and that will rescale our values to between [0, 1] and so they together sum to 1.


In [7]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 32, 5)
        self.conv2 = nn.Conv2d(32, 64, 5)
        self.conv3 = nn.Conv2d(64, 128, 5)

        self.pool1 = nn.MaxPool2d((2, 2))
        self.pool2 = nn.MaxPool2d((2, 2))
        self.pool3 = nn.MaxPool2d((2, 2))
        
        self.fc1 = nn.Linear(512, 512) # we got the first 512 value from our x.shape below when we ran a test fit.
        self.fc2 = nn.Linear(512, 2) 
    
    def forward(self, x):
        x = self.pool1(F.relu(self.conv1(x)))
        x = self.pool2(F.relu(self.conv2(x)))
        x = self.pool3(F.relu(self.conv3(x)))
        x = x.flatten(start_dim=1) # flattening out
        #print(x.shape) # We print the shape for fc1 Linear

        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.softmax(x, dim=1)
        
net = Net()
#net.forward(torch.randn(1, 1, 50, 50)) # passing random sample data, to determine size of the fc1 layer input
print("Finished running")

Finished running


In [20]:
import torch.optim as optim

optimizer = optim.Adam(net.parameters(), lr=0.001)
loss_function = nn.MSELoss()

X = torch.Tensor([i[0] for i in training_data]).view(-1, 50, 50)
X = X/255.0
y = torch.Tensor([i[1] for i in training_data])

VAL_PCT = 0.1
val_size = int(len(X)*VAL_PCT)
print(val_size)

2494


In [21]:
train_X = X[:-val_size]
train_y = y[:-val_size]

test_X = X[-val_size:]
test_y = y[-val_size:]

print(len(train_X))
print(len(test_X))

22452
2494


In [22]:
BATCH_SIZE = 100
EPOCHS = 1

for epoch in range(EPOCHS):
    for i in tqdm(range(0, len(train_X), BATCH_SIZE)):
       #print(i, i+BATCH_SIZE)
        batch_X = train_X[i:i+BATCH_SIZE].view(-1, 1, 50, 50)
        batch_y = train_y[i:i+BATCH_SIZE]
        
        net.zero_grad()
        outputs = net(batch_X)
        loss = loss_function(outputs, batch_y)
        loss.backward()
        optimizer.step()
        
print(loss)

100%|██████████| 225/225 [01:26<00:00,  2.59it/s]
tensor(0.2171, grad_fn=<MseLossBackward>)


In [23]:
correct = 0
total = 0
with torch.no_grad():
    for i in tqdm(range(len(test_X))):
        real_class = torch.argmax(test_y[i])
        net_out = net(test_X[i].view(-1, 1, 50, 50))[0]
        predicted_class = torch.argmax(net_out)
        if predicted_class == real_class:
            correct += 1
        total += 1
        
print("Accuarcy:", round(correct/total,3))

100%|██████████| 2494/2494 [00:13<00:00, 182.10it/s]
Accuarcy:0.626
