# Convolutional Neural Networks

<img src="CNNArchitecture.jpg" />

The whole idea of convolutions is to focus on local dependencies and relations (like edges), and to compress information then look again, this will give an importance to the spacial structure images has.

<img src="CNNLocality.png" />

Filters scan all of the image to produce a final activation of the image, we typically have more than one filter to capture the different dependencies between pixels, the weights are stored inside the filters, and filters product activations.

The stride indicates how many steps the filter is going to move. 

The kernel of weights doesn't perform matrix multiplication, the activation is computed through the dot product of the turned vectors of the patch and the kernel weight values, an Example:

<img src="CNNActivation.png" />

Adding Padding is Adding zeros on some boundaries to make the activation shape like the Input shape (in the case of stride is equal to one), padding examples:
<img src="PaddingTypes.png" />

Filters always extend the full depth of the input volume (3 in an rgb image for example).

The Number of filters will decide the depth of the generated maps (activations) and will serve as an input for the next convolution layer.

<img src="ActivationDepth.png" />

We also use max-pooling to reduce the dimensionality of the input map, and also improves generalization (other methods are available like average pooling, but max-pooling remains the most used one):

<img src="MaxPooling.png" />

Let's implement a Convolutional Neural Network for MNIST:

In [88]:
# imports.
import numpy as np
import torch
import torchvision
from torchvision import transforms
from torch.autograd import Variable
from torch.utils.data import DataLoader

### DataLoader

In [89]:
# first we download the MNIST Dataset.
mnist_train = torchvision.datasets.MNIST(root='./data', 
                                         train=True, 
                                         download=True, 
                                         transform=transforms.Compose([transforms.ToTensor(), transforms.Normalize((.1307,), (.3081,))]))
mnist_test  = torchvision.datasets.MNIST(root='./data', 
                                         train=False, 
                                         transform=transforms.Compose([transforms.ToTensor(), transforms.Normalize((.1307,), (.3081,))]))

In [90]:
# then we load it.
train_loader = DataLoader(dataset=mnist_train, batch_size=128, shuffle=True)
test_loader  = DataLoader(dataset=mnist_test, batch_size=128, shuffle=False)

In [98]:
class MNISTClassifier(torch.nn.Module):
    '''
    We're going to use CNN + Dense layers to classify MNIST hadwritten digits.
    Input: shape=(28,28) images of 1 channel.
    Output: A probability distribution over the 10 labels (0,1,2,3,4,5,6,7,8,9).
    '''
    
    def __init__(self):
        super(MNISTClassifier, self).__init__()
        self.c1 = torch.nn.Conv2d(in_channels=1, out_channels=10, kernel_size=5)
        self.c2 = torch.nn.Conv2d(in_channels=10, out_channels=20, kernel_size=5)
        
        self.d1 = torch.nn.Linear(in_features=320, out_features=10)  # figured this out from the error, find out a better way to inspect in_features.
        
        self.max_pooling = torch.nn.MaxPool2d(kernel_size=2) 
        self.relu        = torch.nn.ReLU()
    
    def forward(self, x):
        x = self.relu(self.max_pooling(self.c1(x)))
        x = self.relu(self.max_pooling(self.c2(x)))
        
        # flatten the tensor.
        x = x.view(x.size(0), -1)
        
        x = self.d1(x)
        
        return x

If you want to know the size of a tensor or anything, just inspect it outside the class and fill the class.

In [99]:
del(model)
model = MNISTClassifier()

In [100]:
# let's take a look at the model's components.
model

MNISTClassifier(
  (c1): Conv2d(1, 10, kernel_size=(5, 5), stride=(1, 1))
  (c2): Conv2d(10, 20, kernel_size=(5, 5), stride=(1, 1))
  (d1): Linear(in_features=320, out_features=10, bias=True)
  (max_pooling): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
  (relu): ReLU()
)

### Loss & Optimizer Definition

In [101]:
criterion = torch.nn.CrossEntropyLoss()

In [102]:
optimizer = torch.optim.SGD(params=model.parameters(), lr=0.01, momentum=0.5)

### Training

In [103]:
def train(epochs):
    # set the model in training mode.
    model.train()
    
    for epoch in range(epochs):
        # let's loop over the train_loader batches.
        for batch_idx, (data, target) in enumerate(train_loader):
            data, target = Variable(data), Variable(target)
            optimizer.zero_grad()
            output = model(data)
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()

            if batch_idx % 10 == 0:
                print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(epoch, 
                                                                               batch_idx * len(data), 
                                                                               len(train_loader.dataset), 
                                                                               100. * batch_idx / len(train_loader), 
                                                                               loss.data[0]))

In [105]:
train(1)



In [106]:
def validate():
    
    # sets the model in evaluation mode.
    model.eval()
    
    test_loss = 0
    correct = 0
    
    for data, target in test_loader:
        data, target = Variable(data, volatile=True), Variable(target)
        output = model(data)
        
        # sum up batch loss.
        test_loss += criterion(output, target).data[0]
        # get the index of the max log probability.
        pred = torch.max(output.data, 1)[1]
        correct += pred.eq(target.data.view_as(pred)).cpu().sum()
    
    test_loss /= len(test_loader.dataset)
    print('\nValidation set Loss: {:.4f}, accuracy: {}/{} ({:.0f}%)\n'.format(test_loss, 
                                                                              correct, 
                                                                              len(test_loader.dataset), 
                                                                              100. * correct / len(test_loader.dataset)))

In [107]:
validate()


Validation set Loss: 0.0007, accuracy: 9714/10000 (97%)



## Exercice: Implement More CNN Layers
<img src="MoreCNNLayers.png" />