<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Chapter-3-Convolutional-Neural-Networks-(CNNs)" data-toc-modified-id="Chapter-3-Convolutional-Neural-Networks-(CNNs)-1">Chapter 3 Convolutional Neural Networks (CNNs)</a></span><ul class="toc-item"><li><ul class="toc-item"><li><span><a href="#Convolution-operator---OOP-way" data-toc-modified-id="Convolution-operator---OOP-way-1.0.1">Convolution operator - OOP way</a></span></li><li><span><a href="#Convolution-operator---Functional-way" data-toc-modified-id="Convolution-operator---Functional-way-1.0.2">Convolution operator - Functional way</a></span></li><li><span><a href="#Max-pooling-operator" data-toc-modified-id="Max-pooling-operator-1.0.3">Max-pooling operator</a></span></li><li><span><a href="#Your-first-CNN---init-method" data-toc-modified-id="Your-first-CNN---init-method-1.0.4">Your first CNN - <strong>init</strong> method</a></span></li><li><span><a href="#Your-first-CNN---forward()-method" data-toc-modified-id="Your-first-CNN---forward()-method-1.0.5">Your first CNN - forward() method</a></span></li></ul></li></ul></li></ul></div>

### Chapter 3 Convolutional Neural Networks (CNNs)

In [17]:
import torch
import torch.nn as nn

##### Convolution operator - OOP way
- in_channels (int) - Number of channels in input
- out_channels (int) - Number of channels produced by the convolution
- kernel_size (int or tuple) - Size of the convolving kernel
- stride(int or tuple, optional) - Stride of the convolution. Default: 1
- padding (int or tuple, optional) - Zero-padding

In [4]:
# create 10 images with shape (1, 28, 28)
images = torch.rand(10, 1, 28, 28)

# build 6 conv filters of size (3, 3) with stride set to 1 and padding set to 1
conv_filters = torch.nn.Conv2d(in_channels=1, out_channels=6, kernel_size=3, stride=1, padding=1)

# Convolve the image with filters
output_feature = conv_filters(images)
print(output_feature.shape)

torch.Size([10, 6, 28, 28])


##### Convolution operator - Functional way
- input – input tensor of shape (minibatch×in_channels×iH×iW)
- weight – lters of shape (out_channels×in_channels×kH×kW)
- stride – the stride ofthe convolving kernel. Can be a single number or a tuple (sH, sW). Default: 1
- padding - – implicit zero paddings on both sides of the input.Can be a single number or a tuple (padH, padW). Default: 0

In [5]:
import torch.nn.functional as F

# Create 10 random images with shape (1, 28, 28)
image = torch.rand(10, 1, 28, 28)

# Create 6 filters with shape (1, 3, 3)
filters = torch.rand(6, 1, 3, 3)

# Convolve the image with the filters
output_feature = F.conv2d(image, filters, stride = 1, padding=1)
print(output_feature.shape)

torch.Size([10, 6, 28, 28])


##### Max-pooling operator

The convolutions are used to extract features from the image, while the pooling is a way of feature selection, choosing the most dominant features from the image, or combining different features.

In [11]:
# Build a pooling operator with size `2`.
im = torch.rand(1, 1, 6, 6)
max_pooling = torch.nn.MaxPool2d(2)

# Apply the pooling operator
output_feature = max_pooling(im)

# Use pooling operator in the image
output_feature_F = F.max_pool2d(im, 2)

# print the results of both cases
print(output_feature)
print(output_feature_F)

tensor([[[[0.6571, 0.9306, 0.5576],
          [0.5193, 0.7947, 0.8003],
          [0.7877, 0.9205, 0.7234]]]])
tensor([[[[0.6571, 0.9306, 0.5576],
          [0.5193, 0.7947, 0.8003],
          [0.7877, 0.9205, 0.7234]]]])


In [12]:
# Build a pooling operator with size `2`.
avg_pooling = torch.nn.AvgPool2d(2)

# Apply the pooling operator
output_feature = avg_pooling(im)

# Use pooling operator in the image
output_feature_F = F.avg_pool2d(im, 2)

# print the results of both cases
print(output_feature)
print(output_feature_F)

tensor([[[[0.3073, 0.5481, 0.4819],
          [0.3767, 0.4234, 0.5605],
          [0.6687, 0.5435, 0.6187]]]])
tensor([[[[0.3073, 0.5481, 0.4819],
          [0.3767, 0.4234, 0.5605],
          [0.6687, 0.5435, 0.6187]]]])


##### Your first CNN - __init__ method
You're going to use the MNIST dataset as the dataset, which is made of handwritten digits from 0 to 9 (28* 28）. The convolutional neural network is going to have 2 convolutional layers, each followed by a ReLU nonlinearity, and a fully connected layer. Remember that each pooling layer halves both the height and the width of the image, so by using 2 pooling layers, the height and width are 1/4 of the original sizes.

For the moment, you are going to implement the __init__ method of the net. In the next exercise, you will implement the .forward() method.

*Note: We need 2 pooling layers, but we only need to instantiate a pooling layer once, because each pooling layer will have the same configuration.*

In [15]:
import torchvision
import torchvision.datasets as datasets
mnist_trainset = datasets.MNIST(root='./data', train=True, download=True, transform=True)
mnist_testset = datasets.MNIST(root='./data', train=False, download=True, transform=None)

##### Your first CNN - forward() method

In [19]:
class Net(nn.Module):
    def __init__(self, num_classes):
        super(Net, self).__init__()

        # Instantiate the ReLU nonlinearity
        self.relu = nn.ReLU()
        
        # Instantiate two convolutional layers
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=5, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(in_channels=5, out_channels=10, kernel_size=3, padding=1)
        
        # Instantiate a max pooling layer
        self.pool = nn.MaxPool2d(2, 2)
        
        # Instantiate a fully connected layer
        self.fc = nn.Linear(7 * 7 * 10, 10)

    def forward(self, x):

        # Apply conv followd by relu, then in next line pool
        x = self.relu(self.conv1(x))
        x = self.pool(x)

        # Apply conv followd by relu, then in next line pool
        x = self.relu(self.conv2(x))
        x = self.pool(x)

        # Prepare the image for the fully connected layer
        x = x.view(-1, 7 * 7 * 10)

        # Apply the fully connected layer and return the result
        return self.fc(x)