### **Convolutional Neural Network**
> Also known as ...
- ConvNet
- CNN

**Unlike perceptrons, where each feature is considered independent, CNNs take locality into account.**
- CNNs also learn to "ignore" parts of an image which are not relevant to the classification task at hand

![](../images/typical_cnn_arch.png)

**Another typical CNN architecture**
![](../images/another_typical_cnn_arch.png)

### **A typical CNN in PyTorch**

In [1]:
import torch
import torch.nn as nn

In [None]:
class CNN(nn.Module):
  def __init__(self, num_classes):
    super().__init__()

    # ConvNet Layers
    self.conv_layers = torch.nn.Sequential(
      nn.Conv2d(...),
      nn.MaxPool2d(...),
      nn.Conv2d(...),
      nn.MaxPool2d(...)
    )

    # Fully connected layers
    self.fc_layers = torch.nn.Sequential(
      nn.Linear(24 * 16 * 16, 256),
      nn.ReLU(),
      nn.Linear(256, 128),
      nn.ReLU(),
      nn.Linear(128, num_classes)
    )
  
  def forward(self, x):
    features = self.conv_layers(x)
    features = torch.flatten(features, start_dim = 1)
    logits = self.fc_layers(features)
    return logits

**When we apply a convolutional layer to an input image, we create a feature map**
- In other words, we are sliding a kernel (filter) over the input image
  - This process is called convolution

![](../images/input_feature_map.png)

![](../images/slide_kernel.png)

- The inputs $x$'s differ as we slide over the image
- The weights $w$'s do not differ $\rightarrow$ **weight sharing**
  - A feature detector that works well in one region of the image may also work well in another region of the image
  - It reduces the complexity with a reduction of parameters to fit

### **Convolutional Layer**

In [2]:
layer = torch.nn.Conv2d(1, 1, kernel_size = 3)

In [3]:
layer.weight

Parameter containing:
tensor([[[[ 0.2099, -0.1161, -0.2832],
          [ 0.2440, -0.2846,  0.1624],
          [-0.1812,  0.1874,  0.1140]]]], requires_grad=True)

In [4]:
layer.bias

Parameter containing:
tensor([0.0253], requires_grad=True)

**$1$ input channel, $3$ output channels**
- On the left
  - 1 channel
  - 12 by 12 input size
- On the right
  - 3 channels
  - 10 by 10 size
- For each channel, we use a different set of weights
  - In other words, we use a different "feature detector" (kernel) to create multiple feature maps

![](../images/1_input_channel_3_output_channel.png)

![](../images/first_channel_convolution.png)

![](../images/second_channel_convolution.png)

![](../images/third_channel_convolution.png)

**Multiple input channels, single output channel**
- Compute one feature map value for each of these input channels
- Sum the values to compute feature map value for the output channel

![](../images/3_input_1_output_channels.png)

**Multiple input channels, multiple output channels**

![](../images/multiple_input_multiple_output_channels.png)

In [5]:
layer = torch.nn.Conv2d(in_channels = 3, out_channels = 5, kernel_size = 2)

In [6]:
layer.weight.shape

torch.Size([5, 3, 2, 2])