## Convolutional Neural Networks

### **Components of a CNN**

1. ***nn.Conv2d - Convolutional Layer*** 

 The Conv2d layer is the "pattern finder." It slides a small window (a kernel/filter) across the image to detect features like edges, textures, or shapes.


args:

- in_channels (int): The number of channels in the input image. (e.g., 3 for RGB color images, 1 for Grayscale)

- out_channels (int): The number of filters you want the layer to learn. This determines the "depth" of the output feature map

- kernel_size (int or tuple): The size of the sliding window. A common choice is 3 (for a $3 \times 3$ filter)

- stride (int or tuple, optional): The number of pixels the filter skips as it slides. Default is 1. A higher stride reduces the output size

- padding (int or tuple, optional): Adds "fake" pixels (usually zeros) around the border. This allows the filter to cover the edges of the image and can keep the output size the same as the input. Default is 0.2.


2. ***nn.MaxPool2d - Pooling Layer***

The MaxPool2d layer is the "condenser." Its job is to reduce the spatial size (height and width) of the image while keeping the most important information

args:

- kernel_size (int or tuple): The size of the window to take the maximum over. Usually 2 (for a $2 \times 2$ window)

- stride (int or tuple, optional): How far the window moves. Default is the same as the kernel_size

- padding (int or tuple, optional): Zero-padding added to both sides. Default is 0.

**Why use MaxPool?**

 It makes the model faster by reducing parameters and makes the network "translation invariant," meaning it can recognize a feature even if its position shifts slightly.

### **Architecture of a CNN**

| Hyperparameter/Layer type | What does it do? | Typical values |
| :--- | :--- | :--- |
| **Input image(s)** | Target images you'd like to discover patterns in | Whatever you can take a photo (or video) of |
| **Input layer** | Takes in target images and preprocesses them for further layers | `input_shape = [batch_size, image_height, image_width, color_channels]` (channels last) or `input_shape = [batch_size, color_channels, image_height, image_width]` (channels first) |
| **Convolution layer** | Extracts/learns the most important features from target images | Multiple, can create with `torch.nn.ConvXd()` (X can be multiple values) |
| **Hidden activation/non-linear activation** | Adds non-linearity to learned features (non-straight lines) | Usually ReLU (`torch.nn.ReLU()`), though can be many more |
| **Pooling layer** | Reduces the dimensionality of learned image features | Max (`torch.nn.MaxPool2d()`) or Average (`torch.nn.AvgPool2d()`) |
| **Output layer/linear layer** | Takes learned features and outputs them in shape of target labels | `torch.nn.Linear(out_features=[number_of_classes])` (e.g. 3 for pizza, steak or sushi) |
| **Output activation** | Converts output logits to prediction probabilities | `torch.sigmoid()` (binary classification) or `torch.softmax()` (multi-class classification) |