# Convolutional Neural Networks (CNNs)


## What are CNNs?

Convolutional Neural Networks (CNNs) are a type of deep learning architecture primarily designed for analyzing visual data. CNNs are particularly effective at extracting spatial and temporal hierarchies in images by applying a series of operations like convolutions, pooling, and non-linear activations.

Key applications of CNNs include:
- Image classification
- Object detection
- Semantic segmentation


## Core Components of CNNs

### Convolutional Layers
The convolutional layer applies filters (also known as kernels) to input data, extracting features like edges, textures, and patterns. Each filter slides over the input image, performing element-wise multiplication and summation.

**Key Concepts**:
- Stride: The step size for moving the filter across the input.
- Padding: Adding extra borders to the input to maintain dimensions.
- Number of Filters: Determines how many feature maps are created.

### Activation Functions
Non-linear activation functions like ReLU (Rectified Linear Unit) are applied to introduce non-linearity, allowing the model to learn complex patterns.

### Pooling Layers
Pooling reduces the spatial dimensions of feature maps, retaining essential information while reducing computational complexity. Common types include:
- Max Pooling: Retains the maximum value in a region.
- Average Pooling: Retains the average value in a region.

### Fully Connected Layers
These layers connect all neurons from the previous layer to classify the image based on the extracted features.

### Dropout
Dropout is a regularization technique that randomly disables neurons during training, reducing overfitting.


In [17]:
import torch
import torchvision
from torchvision import datasets, transforms
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt
from pdb import set_trace

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [8]:
nn.Conv2d?


## Loading and Preprocessing Data

In this example, we use the CIFAR-10 dataset, which contains 60,000 32x32 color images in 10 different classes.

### Transformations
- **ToTensor**: Converts images to tensors.
- **Normalization**: Scales pixel values to mean=1 and std=1 for faster convergence.


In [9]:
mean = 0
std = 1

In [42]:
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((mean, mean, mean), (std, std, std))
])

train_dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)


Files already downloaded and verified
Files already downloaded and verified



## Defining a CNN Model

This model consists of:
1. Three convolutional layers to extract features.
2. Two fully connected layers for classification.
3. ReLU activation for non-linearity.
4. Dropout for regularization.


In [43]:
for image, label in train_loader:
    print(image.shape)
    print(label)
    break

torch.Size([64, 3, 32, 32])
tensor([4, 6, 0, 4, 6, 2, 1, 4, 8, 6, 1, 5, 4, 5, 3, 1, 1, 9, 7, 3, 3, 9, 8, 6,
        1, 8, 2, 8, 1, 5, 8, 1, 1, 6, 6, 4, 3, 9, 7, 2, 8, 8, 0, 4, 0, 9, 1, 3,
        9, 2, 5, 0, 3, 3, 0, 8, 2, 5, 1, 6, 9, 3, 7, 3])


(Input height + 2 * padding - kernel size) / stride + 1

In [46]:
((32 + 2 * 1 - 3) / 1) +1

32.0

In [53]:
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(128 * 8 * 8, 256)
        self.fc2 = nn.Linear(256, 10)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.5)
    
    def forward(self, x):
#         set_trace()
        x = self.conv1(x)
        x = self.relu(x)
        x = self.conv2(x)
        x = self.relu(x)
        x = self.pool(x)
        x = self.conv3(x)
        x = self.relu(x)
        x = self.pool(x)
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        return x

In [54]:
device

device(type='cpu')

In [55]:
model = CNN().to(device)


## Training the Model

The model is trained using the Adam optimizer and cross-entropy loss.


In [56]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

In [57]:
for img, label in train_loader:
    print(img.shape)
    break

torch.Size([64, 3, 32, 32])


In [None]:
%%time
epochs = 4
for epoch in range(epochs):
    model.train()
    running_loss = 0.0
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
    
    print(f"Epoch {epoch+1}/{epochs}, Loss: {running_loss/len(train_loader):.4f}")


Epoch 1/4, Loss: 2.3036
Epoch 2/4, Loss: 2.3038



## Evaluating the Model

The trained model is evaluated on the test dataset to measure accuracy.


In [None]:
%%time

mdl.eval()
correct = 0
total = 0
with torch.no_grad():
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = mdl(inputs)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f"Test Accuracy: {100 * correct / total:.2f}%")
