# QUESTION 6
## Parsa Daghigh
## Std. Num : 810101419

### Part A: CNNs and Translation Invariance
Convolutional Neural Networks (CNNs) are a class of deep learning models particularly well-suited for image processing tasks.

#### **Convolutional Neural Networks (CNNs)**
- Convolutional Layers: These layers apply convolution operations to the input, extracting local patterns such as edges, textures, and shapes.

- Pooling Layers: These layers perform down-sampling operations, reducing the spatial dimensions of the data and helping to achieve translation invariance.

- Fully Connected Layers: These layers connect every neuron in one layer to every neuron in the next layer, similar to traditional neural networks.

#### **Translation Invariance**
Translation invariance means that the model's ability to recognize objects or features is robust to shifts in the input image. For example, a CNN should be able to recognize a cat in an image regardless of where it appears. <br>

This property allows CNNs to be more robust and effective in tasks where the position of objects can vary, such as image classification and object detection.

### Part B: Components Contributing to Translation Invariance

- Convolutional Layers: By applying filters to local regions, convolutional layers capture spatial hierarchies of features that are invariant to position.

- Pooling Layers: Max pooling or average pooling layers reduce the spatial dimensions of the data, making the model less sensitive to the exact position of features.

- ReLU Activation: The use of ReLU activation functions introduces non-linearity, which helps the network learn more complex patterns.

- Strided Convolutions: Convolution operations with a stride greater than one help in achieving translational invariance by skipping over some positions in the input.

- Data Augmentation: Techniques like random cropping, rotation, and translation of images during training can further enhance translation invariance.

### Part C : MLP for MNIST dataset

### Imports

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
import torchvision.transforms.functional as TF

In [2]:

class MLP(nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.fc1 = nn.Linear(28*28, 512)
        self.fc2 = nn.Linear(512, 256)
        self.fc3 = nn.Linear(256, 10)

    def forward(self, x):
        x = x.view(-1, 28*28)
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = datasets.MNIST(root='./data', train=False, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

mlp_model = MLP()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(mlp_model.parameters(), lr=0.001)

epochs = 10
for epoch in range(epochs):
    mlp_model.train()
    for images, labels in train_loader:
        optimizer.zero_grad()
        outputs = mlp_model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

    mlp_model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in test_loader:
            outputs = mlp_model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = 100 * correct / total
    print(f'Epoch [{epoch+1}/{epochs}], Accuracy: {accuracy:.2f}%')


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9.91M/9.91M [00:11<00:00, 899kB/s] 


Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28.9k/28.9k [00:00<00:00, 63.7kB/s]


Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1.65M/1.65M [00:06<00:00, 241kB/s]


Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4.54k/4.54k [00:00<00:00, 7.14MB/s]


Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw

Epoch [1/10], Accuracy: 95.12%
Epoch [2/10], Accuracy: 96.64%
Epoch [3/10], Accuracy: 96.49%
Epoch [4/10], Accuracy: 97.56%
Epoch [5/10], Accuracy: 97.31%
Epoch [6/10], Accuracy: 96.53%
Epoch [7/10], Accuracy: 97.09%
Epoch [8/10], Accuracy: 97.57%
Epoch [9/10], Accuracy: 97.38%
Epoch [10/10], Accuracy: 97.89%


In [3]:
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1000, shuffle=False)

class LeNet5(nn.Module):
    def __init__(self):
        super(LeNet5, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, kernel_size=5, padding=2)
        self.pool = nn.AvgPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(6, 16, kernel_size=5)
        self.fc1 = nn.Linear(16*5*5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.sigmoid(self.conv1(x))
        x = self.pool(x)
        x = self.sigmoid(self.conv2(x))
        x = self.pool(x)
        x = x.view(-1, 16*5*5)
        x = self.sigmoid(self.fc1(x))
        x = self.sigmoid(self.fc2(x))
        x = self.fc3(x)
        return x

lenet_model = LeNet5()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(lenet_model.parameters(), lr=0.001)

epochs = 10
for epoch in range(epochs):
    lenet_model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        output = lenet_model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
    print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')

lenet_model.eval()
correct = 0
total = 0
with torch.no_grad():
    for data, target in test_loader:
        outputs = lenet_model(data)
        _, predicted = torch.max(outputs.data, 1)
        total += target.size(0)
        correct += (predicted == target).sum().item()

accuracy = 100 * correct / total
print(f'Test Accuracy: {accuracy:.2f}%')


Epoch [1/20], Loss: 0.4986
Epoch [2/20], Loss: 0.2091
Epoch [3/20], Loss: 0.0469
Epoch [4/20], Loss: 0.1186
Epoch [5/20], Loss: 0.1849
Epoch [6/20], Loss: 0.0101
Epoch [7/20], Loss: 0.0068
Epoch [8/20], Loss: 0.0203
Epoch [9/20], Loss: 0.0669
Epoch [10/20], Loss: 0.2684
Epoch [11/20], Loss: 0.0571
Epoch [12/20], Loss: 0.0137
Epoch [13/20], Loss: 0.0179
Epoch [14/20], Loss: 0.0556
Epoch [15/20], Loss: 0.0048
Epoch [16/20], Loss: 0.0060
Epoch [17/20], Loss: 0.1018
Epoch [18/20], Loss: 0.0076
Epoch [19/20], Loss: 0.1067
Epoch [20/20], Loss: 0.0169
Test Accuracy: 98.70%


### Part E and F: Residual Neural Networks (ResNet)
Advantages:
- Addressing Vanishing/Exploding Gradient Problem: Traditional deep networks suffer from vanishing or exploding gradients as they become deeper. ResNets address this issue by using residual connections, allowing gradients to flow directly through the network.

- Ease of Training: Residual connections make it easier to train very deep networks (e.g., hundreds of layers) by providing shortcut paths for gradients during backpropagation.

- Improved Accuracy: ResNets have achieved state-of-the-art performance on various benchmarks, including ImageNet, by allowing the construction of much deeper networks without degradation in accuracy.

- Effective Feature Learning: The network can learn more abstract and high-level features as depth increases, leading to improved performance on complex tasks.

#### **Unique Components of ResNet Compared to Traditional CNNs:**<br>
Residual Blocks: The key component of ResNet is the residual block, which includes identity connections (shortcut connections) that bypass one or more layers. This allows the network to learn residual functions rather than directly learning the underlying mapping.<br>

Identity Connection: The identity connection (shortcut) allows the input to be directly added to the output of the convolutional layers, ensuring that gradients can flow smoothly through the network.<br>


In [5]:
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1000, shuffle=False)

# Modify ResNet-18 for MNIST
class ModifiedResNet18(nn.Module):
    def __init__(self, num_classes=10):
        super(ModifiedResNet18, self).__init__()
        self.resnet = models.resnet18(weights=None)
        self.resnet.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.resnet.fc = nn.Linear(self.resnet.fc.in_features, num_classes)

    def forward(self, x):
        return self.resnet(x)

resnet_model = ModifiedResNet18()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(resnet_model.parameters(), lr=0.001)

epochs = 10
for epoch in range(epochs):
    resnet_model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        output = resnet_model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
    print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')

resnet_model.eval()
correct = 0
total = 0
with torch.no_grad():
    for data, target in test_loader:
        outputs = resnet_model(data)
        _, predicted = torch.max(outputs.data, 1)
        total += target.size(0)
        correct += (predicted == target).sum().item()

accuracy = 100 * correct / total
print(f'Test Accuracy: {accuracy:.2f}%')

Epoch [1/10], Loss: 0.2905
Epoch [2/10], Loss: 0.1376
Epoch [3/10], Loss: 0.0133
Epoch [4/10], Loss: 0.0002
Epoch [5/10], Loss: 0.0213
Epoch [6/10], Loss: 0.0059
Epoch [7/10], Loss: 0.0631
Epoch [8/10], Loss: 0.1254
Epoch [9/10], Loss: 0.0005
Epoch [10/10], Loss: 0.0021
Test Accuracy: 99.04%


### Part G

In [1]:
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)

# Select an image
sample_idx = 0
sample_image, sample_label = test_dataset[sample_idx]

# original image
plt.imshow(sample_image.squeeze(), cmap='gray')
plt.title(f'Original Image - Label: {sample_label}')
plt.show()

NameError: name 'transforms' is not defined

In [None]:
shift_pixels = 5
shifted_image = TF.affine(sample_image, angle=0, translate=(shift_pixels, 0), scale=1, shear=0)

# shifted image
plt.imshow(shifted_image.squeeze(), cmap='gray')
plt.title(f'Shifted Image - {shift_pixels} pixels to the right')
plt.show()

In [None]:
mlp_model.eval()
lenet_model.eval()
resnet_model.eval()

shifted_image = shifted_image.unsqueeze(0)

# Predict using MLP
with torch.no_grad():
    mlp_prediction = mlp_model(shifted_image)
    mlp_predicted_digit = torch.argmax(mlp_prediction, dim=1).item()

# Predict using LeNet
with torch.no_grad():
    lenet_prediction = lenet_model(shifted_image)
    lenet_predicted_digit = torch.argmax(lenet_prediction, dim=1).item()

# Predict using ResNet
with torch.no_grad():
    resnet_prediction = resnet_model(shifted_image)
    resnet_predicted_digit = torch.argmax(resnet_prediction, dim=1).item()

print(f'MLP Predicted Digit: {mlp_predicted_digit}')
print(f'LeNet Predicted Digit: {lenet_predicted_digit}')
print(f'ResNet Predicted Digit: {resnet_predicted_digit}')
