## Task 1 (10 Points)

Select padding sizes:

In [None]:
import torch

N = 4
C = 3
C_out = 10
H = 8
W = 16

x = torch.ones((N, C, H, W))

# torch.Size([4, 10, 8, 16])
out1 = torch.nn.Conv2d(C, C_out, kernel_size=(3, 3), padding=(1, 1))(x)
print(out1.shape) # for self-test

# torch.Size([4, 10, 8, 16])
out2 = torch.nn.Conv2d(C, C_out, kernel_size=(5, 5), padding=(2, 2))(x)
print(out2.shape) # for self-test

# torch.Size([4, 10, 8, 16])
out3 = torch.nn.Conv2d(C, C_out, kernel_size=(7, 7), padding=(3, 3))(x)
print(out3.shape) # for self-test

# torch.Size([4, 10, 8, 16])
out4 = torch.nn.Conv2d(C, C_out, kernel_size=(9, 9), padding=(4, 4))(x)
print(out4.shape) # for self-test

# torch.Size([4, 10, 8, 16])
out5 = torch.nn.Conv2d(C, C_out, kernel_size=(3, 5), padding=(1, 2))(x)
print(out5.shape) # for self-test

# torch.Size([4, 10, 22, 30])
out6 = torch.nn.Conv2d(C, C_out, kernel_size=(3, 3), padding=(8, 8))(x)
print(out6.shape) # for self-test

# torch.Size([4, 10, 7, 15])
out7 = torch.nn.Conv2d(C, C_out, kernel_size=(4, 4), padding=(1, 1))(x)
print(out7.shape) # for self-test

# torch.Size([4, 10, 9, 17])
out8 = torch.nn.Conv2d(C, C_out, kernel_size=(2, 2), padding=(1, 1))(x)
print(out8.shape) # for self-test

## Task 2 (40 Points)

Develop an architecture according to the data from the article.
To test the functionality, test your architecture on any suitable data set.

### Architectural Design Strategies
**Strategy 1.** Replace 3×3 filters with 1×1 filters
Given a budget of a certain number of convolution filters, we can choose to make the majority of these filters 1×1, since a 1×1 filter has 9× fewer parameters than a 3×3 filter.

**Strategy 2.** Decrease the number of input channels to 3×3 filters
Consider a convolution layer that is comprised entirely of 3×3 filters. The total quantity of parameters in this layer is:
(number of input channels) × (number of filters) × (3×3)
We can decrease the number of input channels to 3×3 filters using squeeze layers, mentioned in the next section.

**Strategy 3.** Downsample late in the network so that convolution layers have large activation maps
The intuition is that large activation maps (due to delayed downsampling) can lead to higher classification accuracy.

### Fire Module
![](https://miro.medium.com/v2/resize:fit:930/format:webp/1*ONk0HfLLjDcUhUjuu8iq1w.png)
A Fire module is comprised of: a squeeze convolution layer (which has only 1×1 filters), feeding into an expand layer that has a mix of 1×1 and 3×3 convolution filters.

There are three tunable dimensions (hyperparameters) in a Fire module: s1×1, e1×1, and e3×3.

s1×1: The number of 1×1 in squeeze layer.

e1×1 and e3×3: The number of 1×1 and 3×3 in expand layer.

When we use Fire modules we set s1×1 to be less than (e1×1 + e3×3), so the squeeze layer helps to limit the number of input channels to the 3×3 filters, as per Strategy 2 in previous section.
To me, it is quite a like of Inception Module.

![](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*y87bqk95D-IndWdHM_K9-g.png)
![](https://miro.medium.com/v2/resize:fit:4800/format:webp/1*XQGAKZb8kjoF_1lSXeIQxg.png)

## Step 0. Data preparation.

In [None]:
!ls "/content/drive/My Drive/Neural Networks/archive/"

cars_annos.mat	cars_test  cars_train


In [None]:
import scipy.io
mat_data = scipy.io.loadmat(anno_path)

# Extract class names
class_names = [name[0] for name in mat_data['class_names'][0]]

for idx, name in enumerate(class_names, start=1):
    print(f"{idx}. {name}")


1. AM General Hummer SUV 2000
2. Acura RL Sedan 2012
3. Acura TL Sedan 2012
4. Acura TL Type-S 2008
5. Acura TSX Sedan 2012
6. Acura Integra Type R 2001
7. Acura ZDX Hatchback 2012
8. Aston Martin V8 Vantage Convertible 2012
9. Aston Martin V8 Vantage Coupe 2012
10. Aston Martin Virage Convertible 2012
11. Aston Martin Virage Coupe 2012
12. Audi RS 4 Convertible 2008
13. Audi A5 Coupe 2012
14. Audi TTS Coupe 2012
15. Audi R8 Coupe 2012
16. Audi V8 Sedan 1994
17. Audi 100 Sedan 1994
18. Audi 100 Wagon 1994
19. Audi TT Hatchback 2011
20. Audi S6 Sedan 2011
21. Audi S5 Convertible 2012
22. Audi S5 Coupe 2012
23. Audi S4 Sedan 2012
24. Audi S4 Sedan 2007
25. Audi TT RS Coupe 2012
26. BMW ActiveHybrid 5 Sedan 2012
27. BMW 1 Series Convertible 2012
28. BMW 1 Series Coupe 2012
29. BMW 3 Series Sedan 2012
30. BMW 3 Series Wagon 2012
31. BMW 6 Series Convertible 2007
32. BMW X5 SUV 2007
33. BMW X6 SUV 2012
34. BMW M3 Coupe 2012
35. BMW M5 Sedan 2010
36. BMW M6 Convertible 2010
37. BMW X3 SUV 20

In [None]:
from google.colab import drive
import os
import scipy.io
import torch
from torch.utils.data import Dataset, DataLoader
from torchvision import datasets, transforms
from PIL import Image
import torch.nn as nn
import torch.nn.functional as F

data_path = '/content/drive/My Drive/Neural Networks/archive/'
train_data_path = os.path.join(data_path, 'cars_train/cars_train')
test_data_path = os.path.join(data_path, 'cars_test/cars_test')
anno_path = os.path.join(data_path, 'cars_annos.mat')

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.Lambda(lambda image: image.convert("RGB")),
    transforms.ToTensor(),
])

class CustomImageDataset(Dataset):
    def __init__(self, directory, anno_path, transform=None):
        self.directory = directory
        self.transform = transform
        self.image_list = sorted(os.listdir(self.directory))

        # Load annotations
        annotations = scipy.io.loadmat(anno_path)['annotations'][0]
        self.labels = {anno[0][0].split('/')[-1]: int(anno[-2][0][0] - 1) for anno in annotations}

    def __len__(self):
        return len(self.image_list)

    def __getitem__(self, idx):
        img_name = os.path.join(self.directory, self.image_list[idx])
        image = Image.open(img_name)
        if self.transform:
            image = self.transform(image)
        label = self.labels.get(self.image_list[idx], 0)
        return image, label

# Load datasets
train_dataset = CustomImageDataset(train_data_path, anno_path, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

test_dataset = CustomImageDataset(test_data_path, anno_path, transform=transform)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

## Step 1. Neural network architecture

In [None]:
import torch.nn as nn
import torch.nn.functional as F

class Fire(nn.Module):
    def __init__(self, in_channels, s1x1, e1x1, e3x3):
        super(Fire, self).__init__()
        self.squeeze = nn.Conv2d(in_channels, s1x1, kernel_size=1)
        self.expand1x1 = nn.Conv2d(s1x1, e1x1, kernel_size=1)
        self.expand3x3 = nn.Conv2d(s1x1, e3x3, kernel_size=3, padding=1)
    def forward(self, x):
        x = F.relu(self.squeeze(x))
        return torch.cat([F.relu(self.expand1x1(x)), F.relu(self.expand3x3(x))], 1)

class SqueezeNet(nn.Module):
    def __init__(self, num_classes=196):
        super(SqueezeNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 96, kernel_size=7, stride=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True),
            Fire(96, 16, 64, 64),
            Fire(128, 16, 64, 64),
            Fire(128, 32, 128, 128),
            nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True),
            Fire(256, 32, 128, 128),
            Fire(256, 48, 192, 192),
            Fire(384, 48, 192, 192),
            Fire(384, 64, 256, 256),
            nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True),
            Fire(512, 64, 256, 256),
        )
        self.classifier = nn.Sequential(
            nn.Dropout(p=0.5),
            nn.Conv2d(512, num_classes, kernel_size=1),
            nn.ReLU(inplace=True),
            nn.AdaptiveAvgPool2d((1, 1))
        )
    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return torch.flatten(x, 1)

model = SqueezeNet().to(device)


## Step 2.  Loss Function

In [None]:
loss_func = nn.CrossEntropyLoss()


## Step 3. Optimizer

In [None]:
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)


## Step 4. Train Loop

In [None]:
num_epochs = 3
model.train()
for epoch in range(num_epochs):
    epoch_loss = 0.0
    correct = 0
    total = 0

    for batch_idx, (images, labels) in enumerate(train_loader):
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(images)
        _, predicted = outputs.max(1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
        loss = loss_func(outputs, labels)
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()

    accuracy = 100 * correct / total
    print(f"Epoch [{epoch + 1}/{num_epochs}], Loss: {epoch_loss / len(train_loader):.4f}, Accuracy: {accuracy:.2f}%")

print("Training completed.")

model.eval()
correct_test = 0
total_test = 0
with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predicted = outputs.max(1)
        total_test += labels.size(0)
        correct_test += predicted.eq(labels).sum().item()

test_acc = 100. * correct_test / total_test