## Notes

In [None]:
"""
Notes from the paper:

The Alexnet paper used Convolutional Neural Networks to win the ImageNet competition in 2012.

Goal:
Image Classification

Dataset Used:
Imagenet-1000
Imagenet is a 15 million labelled high-resolution (Relatively speeaking, compared to NIST which was 28 x28, this is 256 x 256) images in 22,000 categories.
The 1000 category subset was used for this paper. which is over 1.2 Million images lol. That's ALOT. This might be the biggest dataset I have ever worked with.

Method Used:
Convolution layers, occasionally followed by max-pooling layers. The final layers are fully connected layers, with Dropout layers in between.
Ends with a 1000-way softmax layer.

Convolution dimension calculation:
https://madebyollin.github.io/convnet-calculator/

Architecture:
The input is originally 256 x 256, but is cropped to 224 x 224, and then fed into the network (data augmentation).
Input (3 x 224, 224)
DONE
-------------------------------------------------------------------------------------------------------------------------------------------
- Convolutional Layer 1
    GPU1 - (96 filters, 11 x 11, stride 4, padding 0) -> (output dim: (96, 55, 55))
    GPU2 - (96 filters, 11 x 11, stride 4, padding 0) -> (output dim: (96, 55, 55))
- Max Pooling Layer 1
    GPU1 - (3 x 3, stride 2) -> (output dim: (96, 27, 27))
    GPU2 - (3 x 3, stride 2) -> (output dim: (96, 27, 27))
- Convolutional Layer 2
    GPU1 - (256 filters, 5 x 5, stride 1, padding 2) -> (output dim: (256, 27, 27))
    GPU2 - (256 filters, 5 x 5, stride 1, padding 2) -> (output dim: (256, 27, 27))
- Max Pooling Layer 2
    GPU1 - (3 x 3, stride 2) -> (output dim: (256, 13, 13))
    GPU2 - (3 x 3, stride 2) -> (output dim: (256, 13, 13))
- Convolutional Layer 3
    GPU1 - (384 filters, 3 x 3, stride 1, padding 1) -> (output dim: (384, 13, 13))
    GPU2 - (384 filters, 3 x 3, stride 1, padding 1) -> (output dim: (384, 13, 13))
- Convolutional Layer 4
    GPU1 - (384 filters, 3 x 3, stride 1, padding 1) -> (output dim: (384, 13, 13))
    GPU2 - (384 filters, 3 x 3, stride 1, padding 1) -> (output dim: (384, 13, 13))
- Convolutional Layer 5
    GPU1 - (256 filters, 3 x 3, stride 1, padding 1) -> (output dim: (256, 13, 13))
    GPU2 - (256 filters, 3 x 3, stride 1, padding 1) -> (output dim: (256, 13, 13))
- Max Pooling Layer 3
    GPU1 - (3 x 3, stride 2) -> (output dim: (256, 6, 6))
    GPU2 - (3 x 3, stride 2) -> (output dim: (256, 6, 6)) # Flattened this becomes 9216 neurons
- Dropout Layer 1
    Dropout 0.5
- Fully Connected Layer 1
    GPU1 - (9216 neurons) -> (output dim: (4096))
    GPU2 - (9216 neurons) -> (output dim: (4096))
- Fully Connected Layer 2
    GPU1 - (4096 neurons) -> (output dim: (4096))
    GPU2 - (4096 neurons) -> (output dim: (4096))
- Fully Connected Layer 3
    4096 Neurons -> (output dim: (1000))
- Softmax Layer
    1000 Neurons -> (output dim: (1000))

Keep in mind this:

```
Now we are ready to describe the overall architecture of our CNN. As depicted in Figure 2, the net
contains eight layers with weights; the first five are convolutional and the remaining three are fullyconnected. The output of the last fully-connected layer is fed to a 1000-way softmax which produces
a distribution over the 1000 class labels. Our network maximizes the multinomial logistic regression
objective, which is equivalent to maximizing the average across training cases of the log-probability
of the correct label under the prediction distribution.
The kernels of the second, fourth, and fifth convolutional layers are connected only to those kernel
maps in the previous layer which reside on the same GPU (see Figure 2). The kernels of the third
convolutional layer are connected to all kernel maps in the second layer. The neurons in the fullyconnected layers are connected to all neurons in the previous layer. Response-normalization layers
follow the first and second convolutional layers. Max-pooling layers, of the kind described in Section
3.4, follow both response-normalization layers as well as the fifth convolutional layer. The ReLU
non-linearity is applied to the output of every convolutional and fully-connected layer
```

Training Parameters / Hyperparamters:
- Data Augmentation: Randomly cropped 224x224 patches from the 256x256 images, and horizontally mirroring them.
    - This means that on test time, the image is resized to 256x256, and then 5 224x224 patches are cropped from it, and mirrored, and the network is run on all of them. The final prediction is the average of the 10 predictions.
- They wrote a Cuda ConvNet from scratch to train the network. BASED
- SGD with momentum 0.9 and weight decay 0.0005
- Batch Size: 128

Metrics Defined:
Error Rate
- Number of misclassified test samples / Total number of test samples

Top 1 vs top 5 error rate
- Top 1 error rate is the number of test samples for which the correct label is not among the top 1 predicted labels
- Top 5 error rate is the number of test samples for which the correct label is not among the top 5 predicted labels

Results:
- Top-1 error rate: 37.5%
- Top-5 error rate: 17.0%
"""

In [8]:
!wget "http://aisdatasets.informatik.uni-freiburg.de/freiburg_groceries_dataset/freiburg_groceries_dataset.tar.gz"

--2025-03-05 10:05:03--  http://aisdatasets.informatik.uni-freiburg.de/freiburg_groceries_dataset/freiburg_groceries_dataset.tar.gz
Resolving aisdatasets.informatik.uni-freiburg.de (aisdatasets.informatik.uni-freiburg.de)... 132.230.105.132
Connecting to aisdatasets.informatik.uni-freiburg.de (aisdatasets.informatik.uni-freiburg.de)|132.230.105.132|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 541562880 (516M) [application/x-gzip]
Saving to: ‘freiburg_groceries_dataset.tar.gz’


2025-03-05 10:05:29 (19.8 MB/s) - ‘freiburg_groceries_dataset.tar.gz’ saved [541562880/541562880]



In [9]:
!tar -xvf "freiburg_groceries_dataset.tar.gz"

images/
images/BEANS/
images/BEANS/BEANS0000.png
images/BEANS/BEANS0001.png
images/BEANS/BEANS0002.png
images/BEANS/BEANS0003.png
images/BEANS/BEANS0004.png
images/BEANS/BEANS0005.png
images/BEANS/BEANS0006.png
images/BEANS/BEANS0007.png
images/BEANS/BEANS0008.png
images/BEANS/BEANS0009.png
images/BEANS/BEANS0010.png
images/BEANS/BEANS0011.png
images/BEANS/BEANS0012.png
images/BEANS/BEANS0013.png
images/BEANS/BEANS0014.png
images/BEANS/BEANS0015.png
images/BEANS/BEANS0016.png
images/BEANS/BEANS0017.png
images/BEANS/BEANS0018.png
images/BEANS/BEANS0019.png
images/BEANS/BEANS0020.png
images/BEANS/BEANS0021.png
images/BEANS/BEANS0022.png
images/BEANS/BEANS0023.png
images/BEANS/BEANS0024.png
images/BEANS/BEANS0025.png
images/BEANS/BEANS0026.png
images/BEANS/BEANS0027.png
images/BEANS/BEANS0028.png
images/BEANS/BEANS0029.png
images/BEANS/BEANS0030.png
images/BEANS/BEANS0031.png
images/BEANS/BEANS0032.png
images/BEANS/BEANS0033.png
images/BEANS/BEANS0034.png
images/BEANS/BEANS0035.png
images

In [10]:
# Import image paths in a way where we have a dataframe with the structure --> Path|Folder_name (aka label)
import glob
import pandas as pd

In [11]:
glob.glob("images/*/*.png")

['images/CHIPS/CHIPS0056.png',
 'images/CHIPS/CHIPS0152.png',
 'images/CHIPS/CHIPS0046.png',
 'images/CHIPS/CHIPS0172.png',
 'images/CHIPS/CHIPS0018.png',
 'images/CHIPS/CHIPS0072.png',
 'images/CHIPS/CHIPS0115.png',
 'images/CHIPS/CHIPS0016.png',
 'images/CHIPS/CHIPS0063.png',
 'images/CHIPS/CHIPS0134.png',
 'images/CHIPS/CHIPS0169.png',
 'images/CHIPS/CHIPS0099.png',
 'images/CHIPS/CHIPS0132.png',
 'images/CHIPS/CHIPS0163.png',
 'images/CHIPS/CHIPS0114.png',
 'images/CHIPS/CHIPS0039.png',
 'images/CHIPS/CHIPS0100.png',
 'images/CHIPS/CHIPS0118.png',
 'images/CHIPS/CHIPS0095.png',
 'images/CHIPS/CHIPS0071.png',
 'images/CHIPS/CHIPS0148.png',
 'images/CHIPS/CHIPS0068.png',
 'images/CHIPS/CHIPS0061.png',
 'images/CHIPS/CHIPS0087.png',
 'images/CHIPS/CHIPS0145.png',
 'images/CHIPS/CHIPS0167.png',
 'images/CHIPS/CHIPS0011.png',
 'images/CHIPS/CHIPS0130.png',
 'images/CHIPS/CHIPS0034.png',
 'images/CHIPS/CHIPS0041.png',
 'images/CHIPS/CHIPS0112.png',
 'images/CHIPS/CHIPS0079.png',
 'images

In [12]:
images = dict()
# Dictionary where structure is {Folder name: [Image1, Image2,...]}

folder_names = glob.glob("images/*")

In [13]:
image_paths = glob.glob("images/*/*")
labels = pd.Series(image_paths)

In [14]:
labels = labels.str.split(pat="/", expand=True)

In [15]:
labels

Unnamed: 0,0,1,2
0,images,CHIPS,CHIPS0056.png
1,images,CHIPS,CHIPS0152.png
2,images,CHIPS,CHIPS0046.png
3,images,CHIPS,CHIPS0172.png
4,images,CHIPS,CHIPS0018.png
...,...,...,...
4942,images,CAKE,CAKE0045.png
4943,images,CAKE,CAKE0082.png
4944,images,CAKE,CAKE0087.png
4945,images,CAKE,CAKE0124.png


In [16]:
image_paths = pd.DataFrame(image_paths)
image_paths

Unnamed: 0,0
0,images/CHIPS/CHIPS0056.png
1,images/CHIPS/CHIPS0152.png
2,images/CHIPS/CHIPS0046.png
3,images/CHIPS/CHIPS0172.png
4,images/CHIPS/CHIPS0018.png
...,...
4942,images/CAKE/CAKE0045.png
4943,images/CAKE/CAKE0082.png
4944,images/CAKE/CAKE0087.png
4945,images/CAKE/CAKE0124.png


In [17]:
images_dict = dict()
images_dict["image_paths"] = image_paths[0].values
images_dict["labels"] = labels[1].values
images_dict

{'image_paths': array(['images/CHIPS/CHIPS0056.png', 'images/CHIPS/CHIPS0152.png',
        'images/CHIPS/CHIPS0046.png', ..., 'images/CAKE/CAKE0087.png',
        'images/CAKE/CAKE0124.png', 'images/CAKE/CAKE0058.png'],
       dtype=object),
 'labels': array(['CHIPS', 'CHIPS', 'CHIPS', ..., 'CAKE', 'CAKE', 'CAKE'],
       dtype=object)}

In [18]:
dataset = pd.DataFrame(images_dict)

In [19]:
dataset

Unnamed: 0,image_paths,labels
0,images/CHIPS/CHIPS0056.png,CHIPS
1,images/CHIPS/CHIPS0152.png,CHIPS
2,images/CHIPS/CHIPS0046.png,CHIPS
3,images/CHIPS/CHIPS0172.png,CHIPS
4,images/CHIPS/CHIPS0018.png,CHIPS
...,...,...
4942,images/CAKE/CAKE0045.png,CAKE
4943,images/CAKE/CAKE0082.png,CAKE
4944,images/CAKE/CAKE0087.png,CAKE
4945,images/CAKE/CAKE0124.png,CAKE


In [20]:
len(dataset)



4947

In [21]:
# Torch imports
import torch
import torchvision
from torchvision.transforms import ToTensor, Lambda, Compose, v2
from torch.nn.functional import one_hot
from torch.utils.data import DataLoader, Dataset

In [22]:
class groceries_dataset_class(Dataset):
  def __init__(self, dataframe, transform = None):
    self.dataframe = dataframe
    self.transform = transform
    # TODO: Encode labels to numerical values
  def __len__(self):
    return len(self.dataframe)

  def __getitem__(self, idx):
    image = torchvision.io.read_image(self.dataframe["image_paths"][idx])
    label = self.dataframe["labels"][idx]
    if self.transform:
      image = self.transform(image)
    return image, label


In [31]:
batch_size = 256
transforms = Compose([
    v2.RandomResizedCrop(size=(227, 227), antialias=True),
    v2.RandomHorizontalFlip(p=0.5),
    v2.ToDtype(torch.float32, scale=True),
    v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
groceries_dataset = groceries_dataset_class(dataset, transform=transforms)
train_loader = DataLoader(groceries_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(groceries_dataset, batch_size=batch_size, shuffle=False)

In [32]:
# Test dataset
image, label = groceries_dataset[3]

In [49]:
image.shape

torch.Size([1, 3, 227, 227])

## NN Model, Hyperparameters

In [26]:
import torch.nn as nn
import torch.nn.functional as F

In [40]:
class AlexNet(nn.Module):
  def __init__(self):
    super(AlexNet, self).__init__()
    self.conv1 = nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=0) # 96, 55, 55
    self.conv2 = nn.Conv2d(96, 256, kernel_size=5, stride=1, padding=2) # 256, 27, 27
    self.conv3 = nn.Conv2d(256, 384, kernel_size=3, stride=1, padding=1)
    self.conv4 = nn.Conv2d(384, 384, kernel_size=3, stride=1, padding=1)
    self.conv5 = nn.Conv2d(384, 256, kernel_size=3, stride=1, padding=1)
    self.drop = nn.Dropout(p=0.5)
    self.fc6 = nn.Linear(6*6*256, 4096)
    self.fc7 = nn.Linear(4096, 4096)
    self.fc8 = nn.Linear(4096, 1000)
    self.fc9 = nn.Linear(1000, 256)

  def forward(self, x):
    x = F.relu(self.conv1(x))
    x = F.max_pool2d(x, kernel_size=3, stride=2)
    x = F.relu(self.conv2(x))
    x = F.max_pool2d(x, kernel_size=3, stride=2)
    x = F.relu(self.conv3(x))
    x = F.relu(self.conv4(x))
    x = F.relu(self.conv5(x))
    x = F.max_pool2d(x, kernel_size=3, stride=2)
    x = x.view(-1, 256*6*6)
    x = F.relu(self.fc6(x))
    x = self.drop(x)
    x = F.relu(self.fc7(x))
    x = self.drop(x)
    x = F.relu(self.fc8(x))
    x = self.fc9(x)
    return x

In [42]:
import torch.optim

## Hyperparameters

In [45]:
model = AlexNet().cuda()
epochs = 90
loss = nn.CrossEntropyLoss( )
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=0.0005)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)

In [57]:
#TODO: Accuracy function + Training loop. Don't forget to include your optimizer.step and scheduler.step

def accuracy(y_pred, y_true):
  y_pred_classes = y_pred.argmax(dim=1)
  return (y_pred_classes == y_true).float().mean()

data_iter = iter(train_loader)
image, label = next(data_iter)
len(label)

256

In [54]:
for batch in train_loader:
    print(type(batch), len(batch))  # Check if it's a tuple & how many items it has
    print([type(x) for x in batch])  # Print the type of each element
    break  # Just look at one batch

<class 'list'> 2
[<class 'torch.Tensor'>, <class 'tuple'>]


In [47]:
for epoch in range(epochs):
    model.train()
    train_loss_val = 0.0
    train_acc_val = 0.0

    # Training loop
    for images, labels in train_loader:
        images, labels = images.cuda(), labels.cuda()
        optimizer.zero_grad()
        outputs = model(images)
        train_loss = loss(outputs, labels)
        train_loss.backward()
        optimizer.step()

        train_loss_val += train_loss.item()
        train_acc_val += accuracy(outputs, labels).item()

    train_loss_val /= len(train_loader)
    train_acc_val /= len(train_loader)

    # Evaluation loop
    model.eval()
    test_loss_val = 0.0
    test_acc_val = 0.0
    with torch.no_grad():
        for images, labels in test_loader:
            images, labels = images.cuda(), labels.cuda()
            outputs = model(images)
            test_loss_val += loss(outputs, labels).item()
            test_acc_val += accuracy(outputs, labels).item()

    test_loss_val /= len(test_loader)
    test_acc_val /= len(test_loader)

    print(f"Epoch {epoch + 1}: Test Loss = {test_loss_val:.6f}, Train Loss = {train_loss_val:.6f}")
    print(f"Epoch {epoch + 1}: Test Accuracy = {test_acc_val:.4f}, Train Accuracy = {train_acc_val:.4f}")
    scheduler.step()

AttributeError: 'tuple' object has no attribute 'cuda'