# Image Classification Networks
This notebook contains a few models that work well for image classification. Feel free to use and experiment with them for your project.

In [1]:
import torch
from torch import nn
from torch import optim
import torchvision
from torchvision import transforms

## Retrieving the dataset
I will be loading the CIFAR-10 dataset, a popular dataset for image classification. You can find more information about it at https://www.cs.toronto.edu/~kriz/cifar.html.

In [2]:
# Transform for images in dataset
transform = transforms.Compose([
    transforms.ToTensor(), # Convert image to tensor
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) # Normalize image
])

Pytorch has the CIFAR10 dataset built in, with the functionality to download it automatically. Not all datasets will have this convenience. Refer to https://pytorch.org/tutorials/beginner/data_loading_tutorial.html for creating a custom dataset by extending the standard pytorch dataset class.

In [3]:
# Loading the data using pytorch's built in CIFAR10 dataset
train_data = torchvision.datasets.CIFAR10(root='../data/CIFAR10', train=True, download=True , transform=transform)
test_data = torchvision.datasets.CIFAR10(root='../data/CIFAR10', train=False, download=True, transform=transform)

# Summary of parameters
#    root: The root directory where the dataset exists or will be downloaded to.
#    train: If True, it will pull from the training set, otherwise it pulls from the test set.
#    download: If True, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
#    transform: The default transform for the data. In this case, we will use the transform we defined in the cell above.
#    Not used here, but target_transform: A function that takes in the target and transforms it.

Files already downloaded and verified
Files already downloaded and verified


In [4]:
# Creating dataloaders to pass into the model
train_loader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True, num_workers=2)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=64, shuffle=False, num_workers=2)

# Summary of parameters
#    dataset: The dataset to load the data from.
#    batch_size: The size of each batch of data to load into the model.
#    shuffle: If True, shuffles the data at every epoch.
#    num_workers: The number of subprocesses to use for data loading. 0 means that the data will be loaded in the main process.

## Creating the Models

### Standard Convolutional Neural Network
This is the architecture for a standard image processing model using convolutions. You can find more information about it at https://towardsdatascience.com/convolutional-neural-networks-explained-9cc5188c4939.

In [5]:
class CNN(nn.Module):
    def __init__(self, input_channels = 3, output_classes = 10): # Define the layers of the network
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(input_channels, 16, kernel_size=3, stride=1, padding=1)
        self.relu = nn.ReLU()
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1)
        self.fc1 = nn.Linear(32 * 8 * 8, 128)
        self.fc2 = nn.Linear(128, output_classes)

    def forward(self, x): # Process an image through the network and output a prediction
        x = self.conv1(x) # Convolutional layer
        x = self.relu(x) # Activation function
        x = self.pool(x) # Pooling layer
        x = self.conv2(x) # Second convolutional layer
        x = self.relu(x) # Activation function
        x = self.pool(x) # Second pooling layer
        x = x.view(-1, 32 * 8 * 8) # Flatten the image
        x = self.fc1(x) # Fully connected layer
        x = self.relu(x) # Activation function
        x = self.fc2(x) # Second fully connected layer
        return x

### Residual Neural Network
This is an improvement to the standard Convolutional Neural Network. You can find more information at https://datagen.tech/guides/computer-vision/resnet/.

In [6]:
# Define the basic building blocks of ResNet
class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_channels, out_channels, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.downsample = downsample

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        return out

# Define the ResNet architecture
class ResNet(nn.Module):
    def __init__(self, block, layers, input_channels = 3, output_classes = 10):
        super(ResNet, self).__init__()
        self.in_channels = 64
        self.conv1 = nn.Conv2d(input_channels, 64, kernel_size=7, stride=2, padding=3, bias=False) 
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512 * block.expansion, output_classes)

    def _make_layer(self, block, out_channels, blocks, stride=1):
        downsample = None
        if stride != 1 or self.in_channels != out_channels * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channels, out_channels * block.expansion, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels * block.expansion)
            )

        layers = []
        layers.append(block(self.in_channels, out_channels, stride, downsample))
        self.in_channels = out_channels * block.expansion
        for _ in range(1, blocks):
            layers.append(block(self.in_channels, out_channels))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)

        return x

!!! Many models, including various versions of ResNet beyond the one implemented above, are also built in to pytorch. Refer to https://pytorch.org/vision/stable/models.html#classification for a full list of built-in image classification models. !!!

## Training and Testing the Models

### CNN

Hyperparameters
- CNN parameters input_channels and output_classes default to 3 and 10 respectfully. This is becuase in the dataset we are using, the images passed into the model have 3 color channels (RGB), and there are 10 possible output labels.
- Cross Entropy Loss is an ideal loss function for image processing. Look into loss functions that would work well for your project.
- Learning rate can be experimented with.
- I am using sochastic gradient descent for the optimizer, which is faster version of gradient descent. Alternatively, you can use the Adam optimizer, or look into other optimizers that suit your needs.
- I'm using 20 epochs here, but this can be changed as needed. Remember that in image processing it's important to not "overfit" your data, since images vary wildly from each other.

In [7]:
cnn_model = CNN().to('cuda') # Initialize the model
loss_fn = nn.CrossEntropyLoss() # Define the loss function
learning_rate = 1e-3 # Define the learning rate
optimizer = optim.SGD(cnn_model.parameters(), learning_rate) # Define the optimizer
epochs = 20 # Define the number of epochs

In [8]:
# Training loop
def train(dataloader, model, loss_fn, optimizer):
  size = len(dataloader.dataset)
  model.train()
  for batch, (X, y) in enumerate(dataloader):
    prediction = model(X)
    loss = loss_fn(prediction, y)

    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

    if batch % 100 == 0:
      loss, current = loss.item(), (batch + 1) * len(X)
      print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

# Testing Loop
def test(dataloader, model, loss_fn):
  model.eval()
  size = len(dataloader.dataset)
  num_batches = len(dataloader)
  test_loss, correct = 0, 0

  with torch.no_grad():
    for X, y in dataloader:
      prediction = model(X)
      test_loss += loss_fn(prediction, y).item()
      correct += (prediction.argmax(1) == y).type(torch.float).sum().item()
  test_loss /= num_batches
  correct /= size
  print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [9]:
# Training and Testing the model

for t in range(epochs):
  print(f"Epoch {t+1}\n-------------------------------")
  train(train_loader, cnn_model, loss_fn, optimizer)
  test(test_loader, cnn_model, loss_fn)

Epoch 1
-------------------------------


loss: 2.313912  [   64/50000]
loss: 2.299882  [ 6464/50000]
loss: 2.300845  [12864/50000]
loss: 2.295993  [19264/50000]
loss: 2.311189  [25664/50000]
loss: 2.296562  [32064/50000]
loss: 2.298305  [38464/50000]
loss: 2.293673  [44864/50000]
Test Error: 
 Accuracy: 9.9%, Avg loss: 2.298465 

Epoch 2
-------------------------------
loss: 2.290142  [   64/50000]
loss: 2.298765  [ 6464/50000]
loss: 2.307259  [12864/50000]
loss: 2.303327  [19264/50000]
loss: 2.295066  [25664/50000]
loss: 2.292744  [32064/50000]
loss: 2.284408  [38464/50000]
loss: 2.287450  [44864/50000]
Test Error: 
 Accuracy: 12.7%, Avg loss: 2.290766 

Epoch 3
-------------------------------
loss: 2.295736  [   64/50000]
loss: 2.290703  [ 6464/50000]
loss: 2.280526  [12864/50000]
loss: 2.288830  [19264/50000]
loss: 2.278520  [25664/50000]
loss: 2.294979  [32064/50000]
loss: 2.283677  [38464/50000]
loss: 2.282469  [44864/50000]
Test Error: 
 Accuracy: 16.0%, Avg loss: 2.278873 

Epoch 4
-------------------------------
loss:

After 20 epochs, I got a total accuracy of around 42% using the Convolutional Neural Network.

## ResNet

Hyperparameters: 
- The model initialization takes some extra parameters because of how the model class is structure. We pass in the BasicBlock class defined where I created the model, with the [2, 2, 2, 2] representing how many BasicBlock layers are in each stage of the ResNet Model. This can be changed.
- All other hyperparameters are the same as what was used for the CNN. 

In [10]:
resnet_model = ResNet(BasicBlock, [2, 2, 2, 2]) # Initialize the model
loss_fn = nn.CrossEntropyLoss() # Define the loss function
learning_rate = 1e-3 # Define the learning rate
optimizer = optim.SGD(resnet_model.parameters(), learning_rate) # Define the optimizer
epochs = 20 # Define the number of epochs

In [11]:
# We will use the same train and test functions I defined in the CNN section
# This was ridiculously slow on CPU, so I had to use a GPU
for t in range(epochs):
  print(f"Epoch {t+1}\n-------------------------------")
  train(train_loader, resnet_model, loss_fn, optimizer)
  test(test_loader, resnet_model, loss_fn)

Epoch 1
-------------------------------
loss: 2.648452  [   64/50000]
loss: 1.896520  [ 6464/50000]
loss: 1.729299  [12864/50000]
loss: 1.781915  [19264/50000]
loss: 1.462353  [25664/50000]
loss: 1.658516  [32064/50000]
loss: 1.524162  [38464/50000]
loss: 1.302104  [44864/50000]
Test Error: 
 Accuracy: 48.4%, Avg loss: 1.411380 

Epoch 2
-------------------------------
loss: 1.107959  [   64/50000]
loss: 1.310847  [ 6464/50000]
loss: 1.595200  [12864/50000]
loss: 1.237674  [19264/50000]
loss: 1.510892  [25664/50000]
loss: 1.190642  [32064/50000]
loss: 1.282977  [38464/50000]
loss: 1.081852  [44864/50000]
Test Error: 
 Accuracy: 53.5%, Avg loss: 1.281970 

Epoch 3
-------------------------------
loss: 0.996550  [   64/50000]
loss: 0.842906  [ 6464/50000]
loss: 1.159510  [12864/50000]
loss: 0.872643  [19264/50000]
loss: 1.323365  [25664/50000]
loss: 0.871886  [32064/50000]
loss: 1.059438  [38464/50000]
loss: 0.866416  [44864/50000]
Test Error: 
 Accuracy: 55.8%, Avg loss: 1.241394 

Epoc

TODO: Fix the thing that calculates accuracy because what is going on