# Veggie16

An exploration of how to adapt the VGG-16 CNN architecture, and modify the structure of the classification layer. Explores how to transfer learn and freeze the weights for the convolutional layers.

The developed Veggie16 is evaluated on the PyTroch hymenoptera dataset.

This notebook is inspired by:
- [CS284A CNN example](https://github.com/xhxuciedu/CS284A/blob/master/convolutional_neural_net.ipynb)
- [PyTorch transfer learning tutorial](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html)

Make sure the runtime type of this notebook has a GPU.

In [13]:
import os
import time
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
from torch.utils import data
import torchvision
from torchvision import datasets, models, transforms
from torchvision.models import vgg16

### Use GPU if available

In [2]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print('Device:', device)

Device: cuda:0


### Download, Transform, and Load Dataset

1) Download sub-sample of ImageNet dataset

In [3]:
os.chdir('/content/sample_data')

In [4]:
! wget https://download.pytorch.org/tutorial/hymenoptera_data.zip
! unzip hymenoptera_data.zip
! cd hymenoptera_data && echo "Downloaded and extracted dataset to: $(pwd)"

--2020-12-04 15:47:39--  https://download.pytorch.org/tutorial/hymenoptera_data.zip
Resolving download.pytorch.org (download.pytorch.org)... 13.225.97.95, 13.225.97.65, 13.225.97.116, ...
Connecting to download.pytorch.org (download.pytorch.org)|13.225.97.95|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 47286322 (45M) [application/zip]
Saving to: ‘hymenoptera_data.zip’


2020-12-04 15:47:41 (35.3 MB/s) - ‘hymenoptera_data.zip’ saved [47286322/47286322]

Archive:  hymenoptera_data.zip
   creating: hymenoptera_data/
   creating: hymenoptera_data/train/
   creating: hymenoptera_data/train/ants/
  inflating: hymenoptera_data/train/ants/0013035.jpg  
  inflating: hymenoptera_data/train/ants/1030023514_aad5c608f9.jpg  
  inflating: hymenoptera_data/train/ants/1095476100_3906d8afde.jpg  
  inflating: hymenoptera_data/train/ants/1099452230_d1949d3250.jpg  
  inflating: hymenoptera_data/train/ants/116570827_e9c126745d.jpg  
  inflating: hymenoptera_data/train/ants/12

In [5]:
data_dir = '/content/sample_data/hymenoptera_data'

2) Define transforms

Necessary Steps:
- Do a random crop & rotation of the image
- Convert image to `torch.Tensor`
- Normalize image pixel values channel-wise

*Note: The original ResNet architecture takes images that are $224 \times 224 \times c$.*  

In [6]:
data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

3) Create `DataLoader` to read the images in batches.

*Note: For the pursposes of this experiment, we're only shuffling the training
dataset.*

In [7]:
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                                          data_transforms[x])
                  for x in ('train', 'val')}
dataloaders = {x: data.DataLoader(image_datasets[x], batch_size=4,
                                  shuffle=(x =='train'), num_workers=4)
              for x in ('train', 'val')}
dataset_sizes = {x: len(image_datasets[x]) for x in ('train', 'val')}
class_names = image_datasets['train'].classes

### Define training hyperparameters

In [17]:
num_epochs = 25
num_classes = 2
batch_size = 100
learning_rate = 0.001

### Define a CNN based on VGG-16

The original VGG-16 model looks like this:

```
VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU(inplace=True)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace=True)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace=True)
    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU(inplace=True)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU(inplace=True)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace=True)
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace=True)
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace=True)
    (5): Dropout(p=0.5, inplace=False)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)
```
We will modifier the fully-connected layers (`model.classifer`) only

Define a model that uses the VGG-16 architecture and transfer learns the weights for the convolutional layers.

In [24]:
class Veggie16(nn.Module):
    """A model that adapts the VGG-16 architecture.

    This network applies transfer learning to learn the parameters
    of VGG-16, and freezes those layers of the model. The classification
    layer of the architecture is modified and will be retrained to 
    predict the desired number of output classes.
    """

    def __init__(self, num_classes):
        """Creates a Veggie16 network.

        Args:
            num_classes - The number of output classes to predict
        """
        super(Veggie16, self).__init__()
        # Load a pre-trained VGG-16 model and turn off autograd
        # so its weights won't change.
        architecture = vgg16(pretrained=True)
        for layer in architecture.parameters():
            layer.requires_grad = False
        # Copy the convolutional layers of the model.
        self.features = architecture.features
        # Copy the average pooling layer of the model.
        self.avgpool = architecture.avgpool
        # Define a new block of fully-connected layers for the model.
        in_ftrs = architecture.classifier[0].in_features
        self.classifier = nn.Sequential(
            nn.Linear(in_features=in_ftrs, out_features=2048, bias=True),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5, inplace=False),
            nn.Linear(in_features=2048, out_features=2048, bias=True),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5, inplace=False),
            nn.Linear(in_features=2048, out_features=num_classes, bias=True)
        )
    
    def forward(self, x):
        """Does a forward pass on an image x."""
        out = self.features(x)
        out = self.avgpool(out)
        out = torch.flatten(out, 1)
        out = self.classifier(out)
        return out

model = Veggie16(num_classes).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

In [25]:
print(model)

Veggie16(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilat

### Train `Veggie16` model on Hymenoptera Dataset

In [28]:
def train(model, criterion, optimizer, scheduler, epochs=num_epochs):
    model.train()
    train_loader = dataloaders['train']
    since = time.time()
    num_steps = len(train_loader)
    for epoch in range(1, num_epochs+1):
        for i, (images, labels) in enumerate(train_loader, start=1):
            images = images.to(device)
            labels = labels.to(device)
            # Generate prediciton and evaluate
            outputs = model(images)
            loss = criterion(outputs, labels)
            # Backpropagate loss and update weights
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            # Print epoch every 15 iterations
            if i % 15 == 0:
                print(f'Epoch [{epoch}/{num_epochs}], Step [{i}/{num_steps}], Loss: {loss.item():.6f}')
    # Print training time
    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))

In [29]:
train(model, criterion, optimizer, scheduler=lr_scheduler, epochs=num_epochs)

Epoch [1/25], Step [10/61], Loss: 0.000000
Epoch [1/25], Step [20/61], Loss: 0.005359
Epoch [1/25], Step [30/61], Loss: 0.000623
Epoch [1/25], Step [40/61], Loss: 0.000057
Epoch [1/25], Step [50/61], Loss: 0.000000
Epoch [1/25], Step [60/61], Loss: 0.000000
Epoch [2/25], Step [10/61], Loss: 1.086780
Epoch [2/25], Step [20/61], Loss: 0.000000
Epoch [2/25], Step [30/61], Loss: 0.000000
Epoch [2/25], Step [40/61], Loss: 3.992504
Epoch [2/25], Step [50/61], Loss: 0.000000
Epoch [2/25], Step [60/61], Loss: 0.000000
Epoch [3/25], Step [10/61], Loss: 0.000000
Epoch [3/25], Step [20/61], Loss: 0.000000
Epoch [3/25], Step [30/61], Loss: 0.002405
Epoch [3/25], Step [40/61], Loss: 0.001372
Epoch [3/25], Step [50/61], Loss: 0.000000
Epoch [3/25], Step [60/61], Loss: 0.000008
Epoch [4/25], Step [10/61], Loss: 0.000000
Epoch [4/25], Step [20/61], Loss: 0.000000
Epoch [4/25], Step [30/61], Loss: 0.000000
Epoch [4/25], Step [40/61], Loss: 0.000000
Epoch [4/25], Step [50/61], Loss: 15.570190
Epoch [4/2

### Evaluate the effectiveness of `Veggie16`

In [40]:
model.eval()
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in dataloaders['val']:
        images = images.to(device)
        labels = labels.to(device)
        
        outputs = model(images)
        predictions = torch.argmax(outputs, dim=1)

        total += labels.size(0)
        correct += (predictions == labels).sum().item()
    print(f'Test Accuracy of Veggie16: {(100*(correct/total)):.6f}')

Test Accuracy of Veggie16: 95.424837
