Creating stanford car classifier
In this notebook I'm creating car classifier using the publicly available Stanford car dataset, which contains total of 196 classes. I'll be using a pre-trained resnet101 with transfer learning to train the model. All layers will be fine tuned. Only the last fully connected layer will be replaced and train.

I used resnet101 because this is the best model I can run with the computer resources I have.

Dataset (196 classes):

Train dataset: 8144 images, with an average: 41.5 images per class. This will later on split into 90% train and 10% validation.

Test dataset: 8041 images, with an average: 41.0 images per class.

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.models as models
import torchvision.transforms as transforms
from torchvision import datasets

from torch.utils.data import Dataset

from torch.utils import data as D

import time
import os

from os import walk

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

import os

'''
I need the filename of the image to be display in output result.
So that evaluator can tell the result belongs to which image.
The original ImageFolder class don't provide such function.
So, I overide the ImageFolder class to return the image filename.
'''
class ImageFolderWithPaths(datasets.ImageFolder):
    """Custom dataset that includes image file paths. Extends
    torchvision.datasets.ImageFolder
    """

    # override the __getitem__ method. this is the method dataloader calls
    def __getitem__(self, index):
        # this is what ImageFolder normally returns 
        original_tuple = super(ImageFolderWithPaths, self).__getitem__(index)
        # the image file path
        path = self.imgs[index][0]
        # make a new tuple that includes original and the path
        tuple_with_path = (original_tuple + (path,))
        return tuple_with_path

cuda:0
['train', 'test']


**Load the data and transform**

First, lets create some transforms for our data and load the train/validation and test data with labels from the folders.

Here I use 300 x 300 images with random horizontal flip, random rotation and normalization.



In [2]:
'''
This is the dataset directory for my Kaggle kernel. 
Please comment this line and uncomment the following line if you run it on your workstation.
'''
dataset_dir = "../input/process-car2/process_car/"

'''
This line is the dataset directory if you run it on your local workstation.
'''
#dataset_dir = ""

train_tfms = transforms.Compose([transforms.Resize((300, 300)),
                                 transforms.RandomHorizontalFlip(),
                                 transforms.RandomRotation(15),
                                 transforms.ToTensor(),
                                 transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
test_tfms = transforms.Compose([transforms.Resize((300, 300)),
                                transforms.ToTensor(),
                                transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

dataset = ImageFolderWithPaths(root=dataset_dir+"train", transform = train_tfms)

'''
This part I load image from folder and then split 90% for training and 10% for validation.
The label will be the sub folder of each image.
'''
train_len = int(0.9 * 8144)
valid_len = 8144 - train_len
train_dataset, valid_dataset = D.random_split(dataset, lengths=[train_len, valid_len])

trainloader = torch.utils.data.DataLoader(train_dataset, batch_size = 32, shuffle=True, num_workers = 2)

validloader = torch.utils.data.DataLoader(valid_dataset, batch_size = 32, shuffle=True, num_workers = 2)

test_dataset = ImageFolderWithPaths(root=dataset_dir+"test", transform = test_tfms)
testloader = torch.utils.data.DataLoader(test_dataset, batch_size = 32, shuffle=False, num_workers = 2)

**Model training function**

Here we train our model, after each epoch, we test the model on the test data to see how it's going

In [3]:
'''
This part train resnet model with default 10 epoch.
'''
def train_model(model, criterion, optimizer, scheduler, n_epochs = 10):
    
    losses = []
    accuracies = []
    test_accuracies = []
    # set the model to train mode initially
    model.train()
    for epoch in range(n_epochs):
        since = time.time()
        running_loss = 0.0
        running_correct = 0.0
        for i, data in enumerate(trainloader, 0):

            # get the inputs and assign them to cuda
            inputs, labels, _ = data
            inputs = inputs.to(device)
            labels = labels.to(device)
            optimizer.zero_grad()
            
            # forward + backward + optimize
            outputs = model(inputs)
            _, predicted = torch.max(outputs.data, 1)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            
            # calculate the loss/acc later
            running_loss += loss.item()
            running_correct += (labels==predicted).sum().item()

        epoch_duration = time.time()-since
        epoch_loss = running_loss/len(trainloader)
        epoch_acc = 100/32*running_correct/len(trainloader)
        print("Epoch %s, duration: %d s, loss: %.4f, acc: %.4f" % (epoch+1, epoch_duration, epoch_loss, epoch_acc))
        
        losses.append(epoch_loss)
        accuracies.append(epoch_acc)
        
        # switch the model to eval mode to evaluate on test data
        model.eval()
        test_acc = evaluate_model(model)
        test_accuracies.append(test_acc)
        
        # re-set the model to train mode after validating
        model.train()
        scheduler.step(test_acc)
        since = time.time()
        
    model.eval()
    get_predict(model)
    print('Finished Training')
    model.train()
    return model, losses, accuracies, test_accuracies

**Evaluate on training data**

This function is called out after each epoch of training on the training data. We then measure the accuracy of the model.

In [4]:
'''
This part evaluation the model every epoch with validation dataset.
'''
def evaluate_model(model):
    correct = 0.0
    total = 0.0
    
    with torch.no_grad():
        for i, data in enumerate(validloader, 0):
            images, labels, _ = data
            
            images = images.to(device)
            labels = labels.to(device)
            
            outputs = model_ft(images)
            _, predicted = torch.max(outputs.data, 1)
            
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    test_acc = 100.0 * correct / total
    print('Accuracy of the network on the validation images: %.2f %%' % (
        test_acc))
    return test_acc

'''
This part will get the prediction result from the model.
The result include confidence result and also the best confidence result.
'''
result = []
actual = []
result2 = []
fname = []

def get_predict(model):
    correct = 0.0
    total = 0.0
    
    with torch.no_grad():
        for i, data in enumerate(testloader, 0):
            images, labels, fnames = data
            images = images.to(device)
            labels = labels.to(device)
            
            outputs = model_ft(images)
            _, predicted = torch.max(outputs.data, 1)
            result.append(predicted.cpu().numpy())
            
            tmp_predict = outputs.data
            predicted2 = torch.nn.functional.softmax(tmp_predict)
            result2.append(predicted2.data.cpu().numpy())
            actual.append(labels.cpu().numpy())
            fname.append(fnames)
            
            #total += labels.size(0)
            #correct += (predicted == labels).sum().item()
    
    '''
    test_acc = 100.0 * correct / total
    print('Accuracy of the network on the testing images: %.2f %%' % (
        test_acc))
    print('Done getting result')
    '''

In [5]:
'''
I use pretrained resnet101 from pytorch.
The pretrained model downloaded from https://download.pytorch.org/models/resnet101-5d3b4d8f.pth
If you already have the pretrained file (.pth), you can comment the first line and uncomment the second line to use it.
Please make sure the pth file is at the same directory as this file.
'''
model_ft = models.resnet101(pretrained=True)
#model_ft = torch.load("resnet101-5d3b4d8f.pth")

num_ftrs = model_ft.fc.in_features

# replace the last fc layer with an untrained one (requires grad by default)
model_ft.fc = nn.Linear(num_ftrs, 196)
model_ft = model_ft.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model_ft.parameters(), lr=0.01, momentum=0.9)

"""
probably not the best metric to track, but we are tracking the training accuracy and measuring whether
it increases by atleast 0.9 per epoch and if it hasn't increased by 0.9 reduce the lr by 0.1x.
However in this model it did not benefit me.
"""
lrscheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='max', patience=3, threshold = 0.9)

Downloading: "https://download.pytorch.org/models/resnet101-5d3b4d8f.pth" to /tmp/.torch/models/resnet101-5d3b4d8f.pth
178728960it [00:06, 26866835.12it/s]


Start training with 50 epochs

In [6]:
model_ft, training_losses, training_accs, test_accs = train_model(model_ft, criterion, optimizer, lrscheduler, n_epochs=50)

Epoch 1, duration: 131 s, loss: 3.7730, acc: 19.1576
Accuracy of the network on the validation images: 35.71 %
Epoch 2, duration: 130 s, loss: 1.4789, acc: 60.7609
Accuracy of the network on the validation images: 60.86 %
Epoch 3, duration: 130 s, loss: 0.8395, acc: 76.5217
Accuracy of the network on the validation images: 70.31 %
Epoch 4, duration: 130 s, loss: 0.5460, acc: 84.7690
Accuracy of the network on the validation images: 78.65 %
Epoch 5, duration: 130 s, loss: 0.3956, acc: 89.3478
Accuracy of the network on the validation images: 83.93 %
Epoch 6, duration: 130 s, loss: 0.2904, acc: 92.1467
Accuracy of the network on the validation images: 84.17 %
Epoch 7, duration: 130 s, loss: 0.2206, acc: 94.2391
Accuracy of the network on the validation images: 85.89 %
Epoch 8, duration: 131 s, loss: 0.1041, acc: 97.5951
Accuracy of the network on the validation images: 91.66 %
Epoch 9, duration: 130 s, loss: 0.0725, acc: 98.2473
Accuracy of the network on the validation images: 92.52 %
E



Finished Training


In [7]:
'''
This is the image filename for all the prediction result.
'''
flatten = []
for sublist in fname:
    for item in sublist:
        tmp_item = item.split("/")
        flatten.append(tmp_item[-1])

'''
This is the best confidence result together with the filename of the image.
'''
final = []
for sublist in result:
    for item in sublist:
        final.append(item)

submission = pd.DataFrame.from_dict({
    'filename': flatten,
    'prediction': final
})

submission.to_csv('best.csv', index=False)
        
'''
This is the confidence result for all the 196 classes together with the filename of the image.
'''
final2 = []
for sublist in result2:
    for item in sublist:
        final2.append(item)

submission = pd.DataFrame.from_dict({
    'filename': flatten,
    'prediction': final2
})

submission.to_csv('confidence.csv', index=False)