##  Dog Breed Identification (ImageNet Dogs）

This notebook is for Kaggle's [Dog Breed Identification](https://www.kaggle.com/c/dog-breed-identification) contest and can be found here: https://www.kaggle.com/c/dog-breed-identification 

The purposeis to identify 120 different breeds using data from the famous ImageNet Dogs.

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import torchvision.models as models
import os
import shutil
import time
import pandas as pd
import random

In [2]:
random.seed(210)
torch.manual_seed(21)
torch.cuda.manual_seed(21)

### Download and load dataset

Downloaded dataset look like this:
```
| Dog Breed Identification
    | train
    |   | 000bec180eb18c7604dcecc8fe0dba07.jpg
    |   | 00a338a92e4e7bf543340dc849230e75.jpg
    |   | ...
    | test
    |   | 00a3edd22dc7859c487a64777fc8d093.jpg
    |   | 00a6892e5c7f92c1f465e213fd904582.jpg
    |   | ...
    | labels.csv
    | sample_submission.csv
```

The data is divided into a training set and testing set. The training set contains $10,222$ images and the testing set contains $10,357$ images. The images in both sets are in JPEG format. These images contain three RGB channels and they have different heights and widths. The file name of each image is a unique id. *labels.csv* contains the labels of the training set images. The file contains 10,222 rows, each row contains two columns, the first column is the image id, and the second column is the dog breed. There are 120 breeds of dogs in the training set. 

* Split the validation dataset from the training set to tune the hyperparams. After partitioning, the dataset should contain 4 parts: the partitioned training set, the partitioned validation set, the full training set, and the full test set

* For 4 parts, create 4 folders: train, valid, train_valid, test. In the above folders, a folder is created for each category, and images belonging to the category are stored therein. The labels of the first three parts are known, so there are 120 subfolders each, and the labels of the test set are unknown, so only one subfolder named unknown is created to store all test data.


So the sorted dataset structure would be like:

```
| train_valid_test
    | train
    |   | affenpinscher
    |   |   | 00ca18751837cd6a22813f8e221f7819.jpg
    |   |   | ...
    |   | afghan_hound
    |   |   | 0a4f1e17d720cdff35814651402b7cf4.jpg
    |   |   | ...
    |   | ...
    | valid
    |   | affenpinscher
    |   |   | 56af8255b46eb1fa5722f37729525405.jpg
    |   |   | ...
    |   | afghan_hound
    |   |   | 0df400016a7e7ab4abff824bf2743f02.jpg
    |   |   | ...
    |   | ...
    | train_valid
    |   | affenpinscher
    |   |   | 00ca18751837cd6a22813f8e221f7819.jpg
    |   |   | ...
    |   | afghan_hound
    |   |   | 0a4f1e17d720cdff35814651402b7cf4.jpg
    |   |   | ...
    |   | ...
    | test
    |   | unknown
    |   |   | 00a3edd22dc7859c487a64777fc8d093.jpg
    |   |   | ...
```

In [4]:
data_dir = '/home/kesci/input/Kaggle_Dog6357/dog-breed-identification' 
label_file, train_dir, test_dir = 'labels.csv', 'train', 'test'  
new_data_dir = './train_valid_test'  

valid_ratio = 0.1  # for validation dataset

In [5]:
def mkdir_if_not_exist(path):
    if not os.path.exists(os.path.join(*path)):
        os.makedirs(os.path.join(*path))
        
def reorg_dog_data(data_dir, label_file, train_dir, test_dir, new_data_dir, valid_ratio):
    # load labels for training data
    labels = pd.read_csv(os.path.join(data_dir, label_file))
    id2label = {Id: label for Id, label in labels.values}  # (key: value): (id: label)

    # shuffle traning data
    train_files = os.listdir(os.path.join(data_dir, train_dir))
    random.shuffle(train_files)    

    valid_ds_size = int(len(train_files) * valid_ratio) 
    
    for i, file in enumerate(train_files):
        img_id = file.split('.')[0]  # file is string with id.jpg type
        img_label = id2label[img_id]
        if i < valid_ds_size:
            mkdir_if_not_exist([new_data_dir, 'valid', img_label])
            shutil.copy(os.path.join(data_dir, train_dir, file),
                        os.path.join(new_data_dir, 'valid', img_label))
        else:
            mkdir_if_not_exist([new_data_dir, 'train', img_label])
            shutil.copy(os.path.join(data_dir, train_dir, file),
                        os.path.join(new_data_dir, 'train', img_label))
        mkdir_if_not_exist([new_data_dir, 'train_valid', img_label])
        shutil.copy(os.path.join(data_dir, train_dir, file),
                    os.path.join(new_data_dir, 'train_valid', img_label))

    # test set
    mkdir_if_not_exist([new_data_dir, 'test', 'unknown'])
    for test_file in os.listdir(os.path.join(data_dir, test_dir)):
        shutil.copy(os.path.join(data_dir, test_dir, test_file),
                    os.path.join(new_data_dir, 'test', 'unknown'))

In [6]:
reorg_dog_data(data_dir, label_file, train_dir, test_dir, new_data_dir, valid_ratio)

### Image Augmentation

In [7]:
transform_train = transforms.Compose([
    # scale size of 0.08~1 of original images,
    # keep height and width ratio at 3/4~4/3，
    # crop to 2248224 pixel new image
    transforms.RandomResizedCrop(224, scale=(0.08, 1.0),  
                                 ratio=(3.0/4.0, 4.0/3.0)),
    # flip half of the data
    transforms.RandomHorizontalFlip(),
    # change params randomly to add noise to data 
    transforms.ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4),
    transforms.ToTensor(), # change to tensor
    # using mean(0.485, 0.456, 0.406) and std(0.229, 0.224, 0.225) normalize on channels
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) 
])


# no random noise on test set
transform_test = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224), # crop the center square image with 224*224
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

In [8]:
train_ds = torchvision.datasets.ImageFolder(root=os.path.join(new_data_dir, 'train'),
                                            transform=transform_train)
valid_ds = torchvision.datasets.ImageFolder(root=os.path.join(new_data_dir, 'valid'),
                                            transform=transform_test)
train_valid_ds = torchvision.datasets.ImageFolder(root=os.path.join(new_data_dir, 'train_valid'),
                                            transform=transform_train)
test_ds = torchvision.datasets.ImageFolder(root=os.path.join(new_data_dir, 'test'),
                                            transform=transform_test)

In [9]:
batch_size = 128

train_iter = torch.utils.data.DataLoader(train_ds, batch_size=batch_size, shuffle=True)
valid_iter = torch.utils.data.DataLoader(valid_ds, batch_size=batch_size, shuffle=True)
train_valid_iter = torch.utils.data.DataLoader(train_valid_ds, batch_size=batch_size, shuffle=True)
test_iter = torch.utils.data.DataLoader(test_ds, batch_size=batch_size, shuffle=False)  # shuffle=False

## Build Model

Since this dataset belongs to a subset of the ImageNet dataset. We will use the fine-tuning method, selecting the pre-trained model on the complete ImageNet dataset to extract image features as input to a custom small-scale output network.

Here, we use the pre-trained **ResNet-34 model** to directly reuse the input of the pre-trained model at the output layer, that is, the extracted features, and then we redefine the output layer. Then we only train the params of the redefined output layer. For the part used for feature extraction, we retain the params of the pre-trained model.

In [10]:
def get_net(device): # build model using pre-trained resnet-34
    finetune_net = models.resnet34(pretrained=False)  # get pre-trained resnet-34 model
    finetune_net.load_state_dict(torch.load('/home/kesci/input/resnet347742/resnet34-333f7ec4.pth'))
    for param in finetune_net.parameters():  # freeze params
        param.requires_grad = False
        
    # the original finetune_net.fc is a FC with 512 input and 1000 output
    # reset params in finetuen_net.fc to match with our features and labels
    finetune_net.fc = nn.Sequential(
        nn.Linear(in_features=512, out_features=256),
        nn.ReLU(),
        nn.Linear(in_features=256, out_features=120)  # 120 is output label class
    )
    return finetune_net

In [11]:
def evaluate_loss_acc(data_iter, net, device): # calculate avg loss and accuracy on data_iter
    loss = nn.CrossEntropyLoss()
    is_training = net.training  # check if Bool net in train mode
    net.eval()
    l_sum, acc_sum, n = 0, 0, 0
    
    with torch.no_grad():
        for X, y in data_iter:
            X, y = X.to(device), y.to(device)
            y_hat = net(X)
            l = loss(y_hat, y)
            l_sum += l.item() * y.shape[0]
            acc_sum += (y_hat.argmax(dim=1) == y).sum().item()
            n += y.shape[0]

    net.train(is_training)  # reset net back to train/eval mode
    
    return l_sum / n, acc_sum / n

In [12]:
def train(net, train_iter, valid_iter, num_epochs, lr, wd, device, lr_period,
          lr_decay):
              
    loss = nn.CrossEntropyLoss()
    optimizer = optim.SGD(net.fc.parameters(), lr=lr, momentum=0.9, weight_decay=wd)
    net = net.to(device)
    
    for epoch in range(num_epochs):
        train_l_sum, n, start = 0.0, 0, time.time()
        if epoch > 0 and epoch % lr_period == 0:  # decay lr every (lr_period) epoch
            lr = lr * lr_decay
            for param_group in optimizer.param_groups:
                param_group['lr'] = lr
                
        for X, y in train_iter:
            X, y = X.to(device), y.to(device)
            optimizer.zero_grad()
            y_hat = net(X)
            l = loss(y_hat, y)
            l.backward()
            optimizer.step()
            train_l_sum += l.item() * y.shape[0]
            n += y.shape[0]
            
        time_s = "time %.2f sec" % (time.time() - start)
        
        if valid_iter is not None:
            valid_loss, valid_acc = evaluate_loss_acc(valid_iter, net, device)
            epoch_s = ("epoch %d, train loss %f, valid loss %f, valid acc %f, "
                       % (epoch + 1, train_l_sum / n, valid_loss, valid_acc))
        else:
            epoch_s = ("epoch %d, train loss %f, "
                       % (epoch + 1, train_l_sum / n))
                       
        print(epoch_s + time_s + ', lr ' + str(lr))

In [13]:
# params setting
num_epochs, lr_period, lr_decay = 20, 10, 0.1
lr, wd = 0.03, 1e-4
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

### Train model

In [None]:
net = get_net(device)
train(net, train_iter, valid_iter, num_epochs, lr, wd, device, lr_period, lr_decay)

In [15]:
net = get_net(device)
train(net, train_valid_iter, None, num_epochs, lr, wd, device, lr_period, lr_decay)

epoch 1, train loss 3.231950, time 872.22 sec, lr 0.03
epoch 2, train loss 1.471955, time 867.12 sec, lr 0.03
epoch 3, train loss 1.274351, time 852.87 sec, lr 0.03
epoch 4, train loss 1.208349, time 861.00 sec, lr 0.03
epoch 5, train loss 1.157397, time 876.11 sec, lr 0.03
epoch 6, train loss 1.116189, time 879.29 sec, lr 0.03
epoch 7, train loss 1.113411, time 882.93 sec, lr 0.03
epoch 8, train loss 1.085275, time 879.36 sec, lr 0.03
epoch 9, train loss 1.070930, time 874.22 sec, lr 0.03
epoch 10, train loss 1.027964, time 858.18 sec, lr 0.03
epoch 11, train loss 0.931459, time 877.53 sec, lr 0.003
epoch 12, train loss 0.891806, time 877.98 sec, lr 0.003
epoch 13, train loss 0.885535, time 879.91 sec, lr 0.003
epoch 14, train loss 0.870792, time 878.42 sec, lr 0.003
epoch 15, train loss 0.878356, time 877.15 sec, lr 0.003
epoch 16, train loss 0.847840, time 856.64 sec, lr 0.003
epoch 17, train loss 0.873435, time 877.29 sec, lr 0.003
epoch 18, train loss 0.844666, time 860.16 sec, lr

### Prediction

In [15]:
preds = []

for X, _ in test_iter:
    X = X.to(device)
    output = net(X)
    output = torch.softmax(output, dim=1)
    preds += output.tolist()

ids = sorted(os.listdir(os.path.join(new_data_dir, 'test/unknown')))

In [None]:
# write and save our model output
with open('submission.csv', 'w') as f:
    f.write('id,' + ','.join(train_valid_ds.classes) + '\n')
    for i, output in zip(ids, preds):
        f.write(i.split('.')[0] + ',' + ','.join(
            [str(num) for num in output]) + '\n')
            