__Complete all sub-tasks marked with ## TO DO! ## and submit the filled notebook on OLAT__ \
__Using a GPU is recommended here__

### Transfer Learning ###
Aim of this notebook is to implement the concept of transfer learning to train a bigger dataset. We try to compete on a well-known competiton on Kaggle known as Dog Breeds Identification. Read more about it here:

https://www.kaggle.com/c/dog-breed-identification/overview



To train a model on the Dog breeds dataset using transfer learning and submit your results to Kaggle.
Note: Below notebook gives some tips to run the code in pytorch. 

In [None]:
%matplotlib inline
%reload_ext autoreload
%autoreload 2

In [None]:
import torch
import torch.nn as nn
import torch.backends.cudnn as cudnn
import matplotlib.pyplot as plt
import pandas as pd
import os
import sys
import shutil
import collections
import torchvision
import torchvision.transforms as transforms

In [None]:
use_cuda = torch.cuda.is_available()

In [None]:
from AlexNet import AlexNet
from train_test import start_train_test

In [None]:
####################################################################################################
## TO DO! : Register on Kaggle With Your repective GroupName  (For example: WS19_VDL_GROUP_01)    ##
####################################################################################################

In [None]:
####################################################################################################
## TO DO! : Download the Dog-Breeds dataset in folder "data"                                      ##
## from the Kaggle competition link mentioned above                                               ##
####################################################################################################
if os.path.exists("./data"):
    os.chdir("data")
    files = ["train.zip", "test.zip"]
    for f in files:
        os.system("unzip {}".format(f));
    os.chdir("..")
else:
    print("data folder not found")
    


In [None]:
# base directory for data
base_dir = "./data"
# train, test directory relative to base directory
train_dir = "train"
test_dir = "test"
# the csv file with filenames and there labels
label_file = "labels.csv"
# dictionary containing labels
label_dict = {}

In [None]:
####################################################################################################
## Function to convert data into imagenet format                                                  ##
####################################################################################################
def convert_dataset(base_dir, label_file, train_dir):
    try:
        with open(os.path.join(base_dir, label_file), "r") as lf:
            for line in lf:
                line = line[:-1].split(",")
                label_dict[line[0]] = line[1]
                path = os.path.join(base_dir, train_dir, line[1])
                if not os.path.exists(path):
                    #make the directories for each breed
                    os.mkdir(path)
        lf.close()

        # copy the files to the respectibe folders
        for tf in os.listdir(os.path.join(base_dir, train_dir)):
            if tf.split(".")[0] in label_dict.keys():
                shutil.move(os.path.join(base_dir, train_dir, tf),\
                            os.path.join(base_dir, train_dir,\
                            label_dict[tf.split(".")[0]]))
    except BaseException as err:
        print("Error : {}".format(err))
            
convert_dataset(base_dir, label_file, train_dir)

In [None]:
####################################################################################################
## TO DO! : Make your dataset to and dataloaders for the  test data                                ##
####################################################################################################
from transform import transform_training

trainset = torchvision.datasets.ImageFolder(root = os.path.join(base_dir, train_dir), transform=transform_training())

In [None]:
####################################################################################################
## TO DO! : Split train data into 20% validation set and make dataloaders for train and val split ##
####################################################################################################
# train/validation split
val_data, train_data = torch.utils.data.random_split(trainset, [2044, 8178])

# create dataloaders
trainloader = torch.utils.data.DataLoader(train_data, batch_size=4,
                                          shuffle=True, num_workers=2)
testloader =  torch.utils.data.DataLoader(val_data, batch_size=4,
                                          shuffle=True, num_workers=2)

__Train famous Alexnet model on Dog breeds dataset. It is not easy to train the alexnet model from 
scratch on the Dog breeds data itself. Curious minds can try for once to train Alexnet from scratch. We adopt Transfer Learning here. We 
obtain a pretrained Alexnet model trained on Imagenet and apply transfer learning to it to get better results.__

## Transfer Learning

In [None]:
####################################################################################################
## TO DO! :  Freeze the weigths of the pretrained alexnet model and change the last classification layer
##from 1000 classes of Imagenet to 120 classes of Dog Breeds, only classification layer should be 
## unfreezed and trainable                                                                        ##
####################################################################################################
import torchvision.models as models
pretrained_alexnet = models.alexnet(pretrained=True)
criterion = nn.CrossEntropyLoss()

# reinitilize the last layer to 120 to fit the number of classes
pretrained_alexnet.classifier[6] = nn.Linear(4096, 120)

# Below function will directly train your network with the given parameters to 5 epochs
# You are also free to use function learned in task 1 to train your model here 
train_loss, test_loss = start_train_test(pretrained_alexnet, trainloader, testloader, criterion)

## Making Kaggle Submission

In [None]:
from transform import transform_testing
import PIL.Image
import torch.nn.functional as F
import numpy as np

In [None]:
### Not So optimal Code: This can take upto 2 minutes to run: You are free to make an optimal version :) ###
# It iterates over all test images to compute the softmax probablities from the last layer of the network
augment_image = transform_testing()
test_data_root = 'data/dog_breeds/test/' 
test_image_list = os.listdir(test_data_root) # list of test files 
result = []
for img_name in test_image_list:
    img = PIL.Image.open(test_data_root + img_name)
    img_tensor = augment_image(img)
    with torch.no_grad():
        output = pretrained_resnet(img_tensor.unsqueeze_(0).cuda())
        probs = F.softmax(output, dim=1)
    result.append(probs.cpu().numpy())
all_predictions = np.concatenate(result)
print(all_predictions.shape)

In [None]:
df = pd.DataFrame(all_predictions)
file_list = os.listdir('data/dog_data_imagenet/train') # list of classes to be provided here
df.columns = sorted(file_list)

# insert clean ids - without folder prefix and .jpg suffix - of images as first column
test_data_root = 'data/dog_breeds/test/' # list of all test files here
test_image_list = os.listdir(test_data_root)
df.insert(0, "id", [e[:-4] for e in test_image_list])
df.to_csv(f"sub_1_alexnet.csv", index=False)

### TO DO!: ###
Submit the created CSV file to Kaggle, with a score(cross entropy loss) not more than __2.0__\
Take a snapshot of your rank on Kaggle Public Leaderboard and include the image here ...
For example :
![title](snp2.png)

## CHALLENGE  (optional)
Compete against each other, Come up with creative ideas. Try beating the score of __0.3__. The group with minimum score gets a small prize at the time when the solutions are discussed. 


__Hints:__

1. Instead of Alexnet use pretrained resnet 18 model for better accuracy
2. Instead of a just adding the last classification layer, try adding two layers to get a better loss
3. Train some more layers at the end of the network with a very very small learning rate
4. Add Batch Normalizations or Dropout to the layers you have added, (If not present)
5. Add more augmentation to your dataset, see tranform.py file and use auto autoaugment to apply more rigorous data augmentation techniques