# Dog Breed Identification from Kaggle
# Author: jz1g17@soton.ac.uk

# Briefy Introduction
Who's a good dog? Who likes ear scratches? Well, it seems those fancy deep neural networks don't have all the answers. However, maybe they can answer that ubiquitous question we all ask when meeting a four-legged stranger: what kind of good pup is that?

In this playground competition, you are provided a strictly canine subset of [ImageNet](https://www.kaggle.com/c/imagenet-object-detection-challenge) in order to practice fine-grained image categorization. How well you can tell your Norfolk Terriers from your Norwich Terriers? With 120 breeds of dogs and a limited number training images per class, you might find the problem more, err, ruff than you anticipated.
![iamge](https://raw.githubusercontent.com/Trouble404/Kaggle-Dog-breed-Identification/master/readme_pic_add/border_collies.png)

# Technical use
**Gluon**

Gluon is the high-level interface for [MXNet](https://mxnet.apache.org/). It is more intuitive and easier to use than the lower level interface. Gluon supports dynamic (define-by-run) graphs with JIT-compilation to achieve both flexibility and efficiency.

# Step 1: re-structure data for loading images by gluon
![image](https://raw.githubusercontent.com/Trouble404/Kaggle-Dog-breed-Identification/master/readme_pic_add/loading.png)


In [17]:
import shutil # opreate file
import os # opreate folder and files
import pandas as pd

** [shutil](https://www.jianshu.com/p/b4c87aa6fd24) 是一种高层次的文件操作工具
类似于高级API，而且主要强大之处在于其对文件的复制与删除操作更是比较支持好。**

**[os](http://blog.51cto.com/pmghong/1353340) 主要与操作系统打交道的**

** Adjust each class in one folder with link of image**

1.1 for tranning data -> need run in **admin**

In [19]:
%%time
df = pd.read_csv("labels.csv") #df-> 10222 rows × 2 columns
path = 'for_train'

if os.path.exists(path): # for_train floder exists
    #os.removedirs(path) # delete path file
    shutil.rmtree(path) # delete path file
    osmakedirs(path) # create for_train folder
    
for i, (idNo, breed) in df.iterrows():
    folderpath = path + '/' + breed
    if not os.path.exists(folderpath):
        os.makedirs(folderpath)
    os.symlink('train/%s.jpg' %idNo, '%s/%s.jpg' %(folderpath, idNo)) # create a soft link to oringal image in train floder to new structul folder

Wall time: 6.86 s


1.2 for testing data -> need run in **admin**

In [23]:
%%time
df = pd.read_csv("sample_submission.csv") #df-> 10357 rows × 121 columns
path = 'for_test'
breed = '0' # unclassifcate dogs

if os.path.exists(path): # for_train floder exists
    #os.removedirs(path) # delete path file
    shutil.rmtree(path) # delete path file
    osmakedirs(path) # create for_train folder
    
for idNo in df['id']:
    folderpath = path + '/' + breed
    if not os.path.exists(folderpath):
        os.makedirs(folderpath)
    os.symlink('test/%s.jpg' %idNo, '%s/%s.jpg' %(folderpath, idNo)) # create a soft link to oringal image in train floder to new structul folder

Wall time: 6.01 s


** Re-structured image of dogs**
![image](https://raw.githubusercontent.com/Trouble404/Kaggle-Dog-breed-Identification/master/readme_pic_add/re-s1.PNG)
![image](https://raw.githubusercontent.com/Trouble404/Kaggle-Dog-breed-Identification/master/readme_pic_add/re-s2.PNG)

# Step 2: Loading data to gluon and obtain features of images
** Extract features based on [Gluon Model Zoo](https://mxnet.incubator.apache.org/versions/master/api/python/gluon/model_zoo.html)**

In [32]:
import mxnet as mx
from mxnet import autograd
from mxnet import gluon
from mxnet import image
from mxnet import init
from mxnet import nd
from mxnet.gluon.data import vision
from mxnet.gluon.model_zoo import vision as models
import numpy as np
from tqdm import tqdm


import matplotlib.pyplot as plt
# will make the plot outputs appear and be stored within the notebook.
%matplotlib inline 
# define the wirte lengthy setting
%config InlineBackend.figure_format = 'retina' 

import warnings
warnings.filterwarnings("ignore")

**[Image API in MXNet](https://mxnet.incubator.apache.org/api/python/image/image.html)**

**[Gluon Data API](https://mxnet.incubator.apache.org/api/python/gluon/data.html?highlight=imagefolderdataset#mxnet.gluon.data.vision.ImageFolderDataset)**

**[Fine-tuning: 通过微调来迁移学习](http://zh.gluon.ai/chapter_computer-vision/fine-tuning.html)**

2.1 Define the pre-processing function

In [34]:
ctx = mx.gpu()

preprocessing = [
    image.ForceResizeAug((224,224)),
    image.ColorNormalizeAug(mean=nd.array([0.485, 0.456, 0.406]), std=nd.array([0.229, 0.224, 0.225]))
]

def transform(data, label):
    data = data.astype('float32') / 255
    for pre in preprocessing:
        data = pre(data)
    
    data = nd.transpose(data, (2,0,1))
    return data, nd.array([label]).asscalar().astype('float32')

2.2 定义导出特征向量的函数

In [35]:
def get_features(net, data):
    features = []
    labels = []

    for X, y in tqdm(data):
        feature = net.features(X.as_in_context(ctx))
        features.append(feature.asnumpy())
        labels.append(y.asnumpy())
    
    features = np.concatenate(features, axis=0)
    labels = np.concatenate(labels, axis=0)
    return features, labels

2.3 obtain feature vector

In [40]:
%%time
preprocessing[0] = image.ForceResizeAug((224,224))
imgs = vision.ImageFolderDataset('for_train', transform=transform)
data = gluon.data.DataLoader(imgs, 64)

features_vgg, labels = get_features(models.vgg16_bn(pretrained=True, ctx=ctx), data)
features_resnet, _ = get_features(models.resnet152_v1(pretrained=True, ctx=ctx), data)
features_densenet, _ = get_features(models.densenet161(pretrained=True, ctx=ctx), data)

Model file is not found. Downloading.
Downloading C:\Users\MSI\.mxnet\models\vgg16_bn-6b9dbe61.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/vgg16_bn-6b9dbe61.zip...


MXNetError: [01:44:49] C:\projects\mxnet-distro-win\mxnet-build\src\ndarray\ndarray.cc:565: GPU is not enabled