# Project Summary

Tentatively, my project aims to build a web/mobile app which allows visual detection classification of dog breeds using deep neural networks. To this end, the datasets must consist of images of various dog breeds and corresponding breed labels. As per the requirements of this mini-project, three datasets are described below. 

In [32]:
# Utility function for downloading large files

import requests
import os
from tqdm.auto import tqdm
from pathlib import Path
import math

def download_file(url, filename=None):
    if filename is None:
        filename = url.split(os.sep)[-1]
    if Path(filename).exists():
        print(f"{filename} already exists. Skipping")
        return
    # Get size of the file
    CHUNK_SIZE = 16384
    headers = requests.head(url).headers
    size = None
    if headers:
        size = headers.get('content-length', None)
        if size is not None:
            size = float(size)/CHUNK_SIZE
            size = math.ceil(size)
    with requests.get(url, stream=True) as r:
        r.raise_for_status()
        with open(filename, 'wb') as f:
            for chunk in tqdm(r.iter_content(chunk_size=16384), total=size):
                f.write(chunk)

# Stanford Dogs Dataset
The [stanford dogs dataset consists](http://vision.stanford.edu/aditya86/ImageNetDogs/) of **20,580 images of 120 different dog breeds**. The dataset consists of both bounding boxes (for object detection) as well as dog breed labels. Based on discussions and leaderboards at paperswithcode, this dataset has become an important benchmark for dog breed classification.

In [25]:
# Run cell below to download Stanford Dogs data
STANFORD_DOGS_IMAGE_URL = 'http://vision.stanford.edu/aditya86/ImageNetDogs/images.tar'
STANFORD_DOGS_ANNOTATIONS_URL = 'http://vision.stanford.edu/aditya86/ImageNetDogs/annotation.tar'
STANFORD_DOGS_SPLITS_URL = 'http://vision.stanford.edu/aditya86/ImageNetDogs/lists.tar'

In [None]:
download_file(STANFORD_DOGS_IMAGE_URL)

In [None]:
download_file(STANFORD_DOGS_ANNOTATIONS_URL)

In [None]:
download_file(STANFORD_DOGS_SPLITS_URL)

# Tsinghua Dogs Dataset

[Tsinghua University Dogs Dataset](https://cg.cs.tsinghua.edu.cn/ThuDogs/) is another important benchmark dataset for dogbreed classification and detection. This dataset consists of **70428 images of 130 different dog breeds**. Each dog breed has anywhere from 200 to 7449 images represented in this dataset and the sample sizes are roughly representative of frequencies of dog breeds found in China. 

As is the case with Stanford Dogs dataset, this dataset also consists of class labels as well as bounding boxes. 

In [27]:
# Run cell below to download Tsinghua Dogs Dataset
TSINGHUA_DOGS_LOW_RES_IMAGES_URL = 'https://cloud.tsinghua.edu.cn/f/80013ef29c5f42728fc8/?dl=1'
TSINGHUA_DOGS_LOW_RES_ANNOTATIONS_URL = 'https://cg.cs.tsinghua.edu.cn/ThuDogs/low-annotations.zip'

In [None]:
download_file(TSINGHUA_DOGS_LOW_RES_IMAGES_URL)

In [None]:
download_file(TSINGHUA_DOGS_LOW_RES_ANNOTATIONS_URL)

# Kaggle Dog Breeds Classification Dataset

This is yet another dataset for **20,000 images of 120 different breeds.**. However, a drawback of this dataset is that it only consists of labels of dog breeds and not bounding boxes. However, on the plus side, the data is pre-cropped so each image represents mostly only the dog. 

Since this dataset is associated with a Kaggle competition, downloading it programmatically requires a Kaggle account. Please follow instructions [here](https://www.kaggle.com/docs/api) on how to use the Kaggle API. 

In [None]:
!pip install kaggle

In [None]:
# Ensure you've downloaded kaggle.json based on instructions above
# The Kaggle API client expects this file to be in ~/.kaggle,
# so move it there.
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/


# This permissions change avoids a warning on Kaggle tool startup.
!chmod 600 ~/.kaggle/kaggle.json

In [None]:
#download the dataset for the dog-breed identification challenge https://www.kaggle.com/c/dog-breed-identification
!kaggle competitions download -c dog-breed-identification