# Dog Breed Classifier

Note: This notebook is written down for easier production deployment scenario. So I skip any data exploration step and model training step.

## System Diagram

![](system-diagram.png)

- Dog Detection Module: VGG-16 pretrained model
- Dog Breed Classifier: Big Transfer(BiT) pretrained model with fine-tuning

**Note: Both modules only use CPU for inference.**

## Import Libraries

In [1]:
!pip install torch torchvision



In [2]:
!pip install torchsummary



In [3]:
import torch
import torchvision.models as models
import torchvision.transforms as transforms
from torchvision import datasets
from torch.utils.data import DataLoader
from dog_breed_classification.dog_breed_classifier import ResNetV2
from torchsummary import summary
import numpy as np
from glob import glob
from tqdm import tqdm
from PIL import Image
from PIL import ImageFile

## Import Datasets

In [4]:
!wget https://s3-us-west-1.amazonaws.com/udacity-aind/dog-project/dogImages.zip
!unzip dogImages.zip
!rm dogImages.zip

--2020-06-08 09:10:45--  https://s3-us-west-1.amazonaws.com/udacity-aind/dog-project/dogImages.zip
Resolving s3-us-west-1.amazonaws.com (s3-us-west-1.amazonaws.com)... 52.219.116.1
Connecting to s3-us-west-1.amazonaws.com (s3-us-west-1.amazonaws.com)|52.219.116.1|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1132023110 (1.1G) [application/zip]
Saving to: ‘dogImages.zip’


2020-06-08 09:11:42 (19.3 MB/s) - ‘dogImages.zip’ saved [1132023110/1132023110]

Archive:  dogImages.zip
replace dogImages/test/001.Affenpinscher/Affenpinscher_00003.jpg? [y]es, [n]o, [A]ll, [N]one, [r]ename: ^C


In [9]:
dog_files = np.array(glob('dogImages/*/*/*'))
print(f'There are {len(dog_files)} total dog images.')

There are 8351 total dog images.


## Dog Detection

Try different pretrained models for the ImageNet challenge!

![](pretrained_models.jpg)

I now create a general utility function for different pretraine models first

In [33]:
num_choice = 500
choices = np.random.choice(len(dog_files), num_choice)

In [34]:
ImageFile.LOAD_TRUNCATED_IMAGES = True

def imagenet_pretrained_model_predict(model, preprocess, img_path):
    '''
    Use a pre-trained model to obtain index corresponding to
    predicted ImageNet class for image at specified path
    
    Args:
        img_path: path to an image
        
    Returns:
        index corresponding to pretrained model's prediction
    '''
    model = model.eval()
    input_img = Image.open(img_path)
    
    input_tensor = preprocess(input_img)
    input_batch = input_tensor.unsqueeze(0)
    
    with torch.no_grad():
        output = model(input_batch)
        
    return output.data.numpy().argmax()

In [42]:
def dog_detector(model, preprocess, img_path):
    result = imagenet_pretrained_model_predict(model, preprocess, img_path)
    return 151 <= result <= 268

While looking at the [dictionary](https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a), you will notice that the categories corresponding to dogs appear in an uninterrupted sequence and correspond to dictionary keys 151-268, inclusive, to include all categories from 'Chihuahua' to 'Mexican hairless'. Thus, in order to check to see if an image is predicted to contain a dog by the pre-trained model, we need only check if the pre-trained model predicts an index between 151 and 268 (inclusive).

### VGG16

In [36]:
vgg16 = models.vgg16(pretrained=True)
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225],
    ),
])

In [43]:
dog_files_dog_detector_results = [dog_detector(vgg16, preprocess, dog_files[choice]) for choice in choices]
print(f'What percentage in dog_files are successfully detected a dog face?: {sum(dog_files_dog_detector_results) / num_choice}')

What percentage in dog_files are successfully detected a dog face?: 0.992


### ResNet-50

In [38]:
resnet50 = models.resnet50(pretrained=True)
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

In [44]:
dog_files_dog_detector_results = [dog_detector(resnet50, preprocess, dog_files[choice]) for choice in choices]
print(f'What percentage in dog_files are successfully detected a dog face?: {sum(dog_files_dog_detector_results) / num_choice}')

What percentage in dog_files are successfully detected a dog face?: 0.988


### MobileNet v2

In [40]:
mobilenet = models.mobilenet_v2(pretrained=True)
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

Downloading: "https://download.pytorch.org/models/mobilenet_v2-b0353104.pth" to /home/jupyter/.cache/torch/checkpoints/mobilenet_v2-b0353104.pth


HBox(children=(FloatProgress(value=0.0, max=14212972.0), HTML(value='')))




In [45]:
dog_files_dog_detector_results = [dog_detector(mobilenet, preprocess, dog_files[choice]) for choice in choices]
print(f'What percentage in dog_files are successfully detected a dog face?: {sum(dog_files_dog_detector_results) / num_choice}')

What percentage in dog_files are successfully detected a dog face?: 0.996


## Dog Breed Classifier

In [10]:
ImageFile.LOAD_TRUNCATED_IMAGES = True

preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

train_dataset = datasets.ImageFolder('dogImages/train', transform=preprocess)
class_names = [f[4:].replace('_', ' ') for f in train_dataset.classes]
print(f'Total number of classes: {len(class_names)}')

Total number of classes: 133


In [141]:
class_names

['Affenpinscher',
 'Afghan hound',
 'Airedale terrier',
 'Akita',
 'Alaskan malamute',
 'American eskimo dog',
 'American foxhound',
 'American staffordshire terrier',
 'American water spaniel',
 'Anatolian shepherd dog',
 'Australian cattle dog',
 'Australian shepherd',
 'Australian terrier',
 'Basenji',
 'Basset hound',
 'Beagle',
 'Bearded collie',
 'Beauceron',
 'Bedlington terrier',
 'Belgian malinois',
 'Belgian sheepdog',
 'Belgian tervuren',
 'Bernese mountain dog',
 'Bichon frise',
 'Black and tan coonhound',
 'Black russian terrier',
 'Bloodhound',
 'Bluetick coonhound',
 'Border collie',
 'Border terrier',
 'Borzoi',
 'Boston terrier',
 'Bouvier des flandres',
 'Boxer',
 'Boykin spaniel',
 'Briard',
 'Brittany',
 'Brussels griffon',
 'Bull terrier',
 'Bulldog',
 'Bullmastiff',
 'Cairn terrier',
 'Canaan dog',
 'Cane corso',
 'Cardigan welsh corgi',
 'Cavalier king charles spaniel',
 'Chesapeake bay retriever',
 'Chihuahua',
 'Chinese crested',
 'Chinese shar-pei',
 'Chow cho

Retreive pretrained model weights from Google Cloud Storage

In [11]:
!gsutil cp gs://dog-breed-classifier/models/model_transfer.pt .

CommandException: No URLs matched: gs://dog-breed-classifier/models/model_transfer.pt


In [12]:
BiT_M_R101_1 = ResNetV2([3, 4, 23, 3], width_factor=1, head_size=133, zero_head=True)
device = torch.device('cpu')
BiT_M_R101_1.load_state_dict(torch.load('model_transfer.pt', map_location=device))
BiT_M_R101_1 = BiT_M_R101_1.eval()

ResNetV2(
  (root): Sequential(
    (conv): StdConv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (pad): ConstantPad2d(padding=(1, 1, 1, 1), value=0)
    (pool): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (body): Sequential(
    (block1): Sequential(
      (unit01): PreActBottleneck(
        (gn1): GroupNorm(32, 64, eps=1e-05, affine=True)
        (conv1): StdConv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (gn2): GroupNorm(32, 64, eps=1e-05, affine=True)
        (conv2): StdConv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (gn3): GroupNorm(32, 64, eps=1e-05, affine=True)
        (conv3): StdConv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (relu): ReLU(inplace=True)
        (downsample): StdConv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      )
      (unit02): PreActBottleneck(
        (gn1): GroupNorm(32, 256, eps=1e-05, aff

Model summary

In [13]:
summary(BiT_M_R101_1, input_size=(3, 224, 224))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
         StdConv2d-1         [-1, 64, 112, 112]           9,408
     ConstantPad2d-2         [-1, 64, 114, 114]               0
         MaxPool2d-3           [-1, 64, 56, 56]               0
         GroupNorm-4           [-1, 64, 56, 56]             128
              ReLU-5           [-1, 64, 56, 56]               0
         StdConv2d-6          [-1, 256, 56, 56]          16,384
         StdConv2d-7           [-1, 64, 56, 56]           4,096
         GroupNorm-8           [-1, 64, 56, 56]             128
              ReLU-9           [-1, 64, 56, 56]               0
        StdConv2d-10           [-1, 64, 56, 56]          36,864
        GroupNorm-11           [-1, 64, 56, 56]             128
             ReLU-12           [-1, 64, 56, 56]               0
        StdConv2d-13          [-1, 256, 56, 56]          16,384
 PreActBottleneck-14          [-1, 256,

In [14]:
def predict_top3_dog_breeds(img_path):
    def get_class_name(classes):
        return map(lambda x: class_names[x], classes)
    
    ImageFile.LOAD_TRUNCATED_IMAGES = True
    
    img = Image.open(img_path)
    img_tensor = preprocess(img)
    img_batch = img_tensor.unsqueeze(0)
    
    with torch.no_grad():
        output = BiT_M_R101_1(img_batch)
    
    softmax = torch.nn.Softmax(dim=1) # to get probability distribution over all classes
    softmax_output = softmax(output.data)
    top3_pred_values, top3_pred_indices = torch.topk(softmax_output, 3)
    top3_pred_prob =  np.squeeze(top3_pred_values).numpy()
    top3_pred_classes = np.squeeze(top3_pred_indices).numpy()
    
    return list(zip(get_class_name(top3_pred_classes), top3_pred_prob))

In [15]:
predict_top3_dog_breeds('Brittany_02625.jpg')

[('Brittany', 0.9765796),
 ('Irish red and white setter', 0.008068973),
 ('Pointer', 0.0073480364)]

In [20]:
def test(loaders, model, criterion):
    # monitor test loss and accuracy
    test_loss = 0.
    correct = 0.
    total = 0.

    for batch_idx, (data, target) in enumerate(loaders):
        # forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        # calculate the loss
        loss = criterion(output, target)
        # update average test loss 
        test_loss = test_loss + ((1 / (batch_idx + 1)) * (loss.data - test_loss))
        # convert output probabilities to predicted class
        pred = output.data.max(1, keepdim=True)[1]
        # compare predictions to true label
        correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred))).numpy())
        total += data.size(0)
            
    print('Test Loss: {:.6f}\n'.format(test_loss))

    print('\nTest Accuracy: %2d%% (%2d/%2d)' % (
        100. * correct / total, correct, total))

In [21]:
BATCH_SIZE = 16
test_dataset = datasets.ImageFolder('dogImages/test', transform=preprocess)
test_dataset_batch = DataLoader(
    test_dataset,
    batch_size=BATCH_SIZE,
)

criterion_transfer = torch.nn.CrossEntropyLoss()
test(test_dataset_batch, BiT_M_R101_1, criterion_transfer)

Test Loss: 0.412053


Test Accuracy: 86% (727/836)


## Integration

In [136]:
def predict(img_path):
    if dog_detector(img_path):
        return predict_top3_dog_breeds(img_path)
    else:
        return 'No dog is detected, please try another image again!'

In [137]:
predict('Brittany_02625.jpg')

[('Brittany', 0.9765796),
 ('Irish red and white setter', 0.008068973),
 ('Pointer', 0.0073480364)]

In [138]:
predict('sky.jpeg')

'No dog is detected, please try another image again!'