# Image Classification - Lab

## Introduction

Now that you have a working knowledge of CNNs and have practiced implementing associated techniques in Keras, its time to put all of those skills together. In this lab, you'll work to complete a Kaggle competition on classifying dog breeds.

https://www.kaggle.com/c/dog-breed-identification

## Objectives

You will be able to:
* Independently design and build a CNN for image classifcation tasks
* Compare and apply multiple techniques for tuning a model including data augmentation and adapting pretrained models

## Download and Load the Data

Start by downloading the data locally and loading it into a Pandas DataFrame. Be forewarened that this dataset is fairly large and it is advisable to close other memory intensive applications.

The data can be found here:

https://www.kaggle.com/c/dog-breed-identification/data

We recommend downloading the data into this directory on your local computer. From there, be sure to uncompress the folder and subfolders.

In [None]:
#No code persay, but download and decompress the data.

## Preprocessing

Now that you've downloaded the data, its time to prepare it for some model building! You'll notice that the current structure provided is not the same as our lovely preprocessed folders that we've been providing you. Instead, you have one large training folder with images and a csv file with labels associated with each of these file types. 

Use this to create a directory substructure for a train-validation-test split as we have done previously. Also recall from our previous work that you'll also want to use one-hot encoding as we are now presented with a multi-class problem as opposed to simple binary classification.

In [None]:
#Your code here; open the labels.csv file stored in the zip file
import pandas as pd
df = pd.read_csv('labels.csv')

In [None]:
df

We wish to create our standard directory structure:
* train
    * category1
    * category2
    * category3
    ...
* val
    * category1
    * category2
    * category3
    ...
* test 
    * category1
    * category2
    * category3
    ...  

In [None]:
import numpy as np
import os, shutil

old_dir = 'train/'

new_root_dir = 'data_org/'
os.mkdir(new_root_dir)

dir_names = ['train', 'val', 'test']
for d in dir_names:
    new_dir = os.path.join(new_root_dir, d)
    os.mkdir(new_dir)
    
for breed in df.breed.unique():
    print('Moving {} pictures.'.format(breed))
    #Create sub_directories
    for d in dir_names:
        new_dir = os.path.join(new_root_dir, d, breed)
        os.mkdir(new_dir)
    #Subset dataframe into train, validate and split sets
    #Split is performed here to ensure maintain class distributions.
    temp = df[df.breed == breed]
    train, validate, test = np.split(temp.sample(frac=1), [int(.8*len(temp)), int(.9*len(temp))])
    print('Split {} imgs into {} train, {} val, and {} test examples.'.format(len(temp),
                                                                              len(train),
                                                                              len(validate),
                                                                              len(test)))
    for i, temp in enumerate([train, validate, test]):
        for row in temp.index:
            filename = temp['id'][row] + '.jpg'
            origin = os.path.join(old_dir + filename)
            destination = os.path.join(new_root_dir + dir_names[i] + '/' + breed + '/' + filename)
            shutil.copy(origin, destination)
#Your code here
from keras.preprocessing.image import ImageDataGenerator

train_dir = 'data_org/train'
validation_dir = 'data_org/val/'
test_dir = 'data_org/test/'

# All images will be rescaled by 1./255
train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
        # This is the target directory
        train_dir,
        # All images will be resized to 150x150
        target_size=(150, 150),
        batch_size=20,
        class_mode='categorical')

validation_generator = test_datagen.flow_from_directory(
        validation_dir,
        target_size=(150, 150),
        batch_size=20,
        class_mode='categorical')

## Optional: Build a Baseline CNN

This is an optional step. Adapting a pretrained model will produce better results, but it may be interesting to create a CNN from scratch as a baseline. If you wish to, do so here.

In [None]:
#Create a baseline CNN model

## Loading a Pretrained CNN

## Feature Engineering with the Pretrained Model

Now that you've loaded a pretrained model, it's time to adapt that convolutional base and add some fully connected layers on top in order to build a classifier from these feature maps.

In [None]:
#Your code here; add fully connected layers on top of the convolutional base

from keras.applications import VGG19
from keras import layers
from keras import models
from keras import optimizers

cnn_base = VGG19(weights = 'imagenet',
                include_top = False,
                input_shape = (240,240,3))

model = models.Sequential()
model.add(cnn_base)
model.add(layers.Flatten())
model.add(layers.Dense(64, activation = 'relu'))
model.add(layers.Dense(128, activation = 'relu'))
model.add(layers.Dense(256, activation = 'relu'))
model.add(layers.Dense(128, activation = 'relu'))
model.add(layers.Dense(120, activation = 'sigmoid'))

cnn_base.trainable = False

for layer in model.layers:
    print(layer.name, layer.trainable)
    
print(model.trainable_weights)

model.summary()


train_dir = 'data_org/train'
val_dir = 'data_org/val'
test_dir = 'data_org/test'

train_datagen = ImageDataGenerator(rescale = 1./255,
                                  rotation_range = 40,
                                  width_shift_range = .2,
                                  height_shift_range = .2,
                                  shear_range = .2,
                                  zoom_range = .2,
                                  horizontal_flip = True,
                                  fill_mode = 'nearest')



train_generator = train_datagen.flow_from_directory(train_dir,
                                                    target_size = (240,240),
                                                    batch_size = 20,
                                                    class_mode = 'categorical')

validation_generator = ImageDataGenerator(rescale = 1./255).flow_from_directory(val_dir,
                                                    target_size = (240,240),
                                                    batch_size = 20,
                                                    class_mode = 'categorical')


test_generator = ImageDataGenerator(rescale = 1./255).flow_from_directory(
    test_dir,
    target_size = (240,240),
    batch_size = 180,
    class_mode = 'categorical',
    shuffle = False
                                                                         
                                                                         )

test_images, test_labels = next(test_generator)

model.compile(loss = 'categorical_crossentropy',
             optimizer = optimizers.RMSprop(lr = 2e-5),
              metrics = ['acc']
             
             )

history = model.fit_geneator(train_generator,
                             steps_per_epoch = 25,
                             epochs = 12,
                             validation_data = val_generator,
                             validation_steps = 10
)
                       
            

## Visualize History

Now fit the model and visualize the training and validation accuracy/loss functions over successive epochs.

In [None]:
#Your code here; visualize the training / validation history associated with fitting the model.

import matplotlib.pyplot as plt
%matplotlib inline 

acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(len(acc))
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()


In [None]:
#Save model
model.save('vgg19_FE_AUG_10epochs.h5')

## Final Model Evaluation

In [None]:
#Your code here

test_generator = test_datagen.flow_from_dictionary(
    test_dir,
    target_size = (240,240),
    batch_size = 20,
    class_mode = 'categorical',
    shuffle = False
)

test_loss, test_acc = model.evaluate_generator(test_generator, steps = 54)
predictions = model.predict_generator(test_generator, steps = 54)
print('number of predictions: {}'.format(len(predictions)))
print('accuracy: {}'.format(test_acc))

## Summary

Congratulations! In this lab, you brought all of your prior deep learning skills together from preprocessing including one-hot encoding, to adapting a pretrained model. There are always ongoing advancements in CNN architectures and best practices, but you have a solid foundation and understanding at this point.