# Dog Breed Classifer Project
In this workbook you will build a model to classify the breed of a dog from images. This is based on the Kaggle Dog Breed Identification Challenge (https://www.kaggle.com/c/dog-breed-identification)

The task is to learn a model that classifies an image of a dog into 1 of 120 dog breeds. Like the Cat or Dog exercises, the images are real-world images so there is a great deal of variety in the data that your model will need to account for.

As this is a Project, the workbook only gives you a skelton to work with, how you decide to solve this is up to you. No doubt you will want to refer back to previous exercises to find code snipets you can copy and adapt to this purpose.

## Important
This exercise is not meant to be straightforward, you will be given some skeleton code but you will need to create much of the model yourself and make more descisons about design and training your model.

In part it is to consolidate your knowledge but also to expose you to the often iterative nature of building and training models.

We usually allocate about 1hr to this exercise so you have plenty of time to experiment with different options in a group and iternative manner, you should have time to fully train a few different models.

Don't Panic - this is not a test so if you need assistance or guidance just ask.

# Importing some packages
We are using the Python programming language and a set of Machine Learning packages - Importing packages for use is a common task. For this workshop you don't really need to pay that much attention to this step (but you do need to execute the cell) since we are focusing on building models. However the following is a description of what this cell does that you can read if you are interested.

### Description of imports (Optional)
You don't need to worry about this code as this is not the focus on the workshop but if you are interested in what this next cell does, here is an explaination.

|Statement|Meaning|
|---|---|
|__import tensorflow as tf__ |Tensorflow (from Google) is our main machine learning library and we performs all of the various calculations for us and so hides much of the detailed complexity in Machine Learning. This _import_ statement makes the power of TensorFlow available to us and for convience we will refer to it as __tf__ |
|__from tensorflow import keras__ |Tensorflow is quite a low level machine learning library which, while powerful and flexible can be confusing so instead we use another higher level framework called Keras to make our machine learning models more readable and easier to build and test. This _import_ statement makes the Keras framework available to us.|
|__import numpy as np__ |Numpy is a Python library for scientific computing and is commonly used for machine learning. This _import_ statement makes the Keras framework available to us.|
|__import matplotlib.pyplot as plt__ |To visualise what is happening in our network we will use a set of graphs and MatPlotLib is the standard Python library for producing Graphs so we __import__ this to enable us to make pretty graphs.|
|__%matplotlib inline__| this is a Jupyter Notebook __magic__ commmand that tells the workbook to produce any graphs as part of the workbook and not as pop-up window.|

In [5]:
from __future__ import absolute_import, division, print_function, unicode_literals

import os

import tensorflow as tf
from tensorflow import keras
print("TensorFlow version is ", tf.__version__)

import numpy as np

import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from google.colab import files
from keras.preprocessing import image

%matplotlib inline


## Helper functions
The following cell contains a set of helper functions that makes our models a little clearer. We will not be going through these functions (since they require Python knowlege) so just make sure you have run this cell.

In [None]:
# Needed to stop ImageFile load failing for truncated images (known issue)
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True


def get_data():
    # Download and extract the Data Set
    # Original Source is http://vision.stanford.edu/aditya86/ImageNetDogs/
    # However Udacity have created a set with Training and Test images set-up
    source_url = "https://s3-us-west-1.amazonaws.com/udacity-aind/dog-project/dogImages.zip"
    # Download the data 

    zip_file = tf.keras.utils.get_file(origin = source_url,
                                            fname='dogImages.zip',
                                            extract = True)
    # Grab the location of the unzipped data
    base_dir, _ = os.path.splitext(zip_file)

    # Define the path to the Training and Validation Datasets
    train_dir = os.path.join(base_dir, 'train')
    validation_dir = os.path.join(base_dir, 'valid')
    test_dir = os.path.join(base_dir, 'test')

    return train_dir, validation_dir, test_dir

def get_dog_breed_classes(data_dir):
    return os.listdir(data_dir)
    
def showImageGrid(image_dir, num_rows=2, num_cols=4):  
  image_labels = os.listdir(image_dir)
  num_pix = num_rows * num_cols
  # Index for iterating over images
  pic_index = 0
  # Set up matplotlib fig, and size it to fit 4x4 pics
  fig = plt.gcf()
  fig.set_size_inches(num_cols * 4, num_rows * 4)

  pic_index += num_pix
  next_pix = [os.path.join(image_dir, fname) 
                  for fname in image_labels[pic_index-num_pix:pic_index]]
  
  for i, img_path in enumerate(next_pix):
    # Set up subplot; subplot indices start at 1
    sp = plt.subplot(num_rows, num_cols, i + 1)
    sp.axis('Off') # Don't show axes (or gridlines)

    img = mpimg.imread(img_path)
    plt.imshow(img)

  plt.show()
    
def predictImageContent(model):
  import numpy as np
  from google.colab import files
  from keras.preprocessing import image

  uploaded = files.upload()

  for fn in uploaded.keys():

    # predicting images
    path = '/content/' + fn
    img = image.load_img(path, target_size=(image_size, image_size))
    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)

    images = np.vstack([x])
    classes = model.predict(images, batch_size=10)
    pred_class = np.argmax(classes[0])
    print("{} is a {}".format(fn, dog_classes[pred_class]))

# Get Data
We will download the Dog Breed dataset, extract it to your machine.

In [None]:
train_dir, validation_dir, test_dir = get_data()
dog_classes = os.listdir(train_dir)
print("Total number of dog breed classes: {}".format(dog_classes))

## Let's looks at some of the images

In [None]:
# Display some images from the Training folder
print("Training Images")
showImageGrid(train_dir, num_rows=4, num_cols=4)

# Create Data Generators
In our dataset we have seperate sets of data for:
- training
- validation
- testing

The path to these are given in the variables:
- __train_dir__
- __validation_dir__
- __test_dir__

Since we are working with images on the file-system we will need to create ImageDataGenerators for each of the training data set.

Below we have created the ImageDataGenerator for the training dataset. You will need to create your own generators for the validation and test data sets.

In [None]:
# We want all our images to be re-sized to 160 x 160 pixels
image_size = 160

# For Training we want to use batches of 32 images at a time
batch_size = 32

## Data Generator for the Training data
The following cell contains the code to create a basic generator for the Training data (stored in the __train_dir__ folder).
### Optional Exercise
This Generator does not include any options for image augmentation. If you want, you can add these to this Generator.

In [None]:
# Training Generator
train_datagen = keras.preprocessing.image.ImageDataGenerator(
                rescale=1./255)

# Flow training images in batches of 32 using train_datagen generator
train_generator = train_datagen.flow_from_directory(
                train_dir,  # Source directory for the training images
                target_size = (image_size, image_size),
                batch_size = batch_size,
                # We are performing a multi-class Classification
                class_mode = 'categorical',
                classes=dog_classes)

## Data Generator for Valdiation data
### Exercise
Create a Data Generator for the Validation data - this is stored in the __validation_dir__ folder.

Use the above step as a guide and make the necessary changes to make it specific to the Validation Data

Optionally you can include Data Augmentation options

In [None]:
# Validation Data Generator


## Data Generator for Testing data
### Exercise
Create a Data Generator for the Testing data - this is stored in the __test_dir__ folder.

Use the above step as a guide and make the necessary changes to make it specific to the Validation Data


In [None]:
# Testing Data Generator


# Design your Model
With the data loaded and the Data Generators created, we can now design our network. The design of your network is entirely up to you.

You could:
- Create CNN network
- Use Transfer Learning

### Exercise
Discuss with your team a set of network designs to try and each of you build a different one. This enables you to compare different options.


Your best sources are the last 2 workbooks as they cover a similar domain

In [None]:
# Define the network
model = keras.models.Sequential()

# TODO: YOUR CODE HERE




# Output Layer
model.add(tf.keras.layers.Dense(len(dog_classes), activation=tf.nn.softmax))

# Compile the Model
model.compile(loss='categorical_crossentropy',
              optimizer=tf.keras.optimizers.AdamOptimizer,
              metrics=['accuracy'])

# Print out a summary of the model
model.summary()

# Train your Model
### Exercise
You are now ready to train your model. We have provided you with the code to train the model, but you need to decide the number of epochs to train for and the early stopping criteira (patience level).

__NOTE__: If you have opted to use Transfer learning then you may need to create additional cells to fine tune the model (as per the last workbook)

In [1]:
# TODO: Review the number of epochs and the early stoping criteria (patience).
# Change these if you think it necessaryt
epoch = 10

# Stop early if our Validation Loss stagnates
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)

NameError: name 'keras' is not defined

In [None]:
# Train the mode
steps_per_epoch = train_generator.n // batch_size
validation_steps = validation_generator.n // batch_size

history = model.fit_generator(
    train_generator,
    steps_per_epoch=steps_per_epoch,  
    epochs=epochs,
    validation_data = validation_generator,
    validation_steps = validation_steps,
    callbacks=[early_stop])

In [None]:
printLossAndAccuracy(transfer_history)

## Evaluate your model
Our Training Data includes a test data set so we will use this to evaluate our model

In [None]:
# Test against our Test Set
loss, accuracy = model.evaluate_generator(test_generator)
print("Test Loss {:.4f}".format(loss))
print("Test Accuracy {:.3f}".format(accuracy))

### Exercise
In your teams, consider the task we have been working on (dog breed classification) and consider the following questions:

- What would be the Human Level Performance for this task? And how did our model do compared to that expectation?
 
If your model hasn't performed as well as you think it could have then consider either continuting to train the model for more epochs or perform some fine-tuning to try and get this closer to a Human Level Performance

# Testing our Model
The Accuracy gives us a view of how well the model performed against our data, we now need to consider how we might test this model to look for potential issues.

The following cell will allow you to select a file of your own choosing to test our current model.

### Exercise
Think about the task we are trying to solve (predicting the breed of a dog) and in your teams consider:
 - What dog breeds are easy to identify?
 - What dog breeds might a human confuse?

Use the cell below to try some images out 
- you can download images to your machine from sites such as https://pixabay.com/ and run the cell below to load and classify the image - Use Small images if you can (makes the testing a bit faster)
- Find 1 image of a Cat and 1 image of a Dog that you think your model should easily identify correctly.
- Find 1 image of a Cat and 1 image of a Dog that you think might be challenging for your model to identify correctly (e.g. a Dog that doesn't look like a dog, or a Cat that looks like a Dog).
- NOTE: This only works in a CoLab environment.


Were you able to fool the model in a way that a human would not have been fooled?
- If so what should we do?

In [None]:
predictImageContent(model)

## Exercise
In your teams, prepre a short report (5-8 mins) that outlines:
- The model you found worked best
- Your Testing Strategy
- Your Key Findings

We will then present to each other and debrief