# Lesson 4 - Image Classification Part 2
So in the previous lesson we learned about creating multi-layer networks and how that can improve the learning.

In this lesson we will see that simply adding more layers (going deeper) is not always that helpful and so we need to approach the problem in a different way.

This lesson will introduce the following new concepts:
- Learning features
- Convolutional Layers
- Max Pooling Layers

We will do this using a different image data set call __Fashion-MNIST__. This is a set of different images of Fashion items across 10 different classes (shoes, boots, shirts, dresses etc.). This dataset is harder than the Handwritten digits so we will need to use all we already know solve this.

## Importing some packages
We are using the Python programming language and a set of Machine Learning packages - Importing packages for use is a common task. For this workshop you don't really need to pay that much attention to this step (but you do need to execute the cell) since we are focusing on building models. However the following is a description of what this cell does that you can read if you are interested.

### Description of imports (Optional)
You don't need to worry about this code as this is not the focus on the workshop but if you are interested in what this next cell does, here is an explaination.
- __import tensorflow as tf__ - Tensorflow (from Google) is our main machine learning library and we performs all of the various calculations for us and so hides much of the detailed complexity in Machine Learning. This _import_ statement makes the power of TensorFlow available to us and for convience we will refer to it as __tf__
- __from tensorflow import keras__ - Tensorflow is quite a low level machine learning library which, while powerful and flexible can be confusing so instead we use another higher level framework called Keras to make our machine learning models more readable and easier to build and test. This _import_ statement makes the Keras framework available to us.
- __import numpy as np__ - Numpy is a Python library for scientific computing and is commonly used for machine learning. This _import_ statement makes the Keras framework available to us.
- __import matplotlib.pyplot as plt__ - To visualise what is happening in our network we will use a set of graphs and MatPlotLib is the standard Python library for producing Graphs so we __import__ this to enable us to make pretty graphs.
- __%matplotlib inline__ - this is a Jupyter Notebook __magic__ commmand that tells the workbook to produce any graphs as part of the workbook and not as pop-up window.

In [None]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from tensorflow import keras
import lesson4
%matplotlib inline

## The Fashion MNIST dataset
The Fashion MNIST dataset is another one of the standard datasets for learning Machine Learning. It contains a set of labelled images across 10 different classes. The images are small (28x28) and greyscale but the are noticably different and most are easy to recognise by a human.

As a standard dataset, it comes with Keras and is already split into a Training and Test Set for us.

It is a harder problem to solve than the handwritten digits since some classes of images are share similarities. For example, a Shirt and a Pullover have similar shapes.

Let's load the dataset and look at some of the images.

In [None]:
from tensorflow.keras.datasets import fashion_mnist
((x_train, y_train), (x_test, y_test)) = fashion_mnist.load_data()
# This is the list of labels for the classes
class_names = ["t-shirt", "trousers", "pullover", "dress", "coat",
           "sandle", "shirt", "sneaker", "bag", "boot"]

In [None]:
print("Images from the Training dataset")
lesson4.showSampleImages(x_train, y_train, class_names)

# Data Pre-Processing
As before we will normalise the data so that each pixel has a value between 0 and 1 instead of 0 and 255. 

Again, this makes the images lighter but does not really change the relative difference between the pixels.

In [None]:
# Normalise the data
x_train = tf.keras.utils.normalize(x_train, axis=1)
x_test = tf.keras.utils.normalize(x_test, axis=1)

In [None]:
print("Normalised Images")
lesson4.showSampleImages(x_train, y_train, class_names)

## Exercise
Discucss in your groups what you think _Human Level Performance_ would be for this task.

Which classifications do you think a human might get confused with?

## Exercise: Define your model
Discuss in your groups what models you want to try against this data. Think about the:
- Number of Hidden Layers in your model
- The number of nodes in each of you Hidden Layers
- How many epochs you will train for

How many nodes do you need in your Output Layer?

Come up with enough different models that you can each train a different model.

In [None]:
model = tf.keras.models.Sequential()

# Input layer
model.add(tf.keras.layers.Flatten(input_shape=(28,28)))

# YOUR CHANGES START HERE
# Hidden Layers
# TODO: Define your network architecture. We've included a sample layer definition for you to 
# copy and base your layers on. You need to decide how many layers and what side each layer should be
# Options include:
#    - copy this line to add additional layers
#    - Change the number of nodes (from 32) to some other value such as 64, 128 or 256
#    - Combine additional layers with different numbers of nodes
model.add(tf.keras.layers.Dense(256, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(256, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(256, activation=tf.nn.relu))

# TODO: Define how many output nodes you need to classify the images (change None to the number of classes)
# Output layer
model.add(tf.keras.layers.Dense(10, activation=tf.nn.softmax))
# YOUR CHANGES END HERE

# Compile the model
model.compile(optimizer=tf.train.AdamOptimizer(),
              loss='sparse_categorical_crossentropy', 
              metrics=['accuracy'])

model.summary()

In [None]:
# Train the model
# YOUR CHANGES START HERE
# TODO: Set the number of epochs to train for
num_epochs = 20
# YOUR CHANGES END HERE
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=3)

history = model.fit(x_train, y_train, epochs=num_epochs, validation_split = 0.2, 
                    callbacks=[early_stop])

## Evaluate our model
Now that we have trained our model (and hopefully the Accuracy of the model is greater than 90%) we can evaluate our model on the _testing dataset_

In [None]:
val_loss, val_acc = model.evaluate(x_test, y_test)
print ("Test Loss:", val_loss)
print ("Test Accuracy:", val_acc)

In [None]:
# summarize history for loss and accuracty across epochs
lesson4.displayLossAndAccuracy(history)

In [None]:
# Produce the Confusion Matrix
test_predictions = model.predict_classes(x_test)
for i, label in enumerate(class_names):
    print("{} = {}".format(i, label))
lesson4.displayConfusionMatrix(y_test, test_predictions)

In [None]:
# Display some of the incorrectly classified images
lesson4.printSampleIncorrectImages(x_test, y_test, class_names, model)

### Exercise
In groups, disucss the following:
- Which model performed best?
- Did your model generalise well to the unseen data?
    - Hint look at how close was your Testing Accuracy was to the Training Accuracy?
- Does your model appraoch Human Level Performance on this task?

### Exercise
Think about your current work, social or other situation and how the use of Image Classification could be used for __good__.

Work in your teams and:
- Identify possible uses for Image Classification in your context
- Think about what data you might need and where you can obtain it from
- Consider the Ethical and Social implications of doing this 

# Optional Exercise
The following exercise introduces a different type of Network, a Convolutional Neural Network (CNN). This exercise is optional and if time allows; if we do not complete this exercise during the workshop you can complete this in your own time if you are interested.

## Introducing Convolution Layers
In the models we have been creating we take a 2-D image and flatten it into a single stream of values.

However when we flatten the input we "loose" some of the spacial information that is contained in the images (e.g. how one pixel or set of pixels relate to each other). With a single list of numbers, does not know how pixels relate to each other so it's harder to "learn" spacial information such as lines and patterns. 

This could be one of the reasons why we seem incapable of doing well on this task. 

To regain this spacial awareness we need a different types of layers to capture this information.
- Convolutional Layers
- Pooling Layers

### Understanding Convolution Layers
Convolutions are a technique from Image Processing that are applied to images to perform operations such as:
- sharpen an image
- Emphisise Veritical, Horizontal or diagonal lines
- Emphisise transitions form dark to light

In image processing we use specific Convolutions to perform the above operations, but in machine learning we want our model to learn it's own Convolutions from the images so that it can learn basic features (such as lines, shades etc.) and complex features (such as textures, head shapes etc.).

A convolution operations works on a filter and scans the image systematically. During _training_ the model is attempting to create a filter that describes some feature of the images it is presented. During _prediction_ the filter is used to detect these features in the image.

The following animaation (source https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1) shows how Convolutions work.

<img src="https://cdn-images-1.medium.com/max/800/1*Fw-ehcNBR9byHtho-Rxbtw.gif" alt="Alt text that describes the graphic" title="Title text" height="400" width="400" />

Learning these convolutions allows our model to learn localised features such as perhaps, how do the positions of eyes relate to each other in an image.

This can be quite complex to implement but luckily _Keras_ has pre-implemented the Convolution layer so we can simply add this to our model like we have done with the __Dense__ layers

We can create Convolution in Keras using:

`tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3), activation='relu')`

Where:
- __filters__ is the number of convolution units we want such as 32, 64, 128, 256...
    - We can have any number of filters but these are typical values
- __kernel_size__ is the size of the grid we want to use such as 1x1, 2x2, 3x3 or 5x5
    - The grid can be of any size but these are fairly typical values.

### Understanding Pooling layers
Pooling layers also systematically scan the image using a filter (_kernel_) but instead of learning some property, they perform some mathematical operations on the data under the grid. These operations can be:
- Take the _maximum_ value of the data under the grid. This is known as __Max Pooling__
- Take the _minimum_ value of the data under the grid. This is known as __Min Pooling__
- Take the _average_ value of the data under the grid. This is known as __Average Pooling__

This has the effect of reducing variance and reducing the computational complexity while extracting salient features.

A good article on Pooling is at https://medium.com/@bdhuma/which-pooling-method-is-better-maxpooling-vs-minpooling-vs-average-pooling-95fb03f45a9

Again Keras makes adding Pooling layers very easy for use:

`tf.keras.layers.MaxPooling2D(pool_size=2, stride=2)
tf.keras.layers.AveragePooling2D(pool_size=2, stride=2)
tf.keras.layers.MinPooling2D(pool_size=2, stride=2)`

Where:
- __pool_size__ is the size of the grid (in the above cases 2x2)
    - We can have non-square pool_sizes such as (1, 3)
- __strides__ is the size of the step taken for each pooling operation (in the above cases we step by 2 places each pooling operation)

## Let's create a Convolutional Neural Netwok
We will now attempt to solve our image classificaiton problem using a Convolutional Neural Network (CNN) and see if we can improve on our previous accuracy score.

In this example we will create a small CNN consisting of:
- A Convolutional Input layer 
- A MaxPooling layer
- A Convolutional layer (you will specify the number of filters)
- A Pooling layer (you will specify whether to use Max, Min or Average Pooling
- A Convolutional layer (you will specify the number of filters)
- A Dense Layer (you will specify the number of nodes)
- An Desnse output layer to classify the images.

### Exercise
Work in your groups to decide what network archiecture you will use and each train a different network to compare the results.

__Notes:__ 
- when using convolutions it is typical that the number of filters increases as you go deeper into the network so consider patterns such as 32 -> 64 - > 128 rather than decreasing the number of filters.
- The more filters you choose the longer the training will take so try not to be too extravagent (at least during the workshop!)

In [None]:
cnn_model = tf.keras.models.Sequential()

# Input layer
cnn_model.add(tf.keras.layers.Conv2D(filters=32, kernel_size=(3, 3), 
                                     activation='relu', input_shape=(28, 28, 1)))
cnn_model.add(tf.keras.layers.MaxPooling2D((2, 2)))

# YOUR CHANGES START HERE
# Layer 1 - TODO
#    - specify how many filters you want in the Conv2D layer
#    - specify whether you want 'MaxPooling2D', "AveragePooling2D" or "MinPooling2D" in your pooling
cnn_model.add(tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
cnn_model.add(tf.keras.layers.MaxPooling2D(pool_size=2))
cnn_model.add(tf.keras.layers.Dropout(0.25))
# Layer 2 - TODO
#    - Specify how many filters you want in this layer
cnn_model.add(tf.keras.layers.Conv2D(128, (3, 3), activation='relu'))
cnn_model.add(tf.keras.layers.Dropout(0.25))

# Layer 3 - TODO
#    - Specify how may nodes you want in this dense layer
cnn_model.add(tf.keras.layers.Flatten())
cnn_model.add(tf.keras.layers.Dense(128, activation='relu'))

# Output Layer - we have 10 classes so need 10 nodes
cnn_model.add(tf.keras.layers.Dense(10, activation='softmax'))

# Compile the model
cnn_model.compile(optimizer=tf.train.AdamOptimizer(),
              loss='sparse_categorical_crossentropy', 
              metrics=['accuracy'])

cnn_model.summary()

In [None]:
# Just a bit more data prep to work with Convoltions
x_train_cnn = x_train.reshape(60000, 28, 28, 1)
x_test_cnn = x_test.reshape(10000, 28, 28, 1)

# Train the model (training with Convolutions will be a bit slower so we don't want to train for too long))
num_epochs = 20
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=3)

history = cnn_model.fit(x_train_cnn, y_train, epochs=num_epochs, validation_split = 0.2, 
                    callbacks=[early_stop])

In [None]:
val_loss, val_acc = cnn_model.evaluate(x_test_cnn, y_test)
print ("Validation Loss:", val_loss)
print ("Validation Accuracy:", val_acc)

In [None]:
# summarize history for loss and accuracty across epochs
lesson4.displayLossAndAccuracy(history)

In [None]:
# Produce the Confusion Matrix
test_predictions = cnn_model.predict_classes(x_test_cnn)
for i, label in enumerate(class_names):
    print("{} = {}".format(i, label))
lesson4.displayConfusionMatrix(y_test, test_predictions)

In [None]:
# Display some of the incorrectly classified images
lesson4.printSampleCnnIncorrectImages(x_test_cnn, y_test, class_names, cnn_model)

### Exercise
In groups, disucss the following:
- Did the Convolution Model perform better than the Dense Layer Model? If so in what ways?
- Did using the Convolution Model change what type of _Confusions_ the model had?
- Do you think training the CNN for longer would produce better results?
- How well did the CNN generalise to the unseen Test Data?

# Key Observations
The following are key observations to note before we move on
1. The relationship between local features can be important and so we need ways to capture this information to enable better learning.
    - In this workbook we looked at the use of Convulational Layers which can be used to capture relationships between local features. 
    - Other types of layers exist that have different properties that are useful in different types of data.
2. Models can have good accuracy during training but don't generalise well - this is known as Overfitting.
    - there are techniques that we can use to overcome this (such as adding Dropown layers) but this is more advanced that we want to cover in this workshop.
3. The more complex the taks the more involved the network architecture can become.
    - We will specifically address this in the next lesson.