# ASTR3110 Tutorial 10: CNNs
Tutorial 10 of the 'Data Science Techniques in Astrophysics' course at Macquarie University.

## Learning outcomes from this tutorial

 * Learn how to setup data to input into a convolutional neural network.
 * Learn how to build a convolutional neural network in the Keras framework.
 * Create diagnostic graphs and reports for a CNN to understand how well the training went.
 * Use a saved model to make a prediction for an image.


## Setup for Google Drive

Today we will be operating on a dataset of animal images, so start by linking to your Google drive. The dataset is available at this link: [[animals.tar.gz](https://drive.google.com/file/d/1dEdDPCg9_TkDEvyLgHDrEFYhRVgakt99/view?usp=sharing)]. Please download to your Google drive and then follow the below directions to store the data in the local Colab directory in which you are running your notebook:

```
# Link to Google drive
from google.colab import drive
drive.mount('/content/gdrive')

#Copy animals.tar.gz to local directory (note the last dot means copy into the current directory)
!cp gdrive/'My Drive'/animals.tar.gz .

#make a DATA directory
!mkdir DATA

# Unpack the dataset
!tar -xzf animals.tar.gz

#Move the new folder into your DATA directory
!mv animals DATA/
```

Note that the local Colab directory may not save these datafiles after you close down your session.

## Setup for Colab

Today we will be running CNNs, which can be slow when using CPUs to do the crunching, but using a GPU can speed things up substantially. It is recommended that you run this lab using Colab, and set up the notebook to utilize GPU acceleration. To do this, click the 'Edit' button at top left of the window, then 'Notebook Settings', then select 'GPU' from the 'Hardware accelerator' dropdown menu.

## Quick overview of CNNs

For a quick explanation of how a CNN works see [here](https://towardsdatascience.com/simple-introduction-to-convolutional-neural-networks-cdf8d3077bac), [here](https://www.cs.ryerson.ca/~aharley/vis/conv/) or [here](https://arxiv.org/abs/1511.08458). For a free detailed course with deep mathematical background, see [http://cs231n.stanford.edu/](http://cs231n.stanford.edu/).

## Accessing the data

In this dataset the train-validation-test split has already been done, so we just need to read in the images in each directory and put them into the format expected by the CNN.

The data are in the "animals" directory with images sorted into sub-directories that are split into test, train, valid directories, each with subdifrectories for each of the classes (cat, dog, panda). That is:

```
animals/
├── test
│   ├── cat
│   ├── dog
│   └── panda
├── train
│   ├── cat
│   ├── dog
│   └── panda
└── valid
    ├── cat
    ├── dog
    └── panda
```


We can use [glob](https://docs.python.org/3/library/glob.html) to return a list of all of the images, which can then be used to read our data in.

Let's take a look at the images and see what needs to be done to get them into the required format.

The images are RGB colour images, with different widths and heights, and pixel range 0 to 255. We need to loop over each of the lists, read in each image, normalise to 0-1, and resize onto a consistent size. We also need to generate the label for each image using the pathname and some tricky string manipulation.

Count the number of images in each dataset.

We also need to make one-hot binarized labels like we did for the ANNs last week:

## Building the CNN

Now it is time to build the model. Like the ANN, we create a sequential stack of layers. However, these layers accept 3D images with a width, height and depth. CNNs also tend to have repeating blocks of layers that do:

```
Convolution -> Activation - > Binning (downsize images)
```

The convolutional layer processess the image by convolving it with a number of small filters (usually 3x3). These filters start out as noise, but are changed by the training process to detect differet 'textures' in the images. 

The binning layer we use here is called ```MaxPooling``` and shrinks the resolution of the output by 1/2 (see [here](https://computersciencewiki.org/index.php/Max-pooling_/_Pooling)). So as the image passes through the network it is reduced in size and the relative scale of the filters changes. 

Let's define simple network. The CNN accepts an array of 3D images (compare this to the ANN, which accepts an array of flattened 1D vectors).

Now we choose an optimiser (default to stochastic gradient descent) and compile the model.

## Training the classifier

Now we set the final training parameters and train the CNN classifier

## Evaluating the classifier and training session

Let's start by visualising the training curves. We will make a function to do this and format it nicely.

In [20]:
#Not going to go through the plotting code -- go through in your own time to familiarise yourself with
#what this is doing. Similar to plotting code from ANN notebook, but now the loss and accuracy are plotted
#on separate subplots so that it is easier to see how they change.
import matplotlib.pyplot as plt
import matplotlib as mpl

# Set larger font sizes
mpl.rcParams["font.size"] = 12.0

def plot_train_curves(H):

    # Create the figure
    fig = plt.figure(figsize=(14., 6.))
    
    # Sub-plot for the loss curves
    ax1 = fig.add_subplot(1,2,1)    
    epoch = range(1, len(H["loss"])+1)
    ax1.step(epoch, H["loss"], where="mid", label="Train Loss")
    ax1.step(epoch, H["val_loss"], where="mid", label="Valid Loss")
    ax1.legend(loc="best", shadow=False, fontsize="medium")
    ax1.set_title("Model Loss [Epoch {:d}]".format(epoch[-1]))
    ax1.set_ylabel("Loss")
    ax1.set_xlabel("Epoch")
    
    # Sub-plot for the accuracy curves
    ax2 = fig.add_subplot(1,2,2)
    ax2.yaxis.tick_right()
    ax2.yaxis.set_label_position("right")
    ax2.step(epoch, H["accuracy"], where="mid", label="Train Accuracy")
    ax2.step(epoch, H["val_accuracy"], where="mid", label="Valid Accuracy")
    ax2.legend(loc="lower right", shadow=False, fontsize="medium")
    ax2.set_title("Model Accuracy [Epoch {:d}]".format(epoch[-1]))
    ax2.set_ylabel("Accuracy")
    ax2.set_xlabel("Epoch")

    # Apply nice formatting
    ax1.tick_params(pad=7)
    for line in ax1.get_xticklines() + ax1.get_yticklines():
        line.set_markeredgewidth(1)
        ax2.tick_params(pad=7)
    for line in ax2.get_xticklines() + ax2.get_yticklines():
        line.set_markeredgewidth(1)
        plt.tight_layout()

Now use the function to plot the training curves.

You can see here that after 50 epochs the training loss is still trending down and the training accuracy is still climbing, however, the validation curves have all but plateaued. We could probably train for more time and reach greater accuracy, but in this case it looks like the model is starting to overfit. A flattened validation accuracy curve and a climbing training curve (i.e., diverging) is a sign of overfitting. To improve the model, we probably need to experiment with the number of hidden layers and/or their thickness. We could also 'augment' that data to increase the number of training images by, e.g., flipping the images, rotating the images etc.

Now evaluate the model's performance in numbers.

An alternative way to look visualise predictions is to make a confusion matrix. This has true labels on one axis and predicted labels on another. A perfect classifier has only entries in the diagonal meaning that all images were correctly labelled. We can pull the code used to plot the confusion matrix from the week 8 lectorial notebook.

In [28]:
#taken from Random forests lectorial -- go through in own time to understand.
from sklearn.metrics import confusion_matrix
import itertools

def plot_confusion_matrix(cm, classes,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.Oranges):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    Source: http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html
    """
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
    else:
        print('Confusion matrix, without normalization')

    print(cm)

    plt.figure(figsize = (10, 10))
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title, size = 24)
    plt.colorbar(aspect=4)
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45, size = 14)
    plt.yticks(tick_marks, classes, size = 14)

    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    
    # Labeling the plot
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, format(cm[i, j], fmt), fontsize = 20,
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")
        
#    plt.grid(None)
#    plt.tight_layout()
    plt.ylabel('True label', size = 18)
    plt.xlabel('Predicted label', size = 18)

# Using the model to make predictions

We can use the models to make a prediction for a new image. We need to load and pre-process the image. Note that the exact same steps need to be performed on the input image as we did for our training data.