# Convolutional Neural Networks for Image Regression

This third notebook aims at getting a step further by using tools more dedicated to computer vision problems: namely convolutional neural networks.

**Objectives**

1. Import image data
2. Prepare data for feeding a convolutional neural network
3. Build, train and evaluate a convolutional neural network
4. Submit results for ranking

**Note**: Part of the code and developments of ideas in this notebook has been strongly inspired from Chollet's *Deep Learning with Python* and Lakshmanan, Görner and Gillard *Practical Machine Learning for Computer Vision: End-to-End Machine Learning for Images*. These are two must read if you're into machine learning, deep learning, and the beauty and simplicity of engineering science.

In [None]:
import pandas as pd
import numpy as np
import tensorflow as tf
import matplotlib.pylab as plt

import os
import PIL

pd.options.mode.chained_assignment = None

## Enable GPU

In [None]:
print('TensorFlow version: {}'.format(tf.__version__))
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
    print('GPU device not found - On for CPU time!')
else:
    print('Found GPU at {}'.format(device_name))

## ETL - Load data and prepare it for feeding a convolutional neural network

### Handling image data for convolutional neural networks

Input data are manipulated as tensors, the basic data structures of TensorFlow. Images generally have 3 dimensions: height, width, and number of colour channels. An image dataset is most of the time represented as a rank-4 tensor (or 4D tensor) of shape `(samples, height, width, channels)`. For example, a batch of 32 colour images of size 150 x 150 pixels can be stored in the rank-4 tensor `(32, 150, 150, 3)`.

Our data consists of coloured pets images of different sizes. A convolutional neural network will accept only tensors of fixed size though. Conventionally, we resize the images to the size of the smallest image. We'll do a quick analysis of the images attributes we've got here then.

In [None]:
#path = '../input/petfinder-pawpularity-score/train/'
#training_img = os.listdir(path) # list all training images names
#print('There are {} images in the training directory'.format(len(training_img)))

#img_sz = {'width': list(),
#          'height': list()} # store image attributes for further analysis
#width, height = 1000, 1000

#for im in training_img:
#    img = PIL.Image.open(path+im)
#    w, h = img.size
#    if w < width:
#        width = w
#    if h < height:
#        height = h

#IMG_WIDTH = width
#IMG_HEIGHT = height
#IMG_CHANNELS = 3

#print('Min training image width: {} px'.format(IMG_WIDTH))
#print('Min training image height: {} px'.format(IMG_HEIGHT))

Now we've got the size of our smallest image. Let's display some images at random and define some data handlers.

In [None]:
def read_and_decode(filename, reshape_dims):
    # Read an image file to a tensor as a sequence of bytes
    image = tf.io.read_file(filename)
    # Convert the tensor to a 3D uint8 tensor
    image = tf.image.decode_jpeg(image, channels=IMG_CHANNELS)
    # Convert 3D uint8 tensor with values in [0, 1]
    image = tf.image.convert_image_dtype(image, tf.float32)
    # Resize the image to the desired size
    return tf.image.resize(image, reshape_dims)

def show_image(filename):
    image = read_and_decode(filename, [IMG_HEIGHT, IMG_WIDTH])
    plt.imshow(image.numpy());
    plt.axis('off');
    
def decode_csv(csv_row):
    record_defaults = ['Id', 'Pawpularity']
    filename, pawpularity = tf.io.decode_csv(csv_row, record_defaults)
    pawpularity = tf.convert_to_tensor(np.float(pawpularity), dtype=tf.float32)
    image = read_and_decode(filename, [IMG_HEIGHT, IMG_WIDTH])
    return image, pawpularity

### Build the training dataset

Deep learning models are still machine learning models. Thus we have to build rigourously and precisely training, evaluation and test sets in order to get things done. One simple thing that can be addressed is the conservation of data distribution between the sets. This can ben accomplished by selecting an appropriate way of sampling the data we've got. Instead of doing a random sampling, we're doing a stradified sampling, ensuring the variable to predict (`Pawpularity`) is equally distributed in all the sets.

In [None]:
from sklearn.model_selection import StratifiedShuffleSplit

data_path = '../input/petfinder-pawpularity-score/'
data = pd.read_csv(data_path+'train.csv')

# Use stratified sampling
sssplit = StratifiedShuffleSplit(n_splits=1, test_size=0.2)
for train_index, test_index in sssplit.split(data, data['Pawpularity']):
    training_set = data.iloc[train_index]
    eval_set = data.iloc[test_index]
    
# Visually check distribution of pawpularity score in training and test sets
training_set['Pawpularity'].hist(label='Training set')
eval_set['Pawpularity'].hist(label='Eval set')
plt.title('Pawpularity score distribution in training and test set')
plt.xlabel('Pawpularity score')
plt.ylabel('Count')
plt.legend(loc='upper right')
plt.show()

# Export training and test sets as .csv files
training_set['Id'] = training_set['Id'].apply(lambda x: '../input/petfinder-pawpularity-score/train/'+x+'.jpg')
training_set[['Id', 'Pawpularity']].to_csv('/kaggle/working/training_set.csv', header=False, index=False)
eval_set['Id'] = eval_set['Id'].apply(lambda x: '../input/petfinder-pawpularity-score/train/'+x+'.jpg')
eval_set[['Id', 'Pawpularity']].to_csv('/kaggle/working/eval_set.csv', header=False, index=False)

## Convolutional Neural Network using Keras

### Convolutional Neural Networks (CNN)

Convolutional neural networks are pretty useful when it comes to image-related tasks (image recognition, image classification, image regression, video analysis, etc.). They're named *convolutional* because at least one of their building layers use convolution instead of general matrix multiplication.

A CNN is fed with data in the form of a tensor of shape `(samples, height, width, channels)`. Data generally goes through 2 different kinds of layers within a CNN:

* `Conv2D`: convolution layers learn local patterns by sliding small 2D windows over the image inputs, instead of learning general patterns from the whole input, as would dense layers do
* `MaxPooling2D`: max pooling is the operation of extracting windows from the input feature maps and outputting the max value of each channel. It is quite similar to convolution, and allows to extract information from parts of the input instead of using it as a whole



### Import training and evaluation datasets

In [None]:
IMG_WIDTH = 256
IMG_HEIGHT = 256
IMG_CHANNELS = 3

path = '../input/petfinder-pawpularity-score/train/'
training_img = os.listdir(path)
rand_idx = np.random.randint(0, len(training_img)-1)
rand_img = training_img[rand_idx]

show_image(path+rand_img)

In [None]:
BATCH_SIZE = 256

train_dataset = tf.data.TextLineDataset(
    '/kaggle/working/training_set.csv'
).map(decode_csv).batch(BATCH_SIZE)

eval_dataset = tf.data.TextLineDataset(
    '/kaggle/working/eval_set.csv'
).map(decode_csv).batch(BATCH_SIZE)

### Build the CNN

#TODO rewrite model description

Our first convolutional neural network will stack:
* a first convolution layer `tf.keras.layers.Conv2D` with 64 filters of size 3 * 3 and a `ReLU` activation function
* a first max pooling layer `tf.keras.layers.MaxPooling2D` with filters of size 2 * 2
* a second convolution layer `tf.keras.layers.Conv2D` with 32 filters of size 3 * 3 and a `ReLU` activation function
* a second max pooling layer `tf.keras.layers.MaxPooling2D` with filters of size 2 * 2
* a third convolution layer `tf.keras.layers.Conv2D` with filters of size 3 * 3 and a `ReLU` activation function
* an input layer for the regressor `tf.keras.layers.Flatten` 
* a dense layer `tf.keras.Dense` with 32 units (number of filters of the last convolution layer) and a `ReLU` activation function
* an output dense layer `tf.keras.Dense` with 1 unit (since we're doing regression we're outputing a single value) and no activation function

In [None]:
# Build model
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu', input_shape=(IMG_HEIGHT, IMG_WIDTH, IMG_CHANNELS)),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(units=1, activation=None)
])

The network is built, let's look at some of its characteristics and plot it. Having a visual understanding might help sometimes

In [None]:
model.summary()

In [None]:
tf.keras.utils.plot_model(model, show_shapes=True, show_layer_names=False)

Well our model has 258,513 trainable parameters. We can expect a quite long training !

### Compile the CNN

Compilation makes the newtork ready for training. We'll specify 3 necessary things, that will guide the training and the behaviour of the model itself, as well as be the conceptual anchor points to which we can stick when thinking about optimising models and improving performance: 
* A **loss function** indicates how the model is behaving while dealing with training data. It's a direct measure of the distance between what the model is producing, and what it should be. Beware of aiming at getting a perfect behaviour of the model on the training data! You'd fall into the overfitting trap: you're model would perfectly learn the training data by heart, but would be unable to generalise predictions to new and previously unseen data
* An **optimiser** consists of the actual math behind which the model updates its parameters according to the loss function results, to try to get the best training performance. Generally speaking, optimisation is based on gradient methods, that compile derivatives of the loss with regards to the parameters of the models, and different variants exist. We'll select the Adam optimiser, a well-known and well-suited optimiser for computer vision problems.
* A **performance metric** gives us, the computer scientists, a real figure of how things are going once the model has finished its work. Selecting a performance metric mostly depends on the problem at hand. Here we're performing regression, we'll then look at the actual distance between the predictions made by the model and the truth (roughly). For the sake of interpretability, we'll select Root Mean Squared Error (RMSE) as the performance metric, which is expressed in the same units as the variable we want to predict

In [None]:
model.compile(optimizer='adam',
              loss=tf.keras.losses.MeanSquaredError(),
              metrics=[tf.keras.metrics.RootMeanSquaredError()])

### Train the CNN and display results

When training models, the initial dataset is not used as a whole. Instead, models use **mini-batches** of a fixed quantity of samples (defined by `BATCH_SIZE`) for computing loss and optimising weights. Chaining this elementary training steps again and again makes the model eventually see the whole training data, which ends what is called an **epoch**. Here our model will start iterating over training data in mini-batches of 128 samples, 5 times over.

In [None]:
%%time

history = model.fit(train_dataset, validation_data=eval_dataset, epochs=10, batch_size=BATCH_SIZE)

In [None]:
def training_plot(metrics, history):
    f, ax = plt.subplots(1, len(metrics), figsize=(5*len(metrics), 5))
    for idx, metric in enumerate(metrics):
        ax[idx].plot(history.history[metric], ls='dashed')
        ax[idx].set_xlabel('Epochs')
        ax[idx].set_ylabel(metric)
        ax[idx].plot(history.history['val_'+metric]);
        ax[idx].legend(['train_'+metric, 'val_'+metric])

In [None]:
training_plot(['loss', 'root_mean_squared_error'], history)

This first attempt wasn't very good. The final RMSE oscillates around 20, which, as stated before, is expressed in the same unit as the target variable, i.e. Pawpularity. That's quite a large error, we'll see later if we can improve it.

A note on these plots: the first two epochs went pretty well, with a validation error lower than the training error. But from the 3rd epoch onwards, we can observe a phenomenon called overfitting: the validation error is larger than the training error. This characterises a situation where the model has learned so well the training data that it has forgotten what its real task was: generalising!

## Compute predictions and build submission process

In [None]:
sample_submission = pd.read_csv('../input/petfinder-pawpularity-score/sample_submission.csv')
sample_submission['Id'] = sample_submission['Id'].apply(lambda x: '../input/petfinder-pawpularity-score/test/'+x+'.jpg')
sample_submission.to_csv('/kaggle/working/sample_submission.csv', index=False, header=False)
sample_submission = tf.data.TextLineDataset(
    './sample_submission.csv'
).map(decode_csv).batch(BATCH_SIZE)

# Make predictions with our model
sample_prediction = model.predict(sample_submission)

# Format predictions to output for submission
submission_output = pd.concat(
    [pd.read_csv('../input/petfinder-pawpularity-score/sample_submission.csv').drop('Pawpularity', axis=1),
    pd.DataFrame(sample_prediction)],
    axis=1
)
submission_output.columns = [['Id', 'Pawpularity']]

# Output submission file to csv
submission_output.to_csv('submission.csv', index=False)