# Fuel Library for Data

Notebook #1 explored the data that we were dealing with. This notebook utilizes the [Fuel library](https://github.com/mila-udem/fuel), which wraps data for machine learning pipelines, and the [lfw_fuel library](https://github.com/dribnet/lfw_fuel), which extends the Fuel library to the LFW dataset.

This enables us to load image data and convert it into X and Y training/testing vectors in one call, like this:

```
(X_train, y_train), (X_test, y_test) = lfw.load_data()
(X_train, y_train), (X_test, y_test) = lfw.load_data("funneled")
(X_train, y_train), (X_test, y_test) = lfw.load_data("deepfunneled")
```

In [4]:
%matplotlib inline

In [30]:
import numpy as np
np.random.seed(1337)  # for reproducibility

from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.constraints import maxnorm
from keras.optimizers import SGD
from keras.utils import np_utils
from scipy.misc import imresize 

from lfw_fuel import lfw

In [6]:
batch_size = 128
nb_classes = 2
nb_epoch = 12
feature_width = 32
feature_height = 32

In [7]:
def cropImage(im):
    im2 = np.dstack(im).astype(np.uint8)
    # return centered 128x128 from original 250x250 (40% of area)
    newim = im2[61:189, 61:189]
    sized1 = imresize(newim[:,:,0:3], (feature_width, feature_height), interp="bicubic", mode="RGB")
    sized2 = imresize(newim[:,:,3:6], (feature_width, feature_height), interp="bicubic", mode="RGB")
    return np.asarray([sized1[:,:,0], sized1[:,:,1], sized1[:,:,2], sized2[:,:,0], sized2[:,:,1], sized2[:,:,2]])

a=0

In [8]:
# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = lfw.load_data("deepfunneled")

In [9]:
# Crop features
X_train = np.asarray(list(map(cropImage, X_train)))
X_test = np.asarray(list(map(cropImage, X_test)))

In [10]:
# print shape of data
print("{1} train samples, {2} channel{0}, {3}x{4}".format("" if X_train.shape[1] == 1 else "s", *X_train.shape))
print("{1}  test samples, {2} channel{0}, {3}x{4}".format("" if X_test.shape[1] == 1 else "s", *X_test.shape))

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

2200 train samples, 6 channels, 32x32
1000  test samples, 6 channels, 32x32


In [156]:
# In general, it is a good idea to use normalized data.
# We compute these for now, but don't use them below. Save for later.
X_train_float = X_train.astype('float32')
X_train_float /= 255

X_test_float = X_test.astype('float32')
X_test_float /= 255

# Convolutional Neural Network

Now that we've used the fuel library to load our data, we're ready to train our first neural network. We will use a convolutional neural network, useful for image recognition and classification tasks.

(Useful links on [convolutional neural networks](http://machinelearningmastery.com/crash-course-convolutional-neural-networks/) and [object recognition](http://machinelearningmastery.com/object-recognition-convolutional-neural-networks-keras-deep-learning-library/) from Dr. Jason Brownlee, very helpful in understanding these concepts. [Book chaper](http://neuralnetworksanddeeplearning.com/chap6.html) from Michael Nielsen also useful.)

Convolutional neural networks use three types of layers:
* Convolutional layers
* Pooling layers
* Fully connected layers

Convolutional layers are comprised of filters and feature maps. 
* Filters are the neurons of the layer, which have inputs, weights, and outputs. Input size is a fixed patch. For input convolutional layers, these patches are pixels striaght from the image. Deeper in the neural network, the patches are the outputs from prior layers.
* Feature maps are how outputs from one layer are connected to inputs at the next layer. 
* Padding is necessitated by the fact that one layer's output sie may not be cleanly divisible by the filter patch at the next layer. Zero padding can be used to keep the neural net from reading off the edge of the image.

Pooling layers down-sample a given layer.
* Pooling acts as a compression or dimensionality reduction, condensing the features learned in prior layers to the most important ones. Their input size is often much smaller than the convolutional layer they are connected to. 
* These create their own feature maps (how outputs from one layer are connected to inputs at the next layer), often using an average or maximum function.

Fully connected layers are used to combine the various extracted features.
* Fully connected layers create a non-linear combination of all incoming features.
* Activation functions at the connected layer is often a softmax or non-linear function. 
* These can be thought of as predicting the probability of a particular class or classification.

The general architecture of a convolutional neural network is:
* Convolution
* Convolution
* Pool
* Dropout
* Flatten
* Dense
* Dropout
* Dense

This can take other forms, like:
* Convolution
* Dropout
* Convolution
* Pool
* Flatten
* Dense/Fully connected
* Dropout
* Dense/Fully connected (n_classes)

Optionally, to add more convolution layers,
* Convolution (32 feature maps)
* Dropout
* Convolution (32 feature maps)
* Pool
* Convolution (64 feature maps)
* Dropout
* Convolution (64 feature maps)
* Pool
* Convolution (128 feature maps)
* Dropout
* Convolution (128 feature maps)
* Pool
* Flatten
* Dropout
* Dense/Fully connected (1024)
* Dropout
* Dense/Fully connected (512)
* Dropout
* Dense/Fully connected (n_classes)

In Keras, it takes a little bit of effort to understand how to get all of the layers to line up. The comments in the neural network constructed below should explain what's happening at each step.

In [77]:
print(np.shape(X_train_float))

(2200, 6, 32, 32)


In [122]:
modelA = Sequential()

# Convolutional input layer, 
# 32 feature maps with a size of 3×3,
# and a weight constraint of max norm set to 3.
# Image sizes are (6, 32, 32) - 6 color channels, 32 x 32 pixels
modelA.add(Conv2D(32, (3, 3), input_shape=(6, 32, 32),
                 padding='valid', activation='relu'))

# Convolutional layer, 
# 32 feature maps with a size of 3×3, 
# a rectifier activation function 
# and a weight constraint of max norm set to 3.
modelA.add(Conv2D(32, (3, 3), input_shape=(6, 32, 32),
                  padding='valid', activation='relu'))

# Max Pool layer with size 2×2.
modelA.add(MaxPooling2D(pool_size=(2, 2)))

# Dropout set to 20%
modelA.add(Dropout(0.2))

# Flatten layer.
modelA.add(Flatten())

# Fully connected layer with 128 units and a rectifier activation function.
modelA.add(Dense(128, activation='relu'))

# Dropout set to 50%.
modelA.add(Dropout(0.5))

# Fully connected output layer with 2 units (Y/N) 
# and a softmax activation function.
modelA.add(Dense(2,activation='softmax'))

print(modelA.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_78 (Conv2D)           (None, 4, 30, 32)         9248      
_________________________________________________________________
conv2d_79 (Conv2D)           (None, 2, 28, 32)         9248      
_________________________________________________________________
max_pooling2d_38 (MaxPooling (None, 1, 14, 32)         0         
_________________________________________________________________
dropout_75 (Dropout)         (None, 1, 14, 32)         0         
_________________________________________________________________
flatten_38 (Flatten)         (None, 448)               0         
_________________________________________________________________
dense_74 (Dense)             (None, 128)               57472     
_________________________________________________________________
dropout_76 (Dropout)         (None, 128)               0         
__________

In [129]:
# Compile model:
# A logarithmic loss function is used with 
# stochastic gradient descent optimization algorithm
# configured with a large momentum and weight decay 
# starting with a learning rate of 0.01.
epochs = 12
lrate = 0.1
decay = lrate/epochs
batch_size = 32

sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=False)
modelA.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['binary_accuracy'])

In [130]:
modelA.fit(X_train, Y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(X_test, Y_test))

Train on 2200 samples, validate on 1000 samples
Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x12672eac8>

In [151]:
# lower-case y_test is not encoded, CAPITAL Y_test is hot-encoded
# use CAPITAL-Y_test!!
score = modelA.evaluate(X_test, Y_test, verbose=0)

print("-"*40)
print("Model A (12 epochs):")
print('Test accuracy: {0:%}'.format(score[1]))

----------------------------------------
Model A (12 epochs):
Test accuracy: 50.200000%


# What A Terrible Model

It is a bit of a disappointment to get such an awful result from our neural network, but remember, this was just our first pass. Some other things we can work on:
* Use normalized data
* Add additional convolutional layers to the neural network
* Train for more epochs
* Use larger neural network layers (more neurons per network)

Let's run Model B, which is the same as Model A, but for 120 epochs:

In [147]:
modelB = Sequential()

# Convolutional input layer, 
# 32 feature maps with a size of 3×3,
# and a weight constraint of max norm set to 3.
# Image sizes are (6, 32, 32) - 6 color channels, 32 x 32 pixels
modelB.add(Conv2D(32, (3, 3), input_shape=(6, 32, 32),
                 padding='valid', activation='relu'))

# Convolutional layer, 
# 32 feature maps with a size of 3×3, 
# a rectifier activation function 
# and a weight constraint of max norm set to 3.
modelB.add(Conv2D(32, (3, 3), input_shape=(6, 32, 32),
                  padding='valid', activation='relu'))

# Max Pool layer with size 2×2.
modelB.add(MaxPooling2D(pool_size=(2, 2)))

# Dropout set to 20%
modelB.add(Dropout(0.2))

# Flatten layer.
modelB.add(Flatten())

# Fully connected layer with 128 units and a rectifier activation function.
modelB.add(Dense(128, activation='relu'))

# Dropout set to 50%.
modelB.add(Dropout(0.5))

# Fully connected output layer with 2 units (Y/N) 
# and a softmax activation function.
modelB.add(Dense(2,activation='softmax'))

# Compile model:
# A logarithmic loss function is used with 
# stochastic gradient descent optimization algorithm
# configured with a large momentum and weight decay 
# starting with a learning rate of 0.01.
epochs = 120
lrate = 0.1
decay = lrate/epochs
batch_size = 32

sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=False)
modelB.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['binary_accuracy'])

print(modelB.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_82 (Conv2D)           (None, 4, 30, 32)         9248      
_________________________________________________________________
conv2d_83 (Conv2D)           (None, 2, 28, 32)         9248      
_________________________________________________________________
max_pooling2d_40 (MaxPooling (None, 1, 14, 32)         0         
_________________________________________________________________
dropout_79 (Dropout)         (None, 1, 14, 32)         0         
_________________________________________________________________
flatten_40 (Flatten)         (None, 448)               0         
_________________________________________________________________
dense_78 (Dense)             (None, 128)               57472     
_________________________________________________________________
dropout_80 (Dropout)         (None, 128)               0         
__________

In [153]:
modelB.fit(X_train, Y_train, batch_size=batch_size, epochs=epochs, verbose=0, validation_data=(X_test, Y_test))

<keras.callbacks.History at 0x12b26bb38>

In [154]:
# lower-case y_test is not encoded, CAPITAL Y_test is hot-encoded
# use CAPITAL-Y_test!!
score = modelB.evaluate(X_test, Y_test, verbose=0)

print("-"*40)
print("Model B (120 epochs):")
print('Test accuracy: {0:%}'.format(score[1]))

----------------------------------------
Model B (120 epochs):
Test accuracy: 51.000000%


Yikes! That is still a pretty bad result. In the next notebook we'll work on building out the neural network and try to improve on these results.