# Fuel Library for Data

Notebook #1 explored the data that we were dealing with. This notebook utilizes the [Fuel library](https://github.com/mila-udem/fuel), which wraps data for machine learning pipelines, and the [lfw_fuel library](https://github.com/dribnet/lfw_fuel), which extends the Fuel library to the LFW dataset.

This enables us to load image data and convert it into X and Y training/testing vectors in one call, like this:

```
(X_train, y_train), (X_test, y_test) = lfw.load_data()
(X_train, y_train), (X_test, y_test) = lfw.load_data("funneled")
(X_train, y_train), (X_test, y_test) = lfw.load_data("deepfunneled")
```

In [1]:
%matplotlib inline

In [2]:
import numpy as np
np.random.seed(1337)  # for reproducibility

from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.convolutional import Conv2D, MaxPooling2D, MaxPooling3D
from keras.constraints import maxnorm
from keras.optimizers import SGD
from keras.utils import np_utils
from scipy.misc import imresize 

Using TensorFlow backend.


In [3]:
from lfw_fuel import lfw

In [4]:
batch_size = 128
nb_classes = 1
nb_epoch = 12
feature_width = 32
feature_height = 32

In [5]:
def cropImage(im):
    im2 = np.dstack(im).astype(np.uint8)
    # return centered 128x128 from original 250x250 (40% of area)
    newim = im2[61:189, 61:189]
    sized1 = imresize(newim[:,:,0:3], (feature_width, feature_height), interp="bicubic", mode="RGB")
    sized2 = imresize(newim[:,:,3:6], (feature_width, feature_height), interp="bicubic", mode="RGB")
    return np.asarray([sized1[:,:,0], sized1[:,:,1], sized1[:,:,2], sized2[:,:,0], sized2[:,:,1], sized2[:,:,2]])

a=0

In [6]:
# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = lfw.load_data("deepfunneled")

In [7]:
# Crop features
X_train = np.asarray(list(map(cropImage, X_train)))
X_test = np.asarray(list(map(cropImage, X_test)))

In [8]:
# print shape of data
print("{1} train samples, {2} channel{0}, {3}x{4}".format("" if X_train.shape[1] == 1 else "s", *X_train.shape))
print("{1}  test samples, {2} channel{0}, {3}x{4}".format("" if X_test.shape[1] == 1 else "s", *X_test.shape))

2200 train samples, 6 channels, 32x32
1000  test samples, 6 channels, 32x32


In [9]:
# convert class vectors to binary class matrices
# (only use to_categorical if nb_classes > 1)
Y_train = y_train #np_utils.to_categorical(y_train, nb_classes)
Y_test  = y_test  #np_utils.to_categorical(y_test, nb_classes)
print(Y_train.shape)

(2200, 1)


In [10]:
print("{0} train predictions, {1}-dimension".format(*Y_train.shape))
print("{0}  test predictions, {1}-dimension".format(*Y_test.shape))

2200 train predictions, 1-dimension
1000  test predictions, 1-dimension


In [11]:
# In general, it is a good idea to use normalized data.
# We compute these for now, but don't use them below. Save for later.
X_train_float = X_train.astype('float32')
X_train_float /= 255

X_test_float = X_test.astype('float32')
X_test_float /= 255

# Convolutional Neural Network

Now that we've used the fuel library to load our data, we're ready to train our first neural network. We will use a convolutional neural network, useful for image recognition and classification tasks.

(Useful links on [convolutional neural networks](http://machinelearningmastery.com/crash-course-convolutional-neural-networks/) and [object recognition](http://machinelearningmastery.com/object-recognition-convolutional-neural-networks-keras-deep-learning-library/) from Dr. Jason Brownlee, very helpful in understanding these concepts. [Book chaper](http://neuralnetworksanddeeplearning.com/chap6.html) from Michael Nielsen also useful.)

Convolutional neural networks use three types of layers:
* Convolutional layers
* Pooling layers
* Fully connected layers

Convolutional layers are comprised of filters and feature maps. 
* Filters are the neurons of the layer, which have inputs, weights, and outputs. Input size is a fixed patch. For input convolutional layers, these patches are pixels striaght from the image. Deeper in the neural network, the patches are the outputs from prior layers.
* Feature maps are how outputs from one layer are connected to inputs at the next layer. 
* Padding is necessitated by the fact that one layer's output sie may not be cleanly divisible by the filter patch at the next layer. Zero padding can be used to keep the neural net from reading off the edge of the image.

Pooling layers down-sample a given layer.
* Pooling acts as a compression or dimensionality reduction, condensing the features learned in prior layers to the most important ones. Their input size is often much smaller than the convolutional layer they are connected to. 
* These create their own feature maps (how outputs from one layer are connected to inputs at the next layer), often using an average or maximum function.

Fully connected layers are used to combine the various extracted features.
* Fully connected layers create a non-linear combination of all incoming features.
* Activation functions at the connected layer is often a softmax or non-linear function. 
* These can be thought of as predicting the probability of a particular class or classification.

The general architecture of a convolutional neural network is:
* Convolution
* Convolution
* Pool
* Dropout
* Flatten
* Dense
* Dropout
* Dense

This can take other forms, like:
* Convolution
* Dropout
* Convolution
* Pool
* Flatten
* Dense/Fully connected
* Dropout
* Dense/Fully connected (n_classes)

Optionally, to add more convolution layers,
* Convolution (32 feature maps)
* Dropout
* Convolution (32 feature maps)
* Pool
* Convolution (64 feature maps)
* Dropout
* Convolution (64 feature maps)
* Pool
* Convolution (128 feature maps)
* Dropout
* Convolution (128 feature maps)
* Pool
* Flatten
* Dropout
* Dense/Fully connected (1024)
* Dropout
* Dense/Fully connected (512)
* Dropout
* Dense/Fully connected (n_classes)

In Keras, it takes a little bit of effort to understand how to get all of the layers to line up. The comments in the neural network constructed below should explain what's happening at each step.

In [12]:
print(np.shape(X_train_float))
#
# 2200 = number of training images
# 6 = 3 color channels x 2 images
# 32 x 32 = image w x h

(2200, 6, 32, 32)


In [13]:
modelA = Sequential()

# Convolutional input layer:
# - 32 feature maps
# - each feature map has size of 3 x 3
# - (not sure how that math works out)
# - "weight constraint of max norm set to 3" <-- ???
# - Image size: (6, 32, 32)
# - 2 images, 3 channels each, 32 x 32 pixels
modelA.add(Conv2D(32, 
                  (3, 3), 
                  input_shape=(6, 32, 32),
                  padding='same', 
                  activation='relu'))

modelA.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 6, 32, 32)         9248      
Total params: 9,248
Trainable params: 9,248
Non-trainable params: 0
_________________________________________________________________


In [14]:
# Second convolutional layer, 
# 32 feature maps with a size of 3×3, 
# a rectifier activation function 
# and a weight constraint of max norm set to 3.
modelA.add(Conv2D(32, (3, 3), 
                  input_shape=(6, 32, 32),
                  padding='same', 
                  activation='relu'))

modelA.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 6, 32, 32)         9248      
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 6, 32, 32)         9248      
Total params: 18,496
Trainable params: 18,496
Non-trainable params: 0
_________________________________________________________________


In [15]:
# Max Pool layer with size 2×2.
modelA.add(MaxPooling2D(pool_size=(4, 4),
                       data_format='channels_first'))

modelA.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 6, 32, 32)         9248      
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 6, 32, 32)         9248      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 8, 8)           0         
Total params: 18,496
Trainable params: 18,496
Non-trainable params: 0
_________________________________________________________________


In [16]:
# Dropout set to 20%
modelA.add(Dropout(0.2))

modelA.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 6, 32, 32)         9248      
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 6, 32, 32)         9248      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 8, 8)           0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 6, 8, 8)           0         
Total params: 18,496
Trainable params: 18,496
Non-trainable params: 0
_________________________________________________________________


In [17]:
# Flatten layer.
modelA.add(Flatten())

modelA.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 6, 32, 32)         9248      
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 6, 32, 32)         9248      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 8, 8)           0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 6, 8, 8)           0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 384)               0         
Total params: 18,496
Trainable params: 18,496
Non-trainable params: 0
_________________________________________________________________


In [18]:
# Fully connected layer with 128 units and a rectifier activation function.
modelA.add(Dense(128, activation='relu'))

modelA.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 6, 32, 32)         9248      
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 6, 32, 32)         9248      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 8, 8)           0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 6, 8, 8)           0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 384)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               49280     
Total params: 67,776
Trainable params: 67,776
Non-trainable params: 0
_________________________________________________________________


In [19]:
# Dropout set to 50%.
modelA.add(Dropout(0.5))

modelA.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 6, 32, 32)         9248      
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 6, 32, 32)         9248      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 8, 8)           0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 6, 8, 8)           0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 384)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               49280     
_________________________________________________________________
dropout_2 (Dropout)          (None, 128)               0         
Total para

In [20]:
# Fully connected output layer with 2 units (Y/N) 
# and a softmax activation function.
modelA.add(Dense(1, activation='sigmoid'))

print(modelA.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 6, 32, 32)         9248      
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 6, 32, 32)         9248      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 8, 8)           0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 6, 8, 8)           0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 384)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               49280     
_________________________________________________________________
dropout_2 (Dropout)          (None, 128)               0         
__________

In [21]:
# All at once:
modelA = Sequential()

modelA.add(Conv2D(32, 
                  (3, 3), 
                  input_shape=(6, 32, 32),
                  padding='same', 
                  activation='relu'))

modelA.add(Conv2D(32, (3, 3), 
                  input_shape=(6, 32, 32),
                  padding='same', 
                  activation='relu'))

modelA.add(MaxPooling2D(pool_size=(4, 4),
                       data_format='channels_first'))

modelA.add(Dropout(0.2))
modelA.add(Flatten())
modelA.add(Dense(128, activation='relu'))
modelA.add(Dropout(0.5))
modelA.add(Dense(1, activation='sigmoid'))

In [23]:
# Compile model:
# A logarithmic loss function is used with 
# stochastic gradient descent optimization algorithm
# configured with a large momentum and weight decay 
# starting with a learning rate of 0.01.
epochs = 12
lrate = 0.1
decay = lrate/epochs
batch_size = 128

modelA.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['binary_accuracy'])

In [24]:
print(np.shape(X_test))
print(np.shape(Y_test))
print(np.shape(X_train))
print(np.shape(Y_train))

(1000, 6, 32, 32)
(1000, 1)
(2200, 6, 32, 32)
(2200, 1)


If we were trying to predict "Are these faces the same?", the output vectors (Y_test and Y_train) should be vectors. That is, they should have shape `(2200, 1)`.

In [25]:
modelA.fit(X_train, Y_train, 
           batch_size=batch_size, 
           epochs=epochs, 
           verbose=1, 
           validation_data=(X_test, Y_test))

Train on 2200 samples, validate on 1000 samples
Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x11dfc2358>

In [26]:
score = modelA.evaluate(X_test, Y_test, verbose=0)

print("-"*40)
print("Model A (12 epochs):")
print('Test accuracy: {0:%}'.format(score[1]))

----------------------------------------
Model A (12 epochs):
Test accuracy: 49.600000%


# What A Terrible Model

It is a bit of a disappointment to get such an awful result from our neural network, but remember, this was just our first pass. Some other things we can work on:
* Use normalized data
* Add additional convolutional layers to the neural network
* Train for more epochs
* Use larger neural network layers (more neurons per network)

Let's run Model B, which implements the following changes:
* Less pooling
* Normalized data
* 40 epochs instead of 12

In [31]:
modelB = Sequential()

# Convolutional input layer
modelB.add(Conv2D(32, 
                  (3, 3), 
                  input_shape=(6, 32, 32),
                  padding='same', 
                  activation='relu'))


# Convolutional layer
modelB.add(Conv2D(32, (3, 3), 
                  input_shape=(6, 32, 32),
                  padding='same', 
                  activation='relu'))

# Max Pool layer with size 2×2.
modelB.add(MaxPooling2D(pool_size=(2,2),
                       data_format='channels_first'))


# Dropout set to 20%
modelB.add(Dropout(0.2))

# Flatten layer.
modelB.add(Flatten())

# Fully connected layer with 128 units and a rectifier activation function.
modelB.add(Dense(128, activation='relu'))

# Dropout set to 50%.
modelB.add(Dropout(0.5))

# Fully connected output layer with 2 units (Y/N) 
# and a softmax activation function.
modelB.add(Dense(1,activation='sigmoid'))

# Compile model:
# A logarithmic loss function is used with 
# stochastic gradient descent optimization algorithm
# configured with a large momentum and weight decay 
# starting with a learning rate of... oh, let's say 0.01
epochs = 120
lrate = 0.4
decay = lrate/epochs
batch_size = 32

#sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=False)
modelB.compile(loss='binary_crossentropy', 
               optimizer='rmsprop', 
               metrics=['binary_accuracy'])

print(modelB.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_9 (Conv2D)            (None, 6, 32, 32)         9248      
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 6, 32, 32)         9248      
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 6, 16, 16)         0         
_________________________________________________________________
dropout_9 (Dropout)          (None, 6, 16, 16)         0         
_________________________________________________________________
flatten_5 (Flatten)          (None, 1536)              0         
_________________________________________________________________
dense_9 (Dense)              (None, 128)               196736    
_________________________________________________________________
dropout_10 (Dropout)         (None, 128)               0         
__________

In [32]:
modelB.fit(X_train_float, 
           Y_train, 
           batch_size=batch_size, 
           epochs=40, 
           verbose=1, 
           validation_data=(X_test_float, Y_test))

Train on 2200 samples, validate on 1000 samples
Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40


<keras.callbacks.History at 0x12105cba8>

In [33]:
# lower-case y_test is not encoded, CAPITAL Y_test is hot-encoded
# use CAPITAL-Y_test!!
score = modelB.evaluate(X_test, Y_test, verbose=0)

print("-"*40)
print("Model B (120 epochs):")
print('Test accuracy: {0:%}'.format(score[1]))

----------------------------------------
Model B (120 epochs):
Test accuracy: 49.100000%


So our accuracy was 48.6% (bad) and our validation accuracy was 50% (stinks).