# Fuel Library for Data

Notebook #1 explored the data that we were dealing with. This notebook utilizes the [Fuel library](https://github.com/mila-udem/fuel), which wraps data for machine learning pipelines, and the [lfw_fuel library](https://github.com/dribnet/lfw_fuel), which extends the Fuel library to the LFW dataset.

This enables us to load image data and convert it into X and Y training/testing vectors in one call, like this:

```
(X_train, y_train), (X_test, y_test) = lfw.load_data()
(X_train, y_train), (X_test, y_test) = lfw.load_data("funneled")
(X_train, y_train), (X_test, y_test) = lfw.load_data("deepfunneled")
```

In [102]:
%matplotlib inline

In [122]:
import numpy as np
np.random.seed(1337)  # for reproducibility

from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.utils import np_utils
from scipy.misc import imresize 

from lfw_fuel import lfw

In [124]:
batch_size = 128
nb_classes = 2
nb_epoch = 12
feature_width = 32
feature_height = 32

In [125]:
def cropImage(im):
    im2 = np.dstack(im).astype(np.uint8)
    # return centered 128x128 from original 250x250 (40% of area)
    newim = im2[61:189, 61:189]
    sized1 = imresize(newim[:,:,0:3], (feature_width, feature_height), interp="bicubic", mode="RGB")
    sized2 = imresize(newim[:,:,3:6], (feature_width, feature_height), interp="bicubic", mode="RGB")
    return np.asarray([sized1[:,:,0], sized1[:,:,1], sized1[:,:,2], sized2[:,:,0], sized2[:,:,1], sized2[:,:,2]])


In [126]:
# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = lfw.load_data("deepfunneled")


In [127]:
# Crop features
X_train = np.asarray(list(map(cropImage, X_train)))
X_test = np.asarray(list(map(cropImage, X_test)))


In [128]:
# print shape of data while model is building
print("{1} train samples, {2} channel{0}, {3}x{4}".format("" if X_train.shape[1] == 1 else "s", *X_train.shape))
print("{1}  test samples, {2} channel{0}, {3}x{4}".format("" if X_test.shape[1] == 1 else "s", *X_test.shape))

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

2200 train samples, 6 channels, 32x32
1000  test samples, 6 channels, 32x32


In [129]:
# Convolutional architecture
# ~conv, conv, pool, drop, flatten, dense, drop, dense~

In [130]:
model = Sequential()

model.add(Conv2D(32, (3,3), input_shape=(6,32,32), padding='valid'))
model.add(Activation('relu'))
model.add(Conv2D(32, (3,3), input_shape=(6,32,32), padding='valid'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))

model.add(Dense(nb_classes))
model.add(Activation('softmax'))

In [131]:
model.compile(loss='categorical_crossentropy', metrics=['binary_accuracy'], optimizer='adadelta')

In [132]:
model.fit(X_train, Y_train, batch_size=batch_size, epochs=nb_epoch, verbose=1, validation_data=(X_test, Y_test))

Train on 2200 samples, validate on 1000 samples
Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x126e93da0>

In [133]:
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])

Test score: 0.694366986752
Test accuracy: 0.49
