In [1]:
%run ../convention.ipynb



# Pretraining on an auxiliary task

In this exercise you will build a DNN that compares two MNIST digit images and predicts whether they represent the same digit or not. Then you will reuse the lower layers of this network to train an MNIST classifier using very little training data.

<p class = 'note'>Exercise: Start by building two DNNs (let's call them DNN A and B), both similar to the one you built earlier but without the output layer: each DNN should have five hidden layers of 100 neurons each, He initialization, and ELU activation. Next, add one more hidden layer with 10 units on top of both DNNs. You should use the keras.layers.concatenate() function to concatenate the outputs of both DNNs, then feed the result to the hidden layer. Finally, add an output layer with a single neuron using the logistic activation function.
</p>

In [32]:
from keras.models import Model, load_model, Sequential
from keras.layers import Dense, Flatten, Input, concatenate

In [49]:
def generate_model(input_layer):
    layer = Flatten()(input_layer)
    for i in range(5):
        layer = Dense(100, activation = 'elu', kernel_initializer='he_normal')(layer)
    return layer
input_1 = Input(shape = (28,28))
input_2 = Input(shape = (28,28))
DNN1 = generate_model(input_1)
DNN2 = generate_model(input_2)
concat = concatenate([DNN1, DNN2])
hidden = Dense(10, activation='elu',kernel_initializer='he_normal')(concat)
output = Dense(1, activation='sigmoid')(hidden)
model = Model(inputs = [input_1, input_2], outputs = output)
model.summary()

Model: "model_4"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_9 (InputLayer)            (None, 28, 28)       0                                            
__________________________________________________________________________________________________
input_10 (InputLayer)           (None, 28, 28)       0                                            
__________________________________________________________________________________________________
flatten_9 (Flatten)             (None, 784)          0           input_9[0][0]                    
__________________________________________________________________________________________________
flatten_10 (Flatten)            (None, 784)          0           input_10[0][0]                   
____________________________________________________________________________________________

<p class = 'note'>
Exercise: split the MNIST training set in two sets: split #1 should containing 55,000 images, and split #2 should contain contain 5,000 images. Create a function that generates a training batch where each instance is a pair of MNIST images picked from split #1. Half of the training instances should be pairs of images that belong to the same class, while the other half should be images from different classes. For each pair, the training label should be 0 if the images are from the same class, or 1 if they are from different classes.
</p>

In [4]:
from keras.datasets.mnist import load_data
(X_train, y_train), (X_test, y_test) = load_data()

In [20]:
X_split_1, X_split_2 = X_train[5000:] , X_train[:5000] 
y_split_1, y_split_2 = y_train[5000:], y_train[:5000]
X_split_1.dtype

dtype('uint8')

In [39]:
data = []
for num in range(10):
    is_num = X_split_1[y_split_1 == num]
    not_num = X_split_1[y_split_1 != num]
    for _ in range(500):
        i, j, p = is_num[np.random.choice(len(is_num), size = 3)]
        q = not_num[np.random.choice(len(not_num))]
        data.append([i, j, 1])
        data.append([p, q, 0])
np.random.shuffle(data)
X1, X2, y = np.array([arr[0] for arr in data]), np.array([arr[1] for arr in data]), np.array([arr[2] for arr in data])
        

<p class = 'note'>
Exercise: train the DNN on this training set. For each image pair, you can simultaneously feed the first image to DNN A and the second image to DNN B. The whole network will gradually learn to tell whether two images belong to the same class or not.
</p>

In [50]:
from keras.callbacks import ModelCheckpoint, EarlyStopping
checkpoint_cb = ModelCheckpoint('mnist_binary.h5', save_best_only=True)
early_cb = EarlyStopping(patience=10, restore_best_weights=True)

model.compile('rmsprop', 'binary_crossentropy', ['accuracy'])

In [51]:
history = model.fit([X1, X2], y, callbacks = [checkpoint_cb, early_cb], epochs = 30)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


In [17]:
np.bincount(y)

array([5000, 5000], dtype=int64)