**Tensorflow Play**

Getting familiar with Tensorflow by developing a simple classifier.

In [1]:
import tensorflow as tf
from tensorflow import keras

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import sys
sys.path.append("../src")
import utils as u

**Getting the Data**

In [2]:
# Load Data from Keras
X_train, X_test, X_valid, y_train, y_test, y_valid = u.get_fashion_mnist_data()

In [3]:
booleans_train = [True if x in [0,1,2,3,4,5,6,7] else False for x in y_train]
booleans_test = [True if x in [0,1,2,3,4,5,6,7] else False for x in y_test]
booleans_valid = [True if x in [0,1,2,3,4,5,6,7] else False for x in y_valid]

In [4]:
xtrain_a = X_train[booleans_train]
xtest_a = X_test[booleans_test]
xvalid_a = X_valid[booleans_valid]

ytrain_a = y_train[booleans_train]
ytest_a = y_test[booleans_test]
yvalid_a = y_valid[booleans_valid]

**Showcase Pretrained Models**

We have to train a model for labels `8` & `9`, but we only have a limited number of labels (200). We know that someone else, has already trained a similar task namely, classifies labels `0-7`. Given that some components could be reused, we are going to use that model.


**Model A**

First, we create an initial model that only classifies labels `0-7`. The idea is to then reuse this model to train labels `8` & `9`.


In [6]:
# Start with a simple sequential model
model = keras.models.Sequential()

# Flatten the input
model.add(keras.layers.Flatten(input_shape=[28, 28]))

# Add dense layers
model.add(keras.layers.Dense(300, activation="relu"))

# Add Final Layer
model.add(keras.layers.Dense(10, activation="softmax"))

# Add Optimizer & Compile model
optimizer = keras.optimizers.SGD(lr=0.3)
model.compile(loss="sparse_categorical_crossentropy",
              optimizer=optimizer,
              metrics=["accuracy"])

checkpoint_cb = keras.callbacks.ModelCheckpoint("../models/classification_reusing_pretrained_layers_a.h5",
                                                save_best_only=True)

early_stopping_cb = keras.callbacks.EarlyStopping(patience=10,
                                                  restore_best_weights=True,
                                                  monitor='accuracy')

tensorboard_cb = keras.callbacks.TensorBoard(u.get_run_logdir())

In [7]:
%%time
model_train = model.fit(xtrain_a, ytrain_a, epochs=30, validation_data=(xvalid_a, yvalid_a),
                        callbacks=[checkpoint_cb,early_stopping_cb,tensorboard_cb])

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
Wall time: 1min 7s


**Model B - Try 1**

In [8]:
booleans_train_b_8 = [True if x == 8 else False for x in y_train]
booleans_test_b_8 = [True if x == 8 else False for x in y_test]
booleans_valid_b_8 = [True if x == 8 else False for x in y_valid]

booleans_train_b_9 = [True if x == 9 else False for x in y_train]
booleans_test_b_9 = [True if x == 9 else False for x in y_test]
booleans_valid_b_9 = [True if x == 9 else False for x in y_valid]

In [9]:
xtrain_b_8 = X_train[booleans_train_b_8]
xtest_b_8 = X_test[booleans_test_b_8]
xvalid_b_8 = X_valid[booleans_valid_b_8]

ytrain_b_8 = y_train[booleans_train_b_8]
ytest_b_8 = y_test[booleans_test_b_8]
yvalid_b_8 = y_valid[booleans_valid_b_8]

In [10]:
xtrain_b_9 = X_train[booleans_train_b_9]
xtest_b_9 = X_test[booleans_test_b_9]
xvalid_b_9 = X_valid[booleans_valid_b_9]

ytrain_b_9 = y_train[booleans_train_b_9]
ytest_b_9 = y_test[booleans_test_b_9]
yvalid_b_9 = y_valid[booleans_valid_b_9]

In [11]:
xtrain_b = np.concatenate([xtrain_b_8[0:100],xtrain_b_9[0:100]])
xtest_b = np.concatenate([xtest_b_8,xtest_b_9])
xvalid_b = np.concatenate([xvalid_b_8[0:100],xvalid_b_9[0:100]])

ytrain_b = np.concatenate([ytrain_b_8[0:100],ytrain_b_9[0:100]])
ytest_b = np.concatenate([ytest_b_8,ytest_b_9])
yvalid_b = np.concatenate([yvalid_b_8[0:100],yvalid_b_9[0:100]])

In [12]:
ytrain_b = np.array([1 if x == 8 else 0 for x in ytrain_b])
ytest_b = np.array([1 if x == 8 else 0 for x in ytest_b])
yvalid_b = np.array([1 if x == 8 else 0 for x in yvalid_b])

In [13]:
# Start with a simple sequential model
model = keras.models.Sequential()

# Flatten the input
model.add(keras.layers.Flatten(input_shape=[28, 28]))

# Add dense layers
model.add(keras.layers.Dense(300, activation="relu"))

# Add Final Layer
model.add(keras.layers.Dense(2, activation="softmax"))

# Add Optimizer & Compile model
optimizer = keras.optimizers.SGD(lr=0.3)
model.compile(loss="sparse_categorical_crossentropy",
              optimizer=optimizer,
              metrics=["accuracy"])

checkpoint_cb = keras.callbacks.ModelCheckpoint("../models/classification_reusing_pretrained_layers_b_a.h5",
                                                save_best_only=True)

early_stopping_cb = keras.callbacks.EarlyStopping(patience=10,
                                                  restore_best_weights=True,
                                                  monitor='accuracy')

tensorboard_cb = keras.callbacks.TensorBoard(u.get_run_logdir())

In [14]:
%%time
model_train = model.fit(xtrain_b, ytrain_b, epochs=30, validation_data=(xvalid_b, yvalid_b),
                        callbacks=[checkpoint_cb,early_stopping_cb,tensorboard_cb])

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Wall time: 2.96 s


In [15]:
model_train.model.evaluate(xtest_b, ytest_b)



[3.5936391353607178, 0.5]

**Applying Transfer Learning**

When you train `model_B_on_A`, it will also affect `model_A`. If you want to avoid that, you need to clone `model_A` before you reuse its layers. To do this, you clone `model A`’s architecture with `clone_model()`, then copy its weights (since `clone_model()` does not clone the weights)

In [17]:
model_A = keras.models.load_model("../models/classification_reusing_pretrained_layers_a.h5")
model_A_clone = keras.models.clone_model(model_A)
model_A_clone.set_weights(model_A.get_weights())

Now you could train model_B_on_A for task B, but since the new output layer was initialized randomly it will make large errors (at least during the first few epochs), so there will be large error gradients that may wreck the reused weights. To avoid this, one approach is to freeze the reused layers during the first few epochs, giving the new layer some time to learn reasonable weights. To do this, set every layer’s trainable attribute to False and compile the model:

In [23]:
model_B_on_A = keras.models.Sequential(model_A_clone.layers[:-1])
model_B_on_A.add(keras.layers.Dense(2, activation="softmax"))

In [24]:
for layer in model_B_on_A.layers[:-1]:
    layer.trainable = False

You must always compile your model after you freeze or unfreeze layers.

In [25]:
model_B_on_A.compile(loss='sparse_categorical_crossentropy', optimizer=optimizer,
                     metrics=["accuracy"])

Now you can train the model for a few epochs, then unfreeze the reused layers (which requires compiling the model again) and continue training to fine-tune the reused layers for task B

In [26]:
history = model_B_on_A.fit(xtrain_b, ytrain_b, epochs=4,
                           validation_data=(xvalid_b, yvalid_b))

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


In [27]:
for layer in model_B_on_A.layers[:-1]:
    layer.trainable = True

In [28]:
model_B_on_A.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer,
                     metrics=["accuracy"])

checkpoint_cb = keras.callbacks.ModelCheckpoint("../models/classification_reusing_pretrained_layers_b_b.h5",
                                                save_best_only=True)

history = model_B_on_A.fit(xtrain_b, ytrain_b, epochs=16,
                           validation_data=(xvalid_b, yvalid_b),
                           callbacks=[checkpoint_cb,early_stopping_cb,tensorboard_cb])

Epoch 1/16
Epoch 2/16
Epoch 3/16
Epoch 4/16
Epoch 5/16
Epoch 6/16
Epoch 7/16
Epoch 8/16
Epoch 9/16
Epoch 10/16
Epoch 11/16
Epoch 12/16
Epoch 13/16
Epoch 14/16
Epoch 15/16
Epoch 16/16


In [29]:
history.model.evaluate(xtest_b, ytest_b)



[0.020140551030635834, 0.9965000152587891]