What are we going to do?

We will train a neural network (say model A) on data related to 6 of the classes, and we will train another neural network (say model B) on the remaining 2 classes. Then, we would use the pre-trained weights of model A and tune the last layer so as to classify these 2 classes(this technique is called Transfer Learning), and compare the results of classification obtained using normal training and transfer learning. In this project, we would practically appreciate the use of Transfer Learning.

#### Importing the Modules

In [2]:
import numpy as np
import tensorflow as tf
from tensorflow import keras

#### Preparing the Dataset

In [3]:
(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()

In [4]:
X_train_full = X_train_full[:30000]
y_train_full = y_train_full[:30000]

In [5]:
X_test = X_test[:5000]
y_test = y_test[:5000]

In [6]:
X_train_full = X_train_full / 255.0
X_test = X_test / 255.0

In [7]:
X_valid, X_train = X_train_full[:5000], X_train_full[5000:]
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]

#### Dividing the data sets

Let's split the fashion MNIST training set in two:

X_train_A: all images of all items except for sandals and shirts (classes 5 and 6).

X_train_B: a much smaller training set of just the first 200 images of sandals or shirts. The validation set and the test set are also split this way, but without restricting the number of images.

Why are we doing this?

We will train a model on set A (classification task with 8 classes), and try to reuse it to tackle set B (binary classification). We hope to transfer a little bit of knowledge from task A to task B, since classes in set A (sneakers, ankle boots, coats, t-shirts, etc.) are somewhat similar to classes in set B (sandals and shirts). However, since we are using Dense layers, only patterns that occur at the same location can be reused (in contrast, convolutional layers will transfer much better, since learned patterns can be detected anywhere on the image,

In [8]:
def split_dataset(X, y):
    y_5_or_6 = (y == 5) | (y == 6) # sandals or shirts
    y_A = y[~y_5_or_6]
    y_A[y_A > 6] -= 2 # class indices 7, 8, 9 should be moved to 5, 6, 7
    y_B = (y[y_5_or_6] == 6).astype(np.float32) # binary classification task: is it a shirt (class 6)?
    return ((X[~y_5_or_6], y_A), (X[y_5_or_6], y_B))

In [9]:
(X_train_A, y_train_A), (X_train_B, y_train_B) = split_dataset(X_train, y_train)

In [10]:
(X_valid_A, y_valid_A), (X_valid_B, y_valid_B) = split_dataset(X_valid, y_valid)

In [11]:
(X_test_A, y_test_A), (X_test_B, y_test_B) = split_dataset(X_test, y_test)

In [12]:
tf.random.set_seed(42)
np.random.seed(42)

#### Build and Fit the Model A

In [13]:
model_A = keras.models.Sequential()
model_A.add(keras.layers.Flatten(input_shape=[28, 28]))
for n_hidden in (300, 100, 50, 50, 50):
    model_A.add(keras.layers.Dense(n_hidden, activation="selu"))
model_A.add(keras.layers.Dense(8, activation="softmax"))

In [14]:
model_A.compile(loss="sparse_categorical_crossentropy",
            optimizer=keras.optimizers.SGD(learning_rate=1e-3),
            metrics=["accuracy"])

In [15]:
history = model_A.fit(X_train_A, y_train_A, epochs=5,
                validation_data=(X_valid_A, y_valid_A))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [16]:
import os
os.environ["HDF5_USE_FILE_LOCKING"] = "FALSE"

In [17]:
model_A.save("my_model_A.h5")

#### Build and Fit the Model B

In [20]:
model_B = keras.models.Sequential()
model_B.add(keras.layers.Flatten(input_shape=[28, 28]))
for n_hidden in (300, 100, 50, 50, 50):
    model_B.add(keras.layers.Dense(n_hidden, activation="selu"))
model_B.add(keras.layers.Dense(1, activation="softmax"))

In [21]:
model_B.compile(loss= "binary_crossentropy",
    optimizer= keras.optimizers.SGD(learning_rate=1e-3),
    metrics=["accuracy"])

In [22]:
history = model_B.fit(X_train_B, y_train_B, epochs=5,
            validation_data=(X_valid_B, y_valid_B))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


#### Creating new model based on existing model A

In [24]:
model_B.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten_1 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_6 (Dense)              (None, 300)               235500    
_________________________________________________________________
dense_7 (Dense)              (None, 100)               30100     
_________________________________________________________________
dense_8 (Dense)              (None, 50)                5050      
_________________________________________________________________
dense_9 (Dense)              (None, 50)                2550      
_________________________________________________________________
dense_10 (Dense)             (None, 50)                2550      
_________________________________________________________________
dense_11 (Dense)             (None, 1)                

In [25]:
model_A_clone = keras.models.clone_model(model_A)

In [26]:
model_A_clone.set_weights(model_A.get_weights())

In [27]:
model_B_on_A = keras.models.Sequential(model_A.layers[:-1])

In [28]:
model_B_on_A.add(keras.layers.Dense(1, activation="sigmoid"))

In [29]:
for layer in model_B_on_A.layers[:-1]:
    layer.trainable = False

In [30]:
model_B_on_A.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 300)               235500    
_________________________________________________________________
dense_1 (Dense)              (None, 100)               30100     
_________________________________________________________________
dense_2 (Dense)              (None, 50)                5050      
_________________________________________________________________
dense_3 (Dense)              (None, 50)                2550      
_________________________________________________________________
dense_4 (Dense)              (None, 50)                2550      
_________________________________________________________________
dense_12 (Dense)             (None, 1)                

In [31]:
model_B_on_A.compile(loss="binary_crossentropy",
         optimizer=keras.optimizers.SGD(learning_rate=1e-3),
         metrics=["accuracy"])

In [32]:
history = model_B_on_A.fit(X_train_B, y_train_B, epochs=5,
                   validation_data=(X_valid_B, y_valid_B))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


#### Evaluating the models

In [35]:
model_B.evaluate(X_test_B, y_test_B)



[0.03187720105051994, 0.49844881892204285]

In [36]:
model_B_on_A.evaluate(X_test_B, y_test_B)



[0.09890769422054291, 0.988624632358551]