# Importing the Modules

- Let us begin by importing the necessary modules.

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow import keras

In [None]:
type(keras)

module

# Preparing the Dataset

- Let us load the dataset and trim it to form a shorter dataset, as training a bigger dataset would take a lot of time.

**Note:**

- The Fashion MNIST data from `keras` is already preprocessed and already split into train and test sets. So we shall receive them accordingly while loading the data.


In [None]:
# loading the Fashion-MNIST dataset
(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()

# trimming the data since it takes lot of time
X_train_full = X_train_full[:30000]
y_train_full = y_train_full[:30000]

X_test = X_test[:5000]
y_test = y_test[:5000]

# scaling the dataset
X_train_full = X_train_full / 255.0
X_test = X_test / 255.0

# dividing the dataset into traingin and validation set
X_valid, X_train = X_train_full[:5000], X_train_full[5000:]
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]

In [None]:
X_test.min()==0 and X_test.max()==1

True

# Dividing the data sets

Let's split the fashion MNIST training set in two:

**X_train_A:** all images of all items except for sandals and shirts (classes 5 and 6).

**X_train_B:** a much smaller training set of just the first 200 images of sandals or shirts.
The validation set and the test set are also split this way, but without restricting the number of images.

**Why are we doing this?**

We will train a model on set A (classification task with 8 classes), and try to reuse it to tackle set B (binary classification). 

We hope to transfer a little bit of knowledge from task A to task B, since classes in set A (sneakers, ankle boots, coats, t-shirts, etc.) are somewhat similar to classes in set B (sandals and shirts). 

However, since we are using Dense layers, only patterns that occur at the same location can be reused (in contrast, convolutional layers will transfer much better, since learned patterns can be detected anywhere on the image, as we will see in the CNN chapter).

In [None]:
# defining the dataset
def split_dataset(X, y):
    y_5_or_6 = (y == 5) | (y == 6) # sandals or shirts
    y_A = y[~y_5_or_6]
    y_A[y_A > 6] -= 2 # class indices 7, 8, 9 should be moved to 5, 6, 7
    y_B = (y[y_5_or_6] == 6).astype(np.float32) # binary classification task: is it a shirt (class 6)?
    return ((X[~y_5_or_6], y_A),
            (X[y_5_or_6], y_B))

(X_train_A, y_train_A), (X_train_B, y_train_B) = split_dataset(X_train, y_train)
print((X_train_A.shape, y_train_A.shape), (X_train_B.shape, y_train_B.shape))
(X_valid_A, y_valid_A), (X_valid_B, y_valid_B) = split_dataset(X_valid, y_valid)
(X_test_A, y_test_A), (X_test_B, y_test_B) = split_dataset(X_test, y_test)
# X_train_B = X_train_B[:200]
# y_train_B = y_train_B[:200]

((19875, 28, 28), (19875,)) ((5125, 28, 28), (5125,))


In [None]:
# checking shape of training set A
y_test_A.shape 

(4033,)

In [None]:
X_train_B.shape # checking shape of training set B

(5125, 28, 28)

In [None]:
y_train_A[:30] # checking first 30 y-labels for training set A

array([4, 0, 5, 7, 7, 7, 4, 4, 3, 4, 0, 1, 6, 3, 4, 3, 2, 6, 5, 3, 4, 5,
       1, 3, 4, 2, 0, 6, 7, 1], dtype=uint8)

In [None]:
y_train_B[:30] # checking first 30 y-labels for training set B

array([1., 1., 0., 0., 0., 0., 1., 1., 1., 0., 0., 1., 1., 0., 0., 0., 0.,
       0., 0., 1., 1., 0., 0., 1., 1., 0., 1., 1., 1., 1.], dtype=float32)

In [None]:
tf.random.set_seed(42)
np.random.seed(42)

# Build and Fit the Model A

Let us define the model for the classification of data set A that we have created previously. 

Later the trained weights of this model will be used for the classification task of data B.

- We create a keras neural network as follows:

 - Add `keras.layers.Flatten` to flatten the input image to the model.

 - Add 5 dense layers with `n_hidden` number of neurons and `selu` activation function.

 - Add a final dense layer with 8 neurons and `softmax` activation function(for classifying 8 classes of data).

            model_A = keras.models.Sequential()
            model_A.add(keras.layers.Flatten(input_shape=[28, 28]))
            for n_hidden in (300, 100, 50, 50, 50):
                model_A.add(keras.layers.Dense(n_hidden, activation="selu"))
            model_A.add(keras.layers.Dense(8, activation="softmax"))

In [None]:
# defining the model
model_A = keras.models.Sequential()
model_A.add(keras.layers.Flatten(input_shape=[28, 28]))
for n_hidden in (300, 100, 50, 50, 50):
    model_A.add(keras.layers.Dense(n_hidden, activation="selu"))
model_A.add(keras.layers.Dense(8, activation="softmax"))

In [None]:
# compiling the model
model_A.compile(loss="sparse_categorical_crossentropy",
                optimizer=keras.optimizers.SGD(lr=1e-3),
                metrics=["accuracy"])

In [None]:
# training the model
history = model_A.fit(X_train_A, y_train_A, epochs=5,
                    validation_data=(X_valid_A, y_valid_A))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [None]:
model_A.save("my_model_A.h5") # saving the model we created

In [None]:
str(type(model_A))=="<class 'tensorflow.python.keras.engine.sequential.Sequential'>"

True

# Build and Fit the Model B

- Let us define the model for the classification of data set B that we have created previously. 

 Later, let us also examine the classification of B set by using the trained weights of model A.

 - We create a keras neural network as follows:

 - Add `keras.layers.Flatten` to flatten the input image to the model.

 - Add 5 dense layers with `n_hidden` number of neurons and `selu` activation function.

 - Add a final dense layer with 1 neuron and `softmax` activation function(for classifying 2 classes of data).

            model_B = keras.models.Sequential()
            model_B.add(keras.layers.Flatten(input_shape=[28, 28]))
            for n_hidden in (300, 100, 50, 50, 50):
                model_B.add(keras.layers.Dense(n_hidden, activation="selu"))
            model_B.add(keras.layers.Dense(1, activation="softmax"))

In [None]:
model_B = keras.models.Sequential()
model_B.add(keras.layers.Flatten(input_shape=[28, 28]))
for n_hidden in (300, 100, 50, 50, 50):
    model_B.add(keras.layers.Dense(n_hidden, activation="selu"))
model_B.add(keras.layers.Dense(1, activation="sigmoid"))

Setting `"binary_crossentropy"` as loss, as this is binary classification among sandals and shirts.

In [None]:

# compiling the model with binary crossentropy
# that can accept either logits (i.e values from last linear node, z)
# or probabilities from the last Sigmoid node
model_B.compile(loss="binary_crossentropy",
                optimizer=keras.optimizers.SGD(lr=1e-3),
                metrics=["accuracy"])

In [None]:
# training the model
history = model_B.fit(X_train_B, y_train_B, epochs=5,
                      validation_data=(X_valid_B, y_valid_B))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [None]:
model_A.summary() # generating model summary

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 300)               235500    
_________________________________________________________________
dense_1 (Dense)              (None, 100)               30100     
_________________________________________________________________
dense_2 (Dense)              (None, 50)                5050      
_________________________________________________________________
dense_3 (Dense)              (None, 50)                2550      
_________________________________________________________________
dense_4 (Dense)              (None, 50)                2550      
_________________________________________________________________
dense_5 (Dense)              (None, 8)                 4

In [None]:
model_B.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten_1 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_6 (Dense)              (None, 300)               235500    
_________________________________________________________________
dense_7 (Dense)              (None, 100)               30100     
_________________________________________________________________
dense_8 (Dense)              (None, 50)                5050      
_________________________________________________________________
dense_9 (Dense)              (None, 50)                2550      
_________________________________________________________________
dense_10 (Dense)             (None, 50)                2550      
_________________________________________________________________
dense_11 (Dense)             (None, 1)                

# Creating new model based on existing model A

- Let us first see how many trainable parameters are there for `model_B` we trained previously.

- Then we shall create a new model `model_B_on_A` which has the pre-trained parameters of `model_A` but customized final dense layer with only 1 neuron.

- Finally, we shall compare the performance of both the models - `model_B` and `model_B_on_A`.



In [None]:

# model_A = keras.models.load_model("my_model_A.h5") # loading our saved model
model_B_on_A = keras.models.Sequential(model_A.layers[:-1]) # creating new model based on existing layer
model_B_on_A.add(keras.layers.Dense(1, activation="sigmoid")) # adding new layer to new model


- Now, before creating `model_B_on_A`(a model based on pre-trained layers of `model_A`), we shall clone the `model_A`  and set its trained weights so that when you train `model_B_on_A`, it will not affect `model_A`.

 We could copy the `model_A` architechture using `keras.models.clone_model`.

In [None]:

# model_A and model_B_on_A now share some layers. When you train
# model_B_on_A, it will also affect model_A. To avoid that, you need to clone
# model_A before you reuse its layers. To do this, you clone model A’s
# architecture with clone_model(), then copy its weights
# (since clone_model() does not clone the weights)
model_A_clone = keras.models.clone_model(model_A)
model_A_clone.set_weights(model_A.get_weights())

In [None]:
# freezing reused layers
for layer in model_B_on_A.layers[:-1]:
    layer.trainable = False

# compiling the model
model_B_on_A.compile(loss="binary_crossentropy",
                     optimizer=keras.optimizers.SGD(lr=1e-3),
                     metrics=["accuracy"])

In [None]:
model_B_on_A.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 300)               235500    
_________________________________________________________________
dense_1 (Dense)              (None, 100)               30100     
_________________________________________________________________
dense_2 (Dense)              (None, 50)                5050      
_________________________________________________________________
dense_3 (Dense)              (None, 50)                2550      
_________________________________________________________________
dense_4 (Dense)              (None, 50)                2550      
_________________________________________________________________
dense_13 (Dense)             (None, 1)                

 We observe there are only 51 parameters to train in `model_B_on_A`, while there are as many as 275,801 trainable parameters for `model_B`.

In [None]:
# training the model
history = model_B_on_A.fit(X_train_B, y_train_B, epochs=4,
                           validation_data=(X_valid_B, y_valid_B))

# unfreezing reused layers
for layer in model_B_on_A.layers[:-1]:
    layer.trainable = True

# compiling after reducing learning rate to avoid damaging the reused weights
model_B_on_A.compile(loss="binary_crossentropy",
                     optimizer=keras.optimizers.SGD(lr=1e-3),
                     metrics=["accuracy"])

# training the model
history = model_B_on_A.fit(X_train_B, y_train_B, epochs=5,
                           validation_data=(X_valid_B, y_valid_B))

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


# Evaluating the models

- Now that we have the two models `model_B` and `model_B_on_A` for classifying the B dataset, let us evaluate the performance of the model based on their accuracies on the test data of B data set.

In [None]:
model_B.evaluate(X_test_B, y_test_B) # evaluating the model A



[0.03187720850110054, 0.9937952160835266]

In [None]:
model_B_on_A.evaluate(X_test_B, y_test_B) # evaluating the model B



[0.019171511754393578, 0.997931718826294]

 We observe that the accuracies of both models are almost the same.

 We also see that the performance of `model_B_on_A` - with as less as 51 trainable parameter - stands to be as great as that of `model_B`with as many as 275,801.

 So, with very little training, `model_B_on_A` is performing really well. This saves time and resources even in real-time scenarios. This is the beauty of using pre-trained layers. This method is also known as transfer learning - transferring the knowledge obtained from solving one problem to solving another similar problem.