# Training Deep Neural Networks with Pretrained Layers

_**Experiment on Fashion MNIST or any other appropriate dataset to check if reusing pretrained layers in transfer learning makes the training possible with less data and saves training time. Also check for any model performance improvement.**_

First a model will be trained on set A (for classification task with 8 classes such as trousers, dresses, coats, sandals, shirts, sneakers, bags, and ankle boots in Fashion MNIST dataset). Then the learning will be reused for another a binary classification task on set B (the remaining 2 classes such as T-shirts/tops and pullovers in the same dataset) since classes in set A are somewhat similar to classes in set B.

In [10]:
# Imports required packages

import pickle
import numpy as np
import tensorflow as tf
from sklearn.model_selection import train_test_split

## Retrieving & Analysing Dataset

In [14]:
# Load Fashion MNIST dataset from a pickle dump

with open("data/fashion_mnist.pickle", "rb") as f:
    fashion = pickle.load(f)

In [16]:
# Considering dataset is organized in tuple, items are referenced as follows
(X_train_full, y_train_full), (X_test, y_test) = fashion

In [None]:
# Check the shape of the training and test dataset

print("Train dataset shape:", <code here>)

print("Test dataset shape:", <code here>)

In [25]:
# Each training and test example is assigned to one of the following labels.
class_names = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat", "Sandal", \
               "Shirt", "Sneaker", "Bag", "Ankle boot"]

## Preparing Datasets

In [27]:
# Normalizes the data between 0 and 1 for effective neural network model training
X_train_full, X_test = X_train_full / 255., X_test / 255.

In [30]:
# Splits train dataset further to seperate 5000 instances to be used as validation set
# Also, consider stratification during splitting.

X_train, X_val, y_train, y_val = <code here>)

**Splits Fashion MNIST into tasks A and B, then train and save.**

In [32]:
# Store class ids of the target items into variables.
# In this case, variable pos_class_id and neg_class_id will store 2 and 0, respectively.
# Also, item "Pullover" and "T-shirt/top" are considered as positive and negative class, respectively.

pos_class_id = class_names.index("Pullover")
neg_class_id = class_names.index("T-shirt/top")

In [34]:
def split_dataset(X, y):
    """
    Splits the dataset having all items into a pair of tuples - one for dataset for 8-class classification task
    and other one for dataset for the remaining 2-class classification task.
    """
    y_for_B = (y == pos_class_id) | (y == neg_class_id)
    y_A = y[~y_for_B]
    y_B = (y[y_for_B] == pos_class_id).astype(np.float32)
    old_class_ids = list(set(range(10)) - set([neg_class_id, pos_class_id]))
    for old_class_id, new_class_id in zip(old_class_ids, range(8)):
        y_A[y_A == old_class_id] = new_class_id  # reorder class ids for A
        
    return ((X[~y_for_B], y_A), (X[y_for_B], y_B))

In [36]:
# Splits train, validation and test data into respective dataset for classification task A and B

(X_train_A, y_train_A), (X_train_B, y_train_B) = split_dataset(X_train, y_train)
(X_val_A, y_val_A), (X_val_B, y_val_B) = split_dataset(X_val, y_val)
(X_test_A, y_test_A), (X_test_B, y_test_B) = split_dataset(X_test, y_test)

# Considers only 200 instances for training for classification task B
X_train_B = X_train_B[:200]
y_train_B = y_train_B[:200]

## Modeling

In [None]:
# Create a dense nueral network for classification task A

tf.random.set_seed(42)

model_A = tf.keras.Sequential([
    # Initialize a "Flatten" layer with input shape
    <code here>,
    
    # Initialize three "Dense" layers specifying each with 100 as output shape, 
    # "relu" as activation function and "he_normal" as kernel initializer
    <code here>,
    <code here>,
    <code here>,
    
    # Initialize a "Dense" layer specifying 8 as output shape and activation function 
    # according to the task
    <code here>
])

# Initialize "SGD" as model optimizer with 0.001 as learning rate
optimizer = <code here>

# Compile the model by specifying sparse categorical crossentropy as loss function,
# already initialized optimizer and "accuracy" as a metric
<code here>

In [None]:
# Fit the model by specifying training dataset, 20 epochs and validation data (tuple with features and labels)
history = <code here>

In [53]:
# Saves the model to be used later for transfer learning
# NOTE: Make sure the folder "models" exists under the current working directory

model_A.save("./models/my_fashion_mnist_model_A.keras")

**Now, to realize whether if transfer learning works or not, first train model B, then evaluate it, without reusing model A.**

In [57]:
# Create a dense nueral network for classification task B

tf.random.set_seed(42)

model_B = tf.keras.Sequential([
    # Initialize a "Flatten" layer with input shape
    <code here>,
    
    # Initialize three "Dense" layers specifying each with 100 as output shape, 
    # "relu" as activation function and "he_normal" as kernel initializer
    <code here>,
    <code here>,
    <code here>,
    
    # Initialize a "Dense" layer specifying output shape and activation function
    # appropriate for binary classification task
    <code here>
])

# Initialize "SGD" as model optimizer with 0.001 as learning rate
optimizer = <code here>

# Compile the model by specifying binary crossentropy as loss function,
# already initialized optimizer and "accuracy" as a metric
<code here>

In [None]:
# Fit the model by specifying training dataset, 20 epochs and validation data (tuple with features and labels)
history = model_B.fit(X_train_B, y_train_B, epochs=20, validation_data=(X_val_B, y_val_B))

In [None]:
# Evaluate the model on test dataset
model_B.evaluate(X_test_B, y_test_B)

**Note down Model B's accuracy on the test set.**

**Now let's try reusing the pretrained model A.**

In [77]:
# Load the model trained for classification task A
model_A = tf.keras.models.load_model("./models/my_fashion_mnist_model_A.keras")

In [65]:
tf.random.set_seed(42)

# Clone network architecture of model A into a new model
model_A_clone = tf.keras.models.clone_model(model_A)

# Copy model A's learned weights into the cloned model
model_A_clone.set_weights(model_A.get_weights())

In [67]:
# Create target model B cloning all layers, except for the output layer
model_B_on_A = tf.keras.Sequential(model_A_clone.layers[:-1])

# Then initialize a "Dense" layer a one-node output and sigmoid activation function, and
# and add to target model B
model_B_on_A.add(tf.keras.layers.Dense(1, activation="sigmoid"))

**Target model's training takes place into two phases. In the first phase, only output layer gets trained over a shorter iterations keeping all hidden layers non-trainable, and in the second phase, all layers are trained over a relatively longer iterations.**

In [69]:
# Freeze all the hidden layers before training
for layer in model_B_on_A.layers[:-1]:
    layer.trainable = False

optimizer = tf.keras.optimizers.SGD(learning_rate=0.001)

model_B_on_A.compile(loss="binary_crossentropy", optimizer=optimizer, metrics=["accuracy"])

In [None]:
# Fits the model over a shorter iteration (e.g. 4) to train only the output layer
history = model_B_on_A.fit(X_train_B, y_train_B, epochs=4, validation_data=(X_val_B, y_val_B))

In [73]:
# Then allows hidden layers trainable before next iterations of training
for layer in model_B_on_A.layers[:-1]:
    layer.trainable = True

optimizer = tf.keras.optimizers.SGD(learning_rate=0.001)
model_B_on_A.compile(loss="binary_crossentropy", optimizer=optimizer, metrics=["accuracy"])

In [None]:
# Fits the full model.
history = model_B_on_A.fit(X_train_B, y_train_B, epochs=16, validation_data=(X_val_B, y_val_B))

In [None]:
# Once again, evaluates the model B against test dataset
model_B_on_A.evaluate(X_test_B, y_test_B)

**Note down the accuracy of the new model B which was built over pretrained layer, and compare accuracy between these two models.**

**Observations:**

Note down all your observations in green/blue book.