<a href="https://colab.research.google.com/github/FedericoSabbadini/DeepLearning/blob/main/KerasModelReuse.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Setup

This project requires Python 3.7 or above:

In [54]:
import sys
assert sys.version_info >= (3, 7)

import numpy as np

And TensorFlow ≥ 2.8:

In [55]:
from packaging import version
import tensorflow as tf

assert version.parse(tf.__version__) >= version.parse("2.8.0")

As we did in previous chapters, let's define the default font sizes to make the figures prettier:

In [56]:
import matplotlib.pyplot as plt

plt.rc('font', size=14)
plt.rc('axes', labelsize=14, titlesize=14)
plt.rc('legend', fontsize=14)
plt.rc('xtick', labelsize=10)
plt.rc('ytick', labelsize=10)

And let's create the `images/deep` folder (if it doesn't already exist), and define the `save_fig()` function which is used through this notebook to save the figures in high-res for the book:

In [57]:
from pathlib import Path

IMAGES_PATH = Path() / "images" / "deep"
IMAGES_PATH.mkdir(parents=True, exist_ok=True)

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = IMAGES_PATH / f"{fig_id}.{fig_extension}"
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)

In [58]:
class_names = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
               "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]

In [59]:
fashion_mnist = tf.keras.datasets.fashion_mnist.load_data()
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist
X_train, y_train = X_train_full[:-5000], y_train_full[:-5000]
X_valid, y_valid = X_train_full[-5000:], y_train_full[-5000:]
X_train, X_valid, X_test = X_train / 255, X_valid / 255, X_test / 255

In [60]:
X_train.shape
# X_test.shape

(55000, 28, 28)

# Reusing Pretrained Layers

### Reusing a Keras model

Let's split the fashion MNIST training set in two:
* `X_train_A`: all images of all items except for T-shirts/tops and pullovers (classes 0 and 2).
* `X_train_B`: a much smaller training set of just the first 200 images of T-shirts/tops and pullovers.

The validation set and the test set are also split this way, but without restricting the number of images.

We will train a model on set A (classification task with 8 classes), and try to reuse it to tackle set B (binary classification). We hope to transfer a little bit of knowledge from task A to task B, since classes in set A (trousers, dresses, coats, sandals, shirts, sneakers, bags, and ankle boots) are somewhat similar to classes in set B (T-shirts/tops and pullovers). However, since we are using `Dense` layers, only patterns that occur at the same location can be reused (in contrast, convolutional layers will transfer much better, since learned patterns can be detected anywhere on the image, as we will see in the chapter 14).

## Train Model A

In [61]:
# extra code – split Fashion MNIST into tasks A and B, then train and save
#              model A to "my_model_A".

pos_class_id = class_names.index("Pullover")
neg_class_id = class_names.index("T-shirt/top")

def split_dataset(X, y):
    y_for_B = (y == pos_class_id) | (y == neg_class_id)
    y_A = y[~y_for_B]
    y_B = (y[y_for_B] == pos_class_id).astype(np.float32)
    old_class_ids = list(set(range(10)) - set([neg_class_id, pos_class_id]))
    for old_class_id, new_class_id in zip(old_class_ids, range(8)):
        y_A[y_A == old_class_id] = new_class_id  # reorder class ids for A
    return ((X[~y_for_B], y_A), (X[y_for_B], y_B))

(X_train_A, y_train_A), (X_train_B, y_train_B) = split_dataset(X_train, y_train)
(X_valid_A, y_valid_A), (X_valid_B, y_valid_B) = split_dataset(X_valid, y_valid)
(X_test_A, y_test_A), (X_test_B, y_test_B) = split_dataset(X_test, y_test)
X_train_B = X_train_B[:200]
y_train_B = y_train_B[:200]

In [62]:
X_train_A.shape

(44011, 28, 28)

In [63]:
tf.random.set_seed(43)

model_A = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=[28, 28]),
    tf.keras.layers.Dense(100, activation="relu",
                          kernel_initializer="he_normal"),
    tf.keras.layers.Dense(100, activation="relu",
                          kernel_initializer="he_normal"),
    tf.keras.layers.Dense(100, activation="relu",
                          kernel_initializer="he_normal"),
    tf.keras.layers.Dense(8, activation="softmax") # 8 classi finali, serve distribuzione di probabilità con softmax
])

model_A.compile(loss="sparse_categorical_crossentropy", # con sparse converte gli 8 in un vettore
                optimizer=tf.keras.optimizers.SGD(learning_rate=0.001),
                metrics=["accuracy"]) # meglio altra metrica se non bilanciate, qua lo sono
history = model_A.fit(X_train_A, y_train_A, epochs=20,
                      validation_data=(X_valid_A, y_valid_A))

  super().__init__(**kwargs)


Epoch 1/20
[1m1376/1376[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.5158 - loss: 1.5686 - val_accuracy: 0.7471 - val_loss: 0.7286
Epoch 2/20
[1m1376/1376[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - accuracy: 0.7578 - loss: 0.6768 - val_accuracy: 0.8135 - val_loss: 0.5526
Epoch 3/20
[1m1376/1376[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.8209 - loss: 0.5360 - val_accuracy: 0.8483 - val_loss: 0.4732
Epoch 4/20
[1m1376/1376[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.8512 - loss: 0.4628 - val_accuracy: 0.8589 - val_loss: 0.4236
Epoch 5/20
[1m1376/1376[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - accuracy: 0.8654 - loss: 0.4156 - val_accuracy: 0.8701 - val_loss: 0.3909
Epoch 6/20
[1m1376/1376[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.8736 - loss: 0.3835 - val_accuracy: 0.8764 - val_loss: 0.3680
Epoch 7/20
[1m1

( 1376 sono i batch (32 batch, default) eseguiti in sequenza. Aumentando i batch il numero di iterazioni per epoca diminuisce, ma richiede più tempo per processare più dati. )

In [64]:
model_A.evaluate(X_test_A, y_test_A)

[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.8971 - loss: 0.2886


[0.2866709530353546, 0.8987500071525574]

Model B reaches 89.8% accuracy on its test set.

In [65]:
model_A.save("my_model_A.keras")

## Train Model B

In [66]:
# extra code – train and evaluate model B, without reusing model A

tf.random.set_seed(43)
model_B = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=[28, 28]),
    tf.keras.layers.Dense(100, activation="relu",
                          kernel_initializer="he_normal"),
    tf.keras.layers.Dense(100, activation="relu",
                          kernel_initializer="he_normal"),
    tf.keras.layers.Dense(100, activation="relu",
                          kernel_initializer="he_normal"),
    tf.keras.layers.Dense(1, activation="sigmoid") # solo 1 uscita, no softmax
])

model_B.compile(loss="binary_crossentropy", # ora abbiamo solo due classi
                optimizer=tf.keras.optimizers.SGD(learning_rate=0.001),
                metrics=["accuracy"])
history = model_B.fit(X_train_B, y_train_B, epochs=20,
                      validation_data=(X_valid_B, y_valid_B))

Epoch 1/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 174ms/step - accuracy: 0.7103 - loss: 0.6362 - val_accuracy: 0.7290 - val_loss: 0.6337
Epoch 2/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step - accuracy: 0.8239 - loss: 0.6076 - val_accuracy: 0.8012 - val_loss: 0.6089
Epoch 3/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 21ms/step - accuracy: 0.8718 - loss: 0.5831 - val_accuracy: 0.8497 - val_loss: 0.5879
Epoch 4/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step - accuracy: 0.8907 - loss: 0.5620 - val_accuracy: 0.8724 - val_loss: 0.5688
Epoch 5/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step - accuracy: 0.9116 - loss: 0.5431 - val_accuracy: 0.8912 - val_loss: 0.5520
Epoch 6/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 22ms/step - accuracy: 0.9165 - loss: 0.5258 - val_accuracy: 0.8961 - val_loss: 0.5365
Epoch 7/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━

In [67]:
model_B.evaluate(X_test_B, y_test_B)

[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.9347 - loss: 0.4056


[0.40689003467559814, 0.9350000023841858]

Model B reaches 89.7% accuracy on the test set. Now let's try reusing the pretrained model A.

## Reuse Model A

In [68]:
model_A = tf.keras.models.load_model("my_model_A.keras")


Note that `model_B_on_A` and `model_A` actually share layers now, so when we train one, it will update both models. If we want to avoid that, we need to build `model_B_on_A` on top of a *clone* of `model_A`:

In [69]:
tf.random.set_seed(43)  # extra code – ensure reproducibility

In [70]:
model_A_clone = tf.keras.models.clone_model(model_A)
model_A_clone.set_weights(model_A.get_weights())

In [71]:
# extra code – creating model_B_on_A just like in the previous cell
modelAB = tf.keras.Sequential(model_A.layers[:-1])

In [72]:
modelAB.add(tf.keras.layers.Dense(1, activation="sigmoid"))

In [73]:
#Not trainable layers
for layer in modelAB.layers[:-1]:
    layer.trainable = False

In [74]:
#Fit
modelAB.compile(loss="binary_crossentropy",
                optimizer=tf.keras.optimizers.SGD(learning_rate=0.001),
                metrics=["accuracy"])

In [75]:
modelAB.fit(X_train_B, y_train_B, epochs=5,
            validation_data=(X_valid_B, y_valid_B))

Epoch 1/5
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 296ms/step - accuracy: 0.5698 - loss: 3.0383 - val_accuracy: 0.5153 - val_loss: 2.3501
Epoch 2/5
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step - accuracy: 0.5698 - loss: 1.9763 - val_accuracy: 0.5153 - val_loss: 1.3027
Epoch 3/5
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 29ms/step - accuracy: 0.5717 - loss: 1.0755 - val_accuracy: 0.5559 - val_loss: 0.7098
Epoch 4/5
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step - accuracy: 0.6146 - loss: 0.6555 - val_accuracy: 0.6785 - val_loss: 0.6019
Epoch 5/5
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step - accuracy: 0.6719 - loss: 0.5958 - val_accuracy: 0.7418 - val_loss: 0.5887


<keras.src.callbacks.history.History at 0x7a5653be3500>

In [76]:
#Set Trainable
for layer in modelAB.layers[:-1]:
    layer.trainable = True

In [77]:
modelAB.fit(X_train_B, y_train_B, epochs=20,
            validation_data=(X_valid_B, y_valid_B))

Epoch 1/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 27ms/step - accuracy: 0.7024 - loss: 0.5880 - val_accuracy: 0.7606 - val_loss: 0.5820
Epoch 2/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 21ms/step - accuracy: 0.7317 - loss: 0.5810 - val_accuracy: 0.7755 - val_loss: 0.5753
Epoch 3/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step - accuracy: 0.7317 - loss: 0.5732 - val_accuracy: 0.7883 - val_loss: 0.5687
Epoch 4/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step - accuracy: 0.7380 - loss: 0.5653 - val_accuracy: 0.7943 - val_loss: 0.5623
Epoch 5/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step - accuracy: 0.7555 - loss: 0.5575 - val_accuracy: 0.8032 - val_loss: 0.5560
Epoch 6/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step - accuracy: 0.7636 - loss: 0.5498 - val_accuracy: 0.8061 - val_loss: 0.5499
Epoch 7/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x7a5653a67980>

So, what's the final verdict?

In [78]:
#Evaluate
modelAB.evaluate(X_test_B, y_test_B)

[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.8755 - loss: 0.4775


[0.4864097237586975, 0.8644999861717224]

Great! We got a bit of transfer: the model's accuracy went up 2 percentage points, from 89.7% to 92.25%. This means the error rate dropped by almost 25%:

In [79]:
1 - (100 - 92.25) / (100 -89.7)

0.24757281553398036