<a href="https://colab.research.google.com/github/FedericoSabbadini/DeepLearning/blob/main/KerasModelReuse.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Setup

This project requires Python 3.7 or above:

In [1]:
import sys
assert sys.version_info >= (3, 7)

import numpy as np

And TensorFlow ≥ 2.8:

In [2]:
from packaging import version
import tensorflow as tf

assert version.parse(tf.__version__) >= version.parse("2.8.0")

As we did in previous chapters, let's define the default font sizes to make the figures prettier:

In [3]:
import matplotlib.pyplot as plt

plt.rc('font', size=14)
plt.rc('axes', labelsize=14, titlesize=14)
plt.rc('legend', fontsize=14)
plt.rc('xtick', labelsize=10)
plt.rc('ytick', labelsize=10)

And let's create the `images/deep` folder (if it doesn't already exist), and define the `save_fig()` function which is used through this notebook to save the figures in high-res for the book:

In [4]:
from pathlib import Path

IMAGES_PATH = Path() / "images" / "deep"
IMAGES_PATH.mkdir(parents=True, exist_ok=True)

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = IMAGES_PATH / f"{fig_id}.{fig_extension}"
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)

In [5]:
class_names = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
               "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]

In [6]:
fashion_mnist = tf.keras.datasets.fashion_mnist.load_data()
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist
X_train, y_train = X_train_full[:-5000], y_train_full[:-5000]
X_valid, y_valid = X_train_full[-5000:], y_train_full[-5000:]
X_train, X_valid, X_test = X_train / 255, X_valid / 255, X_test / 255

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
[1m29515/29515[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
[1m26421880/26421880[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
[1m5148/5148[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
[1m4422102/4422102[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


In [7]:
X_train.shape
# X_test.shape

(55000, 28, 28)

# Reusing Pretrained Layers

### Reusing a Keras model

Let's split the fashion MNIST training set in two:
* `X_train_A`: all images of all items except for T-shirts/tops and pullovers (classes 0 and 2).
* `X_train_B`: a much smaller training set of just the first 200 images of T-shirts/tops and pullovers.

The validation set and the test set are also split this way, but without restricting the number of images.

We will train a model on set A (classification task with 8 classes), and try to reuse it to tackle set B (binary classification). We hope to transfer a little bit of knowledge from task A to task B, since classes in set A (trousers, dresses, coats, sandals, shirts, sneakers, bags, and ankle boots) are somewhat similar to classes in set B (T-shirts/tops and pullovers). However, since we are using `Dense` layers, only patterns that occur at the same location can be reused (in contrast, convolutional layers will transfer much better, since learned patterns can be detected anywhere on the image, as we will see in the chapter 14).

## Train Model A

In [8]:
# extra code – split Fashion MNIST into tasks A and B, then train and save
#              model A to "my_model_A".

pos_class_id = class_names.index("Pullover")
neg_class_id = class_names.index("T-shirt/top")

def split_dataset(X, y):
    y_for_B = (y == pos_class_id) | (y == neg_class_id)
    y_A = y[~y_for_B]
    y_B = (y[y_for_B] == pos_class_id).astype(np.float32)
    old_class_ids = list(set(range(10)) - set([neg_class_id, pos_class_id]))
    for old_class_id, new_class_id in zip(old_class_ids, range(8)):
        y_A[y_A == old_class_id] = new_class_id  # reorder class ids for A
    return ((X[~y_for_B], y_A), (X[y_for_B], y_B))

(X_train_A, y_train_A), (X_train_B, y_train_B) = split_dataset(X_train, y_train)
(X_valid_A, y_valid_A), (X_valid_B, y_valid_B) = split_dataset(X_valid, y_valid)
(X_test_A, y_test_A), (X_test_B, y_test_B) = split_dataset(X_test, y_test)
X_train_B = X_train_B[:200]
y_train_B = y_train_B[:200]

In [9]:
X_train_A.shape

(44011, 28, 28)

In [10]:
tf.random.set_seed(43)

model_A = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=[28, 28]),
    tf.keras.layers.Dense(100, activation="relu",
                          kernel_initializer="he_normal"),
    tf.keras.layers.Dense(100, activation="relu",
                          kernel_initializer="he_normal"),
    tf.keras.layers.Dense(100, activation="relu",
                          kernel_initializer="he_normal"),
    tf.keras.layers.Dense(8, activation="softmax") # 8 classi finali, serve distribuzione di probabilità con softmax
])

model_A.compile(loss="sparse_categorical_crossentropy", # con sparse converte gli 8 in un vettore
                optimizer=tf.keras.optimizers.SGD(learning_rate=0.001),
                metrics=["accuracy"]) # meglio altra metrica se non bilanciate, qua lo sono
history = model_A.fit(X_train_A, y_train_A, epochs=20,
                      validation_data=(X_valid_A, y_valid_A))

  super().__init__(**kwargs)


Epoch 1/20
[1m1376/1376[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 3ms/step - accuracy: 0.5148 - loss: 1.5510 - val_accuracy: 0.7724 - val_loss: 0.7260
Epoch 2/20
[1m1376/1376[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - accuracy: 0.7811 - loss: 0.6746 - val_accuracy: 0.8433 - val_loss: 0.5298
Epoch 3/20
[1m1376/1376[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.8369 - loss: 0.5210 - val_accuracy: 0.8641 - val_loss: 0.4500
Epoch 4/20
[1m1376/1376[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.8589 - loss: 0.4486 - val_accuracy: 0.8706 - val_loss: 0.4059
Epoch 5/20
[1m1376/1376[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - accuracy: 0.8681 - loss: 0.4052 - val_accuracy: 0.8742 - val_loss: 0.3777
Epoch 6/20
[1m1376/1376[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.8758 - loss: 0.3758 - val_accuracy: 0.8794 - val_loss: 0.3576
Epoch 7/20
[1m1

( 1376 sono i batch (32 batch, default) eseguiti in sequenza. Aumentando i batch il numero di iterazioni per epoca diminuisce, ma richiede più tempo per processare più dati. )

In [11]:
model_A.evaluate(X_test_A, y_test_A)

[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9002 - loss: 0.2909


[0.28933990001678467, 0.8997499942779541]

Model B reaches 89.8% accuracy on its test set.

In [12]:
model_A.save("my_model_A.keras")

## Train Model B

In [13]:
# extra code – train and evaluate model B, without reusing model A

tf.random.set_seed(43)
model_B = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=[28, 28]),
    tf.keras.layers.Dense(100, activation="relu",
                          kernel_initializer="he_normal"),
    tf.keras.layers.Dense(100, activation="relu",
                          kernel_initializer="he_normal"),
    tf.keras.layers.Dense(100, activation="relu",
                          kernel_initializer="he_normal"),
    tf.keras.layers.Dense(1, activation="sigmoid") # solo 1 uscita, no softmax
])

model_B.compile(loss="binary_crossentropy", # ora abbiamo solo due classi
                optimizer=tf.keras.optimizers.SGD(learning_rate=0.001),
                metrics=["accuracy"])
history = model_B.fit(X_train_B, y_train_B, epochs=20,
                      validation_data=(X_valid_B, y_valid_B))

Epoch 1/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 224ms/step - accuracy: 0.4460 - loss: 0.6784 - val_accuracy: 0.5163 - val_loss: 0.6402
Epoch 2/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 21ms/step - accuracy: 0.4977 - loss: 0.6383 - val_accuracy: 0.5816 - val_loss: 0.6142
Epoch 3/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step - accuracy: 0.6449 - loss: 0.6091 - val_accuracy: 0.6855 - val_loss: 0.5935
Epoch 4/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 21ms/step - accuracy: 0.7585 - loss: 0.5854 - val_accuracy: 0.7606 - val_loss: 0.5753
Epoch 5/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step - accuracy: 0.8319 - loss: 0.5643 - val_accuracy: 0.8051 - val_loss: 0.5592
Epoch 6/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 21ms/step - accuracy: 0.8769 - loss: 0.5454 - val_accuracy: 0.8398 - val_loss: 0.5444
Epoch 7/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━

In [14]:
model_B.evaluate(X_test_B, y_test_B)

[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.9294 - loss: 0.4085


[0.4120299220085144, 0.9225000143051147]

Model B reaches 89.7% accuracy on the test set. Now let's try reusing the pretrained model A.

## Reuse Model A

In [27]:
tf.random.set_seed(43)  # extra code – ensure reproducibility

model_A = tf.keras.models.load_model("my_model_A.keras")

Note that `model_B_on_A` and `model_A` actually share layers now, so when we train one, it will update both models. If we want to avoid that, we need to build `model_B_on_A` on top of a *clone* of `model_A`:

In [28]:
model_A_clone = tf.keras.models.clone_model(model_A)
model_A_clone.set_weights(model_A.get_weights())

In [29]:
# extra code – creating model_B_on_A just like in the previous cell
modelAB = tf.keras.Sequential(model_A.layers[:-1])

In [30]:
modelAB.add(tf.keras.layers.Dense(1, activation="sigmoid"))

In [31]:
#Fit
modelAB.compile(loss="binary_crossentropy",
                optimizer=tf.keras.optimizers.SGD(learning_rate=0.001),
                metrics=["accuracy"])

In [32]:
#Not trainable layers
for layer in modelAB.layers[:-1]:
    layer.trainable = False

In [33]:
modelAB.fit(X_train_B, y_train_B, epochs=5,
            validation_data=(X_valid_B, y_valid_B))

Epoch 1/5
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 204ms/step - accuracy: 0.9191 - loss: 0.3413 - val_accuracy: 0.9179 - val_loss: 0.3682
Epoch 2/5
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step - accuracy: 0.9563 - loss: 0.3241 - val_accuracy: 0.9169 - val_loss: 0.3630
Epoch 3/5
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 29ms/step - accuracy: 0.9639 - loss: 0.3212 - val_accuracy: 0.9100 - val_loss: 0.3609
Epoch 4/5
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 31ms/step - accuracy: 0.9639 - loss: 0.3194 - val_accuracy: 0.9080 - val_loss: 0.3589
Epoch 5/5
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 31ms/step - accuracy: 0.9639 - loss: 0.3173 - val_accuracy: 0.9080 - val_loss: 0.3568


<keras.src.callbacks.history.History at 0x7abd941e1be0>

In [34]:
#Set Trainable
for layer in modelAB.layers[:-1]:
    layer.trainable = True

In [35]:
modelAB.fit(X_train_B, y_train_B, epochs=20,
            validation_data=(X_valid_B, y_valid_B))

Epoch 1/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 26ms/step - accuracy: 0.9639 - loss: 0.3151 - val_accuracy: 0.9080 - val_loss: 0.3547
Epoch 2/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step - accuracy: 0.9639 - loss: 0.3127 - val_accuracy: 0.9080 - val_loss: 0.3526
Epoch 3/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step - accuracy: 0.9639 - loss: 0.3104 - val_accuracy: 0.9080 - val_loss: 0.3505
Epoch 4/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 21ms/step - accuracy: 0.9639 - loss: 0.3080 - val_accuracy: 0.9090 - val_loss: 0.3485
Epoch 5/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step - accuracy: 0.9639 - loss: 0.3058 - val_accuracy: 0.9090 - val_loss: 0.3465
Epoch 6/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step - accuracy: 0.9639 - loss: 0.3036 - val_accuracy: 0.9090 - val_loss: 0.3445
Epoch 7/20
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x7abd9751a840>

So, what's the final verdict?

In [36]:
#Evaluate
modelAB.evaluate(X_test_B, y_test_B)

[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.9139 - loss: 0.3196


[0.3236272931098938, 0.9089999794960022]

Great! We got a bit of transfer: the model's accuracy went up 2 percentage points, from 89.7% to 92.25%. This means the error rate dropped by almost 25%:

In [37]:
1 - (100 - 92.25) / (100 -89.7)

0.24757281553398036