### Simple example of using a convolutional base for feature extraction. This is useful when you have trained your own large model on a sizeable data set, and want to reuse the base weights to train a new model, saving computation. Similar methods work with pretrained models, using the convolutional base with frozen weights, or for feature extraction. See `pretrained_models` notebook and also the readings.

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Load and preprocess the MNIST dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

## As an example, we convert the labels to one-hot encoded format, and use categorical_crossentropy as the loss function.
## This should be equivalent to not one-hotting and using sparse_categorical_crossentropy.
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

# Define the model architecture using the Functional API
inputs = keras.Input(shape=(28, 28, 1))
x = layers.Conv2D(32, kernel_size=(3, 3), activation='relu')(inputs)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Conv2D(64, kernel_size=(3, 3), activation='relu')(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Flatten()(x)
x = layers.Dense(128, activation='relu')(x)
outputs = layers.Dense(10, activation='softmax')(x)

model = keras.Model(inputs=inputs, outputs=outputs)

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, batch_size=128, epochs=5, validation_split=0.1)


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 0us/step
Epoch 1/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 15ms/step - accuracy: 0.9337 - loss: 0.2181 - val_accuracy: 0.9823 - val_loss: 0.0620
Epoch 2/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 15ms/step - accuracy: 0.9803 - loss: 0.0635 - val_accuracy: 0.9877 - val_loss: 0.0470
Epoch 3/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 16ms/step - accuracy: 0.9869 - loss: 0.0431 - val_accuracy: 0.9882 - val_loss: 0.0446
Epoch 4/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 16ms/step - accuracy: 0.9896 - loss: 0.0324 - val_accuracy: 0.9887 - val_loss: 0.0393
Epoch 5/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 17ms/step - accuracy: 0.9922 - loss: 0.0253 - val_accuracy: 0.9893 - val_loss: 0.0325


<keras.src.callbacks.history.History at 0x313568a10>

In [5]:
model.save("mnist_model.keras")

In [8]:
model = keras.models.load_model("mnist_model.keras")

  saveable.load_own_variables(weights_store.get(inner_path))


In [9]:
model.summary()

#### Convolution base output 

In [12]:
print(f"Layer 4 ({model.layers[4].name}): {model.layers[4].output.shape}")

print(f"Layer 4 type: {type(model.layers[4]).__name__}")

Layer 4 (max_pooling2d_1): (None, 5, 5, 64)
Layer 4 type: MaxPooling2D


In [13]:
inputs = model.input
print(inputs)

<KerasTensor shape=(None, 28, 28, 1), dtype=float32, sparse=False, ragged=False, name=input_layer>


In [14]:
conv_base = keras.Model(inputs=model.input, outputs=model.layers[4].output)
conv_base.summary()

#### Get the features, use as inputs to a new model

In [15]:
features = conv_base.predict(x_train)
features.shape

[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step


(60000, 5, 5, 64)

In [16]:
top_inputs = keras.Input(shape=(5, 5, 64))
x = layers.Flatten()(top_inputs)
x = layers.Dense(256)(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(10, activation='softmax')(x)

model_recon = keras.Model(inputs=top_inputs, outputs=outputs)

# Compile the model
model_recon.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
model_recon.fit(features, y_train, batch_size=128, epochs=5, validation_split=0.1)

Epoch 1/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.9628 - loss: 0.1203 - val_accuracy: 0.9868 - val_loss: 0.0466
Epoch 2/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.9628 - loss: 0.1203 - val_accuracy: 0.9868 - val_loss: 0.0466
Epoch 2/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.9805 - loss: 0.0624 - val_accuracy: 0.9895 - val_loss: 0.0399
Epoch 3/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.9805 - loss: 0.0624 - val_accuracy: 0.9895 - val_loss: 0.0399
Epoch 3/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.9826 - loss: 0.0584 - val_accuracy: 0.9877 - val_loss: 0.0476
Epoch 4/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.9826 - loss: 0.0584 - val_accuracy: 0.9877 - val_loss: 0.0476
Epoch 4/5
[1m422/422[0m [32m━━━

<keras.src.callbacks.history.History at 0x3137c83e0>

To evaluate, you would also have to get the test features using conv_base to predict x_test. The function in `get_features_and_labels` in the `pretrained_models` notebook (reproduced from Chollet listing 8.20)  does all this concisely. Another option is to stack your top layer on top of the base, and just freeze the weights on the base, and train only the top layers; or better yet, just use the entire pretrained model including the top, and freeze the convolution layers, or perhaps even leave the last one or two convolution layers unfrozen. Then you dont have to use predict to get the features, then feed into a prediction model. The drawback is that that would process every observation on the frozen layers for every epoch, instead of just once, so more computationally expensive.