Transfer learning consists of taking features learned on one problem, and leveraging them on a new, similar problem. For instance, features from a model that has learned to identify racoons may be useful to kick-start a model meant to identify tanukis.
Transfer learning is usually done for tasks where your dataset has too little data to train a full-scale model from scratch.
The most common incarnation of transfer learning in the context of deep learning is the following workflow:

- Take layers from a previously trained model.
- Freeze them, so as to avoid destroying any of the information they contain during future training rounds.
- Add some new, trainable layers on top of the frozen layers. They will learn to turn the old features into predictions on a new dataset.
- Train the new layers on your dataset.

A last, optional step, is fine-tuning, which consists of unfreezing the entire model you obtained above (or part of it), and re-training it on the new data with a very low learning rate.

**Workflow**

- Instantiate a base model and load pre-trained weights into it.
- Freeze all layers in the base model by setting trainable = False.
- Create a new model on top of the output of one (or several) layers from the base model.
- Train your new model on your new dataset.

then fine-tunning


**Fine tunning**

Once your model has converged on the new data, you can try to unfreeze all or part of the base model and retrain the whole model end-to-end with a very low learning rate.

It is critical to only do this step after the model with frozen layers has been trained to convergence. If you mix randomly-initialized trainable layers with trainable layers that hold pre-trained features, the randomly-initialized layers will cause very large gradient updates during training, which will destroy your pre-trained features.
It's also critical to use a very low learning rate at this stage, because you are training a much larger model than in the first round of training, on a dataset that is typically very small. As a result, you are at risk of overfitting very quickly if you apply large weight updates. Here, you only want to readapt the pretrained weights in an incremental way.


**About BN**
Many image models contain BatchNormalization layers. That layer is a special case on every imaginable count. Here are a few things to keep in mind.
BatchNormalization contains 2 non-trainable weights that get updated during training. These are the variables tracking the mean and variance of the inputs.
When you set bn_layer.trainable = False, the BatchNormalization layer will run in inference mode, and will not update its mean & variance statistics. This is not the case for other layers in general, as weight trainability & inference/training modes are two orthogonal concepts. But the two are tied in the case of the BatchNormalization layer.
When you unfreeze a model that contains BatchNormalization layers in order to do fine-tuning, you should keep the BatchNormalization layers in inference mode by passing training=False when calling the base model. Otherwise the updates applied to the non-trainable weights will suddenly destroy what the model has learned.


AGREGAR UN DIBUJO

In [1]:
import tensorflow as tf
from tensorflow.keras.applications import VGG16
from tensorflow.keras.applications.vgg16 import preprocess_input
from tensorflow.keras.layers import Dense, Flatten, Dropout
from tensorflow.keras.models import Model
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical

2025-06-20 17:25:45.389272: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-06-20 17:25:45.389720: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-06-20 17:25:45.569310: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-06-20 17:25:45.951102: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [2]:
# Load CIFAR-10 data
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

OUR BASE MODEL Will be vgg-16 so we need to make the appropiate preprocessing

In [3]:
# Resize CIFAR-10 images from 32x32 to 48x48 or 64x64 for VGG16 (which expects at least 48x48)
IMG_SIZE = 48

def resize_images(images):
    return tf.image.resize(images, (IMG_SIZE, IMG_SIZE))

In [4]:
x_train_resized = resize_images(x_train)
x_test_resized = resize_images(x_test)

# Preprocess input for VGG16 (scales pixel values in the way VGG16 expects)
x_train_preprocessed = preprocess_input(x_train_resized)
x_test_preprocessed = preprocess_input(x_test_resized)

# One-hot encode labels
y_train_cat = to_categorical(y_train, 10)
y_test_cat = to_categorical(y_test, 10)

2025-06-20 17:26:06.981966: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 1382400000 exceeds 10% of free system memory.
2025-06-20 17:26:08.449533: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 1382400000 exceeds 10% of free system memory.
2025-06-20 17:26:10.196596: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 1382400000 exceeds 10% of free system memory.


In [6]:
import tensorflow as tf

'''
Feature Extraction is performed by VGG16 pretrained on ImageNet.
Input size is 224 x 224.
'''
def feature_extractor(inputs):
    vgg = tf.keras.applications.VGG16(input_shape=(224, 224, 3),
                                      include_top=False,
                                      weights='imagenet')
    vgg.trainable = False  # Freeze feature extractor
    return vgg(inputs)


'''
Defines final dense layers and subsequent softmax layer for classification.
'''
def classifier(inputs):
    x = tf.keras.layers.GlobalAveragePooling2D()(inputs)
    x = tf.keras.layers.Flatten()(x)
    x = tf.keras.layers.Dense(1024, activation="relu")(x)
    x = tf.keras.layers.Dense(512, activation="relu")(x)
    x = tf.keras.layers.Dense(10, activation="softmax", name="classification")(x)
    return x


'''
Since CIFAR-10 image size is (32 x 32), first upsample to (224 x 224).
Connect feature extractor and classifier to build the final model.
'''
def final_model(inputs):
    resize = tf.keras.layers.UpSampling2D(size=(7,7))(inputs)  # 32x7 = 224
    vgg_features = feature_extractor(resize)
    classification_output = classifier(vgg_features)
    return classification_output


'''
Define and compile the model using SGD and sparse categorical crossentropy.
'''
def define_compile_model():
    inputs = tf.keras.layers.Input(shape=(32, 32, 3))
    output = final_model(inputs)
    model = tf.keras.Model(inputs=inputs, outputs=output)

    model.compile(optimizer='SGD',
                  loss='categorical_crossentropy', # not one-hot ecnoding
                  metrics=['accuracy'])
    return model


# Instantiate the model
model = define_compile_model()


model.summary()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 32, 32, 3)]       0         
                                                                 
 up_sampling2d (UpSampling2  (None, 224, 224, 3)       0         
 D)                                                              
                                                                 
 vgg16 (Functional)          (None, 7, 7, 512)         14714688  
                                                                 
 global_average_pooling2d (  (None, 512)               0         
 GlobalAveragePooling2D)                                         
                                                                 
 flatten (Flatten)           (None, 512)              

In [7]:
# Train
print("Stage 1: Training classifier only (feature extraction)")
model.fit(x_train, y_train_cat,
          validation_data=(x_test, y_test_cat),
          epochs=2,
          batch_size=64)

Epoch 1/2


2025-06-20 18:32:20.577775: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 822083584 exceeds 10% of free system memory.
2025-06-20 18:32:21.432292: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 822083584 exceeds 10% of free system memory.


  5/782 [..............................] - ETA: 1:32:22 - loss: 4.3607 - accuracy: 0.1344

KeyboardInterrupt: 

Calling model.compile() again does NOT reset or forget the learned weights.

When you call model.fit(), training updates the model's weights.

Changing layer.trainable flags changes which weights will be updated in subsequent training.

Calling model.compile() again only updates the training configuration — e.g., optimizer, loss, metrics, learning rate.

The model’s weights stay intact across recompiles, so previously learned information is preserved.

So your workflow is correct for fine-tuning:

Initially, you train the model (usually with some layers frozen).

Then you unfreeze some layers (e.g., 'block5' layers).

You recompile the model with a lower learning rate optimizer.

Finally, you call fit() again to continue training those unfrozen layers.

This will fine-tune those layers without losing the previous training progress.

-------------
What does compile do?

Compile defines the loss function, the optimizer and the metrics. That's all.

It has nothing to do with the weights and you can compile a model as many times as you want without causing any problem to pretrained weights.

until now feature extraction

- You're using a pretrained model (VGG16 with ImageNet weights).

- You freeze its layers: vgg.trainable = False ➜ no weight updates during training.
dsp. mostyrar como no cambian los pesois de estas cxapas

- You're only training the new classifier head on your dataset (CIFAR-10).

- You're not fine-tuning the base model's convolutional layers.


### Fine-tuning

Here’s what we’ll do:

Train only the classifier first (feature extraction).

Then unfreeze the top layers of VGG16 (e.g., last 4 convolutional blocks).

Compile and continue training with a lower learning rat

In [8]:
# Access the base model (3rd layer in our full model); vgg-16 the others are the input and the upsampling
model.layers[2].summary()

Model: "vgg16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 224, 224, 3)]     0         
                                                                 
 block1_conv1 (Conv2D)       (None, 224, 224, 64)      1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 224, 224, 64)      36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 112, 112, 64)      0         
                                                                 
 block2_conv1 (Conv2D)       (None, 112, 112, 128)     73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 112, 112, 128)     147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 56, 56, 128)       0     

In [None]:
# === Stage 2: Fine-tune top VGG layers ===
# Unfreeze only last 4 convolutional blocks (block5)
for layer in model.layers[2].layers:
    if 'block5' in layer.name:
        layer.trainable = True

# Compile with lower learning rate
model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=1e-4, momentum=0.9),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

print("Stage 2: Fine-tuning VGG16 top layers")
model.fit(x_train, y_train_cat,
          validation_data=(x_test, y_test_cat),
          epochs=5,
          batch_size=64)


#Calling model.compile() again does NOT reset or forget the learned weights.

In [None]:
# combine plots
import matplotlib.pyplot as plt

# Store history for both stages
history1 = model.fit(x_train, y_train_cat,
                     validation_data=(x_test, y_test_cat),
                     epochs=5,
                     batch_size=64)

# === Fine-tuning stage: unfreeze block5 ===
for layer in model.layers[2].layers:
    if 'block5' in layer.name:
        layer.trainable = True

# Recompile with lower LR
model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=1e-4, momentum=0.9),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train again
history2 = model.fit(x_train, y_train_cat,
                     validation_data=(x_test, y_test_cat),
                     epochs=5,
                     batch_size=64)

# === Combine and plot training history ===
acc = history1.history['accuracy'] + history2.history['accuracy']
val_acc = history1.history['val_accuracy'] + history2.history['val_accuracy']
epochs = list(range(1, len(acc) + 1))

plt.figure(figsize=(8, 5))
plt.plot(epochs, acc, label='Training Accuracy')
plt.plot(epochs, val_acc, label='Validation Accuracy')
plt.axvline(x=len(history1.history['accuracy']), color='red', linestyle='--', label='Start Fine-Tuning')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
