# Transfer Learning

_Transfer learning_ consists of reusing a previously trained model, typically on a large
generic dataset (for example, ImageNet, with millions of images), as a starting point for
solving a specific task. Instead of training a network from scratch, the representations
already learned by the model are leveraged, which typically capture basic patterns such
as edges, textures, shapes, and more complex compositions.

This approach notably reduces the amount of data required, accelerates training, and
typically provides better performance when working with small or medium-sized datasets.
The network already "knows" general visual features, and it is only necessary to
specialize it for the new task.

## Basic Transfer Learning Strategies

In practice, the use of pretrained models is articulated around three main strategies,
which differ in which parts of the model are updated during training.

### Feature Extraction

In the _feature extraction_ strategy, all model parameters are frozen except the last
classification layer. In this way, the pretrained model acts as a fixed feature extractor
and only a lightweight classifier is trained on top.

In [None]:
# 3pps
import torch
import torch.nn as nn
import torchvision.models as models


# Load pretrained model
model = models.resnet18(pretrained=True)

# Freeze ALL layers
for param in model.parameters():
    param.requires_grad = False

# Replace the last layer (classifier)
num_classes = 2  # Example: dogs vs cats
model.fc = nn.Linear(model.fc.in_features, num_classes)

# Only model.fc parameters are trained

This option is especially suitable when the dataset is very small (less than about 1000
images) and the images are relatively similar to those in ImageNet (natural scenes,
everyday objects). Training is fast and the risk of overfitting is reduced, since most
weights remain fixed.

### Partial Fine-Tuning (Fine-Tuning Upper Layers)

In _partial fine-tuning_, the deepest layers (close to the input) are frozen and the last
convolutional layers are unfrozen together with the classifier. The idea is to preserve
the most generic features (edges, textures) and adapt high-level representations to the
new task.

In [None]:
# 3pps
import torchvision.models as models


# Load model
model = models.resnet18(pretrained=True)

# Freeze first layers (for example, all except layer4 and fc)
for name, param in model.named_parameters():
    if "layer4" not in name and "fc" not in name:
        param.requires_grad = False

# Replace classifier
num_classes = 2
model.fc = nn.Linear(model.fc.in_features, num_classes)

# Optimizer with different learning rates
optimizer = torch.optim.Adam(
    [
        {
            "params": model.layer4.parameters(),
            "lr": 1e-4,
        },  # Low LR for pretrained layers
        {"params": model.fc.parameters(), "lr": 1e-3},  # Higher LR for new layer
    ]
)

This approach is appropriate for medium-sized datasets (on the order of 1,000 to 10,000
images) and when the domain is moderately different from ImageNet (for example, medical
images of structures that still share certain types of visual patterns).

### Full Fine-Tuning

In _full fine-tuning_, all parameters of the pretrained model are retrained, albeit with
a relatively low learning rate to avoid abruptly destroying prior knowledge.

In [None]:
# Load model
model = models.resnet18(pretrained=True)

# Replace classifier
num_classes = 2
model.fc = nn.Linear(model.fc.in_features, num_classes)

# Entire model is trainable (low LR)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)

This strategy is recommended when a large dataset is available (more than 10,000 images)
or when the domain is very different from the original training domain (for example, very
specific medical images, satellite images, or industrial data with unusual textures). In
exchange for higher computational cost, the maximum possible accuracy can be obtained.

## Complete Example: Dogs vs Cats Classification

The following presents a simplified workflow based on _feature extraction_ for a binary
classification problem, for example dogs versus cats.

### Data Preparation

The necessary transformations are defined, including normalization with ImageNet means
and standard deviations, a key requirement to correctly reuse pretrained models.

In [None]:
# 3pps
from torch.utils.data import DataLoader
from torchvision import datasets, transforms


# Transformations (ImageNet normalization)
transform = transforms.Compose(
    [
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(
            mean=[0.485, 0.456, 0.406],  # ImageNet mean
            std=[0.229, 0.224, 0.225],  # ImageNet standard deviation
        ),
    ]
)

# Load dataset organized in folders by class
train_dataset = datasets.ImageFolder("data/train", transform=transform)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

### Model Loading and Training

ResNet-18 is used as a feature extractor and only the last layer is trained.

In [None]:
# 3pps
import torch
import torch.nn as nn
import torchvision.models as models


# Load pretrained model
model = models.resnet18(pretrained=True)

# Feature extraction: freeze everything except the last layer
for param in model.parameters():
    param.requires_grad = False

model.fc = nn.Linear(512, 2)  # 2 classes: dogs and cats

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.fc.parameters(), lr=1e-3)

for epoch in range(5):
    model.train()
    for images, labels in train_loader:
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

After this training, the model is specialized in distinguishing between dogs and cats
based on the general representations it had already learned on ImageNet.

## Embedding Extraction

_Embeddings_ are high-dimensional numerical vectors that compactly represent the content
of an image. To obtain them, the pretrained model is reused by removing the last
classification layer and using only the part that acts as a feature extractor.

In [None]:
# 3pps
import torch
import torch.nn as nn


class FeatureExtractor:
    def __init__(self, model):
        # Remove the last layer (classifier) and keep the convolutional trunk
        self.features = nn.Sequential(*list(model.children())[:-1])
        self.features.eval()

    def extract(self, image):
        """Extracts the embedding of one or more images."""
        with torch.no_grad():
            embedding = self.features(image)  # (B, C, 1, 1) in ResNet
            embedding = embedding.view(embedding.size(0), -1)  # Flatten to (B, C)
        return embedding.numpy()


# Usage
extractor = FeatureExtractor(model)
image = torch.randn(1, 3, 224, 224)  # Example image
embedding = extractor.extract(image)
print(f"Embedding shape: {embedding.shape}")  # (1, 512) in ResNet18

These embeddings can be used for additional tasks, such as clustering, visualization,
similarity search, or as input to other models.

## Embedding Visualization with PCA and t-SNE

When embeddings of many images are available (for example, those in the training set),
they can be projected to two dimensions to visualize how different classes cluster in the
feature space.

### Visualization with PCA

Principal Component Analysis (PCA) is a fast linear technique that finds directions of
maximum variance.

In [None]:
# 3pps
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA


# Assume embeddings has shape (N, 512) and labels has shape (N,)
embeddings = ...  # Embedding matrix
labels = ...  # Corresponding labels

pca = PCA(n_components=2)
embeddings_2d = pca.fit_transform(embeddings)

plt.scatter(embeddings_2d[:, 0], embeddings_2d[:, 1], c=labels, cmap="tab10")
plt.xlabel("PC1")
plt.ylabel("PC2")
plt.title("Embeddings in 2D (PCA)")
plt.show()

### Visualization with t-SNE

t-SNE is a nonlinear technique that typically provides clearer visualizations of
clusters, although it is more computationally expensive.

In [None]:
# 3pps
from sklearn.manifold import TSNE


tsne = TSNE(n_components=2, perplexity=30)
embeddings_2d = tsne.fit_transform(embeddings)

plt.scatter(embeddings_2d[:, 0], embeddings_2d[:, 1], c=labels, cmap="tab10")
plt.title("Embeddings in 2D (t-SNE)")
plt.show()

PCA is deterministic and fast; t-SNE better captures local structures and groups, at the
cost of greater computation time and some variability between executions.

## Semantic Search via Cosine Similarity

A direct application of embeddings is semantic search: given a query embedding, the most
similar images are retrieved by comparing their representations using a measure such as
cosine similarity.

### Cosine Similarity

Cosine similarity between two vectors $v_1$ and $v_2$ is defined as:

$$ \text{sim}(v_1, v_2) = \frac{v_1 \cdot v_2}{\|v_1\| \; \|v_2\|}, $$

and takes values between $-1$ and $1$, where 1 indicates identical vectors (same
direction), 0 indicates absence of directional relationship, and âˆ’1 indicates opposition.

In [None]:
# 3pps
import numpy as np


def cosine_similarity(vec1, vec2):
    """
    Calculates cosine similarity between two vectors.
    Result between -1 and 1.
    """
    dot = np.dot(vec1, vec2)
    norm1 = np.linalg.norm(vec1)
    norm2 = np.linalg.norm(vec2)
    return dot / (norm1 * norm2)

### Simple Semantic Searcher

A searcher can be built that, given a query embedding, returns the $k$ most similar
images in a reference set.

In [None]:
class SemanticSearch:
    def __init__(self, embeddings_db, labels_db):
        """
        embeddings_db: Matrix (N, D) with database embeddings.
        labels_db: Vector (N,) with labels or identifiers.
        """
        norms = np.linalg.norm(embeddings_db, axis=1, keepdims=True)
        self.embeddings_db = embeddings_db / norms  # Normalize for cosine
        self.labels_db = labels_db

    def search(self, query_embedding, top_k=5):
        """Returns the top_k entries most similar to the query."""
        query_norm = query_embedding / np.linalg.norm(query_embedding)
        similarities = np.dot(self.embeddings_db, query_norm)
        top_indices = np.argsort(similarities)[::-1][:top_k]
        top_similarities = similarities[top_indices]
        return top_indices, top_similarities

To create the search database, embeddings are calculated for all images in the set:

In [None]:
all_embeddings = []
all_labels = []

for images, labels in train_loader:
    embs = extractor.extract(images)
    all_embeddings.append(embs)
    all_labels.extend(labels.numpy())

all_embeddings = np.vstack(all_embeddings)
all_labels = np.array(all_labels)

searcher = SemanticSearch(all_embeddings, all_labels)

# Query with a new image
query_image = torch.randn(1, 3, 224, 224)  # Example query
query_emb = extractor.extract(query_image).squeeze()

indices, sims = searcher.search(query_emb, top_k=5)
for i, (idx, sim) in enumerate(zip(indices, sims), 1):
    print(f"#{i}: Index {idx}, Similarity {sim:.3f}, Class {all_labels[idx]}")

This mechanism constitutes the basis of visual recommendation systems, similarity-based
image search engines, and visual database exploration tools.

## Complete Simplified Transfer Learning Pipeline

The typical workflow with transfer learning for images can be summarized in the following
steps:

1. Load a pretrained model.
2. Choose the strategy (feature extraction, partial or full fine-tuning).
3. Train according to the selected strategy.
4. Build an embedding extractor from the trained model.
5. Obtain embeddings from the dataset.
6. Visualize the structure of the feature space (PCA, t-SNE).
7. Build a semantic search system on those embeddings.

In code, a simplified pipeline could take this form:

In [None]:
# STEP 1: Load pretrained model and prepare for feature extraction
model = models.resnet18(pretrained=True)
for param in model.parameters():
    param.requires_grad = False
model.fc = nn.Linear(512, 2)

# STEP 2: Train only the classifier
optimizer = torch.optim.Adam(model.fc.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()

for epoch in range(5):
    for images, labels in train_loader:
        outputs = model(images)
        loss = criterion(outputs, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

# STEP 3: Create embedding extractor
extractor = FeatureExtractor(model)

# STEP 4: Calculate dataset embeddings
embeddings = []
labels_list = []

for images, labels in train_loader:
    emb = extractor.extract(images)
    embeddings.append(emb)
    labels_list.extend(labels.numpy())

embeddings = np.vstack(embeddings)
labels_array = np.array(labels_list)

# STEP 5: Visualize (for example, with PCA)
pca = PCA(n_components=2)
emb_2d = pca.fit_transform(embeddings)
plt.scatter(emb_2d[:, 0], emb_2d[:, 1], c=labels_array, cmap="tab10")
plt.show()

# STEP 6: Create semantic searcher
searcher = SemanticSearch(embeddings, labels_array)

# STEP 7: Search for images similar to a query image
query_img = next(iter(train_loader))[0][0:1]
query_emb = extractor.extract(query_img).squeeze()
indices, sims = searcher.search(query_emb, top_k=5)