### Transfer learning in Computer Vision

## Popular Pretrained Models for Vision Tasks

### VGG
- **VGG16 / VGG19**: Deep convolutional neural networks with 16 or 19 layers.
- **Architecture**: Simple and uniform—stacks of 3x3 convolutional layers followed by fully connected layers.
- **Strengths**: Easy to understand and modify; widely used for feature extraction.
- **Applications**: General-purpose image classification, feature extraction, transfer learning baselines.

### ResNet
- **Residual Networks**: Introduced residual (skip) connections to address vanishing gradients in deep networks.
- **Popular Variants**: ResNet18, ResNet34, ResNet50, ResNet101, ResNet152.
- **Strengths**: Enables training of very deep networks; robust performance on large-scale datasets.
- **Applications**: Image classification, object detection, semantic segmentation.

### Inception
- **InceptionV3**: Utilizes inception modules that perform multi-scale feature extraction within a single layer.
- **Architecture**: Combines convolutions of different sizes in parallel, followed by concatenation.
- **Strengths**: Efficient use of computation; good balance between accuracy and speed.
- **Applications**: Scene recognition, fine-grained image classification, image retrieval.

### EfficientNet
- **Family of Models**: Efficiently scales network depth, width, and input resolution using a compound scaling method.
- **Strengths**: Achieves state-of-the-art accuracy with fewer parameters and lower computational cost.
- **Applications**: Resource-constrained environments, mobile and edge devices, high-accuracy tasks.

---

## Freezing and Unfreezing Layers for Fine-Tuning

### Why Freeze Layers?
- Early layers in pretrained models capture generic features (edges, textures, shapes) that are useful across many tasks.
- Freezing these layers:
    - Reduces training time.
    - Prevents overfitting, especially on small datasets.
    - Preserves learned representations from large datasets (e.g., ImageNet).

### Why Unfreeze Layers?
- Later layers learn more task-specific features.
- Unfreezing allows the model to adapt to the new dataset and task.
- Gradual unfreezing can help achieve better performance by fine-tuning higher-level representations.

### Typical Approach
1. **Initial Training**: Freeze most layers; only train the last few (classifier) layers.
2. **Fine-Tuning**: Gradually unfreeze additional layers and continue training with a lower learning rate.

---

## Using Transfer Learning for Image Classification Tasks

1. **Load a Pretrained Model**: Choose a model (e.g., ResNet, VGG) pretrained on a large dataset like ImageNet.
2. **Replace the Classifier**: Modify the final layer(s) to match the number of classes in your target task.
3. **Freeze Base Layers**: Optionally freeze early layers to retain general features.
4. **Train the Classifier**: Train only the new classifier layers initially.
5. **Fine-Tune**: Unfreeze some or all of the base layers and continue training with a reduced learning rate.
6. **Evaluate and Iterate**: Monitor performance and adjust which layers are trainable as needed.

> **Tip:** Always start with a lower learning rate when fine-tuning to avoid destroying the pretrained weights.

In [None]:
import tensorflow as tf
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [None]:
# load pretrained resnet50
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

# freeze base model
for layer in base_model.layers:
    layer.trainable = False

# add custom classification head
x = Flatten()(base_model.output)
x = Dense(256, activation='relu')(x)
output = Dense(5, activation='softmax')(x)

model = Model(inputs= base_model.input, outputs=output)

# compile model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

model.summary()

# data preparation
datagen = ImageDataGenerator(rescale=1./255, validation_split=0.2)

train_data = datagen.flow_from_directory(
    "PATH_TO_TRAIN_DATA",
    target_size=(224, 224),
    batch_size=32,
    class_mode='categorical',
    subset='training'
)

val_data = datagen.flow_from_directory(
    "PATH_TO_TRAIN_DATA",
    target_size=(224, 224),
    batch_size=32,
    class_mode='categorical',
    subset='validation'
)

# train the model
history = model.fit(
    train_data,
    validaton_data = val_data,
    epochs=10,
    steps_per_epoch=len(train_data),
    validation_steps=len(val_data)
)

for layer in base_model.layers[-5:1]:
    layer.trainable = True

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])

val_loss, val_accuracy = model.evaluate(val_data)

print(f"Validation loss: {val_loss}, Validation accuracy: {val_accuracy}")


Use pytorch

In [None]:
import torch
import torchvision.models as models
import torch.nn as nn
from torchvision import datasets, transforms
import torch.optim as optim

In [None]:
# load pretrained resnet50
model = models.resnet50(pretrained=True)

# freeze base model
for param in model.parameters():
    param.requires_grad = False

# replace the last layer for a new task
num_features = model.fc.in_features
model.fc = nn.Sequential(
    nn.Linear(num_features, 256),
    nn.ReLU(),
    nn.Dropout(0.5),
    nn.Linear(256, 5),
    nn.Softmax(dim=1)
)

print(model)

# data preparation
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

train_data = datasets.ImageFolder("PATH_TO_TRAIN_DATA", transform=transform)
val_data = datasets.ImageFolder("PATH_TO_TRAIN_VAL", transform=transform)

train_loader = torch.utils.data.DataLoader(train_data, batch_size=32, shuffle=True)
val_loader = torch.utils.data.DataLoader(val_data, batch_size=32, shuffle=False)

# define loss and optimiser
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# train the model
for epoch in range(10):
    model.train()
    for inputs, labels in train_loader:
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        print(f"Epoch {epoch+1}, loss: {loss.item()}")

for name, param in model.named_parameters():
    if "layer4" in name:
        param.requires_grad = True

model.eval()
correct = 0
total = 0
with torch.no_grad():
    for inputs, labels in val_loader:
        outputs = model(inputs)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f"Accuracy: {100 * correct/total}%")



ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 