### Fine Tuning Techniques in Computer Vision

### Choosing Layers to Fine-Tune and Understanding the Feature Extraction Process

#### Feature Extraction in Pre-trained Models

- **Early layers:** Capture low-level features such as edges, textures, and simple patterns.
- **Middle layers:** Capture mid-level features like shapes, motifs, and parts of objects.
- **Late layers:** Capture high-level, task-specific features such as object categories or semantic patterns.

#### Choosing Layers to Fine-Tune

- **Freeze early layers:** Retain general features learned during pre-training, as these are often transferable across tasks.
- **Unfreeze late layers:** Allow the model to adapt high-level features to the specifics of the new task by updating their weights.

#### Best Practices

- **For small datasets:** Fine-tune only the last few layers to avoid overfitting and leverage the generalization of pre-trained features.
- **For large datasets:** Unfreeze more layers and fine-tune with a smaller learning rate, allowing the model to learn more task-specific features.

---

### Data Augmentation for Improving Generalization

**What is augmentation?**
- Artificially increases the diversity of training data by applying transformations such as rotation, flipping, scaling/zooming, cropping, and color jittering.

**Why use data augmentation?**
- Reduces overfitting by introducing variability in the training data.
- Improves the model's ability to generalize to unseen data.

**Examples of augmentation:**
- **Rotation:** Randomly rotate images by a certain degree range.
- **Flip:** Apply horizontal and/or vertical flips.
- **Zoom:** Randomly zoom in or out on images.
- **Cropping:** Randomly crop parts of the image.
- **Color jittering:** Randomly change brightness, contrast, saturation, or hue.

---

### Hyperparameter Tuning for Transfer Learning

**Key hyperparameters:**

- **Learning rate:**  
    - A smaller learning rate is recommended for fine-tuning pre-trained models to avoid drastic updates to learned weights.
    - Too large: May overshoot the optimal solution.
    - Too small: Leads to slow convergence.

- **Batch size:**  
    - Larger batches stabilize training but require more memory.
    - Smaller batches may lead to noisier updates but can help with limited resources.

- **Optimizer:**  
    - **SGD (Stochastic Gradient Descent):** Works well with transfer learning, especially when paired with momentum.
    - **Adam:** Offers faster convergence but may require careful tuning for stability.

**Tuning Process:**
- Start with default settings.
- Experiment with one hyperparameter at a time to isolate its effect.
- Monitor validation performance to guide adjustments.
- Consider using learning rate schedulers or early stopping to optimize training.

---


In [None]:
import tensorflow as tf
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [None]:
# load pretrained mobilenetv2
base_model = MobileNetV2(
    weights="imagenet", include_top=False, input_shape=(224, 224, 3)
)

# freeze base model
for layer in base_model.layers:
    layer.trainable = False

# add classification head
x = GlobalAveragePooling2D()(base_model.output)
output = Dense(5, andivation="softmax")(x)
model = Model(inputs=base_model.input, outputs=output)

# define data augmentation
datagen = ImageDataGenerator(
    rescalse=1.0 / 255,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    validation_split=0.2,
)

train_data = datagen.flow_from_directory(
    "PATH_TO_TRAIN_DATA",
    target_size=(224, 224),
    batch_size=32,
    class_mode="categorical",
    subset="training",
)

train_data = datagen.flow_from_directory(
    "PATH_TO_DATATSET",
    target_size=(224, 224),
    batch_size=32,
    class_mode="categorical",
    subset="train",
)

val_data = datagen.flow_from_directory(
    "PATH_TO_DATATSET",
    target_size=(224, 224),
    batch_size=32,
    class_mode="categorical",
    subset="validation",
)

# compile the model
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-5),
    loss="categorical_crossentropy",
    metrics=["accuracy"],
)

# train the model
history = model.fit(
    train_data,
    validation_data=val_data,
    epochs=10,
    steps_per_epoch=len(train_data),
    validation_steps=len(val_data)
)

Pytorch

In [None]:
import torch
import torchvision.models as models
import torch.nn as nn
from torchvision import datasets, transforms
import torch.optim as optim

In [None]:
model = model.mobilenet_v2(pretrained=True)

for param in model.parameters():
    param.requires_grad = False

model.classifier[1] = nn.Linear(model.last_channel,5)

train_transform = transforms.Compose([
    transforms.RandomRotation(30),
    transforms.RandomResizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

val_transforms = transforms([
    transforms.Resize(256),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

train_data = datasets.ImageFolder("TRAINING_IMAGE_FOLDER", transform=train_transform)
val_data = datasets.ImageFolder("VALIDATION_IMAGE_FOLDER", transform=val_transforms)

train_loader = torch.utils.data.DataLoader(train_data, batch_size=32, shuffle=True)
val_loader = torch.utils.data.DataLoader(val_data, batch_size=32, shuffle=False)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

for epoch in range(10):
    model.train()
    for inputs, labels in train_loader:
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
    print(f"Epoch {epoch}: Loss: {loss.item()}")
