# Implicit regularization in Transfer Learning

Transfer Learning has been a hot topic in the field of machine learning for the past few years.

**The problem**: We aim to approximate a function $f : \mathbb{R}^d \rightarrow \mathbb{R}$ using a model $h : \mathbb{R}^d \rightarrow \mathbb{R}$, where $h$ is a neural network with parameters $\theta$. The challenge arises from the fact that the data generating distribution $\mathcal{D}$ is difficult to sample from, resulting in a small dataset $D = \{(x_i, y_i)\}_{i=1}^n$ with limited samples.

**The idea behind Transfer Learning**:
The core intuition lies in leveraging knowledge gained in one context (source domain or task) to improve learning in a different, but related context (target domain or task).


In [372]:
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.model_selection import train_test_split
from tqdm.notebook import tqdm
from sklearn.decomposition import PCA
from copy import deepcopy
from utils import *

SEED = 26
np.random.seed(SEED)
torch.manual_seed(SEED)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device: ", device)

Using device:  cuda


## Loading the models

In [373]:
from torchvision.models import resnet50, ResNet50_Weights

model_new        = resnet50(weights=None)
model_pretrained = resnet50(weights=ResNet50_Weights.IMAGENET1K_V2)

## Fine tuning

In [374]:
# Stanford Cars Dataset
from torchvision.datasets import Flowers102
import torchvision.transforms as transforms

# Load the dataset
transform = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.ToTensor()
])
train_dataset = Flowers102(
    root = 'data',
    split = 'train',
    transform=transform,
    download=True
)
test_dataset = Flowers102(
    root = 'data',
    split = 'test',
    transform=transform,
    download=True
)

print("Number of training samples: ", len(train_dataset))
print("Number of test samples: ", len(test_dataset))

Number of training samples:  1020
Number of test samples:  6149


In [375]:
BATCH_SIZE = 32
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False)

In [376]:
# Adapt the last layer of the model from 1000 classes to 101 classes
model_new.fc = nn.Linear(model_new.fc.in_features, 102)
model_pretrained.fc = nn.Linear(model_pretrained.fc.in_features, 102)

## Fine tuning

In [377]:
n_epochs = 10
criterion = nn.CrossEntropyLoss()
lr = 1e-3
optimizer_new = optim.Adam(model_pretrained.parameters(), lr=lr)
optimizer_pretrained = optim.Adam(model_pretrained.parameters(), lr=lr)

model_new.to(device)
model_pretrained.to(device)

print(f"Number of parameters in the model: {sum(p.numel() for p in model_pretrained.parameters()):E}")

Number of parameters in the model: 2.371703E+07


In [378]:
progress_bar = tqdm(range(n_epochs), desc="Epochs")
train_losses_new, test_losses_new = [], []
for epoch in progress_bar:
    model_new.train()

    train_loss = 0
    for X, y in train_loader:
        X, y = X.to(device), y.to(device)
        optimizer_new.zero_grad()
        y_hat = model_new(X)
        loss = criterion(y_hat, y)
        loss.backward()
        optimizer_new.step()
        train_loss += loss.item()
    train_loss /= len(train_loader)
    train_losses_new.append(train_loss)

    model_new.eval()
    test_loss = 0
    for X, y in test_loader:
        X, y = X.to(device), y.to(device)
        y_hat = model_new(X)
        loss = criterion(y_hat, y)
        test_loss += loss.item()
    test_loss /= len(test_loader)
    test_losses_new.append(test_loss)

    progress_bar.set_postfix({
        "train_loss": train_loss,
        "test_loss": test_loss
    })                            

Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

KeyboardInterrupt: 