# 01 Baseline from scratch

Train the target classifier from random initialization, with no transferred weights.

This is the denominator for every transfer claim in the later notebooks.

## Step 1: Imports and setup

Use direct training calls so the workflow is visible end-to-end.

In [1]:
from pathlib import Path
import sys
import torch
import pandas as pd
import matplotlib.pyplot as plt

ROOT = Path.cwd().resolve()
while ROOT != ROOT.parent and not (ROOT / 'src').is_dir():
    ROOT = ROOT.parent
sys.path.insert(0, str(ROOT / 'src'))

from utils.seed import set_seed
from data.cifar10_transfer import get_cifar10_transfer
from models.transfer_resnet import TransferResNet18
from methods.transfer_learning import run_target_adaptation

FIGS = ROOT / 'outputs' / 'figures'
FIGS.mkdir(parents=True, exist_ok=True)

ModuleNotFoundError: No module named 'torch'

## Step 2: Build loaders and train scratch baseline
Use explicit constants and run the baseline directly.

In [None]:
SEED = 0
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

loaders = get_cifar10_transfer(
    data_dir='./data',
    source_classes=[2, 3, 4, 5, 6, 7],
    target_classes=[3, 5, 7],
    source_train_per_class=1000,
    source_test_per_class=300,
    target_train_per_class=80,
    target_test_per_class=300,
    probe_per_class=120,
    batch_size=128,
    num_workers=2,
    seed=SEED,
)

model = TransferResNet18(num_classes=loaders.target_num_classes)
result = run_target_adaptation(
    model=model,
    target_train=loaders.target_train,
    target_test=loaders.target_test,
    target_probe=loaders.target_probe,
    source_test=None,
    source_head=None,
    device=DEVICE,
    strategy='scratch',
    epochs=10,
    lr=0.01,
    weight_decay=5e-4,
    momentum=0.9,
    gradual_schedule={
        2: ['backbone.layer4'],
        5: ['backbone.layer3', 'backbone.layer2'],
        7: ['backbone.layer1', 'backbone.bn1', 'backbone.conv1'],
    },
    use_progress=True,
)

scratch = pd.DataFrame(result.history)
scratch.head()

## Step 3: Plot target accuracy curve

This is the primary baseline curve transfer methods must beat.

In [None]:
plt.figure(figsize=(5.0, 3.2))
plt.plot(scratch['epoch'], scratch['target_test_acc'], marker='o')
plt.title('Scratch baseline target accuracy')
plt.xlabel('epoch')
plt.ylabel('target_test_acc')
plt.grid(alpha=0.25)
plt.savefig(FIGS / '01_scratch_target_acc.png', dpi=150, bbox_inches='tight')

## Step 4: Inspect class confusion on the target test set

This shows where the low-label baseline still confuses target classes.

In [None]:
model.eval()
num_classes = loaders.target_num_classes
cm = torch.zeros((num_classes, num_classes), dtype=torch.int64)

with torch.no_grad():
    for images, labels in loaders.target_test:
        images, labels = images.to(DEVICE), labels.to(DEVICE)
        preds = model(images).argmax(dim=1)
        for t, p in zip(labels.cpu(), preds.cpu()):
            cm[t, p] += 1

plt.figure(figsize=(4.8, 4.0))
plt.imshow(cm.numpy(), cmap='Blues')
plt.title('Scratch confusion matrix (target task)')
plt.xlabel('predicted class (remapped)')
plt.ylabel('true class (remapped)')
plt.colorbar()
plt.savefig(FIGS / '01_scratch_confusion_matrix.png', dpi=150, bbox_inches='tight')
cm

## Step 5: Plot stability-side diagnostics

Track representation movement and gradient behavior during adaptation.

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(10.0, 3.2))
axes[0].plot(scratch['epoch'], scratch['feature_drift'], marker='o', color='#E45756')
axes[0].set_title('Feature drift')
axes[0].set_xlabel('epoch')
axes[0].set_ylabel('feature_drift')
axes[0].grid(alpha=0.25)

axes[1].plot(scratch['epoch'], scratch['grad_norm'], marker='o', color='#4C78A8')
axes[1].set_title('Gradient norm')
axes[1].set_xlabel('epoch')
axes[1].set_ylabel('grad_norm')
axes[1].grid(alpha=0.25)

fig.savefig(FIGS / '01_scratch_stability.png', dpi=150, bbox_inches='tight')

summary = {
    'initial_target_acc': float(scratch['target_test_acc'].iloc[0]),
    'final_target_acc': float(scratch['target_test_acc'].iloc[-1]),
    'acc_gain': float(scratch['target_test_acc'].iloc[-1] - scratch['target_test_acc'].iloc[0]),
    'final_feature_drift': float(scratch['feature_drift'].iloc[-1]),
    'final_grad_norm': float(scratch['grad_norm'].iloc[-1]),
}
summary

### Expected Outcome

Scratch improves over epochs, but in low-label settings it should remain a limited baseline.

Later transfer methods should beat this curve in final accuracy, convergence speed, or both.

## Observations -> Why -> Transfer opportunity

**What you should see**

- Accuracy rises, then starts to flatten.
- Confusions remain on some target classes.
- Feature drift and gradient trends provide stability context.

**Why this matters**

- This baseline anchors the comparison.
- If transfer does not improve over this denominator, it is not earning its complexity.