## 任务分解

1. **数据准备**  
   1.1. 划分教师子集  
   - 决定 \(K\) 个教师各自使用的训练子集（例如按类别分、或不同随机划分）。  
   - 生成对应的 PyTorch `Subset` 或自定义 sampler。  
   1.2. 标准 CIFAR‑10（或 MNIST）预处理  
   - 随机裁剪、水平翻转、归一化；  
   - 构建训练/验证 `DataLoader`（batch size、shuffle、num_workers）。

2. **模型定义**  
   2.1. 教师们  
   - 实例化同一架构（如 ResNet18）或多种架构（ResNet18/34/50…）的 \(K\) 个模型；  
   2.2. 学生  
   - 实例化学生网络（如 ResNet18）——单一、小容量；  
   2.3. 参数统计  
   - 用 `torchinfo` 或 `flops-counter` 输出每个模型的参数量和 FLOPs。

3. **教师训练**  
   - 并行／顺序地在各自子集上训练 \(K\) 个教师；  
   - 保存各自最佳 checkpoint；  
   - 记录各自的验证准确率。

4. **单教师蒸馏（Forward KD）——复用 Part 4 代码**  
   - **已完成**：  
     - 生成单一教师温度化软标签；  
     - 定义并训练学生的 KD 损失（KL+CE）；  
     - 评估并记录学生蒸馏后的性能。  

5. **多教师蒸馏**  
   5.1. **软标签融合**  
   - 每个 mini‑batch 对 \(K\) 位教师分别做 forward，收集它们的 \(\tau\)‑softmax 输出；  
   - 逐元素平均得到 \(\,p_{\rm avg}^{(\tau)}\)；  
   5.2. **学生训练**  
   - 损失同 Forward KD，只用 \(p_{\rm avg}^{(\tau)}\) 代替单个教师；  
   - 学生前向／反向、保存 checkpoint；  
   5.3. **评估**  
   - 在同一验证集上测试学生，多教师 vs. 单教师 vs. 硬标签基线的准确率对比；  

6. **超参数扫描与消融**  
   - 遍历 \(\tau\in\{1,5,10,20\}\)、\(\alpha\in\{0.1,0.5,0.9\}\)、教师数 \(K\in\{2,4,8\}\)；  
   - 记录每组下学生的最终准确率；  
   - 可选：试验“弱教师”检查点（训练不足的教师）对学生性能的影响。  

7. **结果可视化 & 分析**  
   - **定量：**  
     - 各方案准确率表格（硬标签／单教师／多教师）；  
     - Δ‑Accuracy 随 \(\tau\)、\(\alpha\)、\(K\) 的变化。  
   - **可视化：**  
     - 准确率 vs. 温度曲线；  
     - K 值 vs. Δ‑Accuracy 柱状图；  
   - **讨论：**  
     - 多教师如何提供多样化“暗知识”增强正则化；  
     - 与“标签平滑＝正则化”视角的联系。  

8. **复现 & 代码组织**  
   - 新建 `distill_multi.py`，沿用 `distill_single.py` 框架，增加教师列表与融合逻辑；  
   - 在 `scripts/` 下补充命令行示例；  
   - 确保随机种子、日志记录、参数统计一并复现。  



# Part8

In [None]:
#!/usr/bin/env python3
"""
distill_multi.py

Multi‑teacher knowledge distillation for CIFAR-10.
"""

import os
import argparse
import logging
import random
import numpy as np

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torch.backends.cudnn as cudnn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms, models
from torchinfo import summary

def parse_args():
    p = argparse.ArgumentParser(description="Multi‑Teacher Distillation CIFAR-10")
    p.add_argument("--data_dir",      type=str,   default="data/",             help="dataset root")
    p.add_argument("--teacher_ckpts", nargs="+",  required=True,               help="paths to teacher .pth files")
    p.add_argument("--teacher_arch",  type=str,   default="resnet50",         help="teacher architecture")
    p.add_argument("--student_arch",  type=str,   default="resnet34",         help="student architecture")
    p.add_argument("--batch_size",    type=int,   default=128,                 help="batch size")
    p.add_argument("--epochs",        type=int,   default=200,                 help="training epochs")
    p.add_argument("--tau",           type=float, default=5.0,                 help="distillation temperature")
    p.add_argument("--alpha",         type=float, default=0.7,                 help="KD loss weight")
    p.add_argument("--lr",            type=float, default=0.1,                 help="initial lr")
    p.add_argument("--milestones",    nargs="+",  type=int, default=[100,150], help="LR scheduler milestones")
    p.add_argument("--seed",          type=int,   default=42,                  help="random seed")
    p.add_argument("--log_file",      type=str,   default="distill_multi.log",help="log file")
    p.add_argument("--out_dir",       type=str,   default="out/multi",        help="output directory")
    return p.parse_args()

def set_seeds(seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    cudnn.deterministic = True
    cudnn.benchmark     = False

def configure_logging(log_file):
    os.makedirs(os.path.dirname(log_file), exist_ok=True)
    logging.basicConfig(
        level=logging.INFO,
        format="%(asctime)s %(levelname)s %(message)s",
        handlers=[
            logging.FileHandler(log_file),
            logging.StreamHandler()
        ]
    )
    return logging.getLogger()

def build_dataloaders(data_dir, batch_size):
    mean = (0.4914, 0.4822, 0.4465)
    std  = (0.2470, 0.2435, 0.2616)

    train_tf = transforms.Compose([
        transforms.RandomCrop(32, padding=4),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean, std),
    ])
    test_tf = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize(mean, std),
    ])

    train_ds = datasets.CIFAR10(root=data_dir, train=True,  download=True, transform=train_tf)
    test_ds  = datasets.CIFAR10(root=data_dir, train=False, download=True, transform=test_tf)

    train_loader = DataLoader(train_ds, batch_size=batch_size, shuffle=True,  num_workers=4, pin_memory=True)
    test_loader  = DataLoader(test_ds,  batch_size=batch_size, shuffle=False, num_workers=4, pin_memory=True)
    return train_loader, test_loader

def load_teachers(ckpt_paths, architecture, device):
    teachers = []
    for path in ckpt_paths:
        model = getattr(models, architecture)(num_classes=10)
        sd = torch.load(path, map_location=device)['model_state']
        model.load_state_dict(sd)
        model.to(device).eval()
        for p in model.parameters():
            p.requires_grad = False
        teachers.append(model)
    return teachers

def distillation_loss(student_logits, teacher_avg, targets, tau, alpha):
    s_logprob = F.log_softmax(student_logits / tau, dim=1)
    loss_kl    = F.kl_div(s_logprob, teacher_avg, reduction='batchmean') * (tau*tau)
    loss_ce    = F.cross_entropy(student_logits, targets)
    return alpha * loss_kl + (1 - alpha) * loss_ce

def main():
    args = parse_args()
    set_seeds(args.seed)
    log = configure_logging(args.log_file)
    log.info(f"Arguments: {args}")

    os.makedirs(args.out_dir, exist_ok=True)

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    # Data
    train_loader, test_loader = build_dataloaders(args.data_dir, args.batch_size)

    # Teachers
    teachers = load_teachers(args.teacher_ckpts, args.teacher_arch, device)
    log.info(f"Loaded {len(teachers)} teachers [{args.teacher_arch}]")

    # Student
    student = getattr(models, args.student_arch)(num_classes=10).to(device)
    log.info(f"Student arch: {args.student_arch}")
    log.info(summary(student, input_size=(1,3,32,32)))

    # Optimizer & scheduler
    optimizer = optim.SGD(
        student.parameters(),
        lr=args.lr,
        momentum=0.9,
        weight_decay=5e-4
    )
    scheduler = optim.lr_scheduler.MultiStepLR(
        optimizer,
        milestones=args.milestones,
        gamma=0.1
    )

    best_acc = 0.0

    # Training loop
    for epoch in range(1, args.epochs + 1):
        student.train()
        running_loss = 0.0

        for inputs, targets in train_loader:
            inputs, targets = inputs.to(device), targets.to(device)
            optimizer.zero_grad()

            # Soft‑label fusion
            with torch.no_grad():
                sum_soft = torch.zeros(inputs.size(0), 10, device=device)
                for tm in teachers:
                    logits_t = tm(inputs)
                    sum_soft += F.softmax(logits_t / args.tau, dim=1)
                teacher_avg = sum_soft / len(teachers)

            # Student forward + loss
            logits_s = student(inputs)
            loss     = distillation_loss(logits_s, teacher_avg, targets, args.tau, args.alpha)

            loss.backward()
            optimizer.step()
            running_loss += loss.item() * inputs.size(0)

        scheduler.step()
        avg_train_loss = running_loss / len(train_loader.dataset)
        log.info(f"Epoch {epoch:03d}/{args.epochs} train_loss={avg_train_loss:.4f}")

        # Validation
        student.eval()
        correct = total = 0
        with torch.no_grad():
            for inputs, targets in test_loader:
                inputs, targets = inputs.to(device), targets.to(device)
                preds = student(inputs).argmax(dim=1)
                correct += (preds == targets).sum().item()
                total   += targets.size(0)
        test_acc = 100. * correct / total
        log.info(f"          test_acc={test_acc:.2f}%")

        # Checkpoint
        if test_acc > best_acc:
            best_acc = test_acc
            ckpt_path = os.path.join(args.out_dir, f"student_multi_best.pth")
            torch.save({
                'epoch':        epoch,
                'model_state':  student.state_dict(),
                'opt_state':    optimizer.state_dict(),
                'test_acc':     test_acc,
            }, ckpt_path)
            log.info(f"  → New best saved to {ckpt_path}")

    log.info(f"Finished. Best multi-teacher student acc: {best_acc:.2f}%")

if __name__ == "__main__":
    main()


In [4]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [5]:
%cd "/content/drive/MyDrive/661project"
!pwd

/content/drive/MyDrive/661project
/content/drive/MyDrive/661project


# Part1

In [6]:
from torchvision import datasets

full_train = datasets.CIFAR10(
    root='data/',
    train=True,
    download=True,
    transform=None  # we'll apply transforms later
)


In [7]:
# Suppose K = 5 and 10 classes => 2 classes per teacher
K = 5
classes_per_teacher = 10 // K  # = 2
teacher_splits = []
targets = full_train.targets  # list of integer labels

for t in range(K):
    cls_start = t * classes_per_teacher
    cls_end   = cls_start + classes_per_teacher
    # find indices whose label ∈ [cls_start, cls_end)
    idxs = [i for i, lab in enumerate(targets)
            if cls_start <= lab < cls_end]
    teacher_splits.append(idxs)


In [8]:
import torch
num_samples = len(full_train)
perm = torch.randperm(num_samples).tolist()
teacher_splits = []
split_size = num_samples // K

for t in range(K):
    start = t * split_size
    end   = start + split_size if t < K-1 else num_samples
    teacher_splits.append(perm[start:end])


In [9]:
from torch.utils.data import Subset

teacher_datasets = [
    Subset(full_train, idxs)
    for idxs in teacher_splits
]
# Now teacher_datasets[i] is the CIFAR-10 subset for teacher #i


In [10]:
from torchvision import transforms

# CIFAR-10 normalization constants
CIFAR10_MEAN = (0.4914, 0.4822, 0.4465)
CIFAR10_STD  = (0.2470, 0.2435, 0.2616)

train_transform = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(CIFAR10_MEAN, CIFAR10_STD),
])

test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(CIFAR10_MEAN, CIFAR10_STD),
])


In [11]:
# We need to override the transform of the underlying dataset
# for each Subset. One simple approach is to wrap with a lambda:
class TransformSubset(Subset):
    def __init__(self, subset, transform):
        super().__init__(subset.dataset, subset.indices)
        self.transform = transform
    def __getitem__(self, idx):
        img, label = super().__getitem__(idx)
        return self.transform(img), label

teacher_load_datasets = [
    TransformSubset(ds, train_transform)
    for ds in teacher_datasets
]


In [12]:
from torch.utils.data import DataLoader

batch_size  = 128
num_workers = 4

teacher_loaders = [
    DataLoader(
        ds,
        batch_size=batch_size,
        shuffle=True,
        num_workers=num_workers,
        pin_memory=True
    )
    for ds in teacher_load_datasets
]

# And for your student’s full training set (hard‑label baseline):
full_train.transform = train_transform
student_train_loader = DataLoader(
    full_train,
    batch_size=batch_size,
    shuffle=True,
    num_workers=num_workers,
    pin_memory=True
)

# CIFAR-10 test set:
test_set = datasets.CIFAR10(
    root='data/',
    train=False,
    download=False,
    transform=test_transform
)
test_loader = DataLoader(
    test_set,
    batch_size=batch_size,
    shuffle=False,
    num_workers=num_workers,
    pin_memory=True
)


# Part2

In [13]:
pip install torchinfo

Collecting torchinfo
  Downloading torchinfo-1.8.0-py3-none-any.whl.metadata (21 kB)
Downloading torchinfo-1.8.0-py3-none-any.whl (23 kB)
Installing collected packages: torchinfo
Successfully installed torchinfo-1.8.0


In [14]:
import torch
from torchvision.models import resnet18, resnet34, resnet50
from torchinfo import summary

# Device setup
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# 2.1 Instantiate K teacher models (here: ResNet18, ResNet34, ResNet50)
teacher_archs = ['resnet18', 'resnet34', 'resnet50']
teachers = []

for arch in teacher_archs:
    if arch == 'resnet18':
        model = resnet18(num_classes=10)
    elif arch == 'resnet34':
        model = resnet34(num_classes=10)
    elif arch == 'resnet50':
        model = resnet50(num_classes=10)
    else:
        raise ValueError(f"Unsupported architecture: {arch}")
    model.to(device)
    teachers.append((arch, model))

# 2.2 Instantiate the student model (smaller-capacity): ResNet18
student = resnet18(num_classes=10)
student.to(device)

# 2.3 Parameter & FLOPs statistics using torchinfo.summary
# Assume CIFAR-10 input size: (batch_size=1, channels=3, height=32, width=32)
input_size = (1, 3, 32, 32)

print("\n=== Teacher Models ===")
for name, model in teachers:
    print(f"\n-- {name.upper()} --")
    summary(model, input_size=input_size, col_names=("output_size", "num_params", "mult_adds"))

print("\n=== Student Model ===")
print("-- RESNET18 STUDENT --")
summary(student, input_size=input_size, col_names=("output_size", "num_params", "mult_adds"))



=== Teacher Models ===

-- RESNET18 --

-- RESNET34 --

-- RESNET50 --

=== Student Model ===
-- RESNET18 STUDENT --


Layer (type:depth-idx)                   Output Shape              Param #                   Mult-Adds
ResNet                                   [1, 10]                   --                        --
├─Conv2d: 1-1                            [1, 64, 16, 16]           9,408                     2,408,448
├─BatchNorm2d: 1-2                       [1, 64, 16, 16]           128                       128
├─ReLU: 1-3                              [1, 64, 16, 16]           --                        --
├─MaxPool2d: 1-4                         [1, 64, 8, 8]             --                        --
├─Sequential: 1-5                        [1, 64, 8, 8]             --                        --
│    └─BasicBlock: 2-1                   [1, 64, 8, 8]             --                        --
│    │    └─Conv2d: 3-1                  [1, 64, 8, 8]             36,864                    2,359,296
│    │    └─BatchNorm2d: 3-2             [1, 64, 8, 8]             128                       128
│    │    └─ReLU:

# Part3

3. Teacher Training
Train your
𝐾
K teachers—each on its own subset—saving the best checkpoint and logging validation accuracy.
3.2 Notes on Parallel vs. Sequential
Sequential: the loop above trains each teacher one after another in the same process.

Parallel: if you have multiple GPUs or machines, you can spawn
𝐾
K separate Python processes—each binding one teacher’s subset and one GPU—and run the same code concurrently. Tools like torch.multiprocessing.spawn or simple shell scripts with CUDA_VISIBLE_DEVICES can help.

3.3 Logging
We print per‑epoch train loss and validation accuracy.

After all
𝐾
K finishes, you’ll have a best_ckpt_path for each teacher and a Python list best_val_accs of length
𝐾
K.

In [15]:
import torch
import torch.nn as nn
import torch.optim as optim
from tqdm import tqdm
from torchvision.models import resnet34, resnet50  # import the two architectures

# Assume:
#   teacher_loaders = [DataLoader(subset_i, …) for i in range(K)]
#   test_loader     = DataLoader(full_CIFAR10_test, …)
#   device          = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
#   num_epochs      = 200

best_val_accs = []

# Cycle through the two imported constructors
archs = [resnet34, resnet50]

for i, train_loader in enumerate(teacher_loaders):
    print(f"\n=== Training teacher #{i} ===")

    # Select architecture: resnet34 for even i, resnet50 for odd i, etc.
    Arch = archs[i % len(archs)]
    teacher_model = Arch(num_classes=10)  # override final fc to 10 classes
    teacher_model = teacher_model.to(device)

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(
        teacher_model.parameters(),
        lr=0.1,
        momentum=0.9,
        weight_decay=5e-4
    )
    scheduler = optim.lr_scheduler.MultiStepLR(
        optimizer,
        milestones=[100, 150],
        gamma=0.1
    )

    best_acc       = 0.0
    best_ckpt_path = f"teacher_{i}_{Arch.__name__}_best.pth"

    for epoch in range(1, num_epochs + 1):
        teacher_model.train()
        running_loss = 0.0

        for inputs, targets in tqdm(
                train_loader,
                desc=f"Teacher {i} ({Arch.__name__}) Epoch {epoch}",
                leave=False):
            inputs, targets = inputs.to(device), targets.to(device)
            optimizer.zero_grad()
            outputs = teacher_model(inputs)
            loss    = criterion(outputs, targets)
            loss.backward()
            optimizer.step()
            running_loss += loss.item() * inputs.size(0)

        scheduler.step()

        # Validation
        teacher_model.eval()
        correct = total = 0
        with torch.no_grad():
            for inputs, targets in test_loader:
                inputs, targets = inputs.to(device), targets.to(device)
                outputs  = teacher_model(inputs)
                _, preds = outputs.max(1)
                correct += preds.eq(targets).sum().item()
                total   += targets.size(0)

        val_acc  = 100. * correct / total
        avg_loss = running_loss / len(train_loader.dataset)
        print(f"Epoch {epoch:03d} | train_loss={avg_loss:.4f} | val_acc={val_acc:.2f}%")

        if val_acc > best_acc:
            best_acc = val_acc
            torch.save({
                'epoch': epoch,
                'arch': Arch.__name__,
                'model_state': teacher_model.state_dict(),
                'optimizer_state': optimizer.state_dict(),
                'val_acc': val_acc,
            }, best_ckpt_path)
            print(f"  → New best! Saved to {best_ckpt_path}")

    best_val_accs.append(best_acc)
    print(f"Teacher #{i} ({Arch.__name__}) best validation accuracy: {best_acc:.2f}%")



=== Training teacher #0 ===




Epoch 001 | train_loss=3.5669 | val_acc=9.67%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 002 | train_loss=2.6240 | val_acc=17.21%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 003 | train_loss=2.2381 | val_acc=18.44%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 004 | train_loss=2.1175 | val_acc=22.87%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 005 | train_loss=2.0353 | val_acc=22.60%




Epoch 006 | train_loss=1.9970 | val_acc=24.62%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 007 | train_loss=1.9373 | val_acc=26.72%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 008 | train_loss=1.9026 | val_acc=30.03%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 009 | train_loss=1.8379 | val_acc=33.67%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 010 | train_loss=1.7704 | val_acc=34.83%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 011 | train_loss=1.7391 | val_acc=37.98%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 012 | train_loss=1.7043 | val_acc=37.92%




Epoch 013 | train_loss=1.7033 | val_acc=38.10%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 014 | train_loss=1.6495 | val_acc=38.49%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 015 | train_loss=1.6144 | val_acc=41.45%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 016 | train_loss=1.5806 | val_acc=39.89%




Epoch 017 | train_loss=1.5760 | val_acc=41.79%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 018 | train_loss=1.5327 | val_acc=43.35%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 019 | train_loss=1.5061 | val_acc=45.86%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 020 | train_loss=1.4621 | val_acc=44.92%




Epoch 021 | train_loss=1.4462 | val_acc=48.39%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 022 | train_loss=1.4179 | val_acc=50.61%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 023 | train_loss=1.3730 | val_acc=50.80%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 024 | train_loss=1.3587 | val_acc=50.34%




Epoch 025 | train_loss=1.3286 | val_acc=52.68%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 026 | train_loss=1.3367 | val_acc=51.86%




Epoch 027 | train_loss=1.2992 | val_acc=52.36%




Epoch 028 | train_loss=1.2704 | val_acc=54.43%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 029 | train_loss=1.2525 | val_acc=53.77%




Epoch 030 | train_loss=1.2252 | val_acc=53.54%




Epoch 031 | train_loss=1.1934 | val_acc=53.22%




Epoch 032 | train_loss=1.1935 | val_acc=56.77%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 033 | train_loss=1.1714 | val_acc=57.59%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 034 | train_loss=1.1309 | val_acc=55.75%




Epoch 035 | train_loss=1.1329 | val_acc=56.25%




Epoch 036 | train_loss=1.1015 | val_acc=57.85%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 037 | train_loss=1.1122 | val_acc=59.62%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 038 | train_loss=1.0922 | val_acc=56.24%




Epoch 039 | train_loss=1.0805 | val_acc=61.16%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 040 | train_loss=1.0845 | val_acc=57.98%




Epoch 041 | train_loss=1.0378 | val_acc=59.61%




Epoch 042 | train_loss=1.0197 | val_acc=60.87%




Epoch 043 | train_loss=1.0051 | val_acc=62.79%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 044 | train_loss=0.9975 | val_acc=59.91%




Epoch 045 | train_loss=0.9989 | val_acc=62.57%




Epoch 046 | train_loss=1.0106 | val_acc=62.28%




Epoch 047 | train_loss=0.9667 | val_acc=61.46%




Epoch 048 | train_loss=0.9620 | val_acc=63.94%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 049 | train_loss=0.9723 | val_acc=59.60%




Epoch 050 | train_loss=0.9264 | val_acc=63.20%




Epoch 051 | train_loss=0.9180 | val_acc=62.32%




Epoch 052 | train_loss=0.9523 | val_acc=62.08%




Epoch 053 | train_loss=0.8809 | val_acc=62.89%




Epoch 054 | train_loss=0.9008 | val_acc=61.53%




Epoch 055 | train_loss=0.8395 | val_acc=62.92%




Epoch 056 | train_loss=0.8766 | val_acc=61.14%




Epoch 057 | train_loss=0.8360 | val_acc=64.06%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 058 | train_loss=0.8360 | val_acc=63.40%




Epoch 059 | train_loss=0.8308 | val_acc=66.05%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 060 | train_loss=0.8410 | val_acc=67.55%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 061 | train_loss=0.7989 | val_acc=62.18%




Epoch 062 | train_loss=0.8382 | val_acc=65.83%




Epoch 063 | train_loss=0.7985 | val_acc=67.82%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 064 | train_loss=0.7928 | val_acc=67.54%




Epoch 065 | train_loss=0.7644 | val_acc=62.36%




Epoch 066 | train_loss=0.7540 | val_acc=66.28%




Epoch 067 | train_loss=0.7444 | val_acc=67.37%




Epoch 068 | train_loss=0.7463 | val_acc=66.95%




Epoch 069 | train_loss=0.7789 | val_acc=67.50%




Epoch 070 | train_loss=0.7364 | val_acc=63.37%




Epoch 071 | train_loss=0.7373 | val_acc=65.96%




Epoch 072 | train_loss=0.7301 | val_acc=66.70%




Epoch 073 | train_loss=0.7251 | val_acc=62.87%




Epoch 074 | train_loss=0.7426 | val_acc=67.05%




Epoch 075 | train_loss=0.7112 | val_acc=67.05%




Epoch 076 | train_loss=0.7022 | val_acc=67.07%




Epoch 077 | train_loss=0.7210 | val_acc=65.47%




Epoch 078 | train_loss=0.7040 | val_acc=65.22%




Epoch 079 | train_loss=0.7100 | val_acc=67.65%




Epoch 080 | train_loss=0.6789 | val_acc=67.49%




Epoch 081 | train_loss=0.7094 | val_acc=67.20%




Epoch 082 | train_loss=0.6652 | val_acc=66.77%




Epoch 083 | train_loss=0.6771 | val_acc=69.06%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 084 | train_loss=0.6856 | val_acc=63.13%




Epoch 085 | train_loss=0.6999 | val_acc=66.96%




Epoch 086 | train_loss=0.6594 | val_acc=68.98%




Epoch 087 | train_loss=0.6345 | val_acc=65.30%




Epoch 088 | train_loss=0.6656 | val_acc=66.72%




Epoch 089 | train_loss=0.6749 | val_acc=66.49%




Epoch 090 | train_loss=0.6463 | val_acc=66.80%




Epoch 091 | train_loss=0.6296 | val_acc=66.81%




Epoch 092 | train_loss=0.6220 | val_acc=65.66%




Epoch 093 | train_loss=0.6404 | val_acc=66.96%




Epoch 094 | train_loss=0.6510 | val_acc=66.93%




Epoch 095 | train_loss=0.6372 | val_acc=68.68%




Epoch 096 | train_loss=0.6640 | val_acc=68.36%




Epoch 097 | train_loss=0.6281 | val_acc=67.94%




Epoch 098 | train_loss=0.6342 | val_acc=67.11%




Epoch 099 | train_loss=0.6093 | val_acc=66.65%




Epoch 100 | train_loss=0.6167 | val_acc=63.76%




Epoch 101 | train_loss=0.4475 | val_acc=74.79%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 102 | train_loss=0.3457 | val_acc=74.76%




Epoch 103 | train_loss=0.3272 | val_acc=75.49%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 104 | train_loss=0.3022 | val_acc=75.25%




Epoch 105 | train_loss=0.2816 | val_acc=75.39%




Epoch 106 | train_loss=0.2692 | val_acc=75.44%




Epoch 107 | train_loss=0.2690 | val_acc=75.80%
  → New best! Saved to teacher_0_resnet34_best.pth




Epoch 108 | train_loss=0.2463 | val_acc=75.38%




Epoch 109 | train_loss=0.2323 | val_acc=75.30%




Epoch 110 | train_loss=0.2223 | val_acc=75.28%




Epoch 111 | train_loss=0.2172 | val_acc=75.71%




Epoch 112 | train_loss=0.2074 | val_acc=75.18%




Epoch 113 | train_loss=0.2119 | val_acc=74.84%




Epoch 114 | train_loss=0.1913 | val_acc=74.97%




Epoch 115 | train_loss=0.1892 | val_acc=75.16%




Epoch 116 | train_loss=0.1867 | val_acc=74.76%




Epoch 117 | train_loss=0.1716 | val_acc=74.86%




Epoch 118 | train_loss=0.1692 | val_acc=74.61%




Epoch 119 | train_loss=0.1703 | val_acc=75.25%




Epoch 120 | train_loss=0.1484 | val_acc=74.76%




Epoch 121 | train_loss=0.1546 | val_acc=75.04%




Epoch 122 | train_loss=0.1499 | val_acc=74.63%




Epoch 123 | train_loss=0.1453 | val_acc=75.09%




Epoch 124 | train_loss=0.1321 | val_acc=75.01%




Epoch 125 | train_loss=0.1362 | val_acc=74.40%




Epoch 126 | train_loss=0.1307 | val_acc=74.71%




Epoch 127 | train_loss=0.1177 | val_acc=74.77%




Epoch 128 | train_loss=0.1309 | val_acc=75.14%




Epoch 129 | train_loss=0.1233 | val_acc=74.66%




Epoch 130 | train_loss=0.1204 | val_acc=74.47%




Epoch 131 | train_loss=0.1215 | val_acc=74.76%




Epoch 132 | train_loss=0.1198 | val_acc=74.58%




Epoch 133 | train_loss=0.1071 | val_acc=74.61%




Epoch 134 | train_loss=0.1131 | val_acc=74.66%




Epoch 135 | train_loss=0.1021 | val_acc=74.95%




Epoch 136 | train_loss=0.1029 | val_acc=74.80%




Epoch 137 | train_loss=0.1119 | val_acc=75.12%




Epoch 138 | train_loss=0.1234 | val_acc=74.71%




Epoch 139 | train_loss=0.1072 | val_acc=74.93%




Epoch 140 | train_loss=0.0945 | val_acc=75.00%




Epoch 141 | train_loss=0.1090 | val_acc=75.01%




Epoch 142 | train_loss=0.1262 | val_acc=74.69%




Epoch 143 | train_loss=0.1352 | val_acc=75.28%




Epoch 144 | train_loss=0.0899 | val_acc=75.06%




Epoch 145 | train_loss=0.0840 | val_acc=74.68%




Epoch 146 | train_loss=0.0894 | val_acc=75.28%




Epoch 147 | train_loss=0.0958 | val_acc=74.17%




Epoch 148 | train_loss=0.1259 | val_acc=74.27%




Epoch 149 | train_loss=0.1109 | val_acc=74.50%




Epoch 150 | train_loss=0.0928 | val_acc=74.31%




Epoch 151 | train_loss=0.0716 | val_acc=75.07%




Epoch 152 | train_loss=0.0637 | val_acc=75.36%




Epoch 153 | train_loss=0.0623 | val_acc=75.18%




Epoch 154 | train_loss=0.0614 | val_acc=75.33%




Epoch 155 | train_loss=0.0516 | val_acc=75.33%




Epoch 156 | train_loss=0.0569 | val_acc=75.19%




Epoch 157 | train_loss=0.0497 | val_acc=75.33%




Epoch 158 | train_loss=0.0455 | val_acc=75.33%




Epoch 159 | train_loss=0.0497 | val_acc=75.27%




Epoch 160 | train_loss=0.0521 | val_acc=75.40%




Epoch 161 | train_loss=0.0512 | val_acc=75.34%




Epoch 162 | train_loss=0.0484 | val_acc=75.43%




Epoch 163 | train_loss=0.0448 | val_acc=75.57%




Epoch 164 | train_loss=0.0430 | val_acc=75.47%




Epoch 165 | train_loss=0.0388 | val_acc=75.53%




Epoch 166 | train_loss=0.0435 | val_acc=75.18%




Epoch 167 | train_loss=0.0411 | val_acc=75.37%




Epoch 168 | train_loss=0.0405 | val_acc=75.42%




Epoch 169 | train_loss=0.0391 | val_acc=75.18%




Epoch 170 | train_loss=0.0387 | val_acc=75.31%




Epoch 171 | train_loss=0.0367 | val_acc=75.18%




Epoch 172 | train_loss=0.0367 | val_acc=75.42%




Epoch 173 | train_loss=0.0370 | val_acc=75.24%




Epoch 174 | train_loss=0.0406 | val_acc=75.22%




Epoch 175 | train_loss=0.0364 | val_acc=75.29%




Epoch 176 | train_loss=0.0372 | val_acc=75.28%




Epoch 177 | train_loss=0.0361 | val_acc=75.17%




Epoch 178 | train_loss=0.0403 | val_acc=75.46%




Epoch 179 | train_loss=0.0363 | val_acc=75.35%




Epoch 180 | train_loss=0.0336 | val_acc=75.36%




Epoch 181 | train_loss=0.0364 | val_acc=75.27%




Epoch 182 | train_loss=0.0328 | val_acc=75.40%




Epoch 183 | train_loss=0.0333 | val_acc=75.45%




Epoch 184 | train_loss=0.0304 | val_acc=75.60%




Epoch 185 | train_loss=0.0330 | val_acc=75.35%




Epoch 186 | train_loss=0.0282 | val_acc=75.61%




Epoch 187 | train_loss=0.0338 | val_acc=75.59%




Epoch 188 | train_loss=0.0303 | val_acc=75.37%




Epoch 189 | train_loss=0.0307 | val_acc=75.37%




Epoch 190 | train_loss=0.0335 | val_acc=75.40%




Epoch 191 | train_loss=0.0325 | val_acc=75.36%




Epoch 192 | train_loss=0.0317 | val_acc=75.48%




Epoch 193 | train_loss=0.0318 | val_acc=75.56%




Epoch 194 | train_loss=0.0273 | val_acc=75.38%




Epoch 195 | train_loss=0.0309 | val_acc=75.26%




Epoch 196 | train_loss=0.0301 | val_acc=75.60%




Epoch 197 | train_loss=0.0289 | val_acc=75.32%




Epoch 198 | train_loss=0.0306 | val_acc=75.37%




Epoch 199 | train_loss=0.0292 | val_acc=75.55%




Epoch 200 | train_loss=0.0281 | val_acc=75.22%
Teacher #0 (resnet34) best validation accuracy: 75.80%

=== Training teacher #1 ===




Epoch 001 | train_loss=11.9468 | val_acc=10.00%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 002 | train_loss=3.0727 | val_acc=13.32%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 003 | train_loss=2.4110 | val_acc=18.14%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 004 | train_loss=2.2429 | val_acc=17.61%




Epoch 005 | train_loss=2.1734 | val_acc=18.95%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 006 | train_loss=2.1162 | val_acc=18.50%




Epoch 007 | train_loss=2.0879 | val_acc=18.86%




Epoch 008 | train_loss=2.0535 | val_acc=22.63%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 009 | train_loss=2.0022 | val_acc=17.51%




Epoch 010 | train_loss=1.9800 | val_acc=22.18%




Epoch 011 | train_loss=1.9605 | val_acc=26.02%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 012 | train_loss=1.9281 | val_acc=26.13%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 013 | train_loss=1.8920 | val_acc=29.54%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 014 | train_loss=1.8644 | val_acc=28.53%




Epoch 015 | train_loss=1.8066 | val_acc=33.17%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 016 | train_loss=1.7661 | val_acc=34.20%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 017 | train_loss=1.7600 | val_acc=35.21%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 018 | train_loss=1.7518 | val_acc=36.65%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 019 | train_loss=1.7183 | val_acc=37.00%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 020 | train_loss=1.7093 | val_acc=39.32%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 021 | train_loss=1.6925 | val_acc=39.32%




Epoch 022 | train_loss=1.6597 | val_acc=39.03%




Epoch 023 | train_loss=1.6599 | val_acc=36.85%




Epoch 024 | train_loss=1.6515 | val_acc=38.83%




Epoch 025 | train_loss=1.6345 | val_acc=42.89%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 026 | train_loss=1.6032 | val_acc=43.44%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 027 | train_loss=1.5865 | val_acc=42.66%




Epoch 028 | train_loss=1.5965 | val_acc=42.97%




Epoch 029 | train_loss=1.5661 | val_acc=44.45%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 030 | train_loss=1.5804 | val_acc=42.25%




Epoch 031 | train_loss=1.5345 | val_acc=45.04%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 032 | train_loss=1.5018 | val_acc=44.85%




Epoch 033 | train_loss=1.5109 | val_acc=46.05%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 034 | train_loss=1.4873 | val_acc=42.76%




Epoch 035 | train_loss=1.5517 | val_acc=45.42%




Epoch 036 | train_loss=1.4998 | val_acc=45.78%




Epoch 037 | train_loss=1.4581 | val_acc=48.32%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 038 | train_loss=1.4486 | val_acc=48.65%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 039 | train_loss=1.4241 | val_acc=48.83%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 040 | train_loss=1.4082 | val_acc=47.93%




Epoch 041 | train_loss=1.4013 | val_acc=46.89%




Epoch 042 | train_loss=1.3883 | val_acc=45.46%




Epoch 043 | train_loss=1.3751 | val_acc=51.03%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 044 | train_loss=1.3377 | val_acc=46.75%




Epoch 045 | train_loss=1.3414 | val_acc=51.72%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 046 | train_loss=1.3436 | val_acc=46.02%




Epoch 047 | train_loss=1.3259 | val_acc=50.82%




Epoch 048 | train_loss=1.3166 | val_acc=48.16%




Epoch 049 | train_loss=1.2848 | val_acc=53.66%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 050 | train_loss=1.2632 | val_acc=53.19%




Epoch 051 | train_loss=1.2562 | val_acc=54.25%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 052 | train_loss=1.2442 | val_acc=50.98%




Epoch 053 | train_loss=1.2743 | val_acc=53.41%




Epoch 054 | train_loss=1.2596 | val_acc=47.94%




Epoch 055 | train_loss=1.2271 | val_acc=51.46%




Epoch 056 | train_loss=1.2725 | val_acc=54.02%




Epoch 057 | train_loss=1.2503 | val_acc=52.76%




Epoch 058 | train_loss=1.1856 | val_acc=50.85%




Epoch 059 | train_loss=1.2218 | val_acc=51.77%




Epoch 060 | train_loss=1.1739 | val_acc=55.23%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 061 | train_loss=1.1443 | val_acc=50.04%




Epoch 062 | train_loss=1.1483 | val_acc=55.94%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 063 | train_loss=1.1373 | val_acc=57.05%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 064 | train_loss=1.1292 | val_acc=58.41%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 065 | train_loss=1.1398 | val_acc=57.86%




Epoch 066 | train_loss=1.1103 | val_acc=59.26%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 067 | train_loss=1.0875 | val_acc=56.45%




Epoch 068 | train_loss=1.0965 | val_acc=59.26%




Epoch 069 | train_loss=1.0833 | val_acc=52.94%




Epoch 070 | train_loss=1.0800 | val_acc=55.68%




Epoch 071 | train_loss=1.0539 | val_acc=59.13%




Epoch 072 | train_loss=1.0536 | val_acc=57.70%




Epoch 073 | train_loss=1.0553 | val_acc=56.98%




Epoch 074 | train_loss=1.0230 | val_acc=59.89%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 075 | train_loss=1.0044 | val_acc=57.65%




Epoch 076 | train_loss=1.0151 | val_acc=59.97%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 077 | train_loss=1.0171 | val_acc=61.11%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 078 | train_loss=0.9916 | val_acc=59.07%




Epoch 079 | train_loss=0.9755 | val_acc=61.16%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 080 | train_loss=0.9823 | val_acc=62.42%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 081 | train_loss=0.9760 | val_acc=57.81%




Epoch 082 | train_loss=0.9351 | val_acc=64.53%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 083 | train_loss=0.9371 | val_acc=63.32%




Epoch 084 | train_loss=0.9449 | val_acc=61.97%




Epoch 085 | train_loss=0.9469 | val_acc=60.75%




Epoch 086 | train_loss=0.9567 | val_acc=59.92%




Epoch 087 | train_loss=0.9202 | val_acc=61.17%




Epoch 088 | train_loss=0.9374 | val_acc=64.14%




Epoch 089 | train_loss=0.9387 | val_acc=60.92%




Epoch 090 | train_loss=0.9359 | val_acc=65.18%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 091 | train_loss=0.9053 | val_acc=62.39%




Epoch 092 | train_loss=0.8885 | val_acc=64.91%




Epoch 093 | train_loss=0.8819 | val_acc=64.45%




Epoch 094 | train_loss=0.8486 | val_acc=63.57%




Epoch 095 | train_loss=0.8710 | val_acc=64.26%




Epoch 096 | train_loss=0.8481 | val_acc=65.05%




Epoch 097 | train_loss=0.8742 | val_acc=66.82%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 098 | train_loss=0.8354 | val_acc=66.37%




Epoch 099 | train_loss=0.8671 | val_acc=62.26%




Epoch 100 | train_loss=0.8522 | val_acc=62.70%




Epoch 101 | train_loss=0.6765 | val_acc=73.42%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 102 | train_loss=0.5937 | val_acc=74.14%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 103 | train_loss=0.5588 | val_acc=74.25%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 104 | train_loss=0.5388 | val_acc=74.58%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 105 | train_loss=0.5249 | val_acc=74.27%




Epoch 106 | train_loss=0.5114 | val_acc=74.41%




Epoch 107 | train_loss=0.5017 | val_acc=74.70%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 108 | train_loss=0.4801 | val_acc=74.91%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 109 | train_loss=0.4690 | val_acc=75.10%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 110 | train_loss=0.4599 | val_acc=74.96%




Epoch 111 | train_loss=0.4604 | val_acc=74.72%




Epoch 112 | train_loss=0.4395 | val_acc=73.82%




Epoch 113 | train_loss=0.4323 | val_acc=75.11%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 114 | train_loss=0.4322 | val_acc=74.85%




Epoch 115 | train_loss=0.4096 | val_acc=75.10%




Epoch 116 | train_loss=0.4136 | val_acc=75.34%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 117 | train_loss=0.4099 | val_acc=75.29%




Epoch 118 | train_loss=0.3839 | val_acc=75.14%




Epoch 119 | train_loss=0.3823 | val_acc=74.46%




Epoch 120 | train_loss=0.3757 | val_acc=74.19%




Epoch 121 | train_loss=0.3900 | val_acc=75.30%




Epoch 122 | train_loss=0.3753 | val_acc=74.59%




Epoch 123 | train_loss=0.3671 | val_acc=74.65%




Epoch 124 | train_loss=0.3592 | val_acc=75.12%




Epoch 125 | train_loss=0.3383 | val_acc=74.89%




Epoch 126 | train_loss=0.3571 | val_acc=75.27%




Epoch 127 | train_loss=0.3579 | val_acc=75.11%




Epoch 128 | train_loss=0.3394 | val_acc=75.43%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 129 | train_loss=0.3208 | val_acc=74.91%




Epoch 130 | train_loss=0.3147 | val_acc=75.09%




Epoch 131 | train_loss=0.3042 | val_acc=75.09%




Epoch 132 | train_loss=0.3129 | val_acc=74.32%




Epoch 133 | train_loss=0.3098 | val_acc=75.28%




Epoch 134 | train_loss=0.3302 | val_acc=74.63%




Epoch 135 | train_loss=0.2955 | val_acc=75.03%




Epoch 136 | train_loss=0.2872 | val_acc=75.14%




Epoch 137 | train_loss=0.2879 | val_acc=74.02%




Epoch 138 | train_loss=0.2894 | val_acc=74.33%




Epoch 139 | train_loss=0.2827 | val_acc=74.09%




Epoch 140 | train_loss=0.2805 | val_acc=74.39%




Epoch 141 | train_loss=0.2925 | val_acc=74.74%




Epoch 142 | train_loss=0.2691 | val_acc=74.49%




Epoch 143 | train_loss=0.2741 | val_acc=74.83%




Epoch 144 | train_loss=0.2636 | val_acc=74.26%




Epoch 145 | train_loss=0.2633 | val_acc=74.51%




Epoch 146 | train_loss=0.2818 | val_acc=74.85%




Epoch 147 | train_loss=0.2550 | val_acc=74.42%




Epoch 148 | train_loss=0.2698 | val_acc=73.40%




Epoch 149 | train_loss=0.2618 | val_acc=74.62%




Epoch 150 | train_loss=0.2337 | val_acc=74.52%




Epoch 151 | train_loss=0.1910 | val_acc=75.54%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 152 | train_loss=0.1663 | val_acc=75.77%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 153 | train_loss=0.1681 | val_acc=76.04%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 154 | train_loss=0.1540 | val_acc=75.80%




Epoch 155 | train_loss=0.1465 | val_acc=76.06%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 156 | train_loss=0.1418 | val_acc=75.90%




Epoch 157 | train_loss=0.1417 | val_acc=76.29%
  → New best! Saved to teacher_1_resnet50_best.pth




Epoch 158 | train_loss=0.1413 | val_acc=76.03%




Epoch 159 | train_loss=0.1283 | val_acc=75.96%




Epoch 160 | train_loss=0.1310 | val_acc=75.85%




Epoch 161 | train_loss=0.1307 | val_acc=76.17%




Epoch 162 | train_loss=0.1248 | val_acc=76.02%




Epoch 163 | train_loss=0.1226 | val_acc=75.97%




Epoch 164 | train_loss=0.1186 | val_acc=76.17%




Epoch 165 | train_loss=0.1186 | val_acc=76.21%




Epoch 166 | train_loss=0.1184 | val_acc=75.95%




Epoch 167 | train_loss=0.1271 | val_acc=76.22%




Epoch 168 | train_loss=0.1179 | val_acc=75.77%




Epoch 169 | train_loss=0.1145 | val_acc=75.96%




Epoch 170 | train_loss=0.1085 | val_acc=76.18%




Epoch 171 | train_loss=0.1039 | val_acc=75.90%




Epoch 172 | train_loss=0.1089 | val_acc=75.94%




Epoch 173 | train_loss=0.1115 | val_acc=76.22%




Epoch 174 | train_loss=0.1046 | val_acc=76.07%




Epoch 175 | train_loss=0.1019 | val_acc=75.77%




Epoch 176 | train_loss=0.1066 | val_acc=75.80%




Epoch 177 | train_loss=0.1053 | val_acc=75.89%




Epoch 178 | train_loss=0.1064 | val_acc=75.91%




Epoch 179 | train_loss=0.0971 | val_acc=75.93%




Epoch 180 | train_loss=0.0924 | val_acc=75.93%




Epoch 181 | train_loss=0.0957 | val_acc=75.97%




Epoch 182 | train_loss=0.0988 | val_acc=75.90%




Epoch 183 | train_loss=0.0942 | val_acc=76.00%




Epoch 184 | train_loss=0.0977 | val_acc=75.97%




Epoch 185 | train_loss=0.0972 | val_acc=75.58%




Epoch 186 | train_loss=0.1008 | val_acc=75.79%




Epoch 187 | train_loss=0.0893 | val_acc=75.75%




Epoch 188 | train_loss=0.0867 | val_acc=75.83%




Epoch 189 | train_loss=0.0852 | val_acc=76.10%




Epoch 190 | train_loss=0.0887 | val_acc=75.91%




Epoch 191 | train_loss=0.0895 | val_acc=75.80%




Epoch 192 | train_loss=0.0868 | val_acc=75.85%




Epoch 193 | train_loss=0.0802 | val_acc=75.91%




Epoch 194 | train_loss=0.0849 | val_acc=75.78%




Epoch 195 | train_loss=0.0871 | val_acc=75.81%




Epoch 196 | train_loss=0.0790 | val_acc=76.16%




Epoch 197 | train_loss=0.0800 | val_acc=75.78%




Epoch 198 | train_loss=0.0779 | val_acc=76.07%




Epoch 199 | train_loss=0.0818 | val_acc=75.69%




Epoch 200 | train_loss=0.0797 | val_acc=76.08%
Teacher #1 (resnet50) best validation accuracy: 76.29%

=== Training teacher #2 ===




Epoch 001 | train_loss=3.7970 | val_acc=15.27%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 002 | train_loss=2.5824 | val_acc=14.57%




Epoch 003 | train_loss=2.3322 | val_acc=14.25%




Epoch 004 | train_loss=2.1368 | val_acc=18.37%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 005 | train_loss=2.0377 | val_acc=22.39%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 006 | train_loss=1.9814 | val_acc=23.91%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 007 | train_loss=1.9262 | val_acc=27.46%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 008 | train_loss=1.8949 | val_acc=28.31%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 009 | train_loss=1.8422 | val_acc=28.66%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 010 | train_loss=1.8174 | val_acc=28.10%




Epoch 011 | train_loss=1.8087 | val_acc=29.91%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 012 | train_loss=1.7545 | val_acc=34.74%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 013 | train_loss=1.7381 | val_acc=34.13%




Epoch 014 | train_loss=1.7360 | val_acc=31.83%




Epoch 015 | train_loss=1.7927 | val_acc=33.96%




Epoch 016 | train_loss=1.6874 | val_acc=28.66%




Epoch 017 | train_loss=1.7690 | val_acc=35.05%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 018 | train_loss=1.6621 | val_acc=38.26%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 019 | train_loss=1.6104 | val_acc=39.05%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 020 | train_loss=1.5854 | val_acc=39.29%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 021 | train_loss=1.5601 | val_acc=39.60%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 022 | train_loss=1.5313 | val_acc=43.97%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 023 | train_loss=1.5171 | val_acc=44.26%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 024 | train_loss=1.4852 | val_acc=46.91%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 025 | train_loss=1.4355 | val_acc=47.88%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 026 | train_loss=1.4114 | val_acc=46.97%




Epoch 027 | train_loss=1.3970 | val_acc=48.11%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 028 | train_loss=1.3396 | val_acc=51.33%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 029 | train_loss=1.3144 | val_acc=54.51%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 030 | train_loss=1.2962 | val_acc=50.31%




Epoch 031 | train_loss=1.2573 | val_acc=54.46%




Epoch 032 | train_loss=1.2456 | val_acc=56.38%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 033 | train_loss=1.1908 | val_acc=56.34%




Epoch 034 | train_loss=1.2087 | val_acc=54.99%




Epoch 035 | train_loss=1.1701 | val_acc=55.04%




Epoch 036 | train_loss=1.1419 | val_acc=56.48%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 037 | train_loss=1.1291 | val_acc=55.95%




Epoch 038 | train_loss=1.1170 | val_acc=53.28%




Epoch 039 | train_loss=1.1123 | val_acc=55.38%




Epoch 040 | train_loss=1.0930 | val_acc=55.45%




Epoch 041 | train_loss=1.1906 | val_acc=55.92%




Epoch 042 | train_loss=1.1147 | val_acc=59.81%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 043 | train_loss=1.0789 | val_acc=60.89%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 044 | train_loss=1.0458 | val_acc=60.33%




Epoch 045 | train_loss=1.0334 | val_acc=56.16%




Epoch 046 | train_loss=1.0558 | val_acc=57.73%




Epoch 047 | train_loss=1.0008 | val_acc=58.39%




Epoch 048 | train_loss=0.9913 | val_acc=64.17%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 049 | train_loss=1.0634 | val_acc=60.44%




Epoch 050 | train_loss=0.9849 | val_acc=55.94%




Epoch 051 | train_loss=0.9740 | val_acc=59.72%




Epoch 052 | train_loss=0.9508 | val_acc=63.79%




Epoch 053 | train_loss=0.9301 | val_acc=63.68%




Epoch 054 | train_loss=0.8888 | val_acc=65.48%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 055 | train_loss=0.9252 | val_acc=64.76%




Epoch 056 | train_loss=0.9113 | val_acc=62.34%




Epoch 057 | train_loss=0.8837 | val_acc=65.51%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 058 | train_loss=0.8877 | val_acc=63.58%




Epoch 059 | train_loss=0.8594 | val_acc=58.85%




Epoch 060 | train_loss=0.8498 | val_acc=64.73%




Epoch 061 | train_loss=0.8358 | val_acc=65.51%




Epoch 062 | train_loss=0.8320 | val_acc=65.55%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 063 | train_loss=0.8245 | val_acc=64.89%




Epoch 064 | train_loss=0.8160 | val_acc=64.61%




Epoch 065 | train_loss=0.8186 | val_acc=63.69%




Epoch 066 | train_loss=0.8068 | val_acc=64.59%




Epoch 067 | train_loss=0.8484 | val_acc=62.91%




Epoch 068 | train_loss=0.8319 | val_acc=68.37%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 069 | train_loss=0.7746 | val_acc=68.57%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 070 | train_loss=0.8049 | val_acc=66.50%




Epoch 071 | train_loss=0.7824 | val_acc=67.23%




Epoch 072 | train_loss=0.7512 | val_acc=67.63%




Epoch 073 | train_loss=0.7204 | val_acc=61.80%




Epoch 074 | train_loss=0.7440 | val_acc=65.58%




Epoch 075 | train_loss=0.7239 | val_acc=69.69%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 076 | train_loss=0.7964 | val_acc=67.95%




Epoch 077 | train_loss=0.7601 | val_acc=63.74%




Epoch 078 | train_loss=0.7169 | val_acc=68.66%




Epoch 079 | train_loss=0.6963 | val_acc=65.56%




Epoch 080 | train_loss=0.6904 | val_acc=67.71%




Epoch 081 | train_loss=0.7142 | val_acc=66.22%




Epoch 082 | train_loss=0.7433 | val_acc=65.78%




Epoch 083 | train_loss=0.7024 | val_acc=65.56%




Epoch 084 | train_loss=0.6734 | val_acc=67.98%




Epoch 085 | train_loss=0.7200 | val_acc=66.42%




Epoch 086 | train_loss=0.7219 | val_acc=68.37%




Epoch 087 | train_loss=0.6584 | val_acc=68.88%




Epoch 088 | train_loss=0.6603 | val_acc=63.15%




Epoch 089 | train_loss=0.6855 | val_acc=67.67%




Epoch 090 | train_loss=0.6541 | val_acc=70.93%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 091 | train_loss=0.6482 | val_acc=69.08%




Epoch 092 | train_loss=0.6294 | val_acc=69.96%




Epoch 093 | train_loss=0.6464 | val_acc=68.01%




Epoch 094 | train_loss=0.6546 | val_acc=64.93%




Epoch 095 | train_loss=0.6736 | val_acc=67.65%




Epoch 096 | train_loss=0.6572 | val_acc=65.78%




Epoch 097 | train_loss=0.6793 | val_acc=70.07%




Epoch 098 | train_loss=0.6395 | val_acc=69.52%




Epoch 099 | train_loss=0.6169 | val_acc=69.46%




Epoch 100 | train_loss=0.6089 | val_acc=68.83%




Epoch 101 | train_loss=0.4495 | val_acc=75.47%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 102 | train_loss=0.3589 | val_acc=76.52%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 103 | train_loss=0.3313 | val_acc=76.06%




Epoch 104 | train_loss=0.3095 | val_acc=76.08%




Epoch 105 | train_loss=0.2840 | val_acc=76.41%




Epoch 106 | train_loss=0.2924 | val_acc=76.35%




Epoch 107 | train_loss=0.2647 | val_acc=76.46%




Epoch 108 | train_loss=0.2618 | val_acc=76.60%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 109 | train_loss=0.2471 | val_acc=76.46%




Epoch 110 | train_loss=0.2318 | val_acc=76.50%




Epoch 111 | train_loss=0.2364 | val_acc=76.67%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 112 | train_loss=0.2242 | val_acc=76.75%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 113 | train_loss=0.2105 | val_acc=76.61%




Epoch 114 | train_loss=0.2012 | val_acc=76.42%




Epoch 115 | train_loss=0.1970 | val_acc=76.40%




Epoch 116 | train_loss=0.1954 | val_acc=76.16%




Epoch 117 | train_loss=0.1928 | val_acc=76.24%




Epoch 118 | train_loss=0.1825 | val_acc=76.20%




Epoch 119 | train_loss=0.1643 | val_acc=76.64%




Epoch 120 | train_loss=0.1698 | val_acc=75.87%




Epoch 121 | train_loss=0.1852 | val_acc=76.07%




Epoch 122 | train_loss=0.1544 | val_acc=76.04%




Epoch 123 | train_loss=0.1547 | val_acc=76.27%




Epoch 124 | train_loss=0.1531 | val_acc=76.03%




Epoch 125 | train_loss=0.1561 | val_acc=75.65%




Epoch 126 | train_loss=0.1378 | val_acc=75.35%




Epoch 127 | train_loss=0.1334 | val_acc=75.85%




Epoch 128 | train_loss=0.1466 | val_acc=75.58%




Epoch 129 | train_loss=0.1266 | val_acc=76.04%




Epoch 130 | train_loss=0.1262 | val_acc=75.90%




Epoch 131 | train_loss=0.1269 | val_acc=75.56%




Epoch 132 | train_loss=0.1178 | val_acc=75.96%




Epoch 133 | train_loss=0.1155 | val_acc=76.24%




Epoch 134 | train_loss=0.1187 | val_acc=76.26%




Epoch 135 | train_loss=0.1272 | val_acc=75.72%




Epoch 136 | train_loss=0.1076 | val_acc=76.09%




Epoch 137 | train_loss=0.1052 | val_acc=75.77%




Epoch 138 | train_loss=0.1041 | val_acc=75.67%




Epoch 139 | train_loss=0.1028 | val_acc=75.70%




Epoch 140 | train_loss=0.1004 | val_acc=75.30%




Epoch 141 | train_loss=0.1083 | val_acc=74.77%




Epoch 142 | train_loss=0.1080 | val_acc=75.71%




Epoch 143 | train_loss=0.0913 | val_acc=75.55%




Epoch 144 | train_loss=0.1131 | val_acc=75.08%




Epoch 145 | train_loss=0.0903 | val_acc=75.77%




Epoch 146 | train_loss=0.0960 | val_acc=75.51%




Epoch 147 | train_loss=0.1193 | val_acc=76.05%




Epoch 148 | train_loss=0.1347 | val_acc=76.17%




Epoch 149 | train_loss=0.0984 | val_acc=75.81%




Epoch 150 | train_loss=0.0883 | val_acc=75.41%




Epoch 151 | train_loss=0.0789 | val_acc=76.21%




Epoch 152 | train_loss=0.0670 | val_acc=76.34%




Epoch 153 | train_loss=0.0604 | val_acc=76.28%




Epoch 154 | train_loss=0.0590 | val_acc=76.07%




Epoch 155 | train_loss=0.0526 | val_acc=76.41%




Epoch 156 | train_loss=0.0575 | val_acc=76.34%




Epoch 157 | train_loss=0.0533 | val_acc=76.41%




Epoch 158 | train_loss=0.0516 | val_acc=76.40%




Epoch 159 | train_loss=0.0508 | val_acc=76.37%




Epoch 160 | train_loss=0.0496 | val_acc=76.42%




Epoch 161 | train_loss=0.0478 | val_acc=76.46%




Epoch 162 | train_loss=0.0458 | val_acc=76.65%




Epoch 163 | train_loss=0.0457 | val_acc=76.55%




Epoch 164 | train_loss=0.0484 | val_acc=76.72%




Epoch 165 | train_loss=0.0474 | val_acc=76.38%




Epoch 166 | train_loss=0.0440 | val_acc=76.57%




Epoch 167 | train_loss=0.0420 | val_acc=76.40%




Epoch 168 | train_loss=0.0400 | val_acc=76.59%




Epoch 169 | train_loss=0.0435 | val_acc=76.65%




Epoch 170 | train_loss=0.0414 | val_acc=76.28%




Epoch 171 | train_loss=0.0417 | val_acc=76.51%




Epoch 172 | train_loss=0.0442 | val_acc=76.59%




Epoch 173 | train_loss=0.0383 | val_acc=76.77%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 174 | train_loss=0.0382 | val_acc=76.43%




Epoch 175 | train_loss=0.0374 | val_acc=76.60%




Epoch 176 | train_loss=0.0368 | val_acc=76.49%




Epoch 177 | train_loss=0.0393 | val_acc=76.50%




Epoch 178 | train_loss=0.0388 | val_acc=76.50%




Epoch 179 | train_loss=0.0356 | val_acc=76.62%




Epoch 180 | train_loss=0.0402 | val_acc=76.57%




Epoch 181 | train_loss=0.0359 | val_acc=76.78%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 182 | train_loss=0.0330 | val_acc=76.65%




Epoch 183 | train_loss=0.0357 | val_acc=76.64%




Epoch 184 | train_loss=0.0329 | val_acc=76.68%




Epoch 185 | train_loss=0.0335 | val_acc=76.90%
  → New best! Saved to teacher_2_resnet34_best.pth




Epoch 186 | train_loss=0.0329 | val_acc=76.48%




Epoch 187 | train_loss=0.0333 | val_acc=76.64%




Epoch 188 | train_loss=0.0322 | val_acc=76.56%




Epoch 189 | train_loss=0.0318 | val_acc=76.60%




Epoch 190 | train_loss=0.0335 | val_acc=76.64%




Epoch 191 | train_loss=0.0329 | val_acc=76.71%




Epoch 192 | train_loss=0.0329 | val_acc=76.59%




Epoch 193 | train_loss=0.0339 | val_acc=76.40%




Epoch 194 | train_loss=0.0328 | val_acc=76.36%




Epoch 195 | train_loss=0.0299 | val_acc=76.35%




Epoch 196 | train_loss=0.0317 | val_acc=76.41%




Epoch 197 | train_loss=0.0304 | val_acc=76.42%




Epoch 198 | train_loss=0.0300 | val_acc=76.51%




Epoch 199 | train_loss=0.0288 | val_acc=76.36%




Epoch 200 | train_loss=0.0305 | val_acc=76.41%
Teacher #2 (resnet34) best validation accuracy: 76.90%

=== Training teacher #3 ===




Epoch 001 | train_loss=12.9814 | val_acc=13.14%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 002 | train_loss=4.1827 | val_acc=10.00%




Epoch 003 | train_loss=3.7260 | val_acc=18.88%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 004 | train_loss=2.2162 | val_acc=19.36%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 005 | train_loss=2.2481 | val_acc=16.57%




Epoch 006 | train_loss=2.1873 | val_acc=16.94%




Epoch 007 | train_loss=2.1527 | val_acc=17.34%




Epoch 008 | train_loss=2.1262 | val_acc=20.24%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 009 | train_loss=2.1089 | val_acc=20.46%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 010 | train_loss=2.0929 | val_acc=23.07%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 011 | train_loss=2.0820 | val_acc=22.26%




Epoch 012 | train_loss=2.0693 | val_acc=23.99%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 013 | train_loss=2.0467 | val_acc=21.82%




Epoch 014 | train_loss=2.0377 | val_acc=19.75%




Epoch 015 | train_loss=2.1576 | val_acc=21.35%




Epoch 016 | train_loss=2.0813 | val_acc=23.96%




Epoch 017 | train_loss=2.0537 | val_acc=24.51%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 018 | train_loss=2.0308 | val_acc=25.62%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 019 | train_loss=2.0420 | val_acc=25.93%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 020 | train_loss=2.0016 | val_acc=26.69%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 021 | train_loss=1.9852 | val_acc=26.59%




Epoch 022 | train_loss=1.9572 | val_acc=28.00%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 023 | train_loss=1.9439 | val_acc=28.86%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 024 | train_loss=1.9194 | val_acc=26.77%




Epoch 025 | train_loss=1.9023 | val_acc=30.43%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 026 | train_loss=1.8775 | val_acc=31.24%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 027 | train_loss=1.8638 | val_acc=29.31%




Epoch 028 | train_loss=1.8466 | val_acc=32.16%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 029 | train_loss=1.8261 | val_acc=31.04%




Epoch 030 | train_loss=1.8037 | val_acc=33.30%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 031 | train_loss=1.7919 | val_acc=33.15%




Epoch 032 | train_loss=1.7917 | val_acc=34.55%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 033 | train_loss=1.7455 | val_acc=31.05%




Epoch 034 | train_loss=1.7413 | val_acc=34.45%




Epoch 035 | train_loss=1.7374 | val_acc=25.64%




Epoch 036 | train_loss=1.8837 | val_acc=27.44%




Epoch 037 | train_loss=1.8019 | val_acc=33.71%




Epoch 038 | train_loss=1.7441 | val_acc=29.12%




Epoch 039 | train_loss=1.7334 | val_acc=33.91%




Epoch 040 | train_loss=1.6946 | val_acc=36.90%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 041 | train_loss=1.6644 | val_acc=36.71%




Epoch 042 | train_loss=1.6488 | val_acc=40.54%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 043 | train_loss=1.6348 | val_acc=39.09%




Epoch 044 | train_loss=1.6007 | val_acc=32.62%




Epoch 045 | train_loss=1.5948 | val_acc=43.90%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 046 | train_loss=1.5812 | val_acc=35.73%




Epoch 047 | train_loss=1.5590 | val_acc=42.33%




Epoch 048 | train_loss=1.5652 | val_acc=43.55%




Epoch 049 | train_loss=1.5366 | val_acc=42.46%




Epoch 050 | train_loss=1.5254 | val_acc=45.45%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 051 | train_loss=1.5296 | val_acc=36.77%




Epoch 052 | train_loss=1.5071 | val_acc=41.54%




Epoch 053 | train_loss=1.4982 | val_acc=44.46%




Epoch 054 | train_loss=1.4762 | val_acc=45.46%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 055 | train_loss=1.4753 | val_acc=42.68%




Epoch 056 | train_loss=1.4729 | val_acc=44.06%




Epoch 057 | train_loss=1.4550 | val_acc=45.11%




Epoch 058 | train_loss=1.4627 | val_acc=41.93%




Epoch 059 | train_loss=1.4425 | val_acc=46.72%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 060 | train_loss=1.4454 | val_acc=41.57%




Epoch 061 | train_loss=1.4247 | val_acc=47.04%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 062 | train_loss=1.4140 | val_acc=47.61%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 063 | train_loss=1.4200 | val_acc=46.30%




Epoch 064 | train_loss=1.3970 | val_acc=47.16%




Epoch 065 | train_loss=1.3699 | val_acc=49.00%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 066 | train_loss=1.3684 | val_acc=48.15%




Epoch 067 | train_loss=1.3660 | val_acc=49.92%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 068 | train_loss=1.3558 | val_acc=48.79%




Epoch 069 | train_loss=1.3482 | val_acc=46.02%




Epoch 070 | train_loss=1.3386 | val_acc=49.03%




Epoch 071 | train_loss=1.3345 | val_acc=48.66%




Epoch 072 | train_loss=1.3002 | val_acc=50.12%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 073 | train_loss=1.3053 | val_acc=51.99%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 074 | train_loss=1.3178 | val_acc=49.31%




Epoch 075 | train_loss=1.2886 | val_acc=48.99%




Epoch 076 | train_loss=1.2843 | val_acc=53.98%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 077 | train_loss=1.2791 | val_acc=51.16%




Epoch 078 | train_loss=1.2491 | val_acc=54.00%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 079 | train_loss=1.2349 | val_acc=48.65%




Epoch 080 | train_loss=1.2183 | val_acc=53.44%




Epoch 081 | train_loss=1.2340 | val_acc=53.59%




Epoch 082 | train_loss=1.2176 | val_acc=52.34%




Epoch 083 | train_loss=1.1958 | val_acc=56.50%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 084 | train_loss=1.1805 | val_acc=52.00%




Epoch 085 | train_loss=1.1903 | val_acc=53.43%




Epoch 086 | train_loss=1.1720 | val_acc=56.08%




Epoch 087 | train_loss=1.1598 | val_acc=56.40%




Epoch 088 | train_loss=1.1323 | val_acc=54.70%




Epoch 089 | train_loss=1.1406 | val_acc=49.62%




Epoch 090 | train_loss=1.1291 | val_acc=55.69%




Epoch 091 | train_loss=1.1301 | val_acc=52.63%




Epoch 092 | train_loss=1.1082 | val_acc=58.72%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 093 | train_loss=1.0931 | val_acc=59.08%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 094 | train_loss=1.0601 | val_acc=58.80%




Epoch 095 | train_loss=1.0714 | val_acc=61.62%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 096 | train_loss=1.0683 | val_acc=52.98%




Epoch 097 | train_loss=1.0272 | val_acc=58.19%




Epoch 098 | train_loss=1.0289 | val_acc=61.44%




Epoch 099 | train_loss=1.0153 | val_acc=59.97%




Epoch 100 | train_loss=1.0354 | val_acc=61.32%




Epoch 101 | train_loss=0.8601 | val_acc=68.91%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 102 | train_loss=0.7926 | val_acc=69.50%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 103 | train_loss=0.7633 | val_acc=70.14%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 104 | train_loss=0.7448 | val_acc=69.36%




Epoch 105 | train_loss=0.7444 | val_acc=70.38%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 106 | train_loss=0.7168 | val_acc=70.29%




Epoch 107 | train_loss=0.7112 | val_acc=70.65%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 108 | train_loss=0.6977 | val_acc=70.01%




Epoch 109 | train_loss=0.6847 | val_acc=70.33%




Epoch 110 | train_loss=0.6842 | val_acc=70.39%




Epoch 111 | train_loss=0.6751 | val_acc=70.50%




Epoch 112 | train_loss=0.6764 | val_acc=71.01%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 113 | train_loss=0.6581 | val_acc=70.62%




Epoch 114 | train_loss=0.6491 | val_acc=71.40%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 115 | train_loss=0.6368 | val_acc=70.70%




Epoch 116 | train_loss=0.6440 | val_acc=70.94%




Epoch 117 | train_loss=0.6398 | val_acc=71.20%




Epoch 118 | train_loss=0.6212 | val_acc=71.09%




Epoch 119 | train_loss=0.6248 | val_acc=71.51%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 120 | train_loss=0.6148 | val_acc=71.07%




Epoch 121 | train_loss=0.6114 | val_acc=71.17%




Epoch 122 | train_loss=0.5944 | val_acc=71.21%




Epoch 123 | train_loss=0.5985 | val_acc=71.23%




Epoch 124 | train_loss=0.5943 | val_acc=71.50%




Epoch 125 | train_loss=0.5834 | val_acc=70.65%




Epoch 126 | train_loss=0.5784 | val_acc=71.37%




Epoch 127 | train_loss=0.5755 | val_acc=71.88%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 128 | train_loss=0.5846 | val_acc=71.24%




Epoch 129 | train_loss=0.5556 | val_acc=71.92%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 130 | train_loss=0.5745 | val_acc=71.46%




Epoch 131 | train_loss=0.5468 | val_acc=71.98%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 132 | train_loss=0.5365 | val_acc=71.70%




Epoch 133 | train_loss=0.5350 | val_acc=71.55%




Epoch 134 | train_loss=0.5599 | val_acc=71.88%




Epoch 135 | train_loss=0.5484 | val_acc=71.50%




Epoch 136 | train_loss=0.5485 | val_acc=71.48%




Epoch 137 | train_loss=0.5350 | val_acc=71.31%




Epoch 138 | train_loss=0.5314 | val_acc=70.81%




Epoch 139 | train_loss=0.5034 | val_acc=71.20%




Epoch 140 | train_loss=0.5032 | val_acc=71.85%




Epoch 141 | train_loss=0.4955 | val_acc=71.24%




Epoch 142 | train_loss=0.4956 | val_acc=71.57%




Epoch 143 | train_loss=0.4938 | val_acc=70.64%




Epoch 144 | train_loss=0.4913 | val_acc=70.74%




Epoch 145 | train_loss=0.4987 | val_acc=71.31%




Epoch 146 | train_loss=0.4825 | val_acc=71.46%




Epoch 147 | train_loss=0.4896 | val_acc=70.90%




Epoch 148 | train_loss=0.4802 | val_acc=71.47%




Epoch 149 | train_loss=0.4704 | val_acc=70.67%




Epoch 150 | train_loss=0.4844 | val_acc=71.76%




Epoch 151 | train_loss=0.4072 | val_acc=72.88%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 152 | train_loss=0.3821 | val_acc=72.71%




Epoch 153 | train_loss=0.3706 | val_acc=73.02%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 154 | train_loss=0.3698 | val_acc=73.22%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 155 | train_loss=0.3635 | val_acc=73.27%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 156 | train_loss=0.3560 | val_acc=72.97%




Epoch 157 | train_loss=0.3508 | val_acc=73.24%




Epoch 158 | train_loss=0.3441 | val_acc=73.25%




Epoch 159 | train_loss=0.3462 | val_acc=73.17%




Epoch 160 | train_loss=0.3395 | val_acc=72.96%




Epoch 161 | train_loss=0.3426 | val_acc=73.29%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 162 | train_loss=0.3311 | val_acc=73.07%




Epoch 163 | train_loss=0.3280 | val_acc=73.16%




Epoch 164 | train_loss=0.3290 | val_acc=73.19%




Epoch 165 | train_loss=0.3288 | val_acc=73.26%




Epoch 166 | train_loss=0.3140 | val_acc=73.29%




Epoch 167 | train_loss=0.3128 | val_acc=73.14%




Epoch 168 | train_loss=0.3195 | val_acc=73.17%




Epoch 169 | train_loss=0.3094 | val_acc=73.19%




Epoch 170 | train_loss=0.3211 | val_acc=73.23%




Epoch 171 | train_loss=0.3129 | val_acc=73.09%




Epoch 172 | train_loss=0.3170 | val_acc=73.05%




Epoch 173 | train_loss=0.3062 | val_acc=73.04%




Epoch 174 | train_loss=0.3007 | val_acc=72.98%




Epoch 175 | train_loss=0.3058 | val_acc=72.97%




Epoch 176 | train_loss=0.3028 | val_acc=73.00%




Epoch 177 | train_loss=0.2985 | val_acc=72.58%




Epoch 178 | train_loss=0.2995 | val_acc=73.12%




Epoch 179 | train_loss=0.2890 | val_acc=72.84%




Epoch 180 | train_loss=0.2909 | val_acc=72.84%




Epoch 181 | train_loss=0.2901 | val_acc=72.70%




Epoch 182 | train_loss=0.2863 | val_acc=72.95%




Epoch 183 | train_loss=0.2881 | val_acc=73.07%




Epoch 184 | train_loss=0.2845 | val_acc=72.99%




Epoch 185 | train_loss=0.2841 | val_acc=73.04%




Epoch 186 | train_loss=0.2869 | val_acc=73.32%
  → New best! Saved to teacher_3_resnet50_best.pth




Epoch 187 | train_loss=0.2723 | val_acc=73.02%




Epoch 188 | train_loss=0.2711 | val_acc=72.82%




Epoch 189 | train_loss=0.2682 | val_acc=72.78%




Epoch 190 | train_loss=0.2758 | val_acc=73.04%




Epoch 191 | train_loss=0.2670 | val_acc=73.11%




Epoch 192 | train_loss=0.2593 | val_acc=73.03%




Epoch 193 | train_loss=0.2680 | val_acc=72.69%




Epoch 194 | train_loss=0.2684 | val_acc=73.20%




Epoch 195 | train_loss=0.2606 | val_acc=72.91%




Epoch 196 | train_loss=0.2677 | val_acc=72.73%




Epoch 197 | train_loss=0.2577 | val_acc=72.84%




Epoch 198 | train_loss=0.2563 | val_acc=72.99%




Epoch 199 | train_loss=0.2601 | val_acc=73.25%




Epoch 200 | train_loss=0.2515 | val_acc=72.95%
Teacher #3 (resnet50) best validation accuracy: 73.32%

=== Training teacher #4 ===




Epoch 001 | train_loss=4.1889 | val_acc=10.75%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 002 | train_loss=2.4121 | val_acc=13.92%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 003 | train_loss=2.2292 | val_acc=20.84%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 004 | train_loss=2.0737 | val_acc=22.96%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 005 | train_loss=1.9388 | val_acc=27.79%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 006 | train_loss=1.8981 | val_acc=32.22%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 007 | train_loss=1.8200 | val_acc=32.85%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 008 | train_loss=1.7703 | val_acc=35.01%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 009 | train_loss=1.7266 | val_acc=38.84%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 010 | train_loss=1.6781 | val_acc=39.80%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 011 | train_loss=1.6356 | val_acc=38.57%




Epoch 012 | train_loss=1.6173 | val_acc=42.38%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 013 | train_loss=1.5686 | val_acc=40.09%




Epoch 014 | train_loss=1.5424 | val_acc=33.85%




Epoch 015 | train_loss=1.7330 | val_acc=37.51%




Epoch 016 | train_loss=1.5854 | val_acc=43.06%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 017 | train_loss=1.5375 | val_acc=44.60%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 018 | train_loss=1.4955 | val_acc=45.88%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 019 | train_loss=1.4453 | val_acc=48.00%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 020 | train_loss=1.4030 | val_acc=46.93%




Epoch 021 | train_loss=1.3862 | val_acc=50.23%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 022 | train_loss=1.3709 | val_acc=48.40%




Epoch 023 | train_loss=1.3221 | val_acc=50.33%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 024 | train_loss=1.2714 | val_acc=54.75%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 025 | train_loss=1.2610 | val_acc=55.90%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 026 | train_loss=1.2351 | val_acc=53.74%




Epoch 027 | train_loss=1.2135 | val_acc=53.47%




Epoch 028 | train_loss=1.1885 | val_acc=55.74%




Epoch 029 | train_loss=1.1481 | val_acc=57.06%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 030 | train_loss=1.1189 | val_acc=56.84%




Epoch 031 | train_loss=1.1198 | val_acc=55.79%




Epoch 032 | train_loss=1.1187 | val_acc=55.97%




Epoch 033 | train_loss=1.0821 | val_acc=56.79%




Epoch 034 | train_loss=1.0695 | val_acc=57.37%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 035 | train_loss=1.0872 | val_acc=55.89%




Epoch 036 | train_loss=1.1799 | val_acc=56.36%




Epoch 037 | train_loss=1.0712 | val_acc=58.74%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 038 | train_loss=1.0585 | val_acc=61.76%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 039 | train_loss=1.0223 | val_acc=62.27%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 040 | train_loss=0.9847 | val_acc=62.49%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 041 | train_loss=0.9766 | val_acc=62.63%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 042 | train_loss=0.9404 | val_acc=63.98%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 043 | train_loss=0.9558 | val_acc=62.92%




Epoch 044 | train_loss=0.9483 | val_acc=61.16%




Epoch 045 | train_loss=0.9067 | val_acc=64.17%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 046 | train_loss=0.9710 | val_acc=62.60%




Epoch 047 | train_loss=0.8766 | val_acc=65.63%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 048 | train_loss=0.8595 | val_acc=63.42%




Epoch 049 | train_loss=0.8486 | val_acc=62.30%




Epoch 050 | train_loss=0.8630 | val_acc=63.41%




Epoch 051 | train_loss=0.8449 | val_acc=61.07%




Epoch 052 | train_loss=0.8245 | val_acc=66.08%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 053 | train_loss=0.7981 | val_acc=63.27%




Epoch 054 | train_loss=0.8502 | val_acc=63.38%




Epoch 055 | train_loss=0.8066 | val_acc=67.29%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 056 | train_loss=0.8523 | val_acc=64.09%




Epoch 057 | train_loss=0.7856 | val_acc=62.09%




Epoch 058 | train_loss=0.8259 | val_acc=62.81%




Epoch 059 | train_loss=0.7846 | val_acc=59.54%




Epoch 060 | train_loss=0.7320 | val_acc=67.60%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 061 | train_loss=0.7596 | val_acc=68.90%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 062 | train_loss=0.7376 | val_acc=67.23%




Epoch 063 | train_loss=0.7522 | val_acc=64.68%




Epoch 064 | train_loss=0.7443 | val_acc=65.88%




Epoch 065 | train_loss=0.7036 | val_acc=69.60%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 066 | train_loss=0.6829 | val_acc=69.18%




Epoch 067 | train_loss=0.7002 | val_acc=67.64%




Epoch 068 | train_loss=0.7229 | val_acc=65.61%




Epoch 069 | train_loss=0.7117 | val_acc=67.22%




Epoch 070 | train_loss=0.7076 | val_acc=67.02%




Epoch 071 | train_loss=0.7155 | val_acc=69.31%




Epoch 072 | train_loss=0.6753 | val_acc=68.53%




Epoch 073 | train_loss=0.6895 | val_acc=67.54%




Epoch 074 | train_loss=0.6707 | val_acc=69.33%




Epoch 075 | train_loss=0.6927 | val_acc=68.02%




Epoch 076 | train_loss=0.6719 | val_acc=69.84%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 077 | train_loss=0.6834 | val_acc=68.51%




Epoch 078 | train_loss=0.6912 | val_acc=70.14%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 079 | train_loss=0.6548 | val_acc=68.73%




Epoch 080 | train_loss=0.6484 | val_acc=68.08%




Epoch 081 | train_loss=0.6539 | val_acc=70.76%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 082 | train_loss=0.6486 | val_acc=69.89%




Epoch 083 | train_loss=0.6190 | val_acc=67.74%




Epoch 084 | train_loss=0.6228 | val_acc=69.14%




Epoch 085 | train_loss=0.6121 | val_acc=65.65%




Epoch 086 | train_loss=0.6153 | val_acc=69.89%




Epoch 087 | train_loss=0.6268 | val_acc=65.29%




Epoch 088 | train_loss=0.6313 | val_acc=64.01%




Epoch 089 | train_loss=0.6104 | val_acc=69.23%




Epoch 090 | train_loss=0.6292 | val_acc=68.85%




Epoch 091 | train_loss=0.6264 | val_acc=66.42%




Epoch 092 | train_loss=0.6307 | val_acc=65.38%




Epoch 093 | train_loss=0.6204 | val_acc=68.05%




Epoch 094 | train_loss=0.6022 | val_acc=69.30%




Epoch 095 | train_loss=0.6120 | val_acc=69.04%




Epoch 096 | train_loss=0.6210 | val_acc=69.78%




Epoch 097 | train_loss=0.5689 | val_acc=71.26%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 098 | train_loss=0.5670 | val_acc=66.12%




Epoch 099 | train_loss=0.6031 | val_acc=71.36%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 100 | train_loss=0.5785 | val_acc=68.97%




Epoch 101 | train_loss=0.4102 | val_acc=75.90%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 102 | train_loss=0.3265 | val_acc=76.36%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 103 | train_loss=0.2937 | val_acc=77.43%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 104 | train_loss=0.2747 | val_acc=76.80%




Epoch 105 | train_loss=0.2619 | val_acc=76.54%




Epoch 106 | train_loss=0.2468 | val_acc=77.15%




Epoch 107 | train_loss=0.2272 | val_acc=76.75%




Epoch 108 | train_loss=0.2178 | val_acc=77.24%




Epoch 109 | train_loss=0.2148 | val_acc=77.02%




Epoch 110 | train_loss=0.2088 | val_acc=76.72%




Epoch 111 | train_loss=0.2070 | val_acc=77.50%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 112 | train_loss=0.1858 | val_acc=77.00%




Epoch 113 | train_loss=0.1801 | val_acc=77.12%




Epoch 114 | train_loss=0.1708 | val_acc=76.67%




Epoch 115 | train_loss=0.1662 | val_acc=77.31%




Epoch 116 | train_loss=0.1605 | val_acc=76.81%




Epoch 117 | train_loss=0.1573 | val_acc=76.89%




Epoch 118 | train_loss=0.1525 | val_acc=76.63%




Epoch 119 | train_loss=0.1411 | val_acc=76.84%




Epoch 120 | train_loss=0.1460 | val_acc=76.90%




Epoch 121 | train_loss=0.1322 | val_acc=76.43%




Epoch 122 | train_loss=0.1513 | val_acc=76.60%




Epoch 123 | train_loss=0.1265 | val_acc=76.79%




Epoch 124 | train_loss=0.1255 | val_acc=76.67%




Epoch 125 | train_loss=0.1208 | val_acc=76.38%




Epoch 126 | train_loss=0.1140 | val_acc=76.82%




Epoch 127 | train_loss=0.1289 | val_acc=76.48%




Epoch 128 | train_loss=0.1186 | val_acc=76.89%




Epoch 129 | train_loss=0.1091 | val_acc=76.37%




Epoch 130 | train_loss=0.1111 | val_acc=76.36%




Epoch 131 | train_loss=0.1156 | val_acc=75.93%




Epoch 132 | train_loss=0.0972 | val_acc=76.40%




Epoch 133 | train_loss=0.1022 | val_acc=76.36%




Epoch 134 | train_loss=0.1017 | val_acc=76.85%




Epoch 135 | train_loss=0.1008 | val_acc=76.68%




Epoch 136 | train_loss=0.0897 | val_acc=76.09%




Epoch 137 | train_loss=0.0958 | val_acc=76.92%




Epoch 138 | train_loss=0.1069 | val_acc=76.12%




Epoch 139 | train_loss=0.0940 | val_acc=76.55%




Epoch 140 | train_loss=0.0863 | val_acc=76.25%




Epoch 141 | train_loss=0.0885 | val_acc=76.83%




Epoch 142 | train_loss=0.0830 | val_acc=76.26%




Epoch 143 | train_loss=0.0869 | val_acc=76.82%




Epoch 144 | train_loss=0.0768 | val_acc=76.51%




Epoch 145 | train_loss=0.0820 | val_acc=76.66%




Epoch 146 | train_loss=0.0807 | val_acc=76.60%




Epoch 147 | train_loss=0.0965 | val_acc=76.54%




Epoch 148 | train_loss=0.0906 | val_acc=75.71%




Epoch 149 | train_loss=0.0867 | val_acc=76.38%




Epoch 150 | train_loss=0.1052 | val_acc=76.64%




Epoch 151 | train_loss=0.0700 | val_acc=76.90%




Epoch 152 | train_loss=0.0600 | val_acc=76.98%




Epoch 153 | train_loss=0.0573 | val_acc=77.32%




Epoch 154 | train_loss=0.0493 | val_acc=77.46%




Epoch 155 | train_loss=0.0520 | val_acc=77.24%




Epoch 156 | train_loss=0.0463 | val_acc=77.45%




Epoch 157 | train_loss=0.0478 | val_acc=77.67%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 158 | train_loss=0.0452 | val_acc=77.58%




Epoch 159 | train_loss=0.0404 | val_acc=77.44%




Epoch 160 | train_loss=0.0435 | val_acc=77.63%




Epoch 161 | train_loss=0.0420 | val_acc=77.65%




Epoch 162 | train_loss=0.0401 | val_acc=77.57%




Epoch 163 | train_loss=0.0412 | val_acc=77.62%




Epoch 164 | train_loss=0.0443 | val_acc=77.52%




Epoch 165 | train_loss=0.0380 | val_acc=77.39%




Epoch 166 | train_loss=0.0402 | val_acc=77.33%




Epoch 167 | train_loss=0.0354 | val_acc=77.36%




Epoch 168 | train_loss=0.0371 | val_acc=77.73%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 169 | train_loss=0.0411 | val_acc=77.30%




Epoch 170 | train_loss=0.0295 | val_acc=77.59%




Epoch 171 | train_loss=0.0339 | val_acc=77.48%




Epoch 172 | train_loss=0.0357 | val_acc=77.38%




Epoch 173 | train_loss=0.0330 | val_acc=77.60%




Epoch 174 | train_loss=0.0349 | val_acc=77.60%




Epoch 175 | train_loss=0.0284 | val_acc=77.64%




Epoch 176 | train_loss=0.0344 | val_acc=77.60%




Epoch 177 | train_loss=0.0319 | val_acc=77.82%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 178 | train_loss=0.0338 | val_acc=77.74%




Epoch 179 | train_loss=0.0353 | val_acc=77.65%




Epoch 180 | train_loss=0.0318 | val_acc=77.75%




Epoch 181 | train_loss=0.0313 | val_acc=77.77%




Epoch 182 | train_loss=0.0312 | val_acc=77.57%




Epoch 183 | train_loss=0.0309 | val_acc=77.76%




Epoch 184 | train_loss=0.0293 | val_acc=77.84%
  → New best! Saved to teacher_4_resnet34_best.pth




Epoch 185 | train_loss=0.0305 | val_acc=77.82%




Epoch 186 | train_loss=0.0298 | val_acc=77.74%




Epoch 187 | train_loss=0.0311 | val_acc=77.59%




Epoch 188 | train_loss=0.0265 | val_acc=77.46%




Epoch 189 | train_loss=0.0276 | val_acc=77.58%




Epoch 190 | train_loss=0.0251 | val_acc=77.51%




Epoch 191 | train_loss=0.0282 | val_acc=77.63%




Epoch 192 | train_loss=0.0278 | val_acc=77.75%




Epoch 193 | train_loss=0.0265 | val_acc=77.56%




Epoch 194 | train_loss=0.0255 | val_acc=77.67%




Epoch 195 | train_loss=0.0247 | val_acc=77.76%




Epoch 196 | train_loss=0.0251 | val_acc=77.38%




Epoch 197 | train_loss=0.0243 | val_acc=77.70%




Epoch 198 | train_loss=0.0257 | val_acc=77.56%




Epoch 199 | train_loss=0.0237 | val_acc=77.67%




Epoch 200 | train_loss=0.0249 | val_acc=77.72%
Teacher #4 (resnet34) best validation accuracy: 77.84%


# 改进版Part3
下面的代码是一份 改进版的 Teacher 训练脚本，把 每个 teacher 都改成在 全量 CIFAR‑10 上训练，同时加入更强的数据增强（随机裁剪 + 翻转 + Cutout）、Label Smoothing、Mixup，以及更合理的学习率调度，通常可以让单个 ResNet‑50 在 CIFAR‑10 上达到 90% 以上的准确率。

要点总结

全量数据：不再把数据切分成多个 teacher_loader，而是都用同一个完整的 train_loader。

强增强：RandomCrop(32,4) / RandomHorizontalFlip / Cutout。

Mixup：在 batch 级别做 Mixup，增加泛化。

Label Smoothing：CrossEntropy 损失里做平滑，抑制过度自信。

CosineAnnealingLR：比 MultiStepLR 更平滑地衰减 lr。

更长 schedule：200 → 300 epochs，常在 0–1 之间 Cosine 衰减。

为什么会有效？
全量数据 + 强增强：随机裁剪、翻转、Cutout（自行接入）让模型见到更多变形。

Mixup：在样本层面插值，能显著提升模型泛化。

Label Smoothing：抑制模型过度自信，提升 test accuracy。

Cosine LR：平滑衰减学习率，比阶梯式更好地利用后期微调。

足够多的 Epochs (300)：给 ResNet‑50 充分的训练时间。

在 CIFAR‑10 上，这套配置下 ResNet‑50 通常能稳稳突破 90%，甚至接近 93% ~ 95%。如果你还有更强的硬件，也可以把 Mixup 改成 CutMix、再加上 AutoAugment 做更激进的增强，accuracy 会更高。









In [20]:
import os, glob
import torch
import torch.nn as nn
import torch.optim as optim
from tqdm import tqdm
from torchvision import datasets, transforms
from torchvision.models import resnet50  # 统一用 ResNet‑50
from torch.utils.data import DataLoader
import numpy as np

# ───── 0️⃣ CIFAR-10 全量训练/测试 DataLoader ─────
mean = (0.4914, 0.4822, 0.4465)
std  = (0.2470, 0.2435, 0.2616)

# 强化数据增强
train_transform = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])

test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])

batch_size = 128
train_loader = DataLoader(
    datasets.CIFAR10(root='data/', train=True,  download=True, transform=train_transform),
    batch_size=batch_size, shuffle=True,  num_workers=4, pin_memory=True
)
test_loader = DataLoader(
    datasets.CIFAR10(root='data/', train=False, download=True, transform=test_transform),
    batch_size=batch_size, shuffle=False, num_workers=4, pin_memory=True
)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# ───── 1️⃣ Mixup 辅助函数 ─────
def mixup_data(x, y, alpha=1.0):
    if alpha > 0:
        lam = np.random.beta(alpha, alpha)
    else:
        lam = 1.0
    batch_size = x.size()[0]
    index = torch.randperm(batch_size).to(x.device)
    mixed_x = lam * x + (1 - lam) * x[index, :]
    y_a, y_b = y, y[index]
    return mixed_x, y_a, y_b, lam

# ───── 2️⃣ Label Smoothing CrossEntropy ─────
class LabelSmoothingCrossEntropy(nn.Module):
    def __init__(self, eps:float=0.1, reduction='mean'):
        super().__init__()
        self.eps = eps
        self.reduction = reduction
    def forward(self, outputs, targets):
        n = outputs.size(-1)
        log_preds = torch.log_softmax(outputs, dim=-1)
        # one-hot
        with torch.no_grad():
            true_dist = torch.zeros_like(log_preds)
            true_dist.fill_(self.eps / (n-1))
            true_dist.scatter_(1, targets.data.unsqueeze(1), 1-self.eps)
        loss = torch.sum(- true_dist * log_preds, dim=1)
        if self.reduction=='mean':
            return loss.mean()
        else:
            return loss.sum()

# ───── 3️⃣ 训练单个 teacher ─────
num_epochs = 300
criterion = LabelSmoothingCrossEntropy(eps=0.1)
mixup_alpha = 0.2

teacher_model = resnet50(num_classes=10).to(device)
optimizer = optim.SGD(
    teacher_model.parameters(),
    lr=0.1, momentum=0.9, weight_decay=1e-4
)
# Cosine 衰减到 0
scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=num_epochs)

best_acc = 0.0
best_ckpt = "teacher_resnet50_best.pth"

for epoch in range(1, num_epochs+1):
    teacher_model.train()
    running_loss = 0.0

    for xb, yb in tqdm(train_loader, desc=f"Epoch {epoch}/{num_epochs}", leave=False):
        xb, yb = xb.to(device), yb.to(device)
        optimizer.zero_grad()

        # 1) Mixup
        xb_mix, y_a, y_b, lam = mixup_data(xb, yb, alpha=mixup_alpha)

        # 2) forward + Label Smoothing
        outputs = teacher_model(xb_mix)
        loss = lam * criterion(outputs, y_a) + (1-lam) * criterion(outputs, y_b)

        loss.backward()
        optimizer.step()
        running_loss += loss.item() * xb.size(0)

    scheduler.step()
    avg_loss = running_loss / len(train_loader.dataset)

    # Validation
    teacher_model.eval()
    correct = total = 0
    with torch.no_grad():
        for xb, yb in test_loader:
            xb, yb = xb.to(device), yb.to(device)
            preds = teacher_model(xb).argmax(dim=1)
            correct += (preds==yb).sum().item()
            total   += yb.size(0)
    val_acc = 100. * correct / total

    print(f"Epoch {epoch:03d} | train_loss={avg_loss:.4f} | val_acc={val_acc:.2f}%")

    if val_acc > best_acc:
        best_acc = val_acc
        torch.save({
            'epoch': epoch,
            'arch':  'resnet50',
            'model_state': teacher_model.state_dict(),
            'optimizer_state': optimizer.state_dict(),
            'val_acc': best_acc,
        }, best_ckpt)
        print(f"  → New best! Saved to {best_ckpt}")

print(f"\n✅ Training complete. Best teacher val_acc = {best_acc:.2f}%")




Epoch 001 | train_loss=5.1129 | val_acc=15.98%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 002 | train_loss=2.2776 | val_acc=21.40%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 003 | train_loss=2.1576 | val_acc=25.41%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 004 | train_loss=2.0767 | val_acc=31.08%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 005 | train_loss=2.0175 | val_acc=35.10%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 006 | train_loss=1.9649 | val_acc=37.04%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 007 | train_loss=1.9508 | val_acc=38.88%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 008 | train_loss=1.9222 | val_acc=40.86%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 009 | train_loss=1.8901 | val_acc=42.69%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 010 | train_loss=1.8634 | val_acc=44.10%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 011 | train_loss=1.8412 | val_acc=45.20%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 012 | train_loss=1.8184 | val_acc=47.32%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 013 | train_loss=1.7891 | val_acc=49.04%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 014 | train_loss=1.7654 | val_acc=49.81%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 015 | train_loss=1.7477 | val_acc=48.71%




Epoch 016 | train_loss=1.7266 | val_acc=50.52%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 017 | train_loss=1.7103 | val_acc=53.89%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 018 | train_loss=1.6827 | val_acc=55.21%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 019 | train_loss=1.6676 | val_acc=55.43%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 020 | train_loss=1.6460 | val_acc=58.12%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 021 | train_loss=1.6063 | val_acc=58.94%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 022 | train_loss=1.5990 | val_acc=60.67%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 023 | train_loss=1.5746 | val_acc=62.60%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 024 | train_loss=1.5559 | val_acc=63.24%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 025 | train_loss=1.5134 | val_acc=64.80%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 026 | train_loss=1.5064 | val_acc=64.16%




Epoch 027 | train_loss=1.5060 | val_acc=64.26%




Epoch 028 | train_loss=1.5089 | val_acc=67.15%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 029 | train_loss=1.4937 | val_acc=62.84%




Epoch 030 | train_loss=1.5079 | val_acc=64.89%




Epoch 031 | train_loss=1.4676 | val_acc=67.47%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 032 | train_loss=1.4847 | val_acc=62.97%




Epoch 033 | train_loss=1.4590 | val_acc=70.45%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 034 | train_loss=1.4051 | val_acc=69.90%




Epoch 035 | train_loss=1.4070 | val_acc=71.40%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 036 | train_loss=1.3915 | val_acc=72.37%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 037 | train_loss=1.3621 | val_acc=73.65%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 038 | train_loss=1.3442 | val_acc=71.60%




Epoch 039 | train_loss=1.3815 | val_acc=72.61%




Epoch 040 | train_loss=1.3257 | val_acc=73.42%




Epoch 041 | train_loss=1.3153 | val_acc=72.60%




Epoch 042 | train_loss=1.2985 | val_acc=75.65%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 043 | train_loss=1.3482 | val_acc=73.86%




Epoch 044 | train_loss=1.3162 | val_acc=74.90%




Epoch 045 | train_loss=1.3077 | val_acc=74.54%




Epoch 046 | train_loss=1.2971 | val_acc=76.08%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 047 | train_loss=1.2836 | val_acc=75.13%




Epoch 048 | train_loss=1.2941 | val_acc=75.40%




Epoch 049 | train_loss=1.2600 | val_acc=75.54%




Epoch 050 | train_loss=1.2647 | val_acc=76.86%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 051 | train_loss=1.2571 | val_acc=76.85%




Epoch 052 | train_loss=1.2547 | val_acc=76.46%




Epoch 053 | train_loss=1.2478 | val_acc=78.23%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 054 | train_loss=1.2406 | val_acc=77.28%




Epoch 055 | train_loss=1.2603 | val_acc=76.49%




Epoch 056 | train_loss=1.2401 | val_acc=77.91%




Epoch 057 | train_loss=1.2244 | val_acc=79.06%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 058 | train_loss=1.2228 | val_acc=76.90%




Epoch 059 | train_loss=1.2048 | val_acc=78.23%




Epoch 060 | train_loss=1.1999 | val_acc=76.40%




Epoch 061 | train_loss=1.2211 | val_acc=77.10%




Epoch 062 | train_loss=1.2098 | val_acc=78.31%




Epoch 063 | train_loss=1.1915 | val_acc=79.67%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 064 | train_loss=1.2298 | val_acc=78.88%




Epoch 065 | train_loss=1.1770 | val_acc=78.48%




Epoch 066 | train_loss=1.1672 | val_acc=78.22%




Epoch 067 | train_loss=1.1893 | val_acc=78.37%




Epoch 068 | train_loss=1.2044 | val_acc=79.99%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 069 | train_loss=1.1936 | val_acc=76.21%




Epoch 070 | train_loss=1.1894 | val_acc=80.20%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 071 | train_loss=1.1800 | val_acc=79.57%




Epoch 072 | train_loss=1.1956 | val_acc=79.39%




Epoch 073 | train_loss=1.1528 | val_acc=81.92%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 074 | train_loss=1.1679 | val_acc=79.15%




Epoch 075 | train_loss=1.1615 | val_acc=79.63%




Epoch 076 | train_loss=1.1631 | val_acc=80.41%




Epoch 077 | train_loss=1.1655 | val_acc=81.57%




Epoch 078 | train_loss=1.1584 | val_acc=80.58%




Epoch 079 | train_loss=1.1531 | val_acc=80.27%




Epoch 080 | train_loss=1.1412 | val_acc=80.50%




Epoch 081 | train_loss=1.1739 | val_acc=80.70%




Epoch 082 | train_loss=1.1472 | val_acc=80.61%




Epoch 083 | train_loss=1.1375 | val_acc=81.12%




Epoch 084 | train_loss=1.1321 | val_acc=79.05%




Epoch 085 | train_loss=1.1304 | val_acc=82.25%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 086 | train_loss=1.1275 | val_acc=78.85%




Epoch 087 | train_loss=1.1248 | val_acc=80.21%




Epoch 088 | train_loss=1.1215 | val_acc=82.01%




Epoch 089 | train_loss=1.1120 | val_acc=82.02%




Epoch 090 | train_loss=1.1403 | val_acc=81.52%




Epoch 091 | train_loss=1.1538 | val_acc=82.86%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 092 | train_loss=1.1133 | val_acc=81.89%




Epoch 093 | train_loss=1.0959 | val_acc=82.08%




Epoch 094 | train_loss=1.1089 | val_acc=82.77%




Epoch 095 | train_loss=1.1233 | val_acc=81.06%




Epoch 096 | train_loss=1.0921 | val_acc=82.12%




Epoch 097 | train_loss=1.1102 | val_acc=82.83%




Epoch 098 | train_loss=1.1096 | val_acc=81.15%




Epoch 099 | train_loss=1.1143 | val_acc=82.77%




Epoch 100 | train_loss=1.0723 | val_acc=83.34%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 101 | train_loss=1.0844 | val_acc=81.18%




Epoch 102 | train_loss=1.0916 | val_acc=82.84%




Epoch 103 | train_loss=1.0978 | val_acc=82.19%




Epoch 104 | train_loss=1.0750 | val_acc=83.38%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 105 | train_loss=1.0729 | val_acc=81.77%




Epoch 106 | train_loss=1.0973 | val_acc=82.12%




Epoch 107 | train_loss=1.0749 | val_acc=82.58%




Epoch 108 | train_loss=1.0730 | val_acc=82.51%




Epoch 109 | train_loss=1.1064 | val_acc=83.70%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 110 | train_loss=1.0951 | val_acc=81.84%




Epoch 111 | train_loss=1.0881 | val_acc=82.70%




Epoch 112 | train_loss=1.1093 | val_acc=83.16%




Epoch 113 | train_loss=1.0519 | val_acc=83.49%




Epoch 114 | train_loss=1.0535 | val_acc=82.63%




Epoch 115 | train_loss=1.0739 | val_acc=82.75%




Epoch 116 | train_loss=1.0701 | val_acc=82.77%




Epoch 117 | train_loss=1.0753 | val_acc=83.44%




Epoch 118 | train_loss=1.0606 | val_acc=84.02%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 119 | train_loss=1.0640 | val_acc=83.24%




Epoch 120 | train_loss=1.0983 | val_acc=83.32%




Epoch 121 | train_loss=1.0665 | val_acc=83.82%




Epoch 122 | train_loss=1.0559 | val_acc=82.58%




Epoch 123 | train_loss=1.0443 | val_acc=82.11%




Epoch 124 | train_loss=1.0634 | val_acc=83.27%




Epoch 125 | train_loss=1.0606 | val_acc=83.37%




Epoch 126 | train_loss=1.0588 | val_acc=84.39%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 127 | train_loss=1.0561 | val_acc=84.04%




Epoch 128 | train_loss=1.0835 | val_acc=82.89%




Epoch 129 | train_loss=1.0771 | val_acc=83.66%




Epoch 130 | train_loss=1.0715 | val_acc=83.72%




Epoch 131 | train_loss=1.0144 | val_acc=81.65%




Epoch 132 | train_loss=1.0415 | val_acc=83.51%




Epoch 133 | train_loss=1.0359 | val_acc=84.55%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 134 | train_loss=1.0311 | val_acc=84.23%




Epoch 135 | train_loss=1.0445 | val_acc=83.77%




Epoch 136 | train_loss=1.0268 | val_acc=84.17%




Epoch 137 | train_loss=1.0305 | val_acc=84.06%




Epoch 138 | train_loss=1.0426 | val_acc=83.95%




Epoch 139 | train_loss=1.0369 | val_acc=83.67%




Epoch 140 | train_loss=1.0352 | val_acc=83.81%




Epoch 141 | train_loss=1.0069 | val_acc=84.48%




Epoch 142 | train_loss=1.0119 | val_acc=84.78%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 143 | train_loss=1.0315 | val_acc=84.47%




Epoch 144 | train_loss=1.0370 | val_acc=84.20%




Epoch 145 | train_loss=1.0149 | val_acc=85.12%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 146 | train_loss=1.0142 | val_acc=83.97%




Epoch 147 | train_loss=1.0119 | val_acc=84.27%




Epoch 148 | train_loss=0.9927 | val_acc=85.17%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 149 | train_loss=0.9864 | val_acc=84.35%




Epoch 150 | train_loss=0.9830 | val_acc=84.25%




Epoch 151 | train_loss=1.0184 | val_acc=85.26%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 152 | train_loss=1.0110 | val_acc=82.80%




Epoch 153 | train_loss=0.9951 | val_acc=84.04%




Epoch 154 | train_loss=1.0067 | val_acc=83.36%




Epoch 155 | train_loss=1.0076 | val_acc=84.86%




Epoch 156 | train_loss=1.0397 | val_acc=85.27%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 157 | train_loss=1.0086 | val_acc=84.07%




Epoch 158 | train_loss=1.0073 | val_acc=84.45%




Epoch 159 | train_loss=0.9890 | val_acc=85.14%




Epoch 160 | train_loss=0.9720 | val_acc=84.44%




Epoch 161 | train_loss=0.9794 | val_acc=85.24%




Epoch 162 | train_loss=0.9752 | val_acc=84.33%




Epoch 163 | train_loss=0.9828 | val_acc=85.09%




Epoch 164 | train_loss=0.9711 | val_acc=84.80%




Epoch 165 | train_loss=0.9616 | val_acc=85.20%




Epoch 166 | train_loss=0.9978 | val_acc=84.93%




Epoch 167 | train_loss=1.0028 | val_acc=84.70%




Epoch 168 | train_loss=0.9627 | val_acc=84.39%




Epoch 169 | train_loss=0.9629 | val_acc=85.54%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 170 | train_loss=0.9857 | val_acc=85.57%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 171 | train_loss=0.9931 | val_acc=85.10%




Epoch 172 | train_loss=0.9812 | val_acc=85.01%




Epoch 173 | train_loss=1.0022 | val_acc=85.71%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 174 | train_loss=0.9326 | val_acc=84.61%




Epoch 175 | train_loss=0.9606 | val_acc=85.25%




Epoch 176 | train_loss=0.9497 | val_acc=85.30%




Epoch 177 | train_loss=0.9785 | val_acc=85.26%




Epoch 178 | train_loss=0.9525 | val_acc=85.57%




Epoch 179 | train_loss=0.9527 | val_acc=86.44%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 180 | train_loss=0.9725 | val_acc=85.45%




Epoch 181 | train_loss=0.9428 | val_acc=86.24%




Epoch 182 | train_loss=0.9516 | val_acc=85.86%




Epoch 183 | train_loss=0.9652 | val_acc=86.15%




Epoch 184 | train_loss=0.9560 | val_acc=86.11%




Epoch 185 | train_loss=0.9533 | val_acc=85.47%




Epoch 186 | train_loss=0.9449 | val_acc=85.19%




Epoch 187 | train_loss=0.9608 | val_acc=85.26%




Epoch 188 | train_loss=0.9583 | val_acc=86.32%




Epoch 189 | train_loss=0.9757 | val_acc=86.25%




Epoch 190 | train_loss=0.9761 | val_acc=85.98%




Epoch 191 | train_loss=0.9402 | val_acc=85.94%




Epoch 192 | train_loss=0.9226 | val_acc=85.92%




Epoch 193 | train_loss=0.9670 | val_acc=85.72%




Epoch 194 | train_loss=0.9736 | val_acc=85.73%




Epoch 195 | train_loss=0.9119 | val_acc=85.75%




Epoch 196 | train_loss=0.9575 | val_acc=86.05%




Epoch 197 | train_loss=0.9322 | val_acc=86.21%




Epoch 198 | train_loss=0.9118 | val_acc=85.60%




Epoch 199 | train_loss=0.9578 | val_acc=86.14%




Epoch 200 | train_loss=0.9536 | val_acc=86.98%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 201 | train_loss=0.9389 | val_acc=86.75%




Epoch 202 | train_loss=0.9172 | val_acc=87.05%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 203 | train_loss=0.9370 | val_acc=86.38%




Epoch 204 | train_loss=0.8980 | val_acc=86.96%




Epoch 205 | train_loss=0.9042 | val_acc=86.34%




Epoch 206 | train_loss=0.9195 | val_acc=86.26%




Epoch 207 | train_loss=0.9290 | val_acc=86.33%




Epoch 208 | train_loss=0.9345 | val_acc=86.54%




Epoch 209 | train_loss=0.8774 | val_acc=86.54%




Epoch 210 | train_loss=0.9221 | val_acc=86.43%




Epoch 211 | train_loss=0.9097 | val_acc=85.92%




Epoch 212 | train_loss=0.9072 | val_acc=87.27%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 213 | train_loss=0.8947 | val_acc=87.04%




Epoch 214 | train_loss=0.9056 | val_acc=86.73%




Epoch 215 | train_loss=0.8921 | val_acc=87.31%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 216 | train_loss=0.9066 | val_acc=87.41%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 217 | train_loss=0.8792 | val_acc=87.10%




Epoch 218 | train_loss=0.8736 | val_acc=86.54%




Epoch 219 | train_loss=0.8963 | val_acc=87.20%




Epoch 220 | train_loss=0.8573 | val_acc=86.25%




Epoch 221 | train_loss=0.8680 | val_acc=87.26%




Epoch 222 | train_loss=0.8823 | val_acc=86.86%




Epoch 223 | train_loss=0.8891 | val_acc=86.81%




Epoch 224 | train_loss=0.8646 | val_acc=86.85%




Epoch 225 | train_loss=0.8819 | val_acc=87.13%




Epoch 226 | train_loss=0.9055 | val_acc=87.11%




Epoch 227 | train_loss=0.9110 | val_acc=87.51%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 228 | train_loss=0.8599 | val_acc=87.38%




Epoch 229 | train_loss=0.8889 | val_acc=87.60%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 230 | train_loss=0.9021 | val_acc=88.09%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 231 | train_loss=0.8592 | val_acc=87.22%




Epoch 232 | train_loss=0.9372 | val_acc=87.48%




Epoch 233 | train_loss=0.8952 | val_acc=87.95%




Epoch 234 | train_loss=0.8646 | val_acc=88.10%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 235 | train_loss=0.8038 | val_acc=87.86%




Epoch 236 | train_loss=0.8544 | val_acc=87.61%




Epoch 237 | train_loss=0.8748 | val_acc=87.53%




Epoch 238 | train_loss=0.8663 | val_acc=87.62%




Epoch 239 | train_loss=0.8587 | val_acc=88.09%




Epoch 240 | train_loss=0.8738 | val_acc=87.69%




Epoch 241 | train_loss=0.8268 | val_acc=87.38%




Epoch 242 | train_loss=0.8742 | val_acc=88.04%




Epoch 243 | train_loss=0.8499 | val_acc=87.98%




Epoch 244 | train_loss=0.8652 | val_acc=87.65%




Epoch 245 | train_loss=0.8517 | val_acc=88.27%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 246 | train_loss=0.8558 | val_acc=87.80%




Epoch 247 | train_loss=0.8930 | val_acc=88.07%




Epoch 248 | train_loss=0.8704 | val_acc=88.36%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 249 | train_loss=0.8160 | val_acc=88.20%




Epoch 250 | train_loss=0.8469 | val_acc=87.82%




Epoch 251 | train_loss=0.8445 | val_acc=87.99%




Epoch 252 | train_loss=0.8466 | val_acc=88.46%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 253 | train_loss=0.8634 | val_acc=88.58%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 254 | train_loss=0.8495 | val_acc=88.57%




Epoch 255 | train_loss=0.8323 | val_acc=88.40%




Epoch 256 | train_loss=0.8765 | val_acc=88.46%




Epoch 257 | train_loss=0.8273 | val_acc=88.37%




Epoch 258 | train_loss=0.8071 | val_acc=88.14%




Epoch 259 | train_loss=0.8028 | val_acc=88.47%




Epoch 260 | train_loss=0.8486 | val_acc=88.32%




Epoch 261 | train_loss=0.8709 | val_acc=88.31%




Epoch 262 | train_loss=0.8499 | val_acc=88.29%




Epoch 263 | train_loss=0.8210 | val_acc=88.54%




Epoch 264 | train_loss=0.8374 | val_acc=88.21%




Epoch 265 | train_loss=0.8301 | val_acc=88.73%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 266 | train_loss=0.8936 | val_acc=88.59%




Epoch 267 | train_loss=0.8164 | val_acc=88.72%




Epoch 268 | train_loss=0.8313 | val_acc=88.94%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 269 | train_loss=0.8628 | val_acc=89.01%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 270 | train_loss=0.8424 | val_acc=89.18%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 271 | train_loss=0.8225 | val_acc=89.03%




Epoch 272 | train_loss=0.8570 | val_acc=88.81%




Epoch 273 | train_loss=0.8707 | val_acc=89.10%




Epoch 274 | train_loss=0.8086 | val_acc=88.98%




Epoch 275 | train_loss=0.8409 | val_acc=88.81%




Epoch 276 | train_loss=0.8077 | val_acc=88.94%




Epoch 277 | train_loss=0.8128 | val_acc=88.92%




Epoch 278 | train_loss=0.8529 | val_acc=88.91%




Epoch 279 | train_loss=0.8441 | val_acc=88.94%




Epoch 280 | train_loss=0.8535 | val_acc=88.87%




Epoch 281 | train_loss=0.8340 | val_acc=88.92%




Epoch 282 | train_loss=0.8333 | val_acc=88.75%




Epoch 283 | train_loss=0.8519 | val_acc=88.81%




Epoch 284 | train_loss=0.8298 | val_acc=89.01%




Epoch 285 | train_loss=0.8038 | val_acc=89.15%




Epoch 286 | train_loss=0.8537 | val_acc=88.95%




Epoch 287 | train_loss=0.8314 | val_acc=88.89%




Epoch 288 | train_loss=0.8552 | val_acc=88.93%




Epoch 289 | train_loss=0.8372 | val_acc=88.80%




Epoch 290 | train_loss=0.8442 | val_acc=88.88%




Epoch 291 | train_loss=0.8310 | val_acc=89.04%




Epoch 292 | train_loss=0.8446 | val_acc=89.13%




Epoch 293 | train_loss=0.8289 | val_acc=88.90%




Epoch 294 | train_loss=0.8367 | val_acc=89.10%




Epoch 295 | train_loss=0.8369 | val_acc=88.92%




Epoch 296 | train_loss=0.8154 | val_acc=88.98%




Epoch 297 | train_loss=0.8547 | val_acc=89.21%
  → New best! Saved to teacher_resnet50_best.pth




Epoch 298 | train_loss=0.8349 | val_acc=89.10%




Epoch 299 | train_loss=0.8583 | val_acc=89.03%




Epoch 300 | train_loss=0.8451 | val_acc=89.09%

✅ Training complete. Best teacher val_acc = 89.21%


#Part4

Loaded your single teacher.

Defined the student, optimizer & KD loss.

Trained the student for 100 epochs with KL+CE.

Evaluated and saved its best test accuracy.

In [19]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torchvision import datasets, transforms
from torchvision.models import resnet50, resnet34
from torch.utils.data import DataLoader
from tqdm import tqdm

# 1. Hyperparameters and device
device     = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
batch_size = 128
num_epochs = 200
tau        = 5      # distillation temperature
alpha      = 0.7    # weight on KL term
milestones = [60, 120, 160]
lr         = 0.1

# 2. Data preparation (same as in code 1)
mean = (0.4914, 0.4822, 0.4465)
std  = (0.2470, 0.2435, 0.2616)

train_transform = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])

test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])

train_set = datasets.CIFAR10(root='data/', train=True,  download=True, transform=train_transform)
test_set  = datasets.CIFAR10(root='data/', train=False, download=True, transform=test_transform)

train_loader = DataLoader(train_set,  batch_size=batch_size, shuffle=True,  num_workers=4, pin_memory=True)
test_loader  = DataLoader(test_set,   batch_size=batch_size, shuffle=False, num_workers=4, pin_memory=True)

# 3. Load pretrained teacher (ResNet-50) and freeze it
teacher = resnet50(num_classes=10).to(device)
# ← here we load the ResNet-50 checkpoint (teacher_1), not the ResNet-34 one
ckpt    = torch.load("teacher_1_resnet50_best.pth", map_location=device)
teacher.load_state_dict(ckpt['model_state'])
teacher.eval()
for p in teacher.parameters():
    p.requires_grad = False

# 4. Instantiate student (ResNet-34)
student = resnet34(num_classes=10).to(device)

# 5. Optimizer & scheduler
optimizer = optim.SGD(
    student.parameters(),
    lr=lr,
    momentum=0.9,
    weight_decay=5e-4
)
scheduler = optim.lr_scheduler.MultiStepLR(
    optimizer,
    milestones=milestones,
    gamma=0.2
)

# 6. Distillation loss
def kd_loss(student_logits, teacher_logits, targets):
    s = F.log_softmax(student_logits / tau, dim=1)
    t = F.softmax(teacher_logits / tau,   dim=1)
    loss_kl = F.kl_div(s, t, reduction='batchmean') * (tau * tau)
    loss_ce = F.cross_entropy(student_logits, targets)
    return alpha * loss_kl + (1 - alpha) * loss_ce

# 7. Training loop
best_acc = 0.0

for epoch in range(1, num_epochs + 1):
    student.train()
    running_loss = 0.0

    for inputs, targets in tqdm(train_loader, desc=f"Epoch {epoch}", leave=False):
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad()

        with torch.no_grad():
            teacher_logits = teacher(inputs)

        student_logits = student(inputs)
        loss = kd_loss(student_logits, teacher_logits, targets)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * inputs.size(0)

    scheduler.step()
    avg_loss = running_loss / len(train_loader.dataset)
    print(f"Epoch {epoch:03d} | train_loss={avg_loss:.4f}")

    # 8. Evaluation
    student.eval()
    correct = total = 0
    with torch.no_grad():
        for inputs, targets in test_loader:
            inputs, targets = inputs.to(device), targets.to(device)
            outputs = student(inputs)
            preds   = outputs.argmax(dim=1)
            correct += (preds == targets).sum().item()
            total   += targets.size(0)

    test_acc = 100. * correct / total
    print(f"          test_acc={test_acc:.2f}%")

    # 9. Save best
    if test_acc > best_acc:
        best_acc = test_acc
        torch.save({
            'epoch':           epoch,
            'model_state':     student.state_dict(),
            'optimizer_state': optimizer.state_dict(),
            'test_acc':        test_acc,
        }, "student_distilled_best.pth")
        print("  → New best saved.")

print(f"Finished. Best distilled student accuracy: {best_acc:.2f}%")


                                                          

Epoch 001 | train_loss=6.1549




          test_acc=26.44%
  → New best saved.


                                                          

Epoch 002 | train_loss=4.4776




          test_acc=34.61%
  → New best saved.


                                                          

Epoch 003 | train_loss=3.8858




          test_acc=40.24%
  → New best saved.


                                                          

Epoch 004 | train_loss=3.2646




          test_acc=49.80%
  → New best saved.


                                                          

Epoch 005 | train_loss=2.6387




          test_acc=55.17%
  → New best saved.


                                                          

Epoch 006 | train_loss=2.2478




          test_acc=60.07%
  → New best saved.


                                                          

Epoch 007 | train_loss=1.9292




          test_acc=56.21%


                                                          

Epoch 008 | train_loss=1.7324




          test_acc=64.70%
  → New best saved.


                                                          

Epoch 009 | train_loss=1.5815




          test_acc=68.23%
  → New best saved.




Epoch 010 | train_loss=1.4600
          test_acc=69.88%
  → New best saved.


                                                           

Epoch 011 | train_loss=1.4174




          test_acc=65.35%


                                                           

Epoch 012 | train_loss=1.3674




          test_acc=68.58%


                                                           

Epoch 013 | train_loss=1.3133




          test_acc=69.35%


                                                           

Epoch 014 | train_loss=1.2746




          test_acc=68.88%


                                                           

Epoch 015 | train_loss=1.2346




          test_acc=71.04%
  → New best saved.


                                                           

Epoch 016 | train_loss=1.2485




          test_acc=73.21%
  → New best saved.


                                                           

Epoch 017 | train_loss=1.1926




          test_acc=73.63%
  → New best saved.


                                                           

Epoch 018 | train_loss=1.1820




          test_acc=68.26%


                                                           

Epoch 019 | train_loss=1.1685




          test_acc=71.20%


                                                           

Epoch 020 | train_loss=1.1411




          test_acc=70.66%


                                                           

Epoch 021 | train_loss=1.1394




          test_acc=72.60%


                                                           

Epoch 022 | train_loss=1.1314




          test_acc=71.44%


                                                           

Epoch 023 | train_loss=1.1218




          test_acc=71.75%


                                                           

Epoch 024 | train_loss=1.1122




          test_acc=70.50%


                                                           

Epoch 025 | train_loss=1.0998




          test_acc=73.32%


                                                           

Epoch 026 | train_loss=1.0960




          test_acc=72.22%


                                                           

Epoch 027 | train_loss=1.0901




          test_acc=71.63%


                                                           

Epoch 028 | train_loss=1.0703




          test_acc=74.30%
  → New best saved.


                                                           

Epoch 029 | train_loss=1.0699




          test_acc=72.24%


                                                           

Epoch 030 | train_loss=1.0871




          test_acc=71.95%


                                                           

Epoch 031 | train_loss=1.0383




          test_acc=74.11%


                                                           

Epoch 032 | train_loss=1.0494




          test_acc=74.31%
  → New best saved.


                                                           

Epoch 033 | train_loss=1.0401




          test_acc=71.02%


                                                           

Epoch 034 | train_loss=1.0600




          test_acc=75.05%
  → New best saved.


                                                           

Epoch 035 | train_loss=1.0404




          test_acc=72.74%


                                                           

Epoch 036 | train_loss=1.0625




          test_acc=73.97%


                                                           

Epoch 037 | train_loss=1.0260




          test_acc=74.12%


                                                           

Epoch 038 | train_loss=1.0367




          test_acc=73.11%


                                                           

Epoch 039 | train_loss=1.0311




          test_acc=73.62%


                                                           

Epoch 040 | train_loss=1.0455




          test_acc=73.30%


                                                           

Epoch 041 | train_loss=1.0304




          test_acc=70.89%


                                                           

Epoch 042 | train_loss=1.0084




          test_acc=74.93%


                                                           

Epoch 043 | train_loss=1.0214




          test_acc=70.47%


                                                           

Epoch 044 | train_loss=1.0195




          test_acc=71.52%


                                                           

Epoch 045 | train_loss=1.0188




          test_acc=74.75%


                                                           

Epoch 046 | train_loss=1.0162




          test_acc=72.99%


                                                           

Epoch 047 | train_loss=1.0220




          test_acc=74.65%


                                                           

Epoch 048 | train_loss=1.0072




          test_acc=72.99%


                                                           

Epoch 049 | train_loss=1.0094




          test_acc=71.99%


                                                           

Epoch 050 | train_loss=0.9934




          test_acc=73.93%


                                                           

Epoch 051 | train_loss=1.0125




          test_acc=72.69%


                                                           

Epoch 052 | train_loss=1.0071




          test_acc=74.48%


                                                           

Epoch 053 | train_loss=1.0032




          test_acc=74.45%


                                                           

Epoch 054 | train_loss=0.9955




          test_acc=74.55%


                                                           

Epoch 055 | train_loss=1.0154




          test_acc=73.87%


                                                           

Epoch 056 | train_loss=0.9942




          test_acc=73.61%


                                                           

Epoch 057 | train_loss=1.0079




          test_acc=73.89%


                                                           

Epoch 058 | train_loss=0.9920




          test_acc=71.59%


                                                           

Epoch 059 | train_loss=0.9859




          test_acc=71.85%


                                                           

Epoch 060 | train_loss=0.9972




          test_acc=73.76%


                                                           

Epoch 061 | train_loss=0.6946




          test_acc=79.10%
  → New best saved.


                                                           

Epoch 062 | train_loss=0.6299




          test_acc=79.48%
  → New best saved.


                                                           

Epoch 063 | train_loss=0.6122




          test_acc=79.70%
  → New best saved.


                                                           

Epoch 064 | train_loss=0.5950




          test_acc=79.71%
  → New best saved.


                                                           

Epoch 065 | train_loss=0.5854




          test_acc=79.76%
  → New best saved.




Epoch 066 | train_loss=0.5789
          test_acc=79.67%


                                                           

Epoch 067 | train_loss=0.5722




          test_acc=79.66%


                                                           

Epoch 068 | train_loss=0.5669




          test_acc=79.60%


                                                           

Epoch 069 | train_loss=0.5691




          test_acc=79.91%
  → New best saved.


                                                           

Epoch 070 | train_loss=0.5710




          test_acc=79.83%


                                                           

Epoch 071 | train_loss=0.5705




          test_acc=79.33%


                                                           

Epoch 072 | train_loss=0.5679




          test_acc=79.25%


                                                           

Epoch 073 | train_loss=0.5726




          test_acc=79.28%


                                                           

Epoch 074 | train_loss=0.5764




          test_acc=78.82%


                                                           

Epoch 075 | train_loss=0.5749




          test_acc=78.85%


                                                           

Epoch 076 | train_loss=0.5774




          test_acc=79.60%


                                                           

Epoch 077 | train_loss=0.5833




          test_acc=79.46%


                                                           

Epoch 078 | train_loss=0.5797




          test_acc=79.63%


                                                           

Epoch 079 | train_loss=0.5793




          test_acc=79.04%


                                                           

Epoch 080 | train_loss=0.5875




          test_acc=78.97%


                                                           

Epoch 081 | train_loss=0.5908




          test_acc=78.55%


                                                           

Epoch 082 | train_loss=0.5968




          test_acc=78.94%


                                                           

Epoch 083 | train_loss=0.5847




          test_acc=78.75%


                                                           

Epoch 084 | train_loss=0.5949




          test_acc=79.11%


                                                           

Epoch 085 | train_loss=0.5903




          test_acc=78.50%


                                                           

Epoch 086 | train_loss=0.5809




          test_acc=79.62%


                                                           

Epoch 087 | train_loss=0.5932




          test_acc=78.99%


                                                           

Epoch 088 | train_loss=0.5973




          test_acc=79.05%


                                                           

Epoch 089 | train_loss=0.5891




          test_acc=78.30%


                                                           

Epoch 090 | train_loss=0.5857




          test_acc=79.09%


                                                           

Epoch 091 | train_loss=0.5861




          test_acc=77.87%


                                                           

Epoch 092 | train_loss=0.5870




          test_acc=78.64%


                                                           

Epoch 093 | train_loss=0.5864




          test_acc=78.75%


                                                           

Epoch 094 | train_loss=0.5858




          test_acc=78.25%


                                                           

Epoch 095 | train_loss=0.5895




          test_acc=78.61%


                                                           

Epoch 096 | train_loss=0.5921




          test_acc=78.17%


                                                           

Epoch 097 | train_loss=0.5874




          test_acc=78.74%


                                                           

Epoch 098 | train_loss=0.5897




          test_acc=78.50%


                                                           

Epoch 099 | train_loss=0.5802




          test_acc=79.24%


                                                            

Epoch 100 | train_loss=0.5862




          test_acc=78.88%


                                                            

Epoch 101 | train_loss=0.5842




          test_acc=79.58%


                                                            

Epoch 102 | train_loss=0.5784




          test_acc=78.67%


                                                            

Epoch 103 | train_loss=0.5770




          test_acc=78.43%


                                                            

Epoch 104 | train_loss=0.5828




          test_acc=78.42%


                                                            

Epoch 105 | train_loss=0.5829




          test_acc=79.08%


                                                            

Epoch 106 | train_loss=0.5783




          test_acc=78.05%


                                                            

Epoch 107 | train_loss=0.5743




          test_acc=79.13%


                                                            

Epoch 108 | train_loss=0.5741




          test_acc=79.45%


                                                            

Epoch 109 | train_loss=0.5757




          test_acc=77.92%


                                                            

Epoch 110 | train_loss=0.5829




          test_acc=78.85%


                                                            

Epoch 111 | train_loss=0.5809




          test_acc=78.45%


                                                            

Epoch 112 | train_loss=0.5808




          test_acc=78.91%


                                                            

Epoch 113 | train_loss=0.5721




          test_acc=78.17%


                                                            

Epoch 114 | train_loss=0.5818




          test_acc=78.03%


                                                            

Epoch 115 | train_loss=0.5710




          test_acc=78.56%


                                                            

Epoch 116 | train_loss=0.5730




          test_acc=78.42%


                                                            

Epoch 117 | train_loss=0.5717




          test_acc=77.96%


                                                            

Epoch 118 | train_loss=0.5691




          test_acc=77.47%


                                                            

Epoch 119 | train_loss=0.5693




          test_acc=77.93%


                                                            

Epoch 120 | train_loss=0.5687




          test_acc=77.91%


                                                            

Epoch 121 | train_loss=0.4701




          test_acc=80.43%
  → New best saved.


                                                            

Epoch 122 | train_loss=0.4391




          test_acc=80.39%


                                                            

Epoch 123 | train_loss=0.4344




          test_acc=80.49%
  → New best saved.


                                                            

Epoch 124 | train_loss=0.4244




          test_acc=80.49%


                                                            

Epoch 125 | train_loss=0.4210




          test_acc=80.61%
  → New best saved.


                                                            

Epoch 126 | train_loss=0.4183




          test_acc=80.43%


                                                            

Epoch 127 | train_loss=0.4150




          test_acc=80.75%
  → New best saved.


                                                            

Epoch 128 | train_loss=0.4123




          test_acc=80.56%


                                                            

Epoch 129 | train_loss=0.4080




          test_acc=80.42%


                                                            

Epoch 130 | train_loss=0.4043




          test_acc=80.73%


                                                            

Epoch 131 | train_loss=0.4044




          test_acc=80.55%


                                                            

Epoch 132 | train_loss=0.4023




          test_acc=80.72%


                                                            

Epoch 133 | train_loss=0.4002




          test_acc=80.47%


                                                            

Epoch 134 | train_loss=0.3985




          test_acc=80.31%


                                                            

Epoch 135 | train_loss=0.3977




          test_acc=80.65%


                                                            

Epoch 136 | train_loss=0.3969




          test_acc=80.53%


                                                            

Epoch 137 | train_loss=0.3968




          test_acc=80.68%


                                                            

Epoch 138 | train_loss=0.3922




          test_acc=80.40%


                                                            

Epoch 139 | train_loss=0.3952




          test_acc=80.26%


                                                            

Epoch 140 | train_loss=0.3907




          test_acc=80.51%


                                                            

Epoch 141 | train_loss=0.3927




          test_acc=80.57%


                                                            

Epoch 142 | train_loss=0.3901




          test_acc=80.41%


                                                            

Epoch 143 | train_loss=0.3885




          test_acc=80.54%


                                                            

Epoch 144 | train_loss=0.3878




          test_acc=80.52%


                                                            

Epoch 145 | train_loss=0.3877




          test_acc=80.41%


                                                            

Epoch 146 | train_loss=0.3858




          test_acc=80.30%


                                                            

Epoch 147 | train_loss=0.3865




          test_acc=80.48%


                                                            

Epoch 148 | train_loss=0.3864




          test_acc=80.52%


                                                            

Epoch 149 | train_loss=0.3831




          test_acc=80.48%


                                                            

Epoch 150 | train_loss=0.3830




          test_acc=80.34%


                                                            

Epoch 151 | train_loss=0.3841




          test_acc=80.35%


                                                            

Epoch 152 | train_loss=0.3820




          test_acc=80.30%


                                                            

Epoch 153 | train_loss=0.3825




          test_acc=80.28%


                                                            

Epoch 154 | train_loss=0.3849




          test_acc=80.47%


                                                            

Epoch 155 | train_loss=0.3794




          test_acc=80.83%
  → New best saved.


                                                            

Epoch 156 | train_loss=0.3818




          test_acc=80.55%


                                                            

Epoch 157 | train_loss=0.3811




          test_acc=80.47%


                                                            

Epoch 158 | train_loss=0.3816




          test_acc=80.34%


                                                            

Epoch 159 | train_loss=0.3802




          test_acc=80.06%


                                                            

Epoch 160 | train_loss=0.3797




          test_acc=80.31%


                                                            

Epoch 161 | train_loss=0.3588




          test_acc=80.58%


                                                            

Epoch 162 | train_loss=0.3545




          test_acc=80.35%


                                                            

Epoch 163 | train_loss=0.3523




          test_acc=80.57%


                                                            

Epoch 164 | train_loss=0.3530




          test_acc=80.75%


                                                            

Epoch 165 | train_loss=0.3502




          test_acc=80.49%


                                                            

Epoch 166 | train_loss=0.3475




          test_acc=80.64%


                                                            

Epoch 167 | train_loss=0.3508




          test_acc=80.56%


                                                            

Epoch 168 | train_loss=0.3449




          test_acc=80.78%


                                                            

Epoch 169 | train_loss=0.3448




          test_acc=80.65%


                                                            

Epoch 170 | train_loss=0.3461




          test_acc=80.73%


                                                            

Epoch 171 | train_loss=0.3467




          test_acc=80.56%


                                                            

Epoch 172 | train_loss=0.3429




          test_acc=80.75%


                                                            

Epoch 173 | train_loss=0.3461




          test_acc=80.63%


                                                            

Epoch 174 | train_loss=0.3442




          test_acc=80.51%


                                                            

Epoch 175 | train_loss=0.3427




          test_acc=80.68%


                                                            

Epoch 176 | train_loss=0.3439




          test_acc=80.67%


                                                            

Epoch 177 | train_loss=0.3427




          test_acc=80.72%


                                                            

Epoch 178 | train_loss=0.3411




          test_acc=80.71%


                                                            

Epoch 179 | train_loss=0.3402




          test_acc=80.69%


                                                            

Epoch 180 | train_loss=0.3418




          test_acc=80.49%


                                                            

Epoch 181 | train_loss=0.3423




          test_acc=80.66%


                                                            

Epoch 182 | train_loss=0.3390




          test_acc=80.53%


                                                            

Epoch 183 | train_loss=0.3400




          test_acc=80.57%


                                                            

Epoch 184 | train_loss=0.3385




          test_acc=80.47%


                                                            

Epoch 185 | train_loss=0.3391




          test_acc=80.61%


                                                            

Epoch 186 | train_loss=0.3377




          test_acc=80.33%


                                                            

Epoch 187 | train_loss=0.3385




          test_acc=80.45%


                                                            

Epoch 188 | train_loss=0.3376




          test_acc=80.62%


                                                            

Epoch 189 | train_loss=0.3365




          test_acc=80.70%


                                                            

Epoch 190 | train_loss=0.3392




          test_acc=80.84%
  → New best saved.


                                                            

Epoch 191 | train_loss=0.3371




          test_acc=80.75%


                                                            

Epoch 192 | train_loss=0.3375




          test_acc=80.76%


                                                            

Epoch 193 | train_loss=0.3351




          test_acc=80.63%


                                                            

Epoch 194 | train_loss=0.3352




          test_acc=80.44%


                                                            

Epoch 195 | train_loss=0.3357




          test_acc=80.63%


                                                            

Epoch 196 | train_loss=0.3345




          test_acc=80.45%


                                                            

Epoch 197 | train_loss=0.3362




          test_acc=80.55%


                                                            

Epoch 198 | train_loss=0.3342




          test_acc=80.54%


                                                            

Epoch 199 | train_loss=0.3336




          test_acc=80.73%


                                                            

Epoch 200 | train_loss=0.3339




          test_acc=80.93%
  → New best saved.
Finished. Best distilled student accuracy: 80.93%


# Part5
Explanation of steps:

Soft‑label fusion (5.1):

We iterate over each of the
𝐾
K teachers in teacher_models, do a forward pass on the same inputs, take each teacher’s
𝜏
τ-softmax, and accumulate.

Dividing by
𝐾
K yields the averaged soft‑label
𝑝
a
v
g
(
𝜏
)
p
avg
(τ)
​
 .

Student training (5.2):

Compute the KL divergence between the student’s log-​softmax and
𝑝
a
v
g
(
𝜏
)
p
avg
(τ)
​
 , scaled by
𝜏
2
τ
2
 .

Add the usual cross‑entropy on the hard labels.

Backpropagate and optimizer-​step.

Evaluation & checkpointing (5.3):

After each epoch, switch student.eval(), measure test accuracy against the full test set, and if it improves, save "student_multi_teacher_best.pth".

Explanation of steps:

Soft‑label fusion (5.1):

We iterate over each of the
𝐾
K teachers in teacher_models, do a forward pass on the same inputs, take each teacher’s
𝜏
τ-softmax, and accumulate.

Dividing by
𝐾
K yields the averaged soft‑label
𝑝
a
v
g
(
𝜏
)
p
avg
(τ)
​
 .

Student training (5.2):

Compute the KL divergence between the student’s log-​softmax and
𝑝
a
v
g
(
𝜏
)
p
avg
(τ)
​
 , scaled by
𝜏
2
τ
2
 .

Add the usual cross‑entropy on the hard labels.

Backpropagate and optimizer-​step.

Evaluation & checkpointing (5.3):

After each epoch, switch student.eval(), measure test accuracy against the full test set, and if it improves, save "student_multi_teacher_best.pth".

In [25]:
import os
import glob
import torch
import torch.nn.functional as F
from torchvision.models import resnet18, resnet34, resnet50

# ───────────────────────────────────────────────────────────────
# 0️⃣ Basic setup (you already have these)
device       = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
train_loader = train_loader    # your CIFAR‑10 train DataLoader
test_loader  = test_loader     # your CIFAR‑10 test  DataLoader
student      = student.to(device)                # ResNet‑34 student from code4
optimizer    = optimizer                       # your SGD optimizer on `student`
scheduler    = scheduler                       # your MultiStepLR on `student`
tau, alpha   = 5, 0.7                          # distillation hyperparams
best_acc     = 0.0
num_epochs   = 200
# ───────────────────────────────────────────────────────────────

# 1️⃣ Discover the teacher checkpoint files created by code3
ckpt_paths = sorted(glob.glob("teacher_*_*_best.pth"))
if not ckpt_paths:
    raise FileNotFoundError("No teacher checkpoints found matching teacher_*_*_best.pth")

# 2️⃣ Load & freeze each teacher
teacher_models = []
for ckpt_path in ckpt_paths:
    fname = os.path.basename(ckpt_path)           # e.g. "teacher_0_resnet34_best.pth"
    parts = fname.split("_")
    arch  = parts[2]                              # "resnet34"
    # pick the correct constructor
    if   arch == "resnet18": ctor = resnet18
    elif arch == "resnet34": ctor = resnet34
    elif arch == "resnet50": ctor = resnet50
    else: raise ValueError(f"Unexpected arch in {fname}")

    # instantiate, load weights, move to device
    model = ctor(num_classes=10).to(device)
    data  = torch.load(ckpt_path, map_location=device)
    model.load_state_dict(data['model_state'])
    model.eval()                                  # inference only
    for p in model.parameters():
        p.requires_grad = False                   # no grads
    teacher_models.append(model)

K = len(teacher_models)
print(f"Loaded {K} teacher models from {len(ckpt_paths)} checkpoints")

# 3️⃣ Multi‑teacher distillation loop
for epoch in range(1, num_epochs+1):
    student.train()
    running_loss = 0.0

    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad()

        # — fuse K teachers’ τ‑softmax outputs —
        with torch.no_grad():
            # dynamic class count C
            C = teacher_models[0](inputs).size(1)
            sum_soft = torch.zeros(inputs.size(0), C, device=device)

            for t_model in teacher_models:
                logits_t = t_model(inputs)
                sum_soft += F.softmax(logits_t / tau, dim=1)

            p_avg = sum_soft / float(K)

        # — student forward + KD + CE loss —
        logits_s = student(inputs)
        s_soft   = F.log_softmax(logits_s / tau, dim=1)
        loss_kl  = F.kl_div(s_soft, p_avg, reduction='batchmean') * (tau * tau)
        loss_ce  = F.cross_entropy(logits_s, targets)
        loss     = alpha * loss_kl + (1 - alpha) * loss_ce

        loss.backward()
        optimizer.step()
        running_loss += loss.item() * inputs.size(0)

    scheduler.step()
    train_loss = running_loss / len(train_loader.dataset)
    print(f"[Epoch {epoch:03d}] train_loss={train_loss:.4f}")

    # — evaluation —
    student.eval()
    correct = total = 0
    with torch.no_grad():
        for inputs, targets in test_loader:
            inputs, targets = inputs.to(device), targets.to(device)
            preds = student(inputs).argmax(dim=1)
            correct += (preds == targets).sum().item()
            total   += targets.size(0)

    test_acc = 100. * correct / total
    print(f"           test_acc={test_acc:.2f}%")

    # — save best —
    if test_acc > best_acc:
        best_acc = test_acc
        torch.save({
            'epoch':          epoch,
            'student_state':  student.state_dict(),
            'optimizer':      optimizer.state_dict(),
            'test_acc':       test_acc,
        }, "student_multi_teacher_best.pth")
        print("  → New best multi‑teacher student saved.")

print(f"\n✅ Multi‑teacher distillation complete. Best acc: {best_acc:.2f}%")


Loaded 5 teacher models from 5 checkpoints
[Epoch 001] train_loss=0.4575
           test_acc=82.09%
  → New best multi‑teacher student saved.
[Epoch 002] train_loss=0.3838
           test_acc=82.56%
  → New best multi‑teacher student saved.
[Epoch 003] train_loss=0.3579
           test_acc=82.78%
  → New best multi‑teacher student saved.
[Epoch 004] train_loss=0.3404
           test_acc=83.14%
  → New best multi‑teacher student saved.
[Epoch 005] train_loss=0.3266
           test_acc=83.29%
  → New best multi‑teacher student saved.
[Epoch 006] train_loss=0.3167
           test_acc=83.27%
[Epoch 007] train_loss=0.3107
           test_acc=83.29%
[Epoch 008] train_loss=0.3016
           test_acc=83.16%
[Epoch 009] train_loss=0.2982
           test_acc=83.42%
  → New best multi‑teacher student saved.
[Epoch 010] train_loss=0.2912
           test_acc=83.29%
[Epoch 011] train_loss=0.2859
           test_acc=83.45%
  → New best multi‑teacher student saved.
[Epoch 012] train_loss=0.2829
      

In [26]:
# —— 单教师蒸馏 —— 取 teacher_models[0] 作为唯一教师
import torch
import torch.nn.functional as F

student = student.to(device)
teacher = teacher_models[0].to(device)
teacher.eval()
for p in teacher.parameters(): p.requires_grad = False

best_acc = 0.0
num_epochs = 200

for epoch in range(1, num_epochs+1):
    student.train()
    running_loss = 0.0

    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad()

        # 1️⃣ 单教师 forward
        with torch.no_grad():
            logits_t = teacher(inputs)                   # [B, C]
            p_soft   = F.softmax(logits_t / tau, dim=1)  # [B, C]

        # 2️⃣ 学生 forward + KD 损失
        logits_s = student(inputs)                      # [B, C]
        s_soft   = F.log_softmax(logits_s / tau, dim=1)
        loss_kl  = F.kl_div(s_soft, p_soft, reduction='batchmean') * (tau*tau)
        loss_ce  = F.cross_entropy(logits_s, targets)
        loss     = alpha * loss_kl + (1 - alpha) * loss_ce

        loss.backward()
        optimizer.step()
        running_loss += loss.item() * inputs.size(0)

    scheduler.step()
    train_loss = running_loss / len(train_loader.dataset)
    print(f"[Single] Epoch {epoch:03d}/{num_epochs}   train_loss={train_loss:.4f}")

    # 3️⃣ 验证
    student.eval()
    correct = total = 0
    with torch.no_grad():
        for inputs, targets in test_loader:
            inputs, targets = inputs.to(device), targets.to(device)
            preds = student(inputs).argmax(dim=1)
            correct += (preds==targets).sum().item()
            total   += targets.size(0)
    acc = 100.*correct/total
    print(f"          test_acc={acc:.2f}%")

    if acc > best_acc:
        best_acc = acc
        torch.save({
            'epoch':         epoch,
            'student_state': student.state_dict(),
            'optimizer':     optimizer.state_dict(),
            'test_acc':      acc,
        }, "student_single_teacher_best.pth")
        print("  → New best single‑teacher student saved.")


[Single] Epoch 001/200   train_loss=0.5630
          test_acc=82.86%
  → New best single‑teacher student saved.
[Single] Epoch 002/200   train_loss=0.4938
          test_acc=82.48%
[Single] Epoch 003/200   train_loss=0.4694
          test_acc=82.11%
[Single] Epoch 004/200   train_loss=0.4551
          test_acc=82.09%
[Single] Epoch 005/200   train_loss=0.4441
          test_acc=81.96%
[Single] Epoch 006/200   train_loss=0.4323
          test_acc=81.86%
[Single] Epoch 007/200   train_loss=0.4279
          test_acc=81.93%
[Single] Epoch 008/200   train_loss=0.4206
          test_acc=82.08%
[Single] Epoch 009/200   train_loss=0.4127
          test_acc=81.86%
[Single] Epoch 010/200   train_loss=0.4095
          test_acc=81.77%
[Single] Epoch 011/200   train_loss=0.4059
          test_acc=81.87%
[Single] Epoch 012/200   train_loss=0.4009
          test_acc=81.84%
[Single] Epoch 013/200   train_loss=0.4003
          test_acc=81.59%
[Single] Epoch 014/200   train_loss=0.3978
          test_ac

# 下面给出修改后的「单教师蒸馏」那一段，把之前的 teacher_models[0] 全部替换成直接加载 teacher_resnet50_best.pth：

In [21]:
import os, torch
import torch.nn.functional as F
from torchvision.models import resnet50

# ───── 0️⃣ Setup ─────
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
save_dir = "checkpoints/single_teacher"
os.makedirs(save_dir, exist_ok=True)

# 假设 student, train_loader, test_loader, optimizer, scheduler, tau, alpha 已提前定义好
student = student.to(device)

# ───── 1️⃣ 把 teacher_resnet50_best.pth 当作唯一教师 ─────
teacher_ckpt = "teacher_resnet50_best.pth"
teacher = resnet50(num_classes=10).to(device)
ck = torch.load(teacher_ckpt, map_location=device)
teacher.load_state_dict(ck['model_state'])
teacher.eval()
for p in teacher.parameters():
    p.requires_grad = False

best_acc = 0.0
num_epochs = 200

# ───── 2️⃣ 蒸馏训练 ─────
for epoch in range(1, num_epochs+1):
    student.train()
    running_loss = 0.0

    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad()

        # —— 教师 forward 取软标签 ——
        with torch.no_grad():
            logits_t = teacher(inputs)                    # [B, C]
            p_soft   = F.softmax(logits_t / tau, dim=1)   # [B, C]

        # —— 学生 forward + KD 损失 ——
        logits_s = student(inputs)                       # [B, C]
        s_soft   = F.log_softmax(logits_s / tau, dim=1)
        loss_kl  = F.kl_div(s_soft, p_soft, reduction='batchmean') * (tau*tau)
        loss_ce  = F.cross_entropy(logits_s, targets)
        loss     = alpha * loss_kl + (1 - alpha) * loss_ce

        loss.backward()
        optimizer.step()
        running_loss += loss.item() * inputs.size(0)

    scheduler.step()
    train_loss = running_loss / len(train_loader.dataset)
    print(f"[Single] Epoch {epoch:03d}/{num_epochs}  train_loss={train_loss:.4f}")

    # —— 验证 ——
    student.eval()
    correct = total = 0
    with torch.no_grad():
        for inputs, targets in test_loader:
            inputs, targets = inputs.to(device), targets.to(device)
            preds = student(inputs).argmax(dim=1)
            correct += (preds == targets).sum().item()
            total   += targets.size(0)
    acc = 100. * correct / total
    print(f"          test_acc={acc:.2f}%")

    # —— 保存最优 ——
    if acc > best_acc:
        best_acc = acc
        ckpt_name = f"student_single_tau{tau}_alpha{alpha}_best.pth"
        ckpt_path = os.path.join(save_dir, ckpt_name)
        torch.save({
            'epoch':         epoch,
            'student_state': student.state_dict(),
            'optimizer':     optimizer.state_dict(),
            'test_acc':      acc,
            'tau':           tau,
            'alpha':         alpha,
        }, ckpt_path)
        print(f"  → New best single‑teacher student saved to {ckpt_path}")

print(f"\n✅ Single‑teacher distillation complete. Best acc = {best_acc:.2f}%")


[Single] Epoch 001/200  train_loss=3.6901
          test_acc=83.78%
  → New best single‑teacher student saved to checkpoints/single_teacher/student_single_tau5_alpha0.9_best.pth
[Single] Epoch 002/200  train_loss=3.6928
          test_acc=83.87%
  → New best single‑teacher student saved to checkpoints/single_teacher/student_single_tau5_alpha0.9_best.pth
[Single] Epoch 003/200  train_loss=3.6938
          test_acc=83.86%
[Single] Epoch 004/200  train_loss=3.6896
          test_acc=83.84%
[Single] Epoch 005/200  train_loss=3.6929
          test_acc=83.90%
  → New best single‑teacher student saved to checkpoints/single_teacher/student_single_tau5_alpha0.9_best.pth
[Single] Epoch 006/200  train_loss=3.6928
          test_acc=83.93%
  → New best single‑teacher student saved to checkpoints/single_teacher/student_single_tau5_alpha0.9_best.pth
[Single] Epoch 007/200  train_loss=3.6913
          test_acc=83.85%
[Single] Epoch 008/200  train_loss=3.6925
          test_acc=83.78%
[Single] Epoch 0

In [27]:
# —— 硬标签基线 —— 只用 CrossEntropy
import torch

student = student.to(device)
best_acc = 0.0
num_epochs = 200

for epoch in range(1, num_epochs+1):
    student.train()
    running_loss = 0.0

    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad()

        # 1️⃣ 仅学生前向
        logits_s = student(inputs)                   # [B, C]
        loss     = torch.nn.functional.cross_entropy(logits_s, targets)

        loss.backward()
        optimizer.step()
        running_loss += loss.item() * inputs.size(0)

    scheduler.step()
    train_loss = running_loss / len(train_loader.dataset)
    print(f"[Hard ] Epoch {epoch:03d}/{num_epochs}   train_loss={train_loss:.4f}")

    # 2️⃣ 验证
    student.eval()
    correct = total = 0
    with torch.no_grad():
        for inputs, targets in test_loader:
            inputs, targets = inputs.to(device), targets.to(device)
            preds = student(inputs).argmax(dim=1)
            correct += (preds==targets).sum().item()
            total   += targets.size(0)
    acc = 100.*correct/total
    print(f"          test_acc={acc:.2f}%")

    if acc > best_acc:
        best_acc = acc
        torch.save({
            'epoch':         epoch,
            'student_state': student.state_dict(),
            'optimizer':     optimizer.state_dict(),
            'test_acc':      acc,
        }, "student_hard_label_best.pth")
        print("  → New best hard‑label student saved.")


[Hard ] Epoch 001/200   train_loss=0.3215
          test_acc=81.20%
  → New best hard‑label student saved.
[Hard ] Epoch 002/200   train_loss=0.2625
          test_acc=82.40%
  → New best hard‑label student saved.
[Hard ] Epoch 003/200   train_loss=0.2328
          test_acc=82.57%
  → New best hard‑label student saved.
[Hard ] Epoch 004/200   train_loss=0.2144
          test_acc=82.46%
[Hard ] Epoch 005/200   train_loss=0.1957
          test_acc=82.57%
[Hard ] Epoch 006/200   train_loss=0.1895
          test_acc=83.24%
  → New best hard‑label student saved.
[Hard ] Epoch 007/200   train_loss=0.1810
          test_acc=83.00%
[Hard ] Epoch 008/200   train_loss=0.1696
          test_acc=83.18%
[Hard ] Epoch 009/200   train_loss=0.1654
          test_acc=83.07%
[Hard ] Epoch 010/200   train_loss=0.1545
          test_acc=83.73%
  → New best hard‑label student saved.
[Hard ] Epoch 011/200   train_loss=0.1456
          test_acc=83.44%
[Hard ] Epoch 012/200   train_loss=0.1472
          test_

summary:
Multi‑teacher distillation complete. Best acc: 84.44%
—— 单教师蒸馏 —— 取 teacher_models[0] 作为唯一教师 complete. Best acc: 82.86%
Hard label ‑teacher distillation complete. Best acc:85.51%


# Part6

Explanation:

We loop over each desired teacher‑count
𝐾
K.

For each
(
𝜏
,
𝛼
)
(τ,α) pair, we:

Instantiate a fresh ResNet34 student.

Fuse the first
𝐾
K teachers’
𝜏
τ‑softmax outputs each batch.

Train 200 epochs with KL+CE loss.

Evaluate on the test set and print/store the final accuracy.

All results accumulate in the results list for later comparison or tabulation.

In [17]:
# 只跑之前（Part4）跑过最好的{'τ': 5,  'α': 0.9}✖️「2,4,6」共三个组合
import os, glob
import torch
import torch.nn.functional as F
from torchvision import datasets, transforms
from torchvision.models import resnet34, resnet18, resnet50
from torch.utils.data import DataLoader
import pandas as pd

# ───── 0️⃣ Setup device & data loaders ─────
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
mean = (0.4914, 0.4822, 0.4465)
std  = (0.2470, 0.2435, 0.2616)
train_transform = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])
test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])
train_loader = DataLoader(
    datasets.CIFAR10(root='data/', train=True,  download=True, transform=train_transform),
    batch_size=128, shuffle=True,  num_workers=4, pin_memory=True
)
test_loader = DataLoader(
    datasets.CIFAR10(root='data/', train=False, download=True, transform=test_transform),
    batch_size=128, shuffle=False, num_workers=4, pin_memory=True
)

# ───── 1️⃣ Load your pretrained teachers ─────
ckpt_paths = sorted(glob.glob("teacher_*_*_best.pth"))
if not ckpt_paths:
    raise FileNotFoundError("No teacher checkpoints found")
all_teacher_models = []
for path in ckpt_paths:
    arch = os.path.basename(path).split("_")[2]
    ctor = {"resnet18":resnet18,"resnet34":resnet34,"resnet50":resnet50}[arch]
    m = ctor(num_classes=10).to(device)
    ck = torch.load(path, map_location=device)
    m.load_state_dict(ck['model_state'])
    m.eval()
    for p in m.parameters(): p.requires_grad=False
    all_teacher_models.append(m)
print(f"🔍 Loaded {len(all_teacher_models)} teachers")

# ───── 2️⃣ 拿到你之前跑好的 (τ,α) 组合 ─────
rev_results = [
     {'τ': 5,  'α': 0.9},

]
df_rev = pd.DataFrame(rev_results)
combos = df_rev[['τ','α']].to_dict('records')  # list of dicts [{'τ':1,'α':0.1}, ...]

# ───── 3️⃣ 只对 K=2,4,6 做叠加 ─────
Ks = [2,4,6]
num_epochs = 200
final_results = []

for K in Ks:
    teachers = all_teacher_models[:K]
    for combo in combos:
        tau, alpha = combo['τ'], combo['α']

        # fresh student & optimizer/scheduler
        student   = resnet34(num_classes=10).to(device)
        optimizer = torch.optim.SGD(student.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
        scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[100,150], gamma=0.1)

        # training
        for epoch in range(1, num_epochs+1):
            student.train()
            run_loss = 0.0
            for xb, yb in train_loader:
                xb, yb = xb.to(device), yb.to(device)
                optimizer.zero_grad()
                # fuse K teachers
                with torch.no_grad():
                    C = teachers[0](xb).size(1)
                    sum_soft = torch.zeros(xb.size(0), C, device=device)
                    for tm in teachers:
                        sum_soft += F.softmax(tm(xb)/tau, dim=1)
                    p_avg = sum_soft / K
                # distill+CE
                s_logits = student(xb)
                s_soft   = F.log_softmax(s_logits/tau, dim=1)
                l_kl     = F.kl_div(s_soft, p_avg, reduction='batchmean') * (tau*tau)
                l_ce     = F.cross_entropy(s_logits, yb)
                loss     = alpha*l_kl + (1-alpha)*l_ce
                loss.backward()
                optimizer.step()
                run_loss += loss.item()*xb.size(0)
            scheduler.step()
            # 每 50 epochs 打印一次
            if epoch in (1,50,100,150,num_epochs):
                print(f"[K={K} τ={tau} α={alpha}] Epoch {epoch}/{num_epochs} loss={run_loss/len(train_loader.dataset):.4f}")

        # evaluation
        student.eval()
        correct=total=0
        with torch.no_grad():
            for xb,yb in test_loader:
                xb,yb = xb.to(device), yb.to(device)
                preds = student(xb).argmax(1)
                correct += (preds==yb).sum().item()
                total   += yb.size(0)
        acc = 100.*correct/total

        final_results.append({'K':K,'τ':tau,'α':alpha,'test_acc':acc})
        print(f"→ [K={K} τ={tau} α={alpha}] test_acc={acc:.2f}%\n")

# ───── 4️⃣ 保存结果 ─────
df_out = pd.DataFrame(final_results)
df_out.to_csv("multi_teacher_246_sweep.csv", index=False)
print("✅ Done! ")



🔍 Loaded 5 teachers
[K=2 τ=5 α=0.9] Epoch 1/200 loss=7.0123
[K=2 τ=5 α=0.9] Epoch 50/200 loss=0.7304
[K=2 τ=5 α=0.9] Epoch 100/200 loss=0.6725
[K=2 τ=5 α=0.9] Epoch 150/200 loss=0.3162
[K=2 τ=5 α=0.9] Epoch 200/200 loss=0.2322
→ [K=2 τ=5 α=0.9] test_acc=81.18%

[K=4 τ=5 α=0.9] Epoch 1/200 loss=6.6786


KeyboardInterrupt: 

In [18]:
# 只跑之前（Part4）跑过最好的{'τ': 5,  'α': 0.9}✖️「2,4,6」共三个组合
import os, glob
import torch
import torch.nn.functional as F
from torchvision import datasets, transforms
from torchvision.models import resnet34, resnet18, resnet50
from torch.utils.data import DataLoader
import pandas as pd

# ───── 0️⃣ Setup device & data loaders ─────
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
mean = (0.4914, 0.4822, 0.4465)
std  = (0.2470, 0.2435, 0.2616)
train_transform = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])
test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])
train_loader = DataLoader(
    datasets.CIFAR10(root='data/', train=True,  download=True, transform=train_transform),
    batch_size=128, shuffle=True,  num_workers=4, pin_memory=True
)
test_loader = DataLoader(
    datasets.CIFAR10(root='data/', train=False, download=True, transform=test_transform),
    batch_size=128, shuffle=False, num_workers=4, pin_memory=True
)

# ───── 1️⃣ Load your pretrained teachers ─────
ckpt_paths = sorted(glob.glob("teacher_*_*_best.pth"))
if not ckpt_paths:
    raise FileNotFoundError("No teacher checkpoints found")
all_teacher_models = []
for path in ckpt_paths:
    arch = os.path.basename(path).split("_")[2]
    ctor = {"resnet18":resnet18,"resnet34":resnet34,"resnet50":resnet50}[arch]
    m = ctor(num_classes=10).to(device)
    ck = torch.load(path, map_location=device)
    m.load_state_dict(ck['model_state'])
    m.eval()
    for p in m.parameters(): p.requires_grad=False
    all_teacher_models.append(m)
print(f"🔍 Loaded {len(all_teacher_models)} teachers")

# ───── 2️⃣ 拿到你之前跑好的 (τ,α) 组合 ─────
rev_results = [
     {'τ': 5,  'α': 0.9},

]
df_rev = pd.DataFrame(rev_results)
combos = df_rev[['τ','α']].to_dict('records')  # list of dicts [{'τ':1,'α':0.1}, ...]

# ───── 3️⃣ 只对 K=2,4,6 做叠加 ─────
Ks = [4,6]
num_epochs = 200
final_results = []

for K in Ks:
    teachers = all_teacher_models[:K]
    for combo in combos:
        tau, alpha = combo['τ'], combo['α']

        # fresh student & optimizer/scheduler
        student   = resnet34(num_classes=10).to(device)
        optimizer = torch.optim.SGD(student.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
        scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[100,150], gamma=0.1)

        # training
        for epoch in range(1, num_epochs+1):
            student.train()
            run_loss = 0.0
            for xb, yb in train_loader:
                xb, yb = xb.to(device), yb.to(device)
                optimizer.zero_grad()
                # fuse K teachers
                with torch.no_grad():
                    C = teachers[0](xb).size(1)
                    sum_soft = torch.zeros(xb.size(0), C, device=device)
                    for tm in teachers:
                        sum_soft += F.softmax(tm(xb)/tau, dim=1)
                    p_avg = sum_soft / K
                # distill+CE
                s_logits = student(xb)
                s_soft   = F.log_softmax(s_logits/tau, dim=1)
                l_kl     = F.kl_div(s_soft, p_avg, reduction='batchmean') * (tau*tau)
                l_ce     = F.cross_entropy(s_logits, yb)
                loss     = alpha*l_kl + (1-alpha)*l_ce
                loss.backward()
                optimizer.step()
                run_loss += loss.item()*xb.size(0)
            scheduler.step()
            # 每 50 epochs 打印一次
            if epoch in (1,50,100,150,num_epochs):
                print(f"[K={K} τ={tau} α={alpha}] Epoch {epoch}/{num_epochs} loss={run_loss/len(train_loader.dataset):.4f}")

        # evaluation
        student.eval()
        correct=total=0
        with torch.no_grad():
            for xb,yb in test_loader:
                xb,yb = xb.to(device), yb.to(device)
                preds = student(xb).argmax(1)
                correct += (preds==yb).sum().item()
                total   += yb.size(0)
        acc = 100.*correct/total

        final_results.append({'K':K,'τ':tau,'α':alpha,'test_acc':acc})
        print(f"→ [K={K} τ={tau} α={alpha}] test_acc={acc:.2f}%\n")

# ───── 4️⃣ 保存结果 ─────
df_out = pd.DataFrame(final_results)
df_out.to_csv("multi_teacher_246_sweep.csv", index=False)
print("✅ Done! ")



🔍 Loaded 5 teachers
[K=4 τ=5 α=0.9] Epoch 1/200 loss=6.1676
[K=4 τ=5 α=0.9] Epoch 50/200 loss=0.5977
[K=4 τ=5 α=0.9] Epoch 100/200 loss=0.5673
[K=4 τ=5 α=0.9] Epoch 150/200 loss=0.2341
[K=4 τ=5 α=0.9] Epoch 200/200 loss=0.1658
→ [K=4 τ=5 α=0.9] test_acc=82.98%

[K=6 τ=5 α=0.9] Epoch 1/200 loss=2.5948
[K=6 τ=5 α=0.9] Epoch 50/200 loss=-2.9111
[K=6 τ=5 α=0.9] Epoch 100/200 loss=-2.9126
[K=6 τ=5 α=0.9] Epoch 150/200 loss=-3.2254
[K=6 τ=5 α=0.9] Epoch 200/200 loss=-3.2822
→ [K=6 τ=5 α=0.9] test_acc=84.07%

✅ Done! 


# 因为上面k=6的时候loss< 0（代码出错，因为只有五个checkpoint，之前却写了六个）所以运行以下代码（k=5）来测试一下。

In [19]:
# 只跑之前（Part4）跑过最好的{'τ': 5,  'α': 0.9}✖️「2,4,6」共三个组合
import os, glob
import torch
import torch.nn.functional as F
from torchvision import datasets, transforms
from torchvision.models import resnet34, resnet18, resnet50
from torch.utils.data import DataLoader
import pandas as pd

# ───── 0️⃣ Setup device & data loaders ─────
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
mean = (0.4914, 0.4822, 0.4465)
std  = (0.2470, 0.2435, 0.2616)
train_transform = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])
test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])
train_loader = DataLoader(
    datasets.CIFAR10(root='data/', train=True,  download=True, transform=train_transform),
    batch_size=128, shuffle=True,  num_workers=4, pin_memory=True
)
test_loader = DataLoader(
    datasets.CIFAR10(root='data/', train=False, download=True, transform=test_transform),
    batch_size=128, shuffle=False, num_workers=4, pin_memory=True
)

# ───── 1️⃣ Load your pretrained teachers ─────
ckpt_paths = sorted(glob.glob("teacher_*_*_best.pth"))
if not ckpt_paths:
    raise FileNotFoundError("No teacher checkpoints found")
all_teacher_models = []
for path in ckpt_paths:
    arch = os.path.basename(path).split("_")[2]
    ctor = {"resnet18":resnet18,"resnet34":resnet34,"resnet50":resnet50}[arch]
    m = ctor(num_classes=10).to(device)
    ck = torch.load(path, map_location=device)
    m.load_state_dict(ck['model_state'])
    m.eval()
    for p in m.parameters(): p.requires_grad=False
    all_teacher_models.append(m)
print(f"🔍 Loaded {len(all_teacher_models)} teachers")

# ───── 2️⃣ 拿到你之前跑好的 (τ,α) 组合 ─────
rev_results = [
     {'τ': 5,  'α': 0.9},

]
df_rev = pd.DataFrame(rev_results)
combos = df_rev[['τ','α']].to_dict('records')  # list of dicts [{'τ':1,'α':0.1}, ...]

# ───── 3️⃣ 只对 K=5 做叠加 ─────
Ks = [5]
num_epochs = 200
final_results = []

for K in Ks:
    teachers = all_teacher_models[:K]
    for combo in combos:
        tau, alpha = combo['τ'], combo['α']

        # fresh student & optimizer/scheduler
        student   = resnet34(num_classes=10).to(device)
        optimizer = torch.optim.SGD(student.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
        scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[100,150], gamma=0.1)

        # training
        for epoch in range(1, num_epochs+1):
            student.train()
            run_loss = 0.0
            for xb, yb in train_loader:
                xb, yb = xb.to(device), yb.to(device)
                optimizer.zero_grad()
                # fuse K teachers
                with torch.no_grad():
                    C = teachers[0](xb).size(1)
                    sum_soft = torch.zeros(xb.size(0), C, device=device)
                    for tm in teachers:
                        sum_soft += F.softmax(tm(xb)/tau, dim=1)
                    p_avg = sum_soft / K
                # distill+CE
                s_logits = student(xb)
                s_soft   = F.log_softmax(s_logits/tau, dim=1)
                l_kl     = F.kl_div(s_soft, p_avg, reduction='batchmean') * (tau*tau)
                l_ce     = F.cross_entropy(s_logits, yb)
                loss     = alpha*l_kl + (1-alpha)*l_ce
                loss.backward()
                optimizer.step()
                run_loss += loss.item()*xb.size(0)
            scheduler.step()
            # 每 50 epochs 打印一次
            if epoch in (1,50,100,150,num_epochs):
                print(f"[K={K} τ={tau} α={alpha}] Epoch {epoch}/{num_epochs} loss={run_loss/len(train_loader.dataset):.4f}")

        # evaluation
        student.eval()
        correct=total=0
        with torch.no_grad():
            for xb,yb in test_loader:
                xb,yb = xb.to(device), yb.to(device)
                preds = student(xb).argmax(1)
                correct += (preds==yb).sum().item()
                total   += yb.size(0)
        acc = 100.*correct/total

        final_results.append({'K':K,'τ':tau,'α':alpha,'test_acc':acc})
        print(f"→ [K={K} τ={tau} α={alpha}] test_acc={acc:.2f}%\n")

# ───── 4️⃣ 保存结果 ─────
df_out = pd.DataFrame(final_results)
df_out.to_csv("multi_teacher_246_sweep.csv", index=False)
print("✅ Done! ")



🔍 Loaded 5 teachers
[K=5 τ=5 α=0.9] Epoch 1/200 loss=7.7488
[K=5 τ=5 α=0.9] Epoch 50/200 loss=0.6674
[K=5 τ=5 α=0.9] Epoch 100/200 loss=0.6132
[K=5 τ=5 α=0.9] Epoch 150/200 loss=0.2405
[K=5 τ=5 α=0.9] Epoch 200/200 loss=0.1672
→ [K=5 τ=5 α=0.9] test_acc=83.78%

✅ Done! 


In [None]:
# Part 6 遍历所有36个超参组合
import os, glob
import torch
import torch.nn.functional as F
from torchvision import datasets, transforms
from torchvision.models import resnet34, resnet18, resnet50
from torch.utils.data import DataLoader
import pandas as pd

# ───── 0️⃣ Setup device & data loaders ─────
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
mean = (0.4914, 0.4822, 0.4465)
std  = (0.2470, 0.2435, 0.2616)
train_transform = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])
test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])
train_loader = DataLoader(
    datasets.CIFAR10(root='data/', train=True,  download=True, transform=train_transform),
    batch_size=128, shuffle=True,  num_workers=4, pin_memory=True
)
test_loader = DataLoader(
    datasets.CIFAR10(root='data/', train=False, download=True, transform=test_transform),
    batch_size=128, shuffle=False, num_workers=4, pin_memory=True
)

# ───── 1️⃣ Load your pretrained teachers ─────
ckpt_paths = sorted(glob.glob("teacher_*_*_best.pth"))
if not ckpt_paths:
    raise FileNotFoundError("No teacher checkpoints found")
all_teacher_models = []
for path in ckpt_paths:
    arch = os.path.basename(path).split("_")[2]
    ctor = {"resnet18":resnet18,"resnet34":resnet34,"resnet50":resnet50}[arch]
    m = ctor(num_classes=10).to(device)
    ck = torch.load(path, map_location=device)
    m.load_state_dict(ck['model_state'])
    m.eval()
    for p in m.parameters(): p.requires_grad=False
    all_teacher_models.append(m)
print(f"🔍 Loaded {len(all_teacher_models)} teachers")

# ───── 2️⃣ 拿到你之前跑好的 (τ,α) 组合 ─────
rev_results = [
    {'τ': 1,  'α': 0.1}, {'τ': 1,  'α': 0.5}, {'τ': 1,  'α': 0.9},
    {'τ': 5,  'α': 0.1}, {'τ': 5,  'α': 0.5}, {'τ': 5,  'α': 0.9},
    {'τ': 10, 'α': 0.1}, {'τ': 10, 'α': 0.5}, {'τ': 10, 'α': 0.9},
    {'τ': 20, 'α': 0.1}, {'τ': 20, 'α': 0.5}, {'τ': 20, 'α': 0.9},
]
df_rev = pd.DataFrame(rev_results)
combos = df_rev[['τ','α']].to_dict('records')  # list of dicts [{'τ':1,'α':0.1}, ...]

# ───── 3️⃣ 只对 K=2,4,6 做叠加 ─────
Ks = [2,4,6]
num_epochs = 200
final_results = []

for K in Ks:
    teachers = all_teacher_models[:K]
    for combo in combos:
        tau, alpha = combo['τ'], combo['α']

        # fresh student & optimizer/scheduler
        student   = resnet34(num_classes=10).to(device)
        optimizer = torch.optim.SGD(student.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
        scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[100,150], gamma=0.1)

        # training
        for epoch in range(1, num_epochs+1):
            student.train()
            run_loss = 0.0
            for xb, yb in train_loader:
                xb, yb = xb.to(device), yb.to(device)
                optimizer.zero_grad()
                # fuse K teachers
                with torch.no_grad():
                    C = teachers[0](xb).size(1)
                    sum_soft = torch.zeros(xb.size(0), C, device=device)
                    for tm in teachers:
                        sum_soft += F.softmax(tm(xb)/tau, dim=1)
                    p_avg = sum_soft / K
                # distill+CE
                s_logits = student(xb)
                s_soft   = F.log_softmax(s_logits/tau, dim=1)
                l_kl     = F.kl_div(s_soft, p_avg, reduction='batchmean') * (tau*tau)
                l_ce     = F.cross_entropy(s_logits, yb)
                loss     = alpha*l_kl + (1-alpha)*l_ce
                loss.backward()
                optimizer.step()
                run_loss += loss.item()*xb.size(0)
            scheduler.step()
            # 每 50 epochs 打印一次
            if epoch in (1,50,100,150,num_epochs):
                print(f"[K={K} τ={tau} α={alpha}] Epoch {epoch}/{num_epochs} loss={run_loss/len(train_loader.dataset):.4f}")

        # evaluation
        student.eval()
        correct=total=0
        with torch.no_grad():
            for xb,yb in test_loader:
                xb,yb = xb.to(device), yb.to(device)
                preds = student(xb).argmax(1)
                correct += (preds==yb).sum().item()
                total   += yb.size(0)
        acc = 100.*correct/total

        final_results.append({'K':K,'τ':tau,'α':alpha,'test_acc':acc})
        print(f"→ [K={K} τ={tau} α={alpha}] test_acc={acc:.2f}%\n")

# ───── 4️⃣ 保存结果 ─────
df_out = pd.DataFrame(final_results)
df_out.to_csv("multi_teacher_246_sweep.csv", index=False)
print("✅ Done! ")



In [None]:
# Part 6 遍历所有36个超参组合
import os, glob
import torch
import torch.nn.functional as F
from torchvision import datasets, transforms
from torchvision.models import resnet18, resnet34, resnet50
from torch.utils.data import DataLoader
from itertools import product
import pandas as pd

# ───── 0️⃣ Setup device & data loaders ─────
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# CIFAR‑10 transforms (same as in code 4)
mean = (0.4914, 0.4822, 0.4465)
std  = (0.2470, 0.2435, 0.2616)
train_transform = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])
test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])

batch_size = 128
train_loader = DataLoader(
    datasets.CIFAR10(root='data/', train=True,  download=True, transform=train_transform),
    batch_size=batch_size, shuffle=True,  num_workers=4, pin_memory=True
)
test_loader = DataLoader(
    datasets.CIFAR10(root='data/', train=False, download=True, transform=test_transform),
    batch_size=batch_size, shuffle=False, num_workers=4, pin_memory=True
)

# ───── 1️⃣ Build all_teacher_models from your code 3 checkpoints ─────
ckpt_paths = sorted(glob.glob("teacher_*_*_best.pth"))
if not ckpt_paths:
    raise FileNotFoundError("No teacher checkpoints found matching teacher_*_*_best.pth")

all_teacher_models = []
for path in ckpt_paths:
    arch = os.path.basename(path).split("_")[2]  # e.g. 'resnet34'
    if   arch=="resnet18": ctor = resnet18
    elif arch=="resnet34": ctor = resnet34
    elif arch=="resnet50": ctor = resnet50
    else: raise ValueError(f"Unexpected arch {arch} in {path}")
    m = ctor(num_classes=10).to(device)
    ck = torch.load(path, map_location=device)
    m.load_state_dict(ck['model_state'])
    m.eval()
    for p in m.parameters(): p.requires_grad = False
    all_teacher_models.append(m)

print(f"🔍  Loaded {len(all_teacher_models)} teachers from disk")

# ───── 2️⃣ Hyperparameters & sweep setup ─────
taus       = [1, 5, 10, 20]
alphas     = [0.1, 0.5, 0.9]
Ks         = [2, 4, 8]
num_epochs = 200

results = []

# ───── 3️⃣ Multi‑teacher KD sweep ─────
for K in Ks:
    teachers = all_teacher_models[:K]
    for tau, alpha in product(taus, alphas):
        # fresh student + optimizer/scheduler
        student   = resnet34(num_classes=10).to(device)
        optimizer = torch.optim.SGD(
            student.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4
        )
        scheduler = torch.optim.lr_scheduler.MultiStepLR(
            optimizer, milestones=[100,150], gamma=0.1
        )

        # training loop
        for epoch in range(1, num_epochs+1):
            student.train()
            run_loss = 0.0

            for xb, yb in train_loader:
                xb, yb = xb.to(device), yb.to(device)
                optimizer.zero_grad()
                # fuse K teachers
                with torch.no_grad():
                    C = teachers[0](xb).size(1)
                    sum_soft = torch.zeros(xb.size(0), C, device=device)
                    for tm in teachers:
                        sum_soft += F.softmax(tm(xb)/tau, dim=1)
                    p_avg = sum_soft / K
                # student forward + loss
                s_logits = student(xb)
                s_soft   = F.log_softmax(s_logits/tau, dim=1)
                l_kl     = F.kl_div(s_soft, p_avg, reduction='batchmean') * (tau*tau)
                l_ce     = F.cross_entropy(s_logits, yb)
                loss     = alpha*l_kl + (1-alpha)*l_ce
                loss.backward()
                optimizer.step()
                run_loss += loss.item()*xb.size(0)

            scheduler.step()

            # ── print periodic epoch info ──
            if epoch == 1 or epoch == num_epochs or epoch % 50 == 0:
                avg_loss = run_loss / len(train_loader.dataset)
                print(f"[K={K}, τ={tau}, α={alpha}] Epoch {epoch:03d}/{num_epochs} ― train_loss={avg_loss:.4f}")

        # evaluation
        student.eval()
        correct = total = 0
        with torch.no_grad():
            for xb,yb in test_loader:
                xb,yb = xb.to(device), yb.to(device)
                preds = student(xb).argmax(1)
                correct += (preds==yb).sum().item()
                total   += yb.size(0)
        acc = 100.*correct/total

        results.append({'K':K, 'tau':tau, 'alpha':alpha, 'test_acc':acc})
        print(f"→ K={K}, τ={tau}, α={alpha}  →  test_acc={acc:.2f}%\n")

# ───── Done ─────
df_results = pd.DataFrame(results)
df_results.to_csv("multi_teacher_sweep.csv", index=False)
print("✅ Sweep complete, saved to multi_teacher_sweep.csv")


# 新的 code 6
新的 code 6 正是在新的 code 3（把强化训练后的 teacher checkpoints 存到 `checkpoints/single_teacher/`）的基础上，来检验用「真正变强」的 teacher 模型去蒸馏对 student 的效果有多大提升。换句话说：

1. **新的 code 3** 让你的 ResNet‑50 teacher 真正跑到 ∼90%+ 的水平，并把 checkpoint 全都存到 `checkpoints/single_teacher/` 目录下，和旧的 `teacher_*_*_best.pth` 不会互相覆盖。  
2. **rerun code 6** 就是在这批真实好用的 teacher 上，用先前在 Part 4 找到的最佳 \{τ=5, α=0.9\}，分别测试 K=2,4,6 三种老师数量对 student 最终精度的影响。  
3. 这样做的意义就是：  
   - **不用再跑 36 个** (K×τ×α) 组合，直接锁定最有可能打赢硬标签的「黄金超参」；  
   - 确保蒸馏所用的 teacher 真的是你刚刚 mixup+label‐smoothing 训练出来、在验证集跑到 ∼89–90%+ 的那几位，而不是旧 teacher；  
   - 快速对比：K=2、4、6 这三个点，student 能提升到什么水平，从而判断要不要大规模扩充 teacher 数量。  

如果你不把 code 6 重新跑一遍，它只会加载旧有的、只有 ∼76% 验证精度的 teachers──那当然拿不到好的 student 蒸馏效果。  

所以 **下一步** 就是用新的、90%+ 的 teacher checkpoints，跑一遍 code 6，看看在 \{τ=5,α=0.9\} 下 K=2、4、6 时，student 最终能到几几％，再决定要不要把 K 再往上拉，或者混入不同超参做更细致的 sweep。

In [22]:
import os, glob
import torch
import torch.nn.functional as F
from torchvision import datasets, transforms
from torchvision.models import resnet34, resnet18, resnet50
from torch.utils.data import DataLoader
import pandas as pd

# ───── 0️⃣ Setup device & data loaders ─────
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
mean = (0.4914, 0.4822, 0.4465)
std  = (0.2470, 0.2435, 0.2616)
train_transform = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])
test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])
train_loader = DataLoader(
    datasets.CIFAR10(root='data/', train=True, download=True, transform=train_transform),
    batch_size=128, shuffle=True, num_workers=4, pin_memory=True
)
test_loader = DataLoader(
    datasets.CIFAR10(root='data/', train=False, download=True, transform=test_transform),
    batch_size=128, shuffle=False, num_workers=4, pin_memory=True
)

# ───── 1️⃣ Load pretrained teachers ─────
ckpt_paths = sorted(glob.glob("teacher_*_*_best.pth"))
if not ckpt_paths:
    raise FileNotFoundError("No teacher checkpoints found")
all_teacher_models = []
for path in ckpt_paths:
    arch = os.path.basename(path).split("_")[2]  # 'resnet34' 等
    ctor = {"resnet18": resnet18, "resnet34": resnet34, "resnet50": resnet50}[arch]
    m = ctor(num_classes=10).to(device)
    ck = torch.load(path, map_location=device)
    m.load_state_dict(ck['model_state'])
    m.eval()
    for p in m.parameters(): p.requires_grad = False
    all_teacher_models.append(m)
print(f"🔍 Loaded {len(all_teacher_models)} teachers")

# ───── 2️⃣ 固定 (τ,α) 仅这一组 ─────
tau, alpha = 5, 0.9

# ───── 3️⃣ 只对 K = 2,4,6 做多教师蒸馏 ─────
Ks = [2, 4, 6]
num_epochs = 200
final_results = []

for K in Ks:
    teachers = all_teacher_models[:K]
    print(f"\n=== Multi‑teacher KD with K={K}, τ={tau}, α={alpha} ===")
    # 注意：在每个 K 下都要 fresh 一个 student
    student = None
    for epoch in [1, 50, 100, 150, num_epochs]:
        # 第一次循环前才 new model
        if student is None:
            student = resnet34(num_classes=10).to(device)
            optimizer = torch.optim.SGD(
                student.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4
            )
            scheduler = torch.optim.lr_scheduler.MultiStepLR(
                optimizer, milestones=[100,150], gamma=0.1
            )

        # 从上次的 epoch+1 跑到当前 epoch
        start = 1 if epoch==1 else prev_epoch+1
        run_loss = 0.0
        student.train()
        for e in range(start, epoch+1):
            for xb, yb in train_loader:
                xb, yb = xb.to(device), yb.to(device)
                optimizer.zero_grad()
                # fuse K teachers
                with torch.no_grad():
                    C = teachers[0](xb).size(1)
                    sum_soft = torch.zeros(xb.size(0), C, device=device)
                    for tm in teachers:
                        sum_soft += F.softmax(tm(xb)/tau, dim=1)
                    p_avg = sum_soft / K
                # distill + CE
                logits_s = student(xb)
                s_soft   = F.log_softmax(logits_s/tau, dim=1)
                l_kl     = F.kl_div(s_soft, p_avg, reduction='batchmean') * (tau*tau)
                l_ce     = F.cross_entropy(logits_s, yb)
                loss     = alpha*l_kl + (1-alpha)*l_ce
                loss.backward()
                optimizer.step()
                run_loss += loss.item()*xb.size(0)
            scheduler.step()

        prev_epoch = epoch
        avg_loss = run_loss / len(train_loader.dataset)
        print(f"[K={K} τ={tau} α={alpha}] Epoch {epoch}/{num_epochs}  train_loss={avg_loss:.4f}")

    # ── 最终评估 ──
    student.eval()
    correct = total = 0
    with torch.no_grad():
        for xb, yb in test_loader:
            xb, yb = xb.to(device), yb.to(device)
            preds = student(xb).argmax(1)
            correct += (preds==yb).sum().item()
            total   += yb.size(0)
    acc = 100.*correct/total

    final_results.append({'K':K, 'τ':tau, 'α':alpha, 'test_acc':acc})
    print(f"→ [K={K} τ={tau} α={alpha}] test_acc={acc:.2f}%\n")

# ───── 4️⃣ 保存结果 ─────
df_out = pd.DataFrame(final_results)
df_out.to_csv("multi_teacher_246_only.csv", index=False)
print("✅ Done! Results saved to multi_teacher_246_only.csv")


🔍 Loaded 5 teachers

=== Multi‑teacher KD with K=2, τ=5, α=0.9 ===
[K=2 τ=5 α=0.9] Epoch 1/200  train_loss=5.9960
[K=2 τ=5 α=0.9] Epoch 50/200  train_loss=44.7222
[K=2 τ=5 α=0.9] Epoch 100/200  train_loss=31.8692
[K=2 τ=5 α=0.9] Epoch 150/200  train_loss=14.3353
[K=2 τ=5 α=0.9] Epoch 200/200  train_loss=10.4413
→ [K=2 τ=5 α=0.9] test_acc=81.04%


=== Multi‑teacher KD with K=4, τ=5, α=0.9 ===
[K=4 τ=5 α=0.9] Epoch 1/200  train_loss=6.2748
[K=4 τ=5 α=0.9] Epoch 50/200  train_loss=46.6638
[K=4 τ=5 α=0.9] Epoch 100/200  train_loss=29.6004
[K=4 τ=5 α=0.9] Epoch 150/200  train_loss=12.4816
[K=4 τ=5 α=0.9] Epoch 200/200  train_loss=8.9397
→ [K=4 τ=5 α=0.9] test_acc=83.03%


=== Multi‑teacher KD with K=6, τ=5, α=0.9 ===
[K=6 τ=5 α=0.9] Epoch 1/200  train_loss=2.3555
[K=6 τ=5 α=0.9] Epoch 50/200  train_loss=-128.3479
[K=6 τ=5 α=0.9] Epoch 100/200  train_loss=-145.7046
[K=6 τ=5 α=0.9] Epoch 150/200  train_loss=-160.6823
[K=6 τ=5 α=0.9] Epoch 200/200  train_loss=-163.7050
→ [K=6 τ=5 α=0.9] test_a

# part7：Results Visualization & Analysis

In [None]:
import os
import pandas as pd
import matplotlib.pyplot as plt

# 1. Load the three CSVs into DataFrames
for fn in ("baseline.csv","single_teacher.csv","multi_teacher.csv"):
    if not os.path.isfile(fn):
        raise FileNotFoundError(f"Required file not found: {fn}")

df_base   = pd.read_csv("baseline.csv"       ).rename(columns={'test_acc':'acc_base'})
df_single = pd.read_csv("single_teacher.csv" ).rename(columns={'test_acc':'acc_single'})
df_multi  = pd.read_csv("multi_teacher.csv"  ).rename(columns={'test_acc':'acc_multi'})

# 2. Merge on the grid indices (tau, alpha, K)
df = df_base.merge(df_single, on=['tau','alpha','K']) \
            .merge(df_multi,  on=['tau','alpha','K'])

# 3. Compute Δ‑columns
df['Δ_single'] = df['acc_single'] - df['acc_base']
df['Δ_multi']  = df['acc_multi']  - df['acc_base']

# 4. Quantitative tables
print("\n=== Full Accuracy Comparison ===")
print(df[['tau','alpha','K','acc_base','acc_single','acc_multi']]
      .sort_values(['K','alpha','tau'])
      .to_string(index=False))

print("\n=== Δ-Single‑Teacher Accuracy Pivot ===")
pivot_single = df.pivot_table(index=['tau','alpha'], columns='K', values='Δ_single')
print(pivot_single)

print("\n=== Δ-Multi‑Teacher Accuracy Pivot ===")
pivot_multi  = df.pivot_table(index=['tau','alpha'], columns='K', values='Δ_multi')
print(pivot_multi)

# 5. Save combined table (optional)
df.to_csv("combined_results.csv", index=False)


# 6. Visualization: Accuracy vs Temperature for each (α, K)
for alpha in sorted(df['alpha'].unique()):
    for K in sorted(df['K'].unique()):
        sub = df[(df.alpha==alpha)&(df.K==K)].sort_values('tau')
        plt.figure()
        plt.plot(sub.tau, sub.acc_base,   'k--',   label='Hard‑label baseline')
        plt.plot(sub.tau, sub.acc_single, 'b-o',   label='Single‑teacher KD')
        plt.plot(sub.tau, sub.acc_multi,  'r-s',   label=f'Multi‑teacher KD (K={K})')
        plt.title(f'Accuracy vs τ  (α={alpha}, K={K})')
        plt.xlabel('Temperature τ')
        plt.ylabel('Test Accuracy (%)')
        plt.legend()
        plt.grid(True)
        plt.savefig(f"acc_vs_tau_alpha{alpha}_K{K}.png")
        plt.close()

# 7. Visualization: Δ-Multi vs K at best (τ, α)
best_idx    = df['acc_multi'].idxmax()
best_tau    = df.at[best_idx, 'tau']
best_alpha  = df.at[best_idx, 'alpha']
best_slice  = df[(df.tau==best_tau)&(df.alpha==best_alpha)].sort_values('K')

plt.figure()
plt.bar(best_slice.K.astype(str), best_slice['Δ_multi'], color='skyblue')
plt.title(f'Δ‑Multi‑Teacher Accuracy vs K  (τ={best_tau}, α={best_alpha})')
plt.xlabel('Number of Teachers K')
plt.ylabel('Δ‑Accuracy (%)')
plt.grid(axis='y')
plt.savefig("delta_vs_K.png")
plt.close()

print(f"\nPlotted:\n - Accuracy vs τ curves for each (α,K)\n - Δ‑Accuracy vs K at best τ={best_tau}, α={best_alpha}\n")
