# Лабораторная работа № 6 – Проведение исследований с моделями классификации

---

## 1. Выбор начальных условий

Используем EuroSAT RGB – датасет спутниковых изображений, отснятых Sentinel-2, содержащий 10 классов наземного покрытия (лес, вода, жилые постройки и т.д.), всего 27000 изображений размером 64x64.  

Практическая задача: автоматический мониторинг землепользования из космоса для экологии, планирования и сельского хозяйства.

Датасет: <https://github.com/phelber/EuroSAT>


### Метрики

* Accuracy — проста в интерпретации и подходит при равном количестве примеров классов.  
* Macro F1 score — усреднённая F1-мера по классам; чувствительна к ошибкам в миноритарных классах.  
* Cohen's — учитывает вероятность случайного угадывания и полезна при несбалансированных классах.



In [1]:
!pip -q install torch torchvision torchaudio timm tqdm scikit-learn

In [2]:
import torch, torchvision
from torchvision import transforms
from torchvision.datasets import EuroSAT
from torch.utils.data import DataLoader, random_split
import timm, torch.nn as nn, torch.optim as optim
from sklearn.metrics import accuracy_score, f1_score, cohen_kappa_score
from tqdm.auto import tqdm
device = torch.device('cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu')
print('Using device:', device)

Using device: cuda


### Загрузка и разделение датасета

In [3]:
BATCH_SIZE = 64
IMG_SIZE = 224

base_transform = transforms.Compose([
    transforms.Resize((IMG_SIZE, IMG_SIZE)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406],
                         [0.229, 0.224, 0.225]),
])

dataset = EuroSAT(root='data', download=True, transform=base_transform)

train_len = int(0.7 * len(dataset))
val_len   = int(0.15 * len(dataset))
test_len  = len(dataset) - train_len - val_len
train_ds, val_ds, test_ds = random_split(dataset, [train_len, val_len, test_len],
                                         generator=torch.Generator().manual_seed(42))

PIN_MEMORY = device.type == 'cuda'

train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True, num_workers=2, pin_memory=PIN_MEMORY)
val_loader   = DataLoader(val_ds,   batch_size=BATCH_SIZE, shuffle=False, num_workers=2, pin_memory=PIN_MEMORY)
test_loader  = DataLoader(test_ds,  batch_size=BATCH_SIZE, shuffle=False, num_workers=2, pin_memory=PIN_MEMORY)

NUM_CLASSES = len(dataset.classes)
NUM_CLASSES

100%|██████████| 94.3M/94.3M [00:00<00:00, 210MB/s]


10

### Вспомогательные функции обучения и оценки

In [4]:
def train_one_epoch(model, loader, criterion, optimizer):
    model.train()
    running_loss = 0
    for x, y in loader:
        x, y = x.to(device), y.to(device)
        optimizer.zero_grad()
        preds = model(x)
        loss = criterion(preds, y)
        loss.backward()
        optimizer.step()
        running_loss += loss.item() * x.size(0)
    return running_loss / len(loader.dataset)

@torch.no_grad()
def evaluate(model, loader):
    model.eval()
    y_true, y_pred = [], []
    for x, y in loader:
        x = x.to(device)
        logits = model(x)
        y_hat = logits.argmax(1).cpu()
        y_true.extend(y.numpy())
        y_pred.extend(y_hat.numpy())
    acc = accuracy_score(y_true, y_pred)
    f1  = f1_score(y_true, y_pred, average='macro')
    k   = cohen_kappa_score(y_true, y_pred)
    return acc, f1, k

def fit(model, epochs=3, lr=1e-3):
    model.to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.AdamW(model.parameters(), lr=lr)
    best_acc = 0
    for epoch in range(epochs):
        loss = train_one_epoch(model, train_loader, criterion, optimizer)
        acc, f1, k = evaluate(model, val_loader)
        if acc > best_acc:
            best_acc = acc
            torch.save(model.state_dict(), 'best.pt')
        print(f'Epoch {epoch+1}/{epochs}: loss={loss:.4f}, val_acc={acc:.4f}, val_f1={f1:.4f}')
    model.load_state_dict(torch.load('best.pt'))
    return model

## 2. Бейзлайн: модели из torchvision

In [5]:
baseline_results = {}

resnet = torchvision.models.resnet18(weights='IMAGENET1K_V1')
resnet.fc = nn.Linear(resnet.fc.in_features, NUM_CLASSES)
resnet = fit(resnet, epochs=3, lr=3e-4)
baseline_results['ResNet18'] = evaluate(resnet, test_loader)
print('ResNet18 test:', baseline_results['ResNet18'])

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 143MB/s]


Epoch 1/3: loss=0.2128, val_acc=0.9027, val_f1=0.9024
Epoch 2/3: loss=0.0986, val_acc=0.9378, val_f1=0.9352
Epoch 3/3: loss=0.0807, val_acc=0.9642, val_f1=0.9633
ResNet18 test: (0.9674074074074074, 0.9661639249991417, np.float64(0.9637262787436672))


In [8]:
vit = timm.create_model('vit_small_patch16_224', pretrained=True, num_classes=NUM_CLASSES)
transform = transforms.Resize((128, 128))
vit = fit(vit, epochs=3, lr=5e-5)
baseline_results['ViT-B16'] = evaluate(vit, test_loader)
print('ViT-B16 test:', baseline_results['ViT-B16'])

Epoch 1/3: loss=0.1695, val_acc=0.9810, val_f1=0.9806
Epoch 2/3: loss=0.0327, val_acc=0.9800, val_f1=0.9791
Epoch 3/3: loss=0.0216, val_acc=0.9837, val_f1=0.9826
ViT-B16 test: (0.9864197530864197, 0.9856256925808424, np.float64(0.9848843515478424))


### Итоги бейзлайна
`baseline_results` содержит кортежи: accuracy, macro F1, κ - для каждой модели.

## 3. Улучшение бейзлайна

Гипотезы:  
1. Сильные аугментации (ColorJitter, RandomRotation, RandomHorizontalFlip) улучшат обобщение.  
2. Более современные модели (EfficientNetV2, DeiT-Small) дадут лучший bias-variance баланс.  
3. OneCycleLR ускорит и стабилизирует обучение.

In [11]:
aug_transform = transforms.Compose([
    transforms.RandomResizedCrop(IMG_SIZE, scale=(0.8, 1.2)),
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(20),
    transforms.ColorJitter(0.2, 0.2, 0.2, 0.1),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406],
                         [0.229, 0.224, 0.225]),
])

aug_dataset = EuroSAT(root='data', download=False, transform=aug_transform)
train_ds_aug, _, _ = random_split(aug_dataset, [train_len, val_len, test_len],
                                  generator=torch.Generator().manual_seed(42))
train_loader_aug = DataLoader(train_ds_aug, batch_size=BATCH_SIZE, shuffle=True, num_workers=2, pin_memory=True)

def fit_onecycle(model, epochs=5, max_lr=1e-3):
    model.to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.AdamW(model.parameters(), lr=max_lr)
    scheduler = optim.lr_scheduler.OneCycleLR(optimizer, max_lr=max_lr,
                                              steps_per_epoch=len(train_loader_aug),
                                              epochs=epochs)
    best_acc = 0
    for epoch in range(epochs):
        model.train()
        for x, y in tqdm(train_loader_aug, leave=False):
            x, y = x.to(device), y.to(device)
            optimizer.zero_grad()
            loss = criterion(model(x), y)
            loss.backward()
            optimizer.step()
            scheduler.step()
        acc, f1, k = evaluate(model, val_loader)
        if acc > best_acc:
            best_acc = acc
            torch.save(model.state_dict(), 'best.pt')
        print(f'Epoch {epoch+1}: val_acc={acc:.4f}')
    model.load_state_dict(torch.load('best.pt'))
    return model

In [12]:
improved_results = {}

effnet = timm.create_model('efficientnetv2_rw_s', pretrained=True, num_classes=NUM_CLASSES)
effnet = fit_onecycle(effnet, epochs=5, max_lr=1e-4)
improved_results['EffNetV2-S'] = evaluate(effnet, test_loader)
print('Improved EffNetV2-S:', improved_results['EffNetV2-S'])

Epoch 1: val_acc=0.8504


Epoch 2: val_acc=0.9442


Epoch 3: val_acc=0.9593


Epoch 4: val_acc=0.9679


Epoch 5: val_acc=0.9679
Improved EffNetV2-S: (0.96, 0.9595400854322612, np.float64(0.9554848699104521))


In [13]:
deit = timm.create_model('deit_small_patch16_224', pretrained=True, num_classes=NUM_CLASSES)
deit = fit_onecycle(deit, epochs=5, max_lr=5e-5)
improved_results['DeiT-S'] = evaluate(deit, test_loader)
print('Improved DeiT-S:', improved_results['DeiT-S'])

Epoch 1: val_acc=0.9373


Epoch 2: val_acc=0.9348


Epoch 3: val_acc=0.9820


Epoch 4: val_acc=0.9827


Epoch 5: val_acc=0.9852
Improved DeiT-S: (0.9839506172839506, 0.983661636245506, np.float64(0.9821395338991646))


Сравним бейзлайн и улучшенный бейзлайн далее.

## 4. Собственная имплементация моделей

### 4.a Свёрточная сеть SimpleCNN

In [14]:
class SimpleCNN(nn.Module):
    def __init__(self, num_classes):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 32, 3, padding=1), nn.BatchNorm2d(32), nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 64, 3, padding=1), nn.BatchNorm2d(64), nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Conv2d(64, 128, 3, padding=1), nn.BatchNorm2d(128), nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Conv2d(128, 256, 3, padding=1), nn.BatchNorm2d(256), nn.ReLU(),
            nn.AdaptiveAvgPool2d(1),
        )
        self.classifier = nn.Linear(256, num_classes)

    def forward(self, x):
        x = self.features(x)
        x = torch.flatten(x, 1)
        return self.classifier(x)

### 4.b Лёгкий Vision Transformer TinyViT

In [15]:
class PatchEmbed(nn.Module):
    def __init__(self, img_size=IMG_SIZE, patch=8, in_ch=3, embed_dim=128):
        super().__init__()
        self.proj = nn.Conv2d(in_ch, embed_dim, kernel_size=patch, stride=patch)
        self.cls_token = nn.Parameter(torch.randn(1, 1, embed_dim))
        self.pos_embed = nn.Parameter(torch.randn(1, (img_size//patch)**2 + 1, embed_dim))

    def forward(self, x):
        B = x.size(0)
        x = self.proj(x).flatten(2).transpose(1, 2)
        cls = self.cls_token.expand(B, -1, -1)
        x = torch.cat((cls, x), 1) + self.pos_embed
        return x

class TransformerEncoder(nn.Module):
    def __init__(self, dim, heads=4, mlp_ratio=2.):
        super().__init__()
        self.norm1 = nn.LayerNorm(dim)
        self.attn = nn.MultiheadAttention(dim, heads, batch_first=True)
        self.norm2 = nn.LayerNorm(dim)
        self.mlp = nn.Sequential(nn.Linear(dim, int(dim*mlp_ratio)),
                                 nn.GELU(),
                                 nn.Linear(int(dim*mlp_ratio), dim))

    def forward(self, x):
        x = x + self.attn(self.norm1(x), self.norm1(x), self.norm1(x))[0]
        x = x + self.mlp(self.norm2(x))
        return x

class TinyViT(nn.Module):
    def __init__(self, num_classes, depth=4, dim=128):
        super().__init__()
        self.embed = PatchEmbed(embed_dim=dim)
        self.blocks = nn.Sequential(*[TransformerEncoder(dim) for _ in range(depth)])
        self.norm = nn.LayerNorm(dim)
        self.head = nn.Linear(dim, num_classes)

    def forward(self, x):
        x = self.embed(x)
        x = self.blocks(x)
        x = self.norm(x)
        return self.head(x[:,0])

### 4.c Обучение собственных моделей

In [16]:
custom_results = {}

simplecnn = SimpleCNN(NUM_CLASSES)
simplecnn = fit_onecycle(simplecnn, epochs=10, max_lr=1e-3)
custom_results['SimpleCNN'] = evaluate(simplecnn, test_loader)
print('SimpleCNN test:', custom_results['SimpleCNN'])

Epoch 1: val_acc=0.5477


Epoch 2: val_acc=0.6291


Epoch 3: val_acc=0.6452


Epoch 4: val_acc=0.6304


Epoch 5: val_acc=0.7617


Epoch 6: val_acc=0.7938


Epoch 7: val_acc=0.8356


Epoch 8: val_acc=0.8400


Epoch 9: val_acc=0.8642


Epoch 10: val_acc=0.8617
SimpleCNN test: (0.8597530864197531, 0.8539296056075015, np.float64(0.8439594945872178))


In [17]:
tinyvit = TinyViT(NUM_CLASSES)
tinyvit = fit_onecycle(tinyvit, epochs=10, max_lr=2e-4)
custom_results['TinyViT'] = evaluate(tinyvit, test_loader)
print('TinyViT test:', custom_results['TinyViT'])

Epoch 1: val_acc=0.4244


Epoch 2: val_acc=0.6067


Epoch 3: val_acc=0.6595


Epoch 4: val_acc=0.6770


Epoch 5: val_acc=0.7131


Epoch 6: val_acc=0.7346


Epoch 7: val_acc=0.7481


Epoch 8: val_acc=0.7699


Epoch 9: val_acc=0.7765


Epoch 10: val_acc=0.7795
TinyViT test: (0.7770370370370371, 0.7701192358801896, np.float64(0.7519638946948494))


## 5. Сравнение результатов

In [18]:
import pandas as pd
df = pd.DataFrame.from_dict(
    {**baseline_results, **improved_results, **custom_results},
    orient='index', columns=['Accuracy', 'Macro F1', 'Cohen κ'])
df

Unnamed: 0,Accuracy,Macro F1,Cohen κ
ResNet18,0.967407,0.966164,0.963726
ViT-B16,0.98642,0.985626,0.984884
EffNetV2-S,0.96,0.95954,0.955485
DeiT-S,0.983951,0.983662,0.98214
SimpleCNN,0.859753,0.85393,0.843959
TinyViT,0.777037,0.770119,0.751964


## 6. Выводы
ViT-B/16 достиг наивысших показателей (Accuracy ≈ 0.986, Macro F1 ≈ 0.986). DeiT-S показал сопоставимое качество (0.984), незначительно уступая лидеру и превосходя ResNet-18.

EffNetV2-S не улучшил результаты базовых моделей: его точность (0.960) ниже ResNet-18.

Самописные SimpleCNN и TinyViT существенно уступают всем предобученным архитектурам: 0.860 и 0.777 Accuracy соответственно; следовательно SimpleCNN не превосходит ResNet-18, а TinyViT не является «сравнимой».

На основе этих экспериментов нельзя утверждать, что OneCycleLR и усиленные аугментации гарантируют рост качества: лучшие результаты получены без них.