# Лабораторная работа 7

## 1. Выбор начальных условий

### a. набор данных

Был выбран датасет [leaf flower fruit annotation](https://www.kaggle.com/datasets/ar5entum/leaf-flower-fruit-annotation) (сегментация растений на фотографиях). Задача может иметь множество приложений в реальной жизни. Например, подобная система может использоваться для беспилотных автомобилей для анализа окружающей среды

In [None]:
!pip install kaggle
!pip install pandas



In [1]:
!kaggle datasets download -d ar5entum/leaf-flower-fruit-annotation -p data7 --unzip

Dataset URL: https://www.kaggle.com/datasets/ar5entum/leaf-flower-fruit-annotation
License(s): unknown


In [1]:
!pip install pycocotools

Collecting pycocotools
  Using cached pycocotools-2.0.8.tar.gz (24 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: pycocotools
  Building wheel for pycocotools (pyproject.toml): started
  Building wheel for pycocotools (pyproject.toml): finished with status 'done'
  Created wheel for pycocotools: filename=pycocotools-2.0.8-cp313-cp313-win_amd64.whl size=82529 sha256=88790f67b49a374aee3675aaadffb5d580f39252cdc37e1a4634e68b8bd70acd
  Stored in directory: c:\users\corsider\appdata\local\pip\cache\wheels\a3\c8\17\9a271afbebc0abbc30d6d0da53284602f92208c8437b11cf32
Successfully built pycocotools
Installing collected packages: pycocotools
Successfully installed pycocotools

In [4]:
!pip install segmentation_models_pytorch

Collecting segmentation_models_pytorch
  Downloading segmentation_models_pytorch-0.5.0-py3-none-any.whl.metadata (17 kB)
Collecting huggingface-hub>=0.24 (from segmentation_models_pytorch)
  Downloading huggingface_hub-0.30.2-py3-none-any.whl.metadata (13 kB)
Collecting safetensors>=0.3.1 (from segmentation_models_pytorch)
  Downloading safetensors-0.5.3-cp38-abi3-win_amd64.whl.metadata (3.9 kB)
Collecting timm>=0.9 (from segmentation_models_pytorch)
  Downloading timm-1.0.15-py3-none-any.whl.metadata (52 kB)
Collecting pyyaml>=5.1 (from huggingface-hub>=0.24->segmentation_models_pytorch)
  Using cached PyYAML-6.0.2-cp313-cp313-win_amd64.whl.metadata (2.1 kB)
Downloading segmentation_models_pytorch-0.5.0-py3-none-any.whl (154 kB)
Downloading huggingface_hub-0.30.2-py3-none-any.whl (481 kB)
Downloading safetensors-0.5.3-cp38-abi3-win_amd64.whl (308 kB)
Downloading timm-1.0.15-py3-none-any.whl (2.4 MB)
   ---------------------------------------- 0.0/2.4 MB ? eta -:--:--
   --------------

Загрузим датасет в формате СОСО и подготовим класс для выделения масок из изображений

In [1]:
import os
import cv2
import torch
import numpy as np
from torch import nn, optim
from torch.utils.data import Dataset, DataLoader
import albumentations as A
from albumentations.pytorch import ToTensorV2
import segmentation_models_pytorch as smp
from pycocotools.coco import COCO

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Device:", device)


  from .autonotebook import tqdm as notebook_tqdm


Device: cuda


In [None]:
class toCOCO(Dataset):
    def __init__(self, images_dir, annotation_file, transform=None):
        self.images_dir = images_dir
        self.coco = COCO(annotation_file)
        self.ids = list(self.coco.imgs.keys())
        cats = self.coco.loadCats(self.coco.getCatIds())
        self.cat2label = {cat['id']: i+1 for i, cat in enumerate(cats)}
        self.transform = transform

    def __len__(self):
        return len(self.ids)

    def __getitem__(self, idx):
        img_id = self.ids[idx]
        ann_ids = self.coco.getAnnIds(imgIds=img_id)
        anns = self.coco.loadAnns(ann_ids)

        img_info = self.coco.imgs[img_id]
        path = os.path.join(self.images_dir, img_info['file_name'])
        image = cv2.cvtColor(cv2.imread(path), cv2.COLOR_BGR2RGB)
        h, w = img_info['height'], img_info['width']

        mask = np.zeros((h,w), dtype=np.uint8)
        for ann in anns:
            cat = ann['category_id']
            label = self.cat2label[cat]
            m = self.coco.annToMask(ann)
            mask[m == 1] = label

        if self.transform:
            aug = self.transform(image=image, mask=mask)
            image, mask = aug['image'], aug['mask']
        return image, mask.long()


### b. Метрики качества

В качестве метрики качества будем использовать F1 score (и дополнительно IoU) - стандартные метрики для задач подобного типа

## 2. Создание бейзлайна и оценка качества


In [24]:
!pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu118

Looking in indexes: https://download.pytorch.org/whl/cu118
Collecting torch
  Using cached https://download.pytorch.org/whl/cu118/torch-2.6.0%2Bcu118-cp313-cp313-win_amd64.whl.metadata (28 kB)
Collecting torchvision
  Using cached https://download.pytorch.org/whl/cu118/torchvision-0.21.0%2Bcu118-cp313-cp313-win_amd64.whl.metadata (6.3 kB)
Downloading https://download.pytorch.org/whl/cu118/torch-2.6.0%2Bcu118-cp313-cp313-win_amd64.whl (2728.9 MB)
   ---------------------------------------- 0.0/2.7 GB ? eta -:--:--
   ---------------------------------------- 0.0/2.7 GB 37.7 MB/s eta 0:01:13
   ---------------------------------------- 0.0/2.7 GB 51.3 MB/s eta 0:00:53
   ---------------------------------------- 0.0/2.7 GB 47.3 MB/s eta 0:00:58
    --------------------------------------- 0.0/2.7 GB 58.0 MB/s eta 0:00:47
    --------------------------------------- 0.1/2.7 GB 61.4 MB/s eta 0:00:44
   - -------------------------------------- 0.1/2.7 GB 62.4 MB/s eta 0:00:43
   - --------------

Загрузим датасет и создадим соответствующие даталоадеры. Число классов += 1, т.к. доп. класс будем использовать для фона

In [5]:
#базовые преобразования для tran и для validation данных:
train_transform = A.Compose([
    A.Resize(256, 256),
    A.HorizontalFlip(p=0.5),
    A.RandomCrop(224, 224),
    A.Normalize(),
    ToTensorV2()
])

val_transform = A.Compose([
    A.Resize(224, 224),
    A.Normalize(),
    ToTensorV2()
])

In [None]:
train_ds = toCOCO(
    images_dir="data7/semantic-segmentation-of-plants.v2i.coco-segmentation/train",
    annotation_file="data7/semantic-segmentation-of-plants.v2i.coco-segmentation/train.json",
    transform=train_transform
)
val_ds = toCOCO(
    images_dir="data7/semantic-segmentation-of-plants.v2i.coco-segmentation/valid",
    annotation_file="data7/semantic-segmentation-of-plants.v2i.coco-segmentation/valid.json",
    transform=val_transform
)

train_loader = DataLoader(train_ds, batch_size=32, shuffle=True, num_workers=0)
val_loader   = DataLoader(val_ds,   batch_size=32, shuffle=False, num_workers=0)
num_classes = len(train_ds.cat2label) + 1


loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!


Добавим метод validate() для вывода статистик по обучению

In [None]:
import segmentation_models_pytorch as smp
from segmentation_models_pytorch.metrics.functional import get_stats, iou_score, f1_score

def validate(model, loader):
    model.eval()
    dice_loss = smp.losses.DiceLoss(mode='multiclass')
    ce_loss   = nn.CrossEntropyLoss()

    total_loss = 0.0
    total_iou  = 0.0
    total_f1   = 0.0
    with torch.no_grad():
        for imgs, masks in loader:
            imgs, masks = imgs.to(device), masks.to(device)
            logits = model(imgs)
            #суммарный loss
            loss = dice_loss(logits, masks) + ce_loss(logits, masks)
            total_loss += loss.item()

            preds = logits.argmax(dim=1)
            tp, fp, fn, tn = get_stats(
                preds, masks,
                mode='multiclass',
                num_classes=num_classes
            )
            total_iou += iou_score(tp, fp, fn, tn, reduction='micro').item()
            total_f1  += f1_score(tp, fp, fn, tn, reduction='micro').item()

    n = len(loader)
    return {'loss':total_loss/n,'iou':total_iou/n,'f1':total_f1/n}


В качестве сверточной модели используем Unet с энкодером resnet34. В качестве трансформерной модели используем Segformer с энкодером mit_b0. Используем to(device) для использования cuda

In [7]:
model_unet = smp.Unet(
    encoder_name="resnet34",
    encoder_weights="imagenet",
    classes=num_classes,
    activation=None
).to(device)

model_segformer = smp.Segformer(
    encoder_name="mit_b0",
    encoder_weights="imagenet",
    in_channels=3,
    classes=num_classes,
    activation=None
).to(device)


Добавим функцию обучения fit с удобным выводом результатов по эпохам:

In [14]:
from tqdm import tqdm

def fit(model, train_loader, val_loader, epochs, lr=1e-3):
    optimizer = optim.Adam(model.parameters(), lr=lr)

    for epoch in range(1, epochs+1):
        model.train()
        train_loss = 0.0
        for imgs, masks in train_loader:
            imgs, masks = imgs.to(device), masks.to(device)
            optimizer.zero_grad()
            logits = model(imgs)
            loss = smp.losses.DiceLoss(mode='multiclass')(logits, masks) + nn.CrossEntropyLoss()(logits, masks)
            loss.backward()
            optimizer.step()
            train_loss += loss.item()

        val_res = validate(model, val_loader)

        print(
            f"epoch {epoch:02d} | "
            f"train loss: {train_loss/len(train_loader):.4f} | "
            f"val loss:   {val_res['loss']:.4f} | "
            f"iou:        {val_res['iou']:.4f} | "
            f"f1:         {val_res['f1']:.4f}"
        )


Теперь запустим обучение

In [15]:
fit(model_unet, train_loader, val_loader, epochs=30, lr=1e-3)


epoch 01 | train loss: 0.7485 | val loss:   2.0331 | iou:        0.3834 | f1:         0.5543
epoch 02 | train loss: 0.6813 | val loss:   1.7218 | iou:        0.4714 | f1:         0.6408
epoch 03 | train loss: 0.6871 | val loss:   4.5111 | iou:        0.3692 | f1:         0.5392
epoch 04 | train loss: 0.6138 | val loss:   1.3719 | iou:        0.6170 | f1:         0.7632
epoch 05 | train loss: 0.4925 | val loss:   1.6242 | iou:        0.5494 | f1:         0.7091
epoch 06 | train loss: 0.4674 | val loss:   0.8863 | iou:        0.6954 | f1:         0.8203
epoch 07 | train loss: 0.4303 | val loss:   1.0679 | iou:        0.6925 | f1:         0.8183
epoch 08 | train loss: 0.4010 | val loss:   1.1084 | iou:        0.6965 | f1:         0.8211
epoch 09 | train loss: 0.3982 | val loss:   1.1562 | iou:        0.6856 | f1:         0.8135
epoch 10 | train loss: 0.3613 | val loss:   1.1654 | iou:        0.6821 | f1:         0.8110
epoch 11 | train loss: 0.4430 | val loss:   1.0979 | iou:        0.716

Получаем очень хорошие показатели f1 - 0.8061

In [16]:
fit(model_segformer, train_loader, val_loader, epochs=30, lr=1e-3)

epoch 01 | train loss: 1.6040 | val loss:   1.0779 | iou:        0.6446 | f1:         0.7839
epoch 02 | train loss: 1.0121 | val loss:   1.3077 | iou:        0.5892 | f1:         0.7415
epoch 03 | train loss: 0.7509 | val loss:   0.9538 | iou:        0.7103 | f1:         0.8306
epoch 04 | train loss: 0.5791 | val loss:   1.0355 | iou:        0.6731 | f1:         0.8046
epoch 05 | train loss: 0.5196 | val loss:   0.9659 | iou:        0.7088 | f1:         0.8296
epoch 06 | train loss: 0.4624 | val loss:   1.0016 | iou:        0.6773 | f1:         0.8076
epoch 07 | train loss: 0.4369 | val loss:   0.9907 | iou:        0.7161 | f1:         0.8346
epoch 08 | train loss: 0.3858 | val loss:   1.1799 | iou:        0.6593 | f1:         0.7947
epoch 09 | train loss: 0.3459 | val loss:   1.0917 | iou:        0.6895 | f1:         0.8162
epoch 10 | train loss: 0.3552 | val loss:   1.0182 | iou:        0.7040 | f1:         0.8263
epoch 11 | train loss: 0.3333 | val loss:   0.9461 | iou:        0.693

Получаем f1=0.823 - что являетя очень хорошим результатом (превосходящим результат cnn модели)

## 3. Улучшение бейзлайна

### Гипотезы

Добавим более агрессивные аугментации (цветовые искажения, повороты), поменяем оптимизатор на AdamW и добавим Scheduler.

Размер батча поставим 16, уменьшим learning rate.

In [17]:
improv_transform = A.Compose([
    A.Resize(256, 256),
    A.HorizontalFlip(p=0.5),
    A.VerticalFlip(p=0.5),
    A.RandomRotate90(p=0.5),
    A.ColorJitter(p=0.5),
    A.RandomCrop(224, 224),
    A.Normalize(),
    ToTensorV2()
])

train_ds.transform = improv_transform
train_loader = DataLoader(train_ds, batch_size=16, shuffle=True, num_workers=0)

model_unet_improv = smp.Unet(
    encoder_name="resnet50",
    encoder_weights="imagenet",
    classes=num_classes,
    activation=None
).to(device)

#оптимизатор и scheduler
optimizer = optim.AdamW(model_unet_improv.parameters(), lr=1e-4)
scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10)
dice_loss = smp.losses.DiceLoss(mode='multiclass')
ce_loss   = nn.CrossEntropyLoss()

#новая функция обучения
def fit_improved(model, train_loader, val_loader, epochs):
    for epoch in range(1, epochs+1):
        model.train()
        train_loss = 0.0

        for imgs, masks in train_loader:
            imgs, masks = imgs.to(device), masks.to(device)
            optimizer.zero_grad()

            logits = model(imgs)
            loss = dice_loss(logits, masks) + ce_loss(logits, masks)

            loss.backward()
            optimizer.step()

            train_loss += loss.item()

        scheduler.step() # шаг scheduler-а
        val_res = validate(model, val_loader)
        print(
            f"epoch {epoch:02d} | "
            f"train loss: {train_loss/len(train_loader):.4f} | "
            f"val loss:   {val_res['loss']:.4f} | "
            f"iou:        {val_res['iou']:.4f} | "
            f"f1:         {val_res['f1']:.4f}"
        )




Обучим бейзлайн cnn модель с использованием предложенных гипотез улучшения:

In [18]:
fit_improved(model_unet_improv, train_loader, val_loader, epochs=30)

epoch 01 | train loss: 2.3907 | val loss:   2.5415 | iou:        0.1152 | f1:         0.2066
epoch 02 | train loss: 2.0892 | val loss:   2.0869 | iou:        0.2111 | f1:         0.3486
epoch 03 | train loss: 1.8888 | val loss:   1.9070 | iou:        0.3959 | f1:         0.5672
epoch 04 | train loss: 1.7550 | val loss:   1.7736 | iou:        0.5483 | f1:         0.7082
epoch 05 | train loss: 1.6937 | val loss:   1.6789 | iou:        0.6249 | f1:         0.7691
epoch 06 | train loss: 1.5642 | val loss:   1.6345 | iou:        0.6525 | f1:         0.7897
epoch 07 | train loss: 1.5402 | val loss:   1.6080 | iou:        0.6588 | f1:         0.7943
epoch 08 | train loss: 1.4956 | val loss:   1.5944 | iou:        0.6531 | f1:         0.7901
epoch 09 | train loss: 1.4618 | val loss:   1.5831 | iou:        0.6555 | f1:         0.7919
epoch 10 | train loss: 1.4756 | val loss:   1.5884 | iou:        0.6445 | f1:         0.7838
epoch 11 | train loss: 1.4651 | val loss:   1.5855 | iou:        0.644

Гипотезы помогли и привели к улучшению и без того хорошего результата до f1=0.8326

Повторим для трансформерной модели:

In [23]:
import segmentation_models_pytorch as smp
from torch import optim, nn

model_segformer_improv = smp.Segformer(
    encoder_name="mit_b0",
    encoder_weights="imagenet",
    in_channels=3,
    classes=num_classes,
    activation=None
).to(device)

optimizer = optim.AdamW(model_segformer_improv.parameters(), lr=1e-4)
scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10)
dice_loss = smp.losses.DiceLoss(mode='multiclass')
ce_loss   = nn.CrossEntropyLoss()

def fit_improved_trans(model, train_loader, val_loader, epochs):
    for epoch in range(1, epochs+1):
        model.train()
        train_loss = 0.0
        for imgs, masks in train_loader:
            imgs, masks = imgs.to(device), masks.to(device)
            optimizer.zero_grad()

            logits = model(imgs)
            loss = dice_loss(logits, masks) + ce_loss(logits, masks)
            loss.backward()
            optimizer.step()
            train_loss += loss.item()

        scheduler.step()
        val_res = validate(model, val_loader)

        print(
            f"epoch {epoch:02d} | "
            f"train loss: {train_loss/len(train_loader):.4f} | "
            f"val loss:   {val_res['loss']:.4f} | "
            f"ioo:        {val_res['iou']:.4f} | "
            f"f1:         {val_res['f1']:.4f}"
        )




Обучим модель:

In [24]:
fit_improved_trans(model_segformer_improv, train_loader, val_loader, epochs=30)

epoch 01 | train loss: 1.8881 | val loss:   1.3488 | ioo:        0.5234 | f1:         0.6871
epoch 02 | train loss: 1.2422 | val loss:   1.1255 | ioo:        0.6181 | f1:         0.7639
epoch 03 | train loss: 0.9305 | val loss:   1.0054 | ioo:        0.6676 | f1:         0.8007
epoch 04 | train loss: 0.8034 | val loss:   1.0521 | ioo:        0.6572 | f1:         0.7931
epoch 05 | train loss: 0.7378 | val loss:   1.0719 | ioo:        0.6494 | f1:         0.7875
epoch 06 | train loss: 0.6757 | val loss:   1.0029 | ioo:        0.6686 | f1:         0.8014
epoch 07 | train loss: 0.6503 | val loss:   1.0071 | ioo:        0.6574 | f1:         0.7933
epoch 08 | train loss: 0.6127 | val loss:   1.0301 | ioo:        0.6517 | f1:         0.7891
epoch 09 | train loss: 0.6334 | val loss:   1.0155 | ioo:        0.6590 | f1:         0.7945
epoch 10 | train loss: 0.6498 | val loss:   1.0001 | ioo:        0.6615 | f1:         0.7963
epoch 11 | train loss: 0.5962 | val loss:   0.9990 | ioo:        0.662

Получаем f1=0.8189, значит гипотезы не улучшили бейзлайн трансформерной модели

## 4. Имплементация алгоритма машинного обучения


На данном шаге самостоятельно имплементируем сверточную и трансформер модели машинного обучения.

In [25]:
import torch
from torch import nn, optim

class DoubleConv(nn.Module):
    def __init__(self, in_c, out_c):
        super().__init__()
        self.net = nn.Sequential(
            nn.Conv2d(in_c, out_c, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_c, out_c, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
        )
    def forward(self, x):
        return self.net(x)

class MyCnn(nn.Module):
    def __init__(self, num_classes):
        super().__init__()
        self.down1 = DoubleConv(3, 64)
        self.pool = nn.MaxPool2d(2)
        self.down2 = DoubleConv(64, 128)
        self.down3 = DoubleConv(128, 256)
        self.bottleneck = DoubleConv(256, 512)
        self.up3 = nn.ConvTranspose2d(512, 256, kernel_size=2, stride=2)
        self.upconv3 = DoubleConv(512, 256)
        self.up2 = nn.ConvTranspose2d(256, 128, kernel_size=2, stride=2)
        self.upconv2 = DoubleConv(256, 128)
        self.up1 = nn.ConvTranspose2d(128, 64, kernel_size=2, stride=2)
        self.upconv1 = DoubleConv(128, 64)
        self.final = nn.Conv2d(64, num_classes, kernel_size=1)

    def forward(self, x):
        d1 = self.down1(x)
        d2 = self.down2(self.pool(d1))
        d3 = self.down3(self.pool(d2))
        bn = self.bottleneck(self.pool(d3))
        u3 = self.upconv3(torch.cat([self.up3(bn), d3], dim=1))
        u2 = self.upconv2(torch.cat([self.up2(u3), d2], dim=1))
        u1 = self.upconv1(torch.cat([self.up1(u2), d1], dim=1))
        return self.final(u1)


Самостоятельная имплементация трансформера:

In [None]:
class MyTransformer(nn.Module):
    def __init__(self,
                 img_size=224, patch_size=32, in_ch=3,
                 embed_dim=128, num_heads=4, depth=2,
                 num_classes=num_classes):
        super().__init__()
        assert img_size % patch_size == 0
        num_patches = (img_size // patch_size) **2
        self.patch_embed = nn.Conv2d(in_ch, embed_dim,kernel_size=patch_size, stride=patch_size)
        self.pos_embed = nn.Parameter(torch.zeros(1, num_patches, embed_dim))
        encoder_layer = nn.TransformerEncoderLayer(
            d_model=embed_dim,
            nhead=num_heads,
            dim_feedforward=embed_dim*2,
            dropout=0.1,
            activation='gelu'
        )
        self.transformer = nn.TransformerEncoder(encoder_layer, num_layers=depth)
        self.decoder = nn.Sequential(
            nn.ConvTranspose2d(embed_dim, embed_dim,kernel_size=patch_size,stride=patch_size),
            nn.ReLU(inplace=True),
            nn.Conv2d(embed_dim, embed_dim//2,3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(embed_dim//2, num_classes, 1)
        )

    def forward(self, x):
        x = self.patch_embed(x)
        B, C, H, W = x.shape
        x = x.flatten(2).transpose(1, 2)+self.pos_embed
        x = x.permute(1,0,2)
        x = self.transformer(x)
        x = x.permute(1,0,2)
        x = x.transpose(1, 2).view(B, C,H,W)
        return self.decoder(x)


Валидатор для своих реализаций моделей:

In [28]:
import segmentation_models_pytorch as smp
from segmentation_models_pytorch.metrics.functional import get_stats, iou_score, f1_score

def validate(model, loader):
    model.eval()
    dice_loss = smp.losses.DiceLoss(mode='multiclass')
    ce_loss   = nn.CrossEntropyLoss()

    total_loss = total_iou = total_f1 = 0.0
    with torch.no_grad():
        for imgs, masks in loader:
            imgs, masks = imgs.to(device), masks.to(device)
            logits = model(imgs)
            loss = dice_loss(logits, masks) + ce_loss(logits, masks)
            total_loss += loss.item()

            preds = logits.argmax(dim=1)
            tp, fp, fn, tn = get_stats(preds, masks,mode='multiclass',num_classes=num_classes)
            total_iou += iou_score(tp, fp, fn, tn, reduction='micro').item()
            total_f1  += f1_score(tp, fp, fn, tn, reduction='micro').item()

    n = len(loader)
    return {'loss': total_loss/n,'iou':total_iou/n,'f1':total_f1/n}


Функция обучения для кастомных моделей:

In [29]:
from torch import optim

def fit_cust(model, train_loader, val_loader, epochs, lr=1e-3):
    optimizer = optim.Adam(model.parameters(), lr=lr)
    for epoch in range(1, epochs+1):
        model.train()
        train_loss = 0.0
        for imgs, masks in train_loader:
            imgs, masks = imgs.to(device), masks.to(device)
            optimizer.zero_grad()
            logits = model(imgs)
            loss = smp.losses.DiceLoss(mode='multiclass')(logits, masks) + \
                   nn.CrossEntropyLoss()(logits, masks)
            loss.backward()
            optimizer.step()
            train_loss += loss.item()

        val_res = validate(model, val_loader)
        print(f"epoch {epoch:02d}| "
              f"taining loss: {train_loss/len(train_loader):.4f}  | "
              f"val loss: {val_res['loss']:.4f} | "
              f"iou: {val_res['iou']:.4f} | "
              f"f1: {val_res['f1']:.4f}")


In [None]:
model_cnn_my = MyCnn(num_classes).to(device) # экземпляр кастомной cnn


In [33]:
fit_cust(model_cnn_my, train_loader, val_loader, epochs=30, lr=1e-3)

epoch 01| taining loss: 2.0628  | val loss: 1.8179 | iou: 0.5320 | f1: 0.6945
epoch 02| taining loss: 1.7665  | val loss: 1.9267 | iou: 0.4437 | f1: 0.6147
epoch 03| taining loss: 1.7211  | val loss: 1.7088 | iou: 0.5317 | f1: 0.6942
epoch 04| taining loss: 1.6611  | val loss: 1.6852 | iou: 0.4909 | f1: 0.6586
epoch 05| taining loss: 1.6423  | val loss: 1.6912 | iou: 0.4683 | f1: 0.6378
epoch 06| taining loss: 1.6417  | val loss: 1.6640 | iou: 0.4591 | f1: 0.6293
epoch 07| taining loss: 1.5881  | val loss: 1.6285 | iou: 0.4985 | f1: 0.6654
epoch 08| taining loss: 1.5915  | val loss: 1.6381 | iou: 0.4981 | f1: 0.6649
epoch 09| taining loss: 1.5628  | val loss: 1.6453 | iou: 0.4622 | f1: 0.6322
epoch 10| taining loss: 1.5760  | val loss: 1.6208 | iou: 0.4812 | f1: 0.6497
epoch 11| taining loss: 1.5284  | val loss: 1.5706 | iou: 0.5264 | f1: 0.6897
epoch 12| taining loss: 1.5851  | val loss: 1.7538 | iou: 0.4138 | f1: 0.5854
epoch 13| taining loss: 1.5942  | val loss: 1.5687 | iou: 0.5618

Получаем более низкий результат - f1=0.6465, это ниже, чем у библиотечной модели. Проведем те же действия для трансформерной модели

In [34]:
model_transf = MyTransformer(
    img_size=224, patch_size=32,
    in_ch=3, embed_dim=128,
    num_heads=4, depth=2,
    num_classes=num_classes
).to(device)



Обучим трансформер:

In [35]:
fit_cust(model_transf, train_loader, val_loader, epochs=30, lr=1e-3)

epoch 01| taining loss: 2.0618  | val loss: 1.7457 | iou: 0.5294 | f1: 0.6923
epoch 02| taining loss: 1.7684  | val loss: 1.7108 | iou: 0.5320 | f1: 0.6945
epoch 03| taining loss: 1.6742  | val loss: 1.7155 | iou: 0.5320 | f1: 0.6945
epoch 04| taining loss: 1.6854  | val loss: 1.7120 | iou: 0.4254 | f1: 0.5969
epoch 05| taining loss: 1.6812  | val loss: 1.6333 | iou: 0.4508 | f1: 0.6215
epoch 06| taining loss: 1.6517  | val loss: 1.7101 | iou: 0.4544 | f1: 0.6248
epoch 07| taining loss: 1.5941  | val loss: 1.7215 | iou: 0.4551 | f1: 0.6255
epoch 08| taining loss: 1.5703  | val loss: 1.6425 | iou: 0.5034 | f1: 0.6697
epoch 09| taining loss: 1.5482  | val loss: 1.5803 | iou: 0.5454 | f1: 0.7058
epoch 10| taining loss: 1.5940  | val loss: 1.7372 | iou: 0.4530 | f1: 0.6235
epoch 11| taining loss: 1.4506  | val loss: 1.6632 | iou: 0.3975 | f1: 0.5689
epoch 12| taining loss: 1.5116  | val loss: 1.6712 | iou: 0.3864 | f1: 0.5574
epoch 13| taining loss: 1.4403  | val loss: 1.6898 | iou: 0.3649

Удалось добавиться показателя f1=0.6042. Теперь перейдем к применениям техник улучшения бейзлайна:

In [None]:
optimizer_s = optim.AdamW(model_cnn_my.parameters(), lr=1e-4)
scheduler_s = optim.lr_scheduler.CosineAnnealingLR(optimizer_s, T_max=10)
optimizer_t = optim.AdamW(model_transf.parameters(), lr=1e-4)
scheduler_t = optim.lr_scheduler.CosineAnnealingLR(optimizer_t, T_max=10)

dice_loss = smp.losses.DiceLoss(mode='multiclass')
ce_loss   = nn.CrossEntropyLoss()
def fit_cust_imrpv(model, optimizer, scheduler, train_loader, val_loader, epochs=30):
    for epoch in range(1, epochs+1):
        model.train()
        tl = 0.0
        for imgs, masks in train_loader:
            imgs, masks = imgs.to(device), masks.to(device)
            optimizer.zero_grad()
            logits = model(imgs)
            loss = dice_loss(logits, masks) + ce_loss(logits, masks)
            loss.backward()
            optimizer.step()
            tl += loss.item()
        scheduler.step()

        vr = validate(model, val_loader)
        print(f"epoch {epoch:02d}| "
              f"tr loss: {tl/len(train_loader):.4f}  | "
              f"val loss: {vr['loss']:.4f} | "
              f"iou: {vr['iou']:.4f} | "
              f"f1: {vr['f1']:.4f}")

Обучим CNN модель:

In [37]:
fit_cust_imrpv(model_cnn_my, optimizer_s, scheduler_s, train_loader, val_loader, epochs=30)

epoch 01| tr loss: 1.4548  | val loss: 1.5101 | iou: 0.4953 | f1: 0.6624
epoch 02| tr loss: 1.4346  | val loss: 1.5052 | iou: 0.4936 | f1: 0.6610
epoch 03| tr loss: 1.4826  | val loss: 1.5004 | iou: 0.4955 | f1: 0.6627
epoch 04| tr loss: 1.4041  | val loss: 1.4565 | iou: 0.5158 | f1: 0.6806
epoch 05| tr loss: 1.3488  | val loss: 1.4569 | iou: 0.5127 | f1: 0.6779
epoch 06| tr loss: 1.4004  | val loss: 1.4462 | iou: 0.5158 | f1: 0.6805
epoch 07| tr loss: 1.3679  | val loss: 1.4690 | iou: 0.5048 | f1: 0.6709
epoch 08| tr loss: 1.3713  | val loss: 1.4474 | iou: 0.5142 | f1: 0.6792
epoch 09| tr loss: 1.4190  | val loss: 1.4502 | iou: 0.5127 | f1: 0.6778
epoch 10| tr loss: 1.3776  | val loss: 1.4489 | iou: 0.5134 | f1: 0.6784
epoch 11| tr loss: 1.3883  | val loss: 1.4489 | iou: 0.5134 | f1: 0.6784
epoch 12| tr loss: 1.3809  | val loss: 1.4499 | iou: 0.5128 | f1: 0.6780
epoch 13| tr loss: 1.3508  | val loss: 1.4490 | iou: 0.5133 | f1: 0.6784
epoch 14| tr loss: 1.3883  | val loss: 1.4353 | iou

Получаем f1=0.6840, что немного выше бейзлайн значений. Значит, техники улучшения действительно улучшили бейзлайн кастомной модели. Теперь переейдем к обучению трансформер-модели:

In [38]:
fit_cust_imrpv(model_transf, optimizer_t, scheduler_t, train_loader, val_loader, epochs=30)

epoch 01| tr loss: 1.3957  | val loss: 1.6563 | iou: 0.4427 | f1: 0.6137
epoch 02| tr loss: 1.3283  | val loss: 1.6944 | iou: 0.4507 | f1: 0.6214
epoch 03| tr loss: 1.3418  | val loss: 1.7260 | iou: 0.4318 | f1: 0.6031
epoch 04| tr loss: 1.3740  | val loss: 1.7304 | iou: 0.4195 | f1: 0.5911
epoch 05| tr loss: 1.3369  | val loss: 1.7340 | iou: 0.4189 | f1: 0.5905
epoch 06| tr loss: 1.3631  | val loss: 1.7433 | iou: 0.4123 | f1: 0.5838
epoch 07| tr loss: 1.3035  | val loss: 1.7373 | iou: 0.4126 | f1: 0.5842
epoch 08| tr loss: 1.3074  | val loss: 1.7298 | iou: 0.4113 | f1: 0.5829
epoch 09| tr loss: 1.3028  | val loss: 1.7272 | iou: 0.4112 | f1: 0.5828
epoch 10| tr loss: 1.3571  | val loss: 1.7273 | iou: 0.4111 | f1: 0.5827
epoch 11| tr loss: 1.3928  | val loss: 1.7273 | iou: 0.4111 | f1: 0.5827
epoch 12| tr loss: 1.3419  | val loss: 1.7274 | iou: 0.4113 | f1: 0.5829
epoch 13| tr loss: 1.2620  | val loss: 1.7248 | iou: 0.4119 | f1: 0.5835
epoch 14| tr loss: 1.3306  | val loss: 1.7225 | iou

Получаем f1=0.5867, что значит, что гипотезы не улучшили бейзлайн кастомной трансформерной модели.