# Discriminator
For the discriminator, we use an architecture similar to [30] but utilize all fully-convolutional layers to retain the spatial information. The network consists of 5 convolution layers with kernel 4 × 4 and stride of 2, where the channel number is {64, 128, 256, 512, 1}, respectively. Except for the last layer, each convolution layer is followed by a leaky ReLU [27] parameterized by 0.2. An up-sampling layer is added to the last convolution layer for re-scaling the output to the size of the input. We do not use any batch-normalization layers [16] as we jointly train the discriminator with the segmentation network using a small batch size.
# Segmentation Network
It is essential to build upon a good baseline model to achieve high-quality segmentation results
[2, 38, 40]. We adopt the DeepLab-v2 [2] framework with ResNet-101 [11] model pre-trained on ImageNet [6] as our segmentation baseline network. However, we do not use the multi-scale fusion strategy [2] due to the memory issue.
Similar to the recent work on semantic segmentation [2, 38], we remove the last classification layer and modify the stride
of the last two convolution layers from 2 to 1, making the resolution of the output feature maps effectively 1/8 times the input image size. To enlarge the receptive field, we apply dilated convolution layers [38] in conv4 and conv5 layers with a stride of 2 and 4, respectively. After the last layer, we use the Atrous Spatial Pyramid Pooling (ASPP) [2] as the final classifier. Finally, we apply an up-sampling layer along with the softmax output to match the size of the input image. Based on this architecture, our segmentation model achieves 65.1% mean intersection-over-union (IoU) whentrained on the Cityscapes [4] training set and tested on the Cityscapes validation set.
### Multi-level Adaptation Model
We construct the abovementioned discriminator and segmentation network as our ingle-level adaptation model. For the multi-level structure, we extract feature maps from the conv4 layer and add an ASPP module as the auxiliary classifier. Similarly, a discriminator with the same architecture is added for adversarial learning. Figure 2 shows the proposed multi-level adaptation model. In this paper, we use two levels due to the balance of its efficiency and accuracy.
# Network Training. 
To train the proposed single/multi-level adaptation model, we find that jointly training the segmentation network and discriminators in one stage is effective.
In each training batch, we first forward the source image Is to optimize the segmentation network for Lseg in (3) and generate the output Ps. For the target image It, we obtain the segmentation output Pt, and pass it along with Ps to the discriminator for optimizing Ld in (2). In addition, we compute the adversarial loss Ladv in (4) for the target prediction
Pt. For the multi-level training objective in (5), we simply repeat the same procedure for each adaptation module.
To train the segmentation network, we use the Stochastic Gradient Descent (SGD) optimizer with Nesterov acceleration where
the momentum is 0.9 and the weight decay is 10−4. The initial learning rate is set as 2.5 × 10−4 and is decreased using the polynomial decay with power of 0.9 as mentioned in [2]. For training the discriminator, we use the Adam optimizer [18] with the learning rate as 10−4 and the same polynomial decay as the segmentation network. The momentum is set as 0.9 and 0.99.


# REPO GITHUB
[https://github.com/wasidennis/AdaptSegNet/tree/master](http://)

https://github.com/hfslyc/AdvSemiSeg/tree/master

In [1]:
!pip install -U fvcore

Collecting fvcore
  Downloading fvcore-0.1.5.post20221221.tar.gz (50 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.2/50.2 kB[0m [31m795.6 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
Collecting yacs>=0.1.6 (from fvcore)
  Downloading yacs-0.1.8-py3-none-any.whl.metadata (639 bytes)
Collecting iopath>=0.1.7 (from fvcore)
  Downloading iopath-0.1.10.tar.gz (42 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.2/42.2 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
Collecting portalocker (from iopath>=0.1.7->fvcore)
  Downloading portalocker-2.10.0-py3-none-any.whl.metadata (8.5 kB)
Downloading yacs-0.1.8-py3-none-any.whl (14 kB)
Downloading portalocker-2.10.0-py3-none-any.whl (18 kB)
Building wheels for collected packages: fvcore, iopath
  Building wheel for fvcore (setup.py) ... [?25ldone
[?25h  Created wheel for fvcore: filename=fvco

In [5]:
# If you run the model for the first time remove all the previus checkpoints
! rm -r checkpoints/

rm: cannot remove 'checkpoints/': No such file or directory


# IMPORT

In [2]:
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
import torchvision.transforms as transforms
import torchvision.models as models
import torch.optim as optim
import torch.nn.functional as F

import os
import zipfile
import numpy as np
import time
from PIL import Image
import albumentations as A

from fvcore.nn import FlopCountAnalysis, flop_count_table

import warnings
warnings.filterwarnings(action='ignore')

# MODEL PIPELINE

In [3]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
def model_pipeline(config=None):

    # make the model, data, and optimization problem
    model, source_loader, target_loader, val_loader, criterion, optimizer, start_epoch = make(config)
    
    model = torch.nn.DataParallel(model).cuda()
    
    # and use them to train the model
    train(model, source_loader, target_loader, criterion, optimizer, config, start_epoch)

    # and test its final performance
    val(model, val_loader)

    return model

# DATASET

In [4]:
def make(config):
    # Make the data
    (source, target) , test = get_data(train=True), get_data(train=False)
    source_loader = make_loader(source, batch_size=config["batch_size"],train=True)
    target_loader = make_loader(target, batch_size=config["batch_size"],train=True)
    test_loader = make_loader(test, batch_size=config["batch_size"],train=False)

    # Make the model (BiSeNet with ResNet-18 backbone)
    model = build_model(model_type='BiSeNet').cuda()

    # Make the loss and optimizer
    optimizer = optim.SGD(model.parameters(), 
                          lr=config["learning_rate"], 
                          momentum=config["momentum"], 
                          weight_decay=config["weight_decay"])
    
    criterion = torch.nn.CrossEntropyLoss(ignore_index=255)
    
    # Load the last checkpoint
    start_epoch = load_checkpoint(config, model, optimizer)
    
    return model, source_loader, target_loader, test_loader, criterion, optimizer, start_epoch

In [5]:
# Define transforms for preprocessing
transform_cityscapes = A.Compose([
    A.Resize(height=512, width=1024),
])
transform_gta5 = A.Compose([
    A.Resize(height=720, width=1280)
])

# GTA5 for train and CityScapes for test
citiyscapes_dir ='/kaggle/input/cityscapes/Cityscapes/Cityspaces'
#gta_dir = '/kaggle/input/gta5-dataset/GTA5'
gta_dir = '/kaggle/input/gta5-with-mask/GTA5_with_mask/'

def get_data(train=True):
    if train == True:
        # train dataset
        source_dataset = GTA5(root_dir=gta_dir, transform=transform_gta5)
        target_dataset = CityScapes(root_dir=citiyscapes_dir, split='train',transform=transform_cityscapes)
        return source_dataset, target_dataset
    else:
        # test dataset
        dataset = CityScapes(root_dir=citiyscapes_dir, split='val', transform=transform_cityscapes)
        return dataset


def make_loader(dataset, batch_size = 8, train=True):
    if train == True:
        # train dataloader
        dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True,drop_last=True)
    else:
        # test dataloader
        dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=False,drop_last=True)
    
    return dataloader

In [6]:
# TO IMPLEMENT
def build_model(model_type):
    if model_type == 'BiSeNet':
        return BiSeNet(num_classes=19, context_path="resnet18")
    elif model_type == 'DeepLabV2':
        pretrain_model_path = '/kaggle/input/model-weight/deeplab_resnet_pretrained_imagenet.pth'
        return get_deeplab_v2(num_classes=19, pretrain=True, pretrain_model_path=pretrain_model_path)

In [7]:
from torch.utils.data import Dataset
import torch
from PIL import Image
import numpy as np
import os
from typing import Optional, Tuple
from albumentations import Compose

class CityScapes(Dataset):
    
    """
    _summary_
    """
    def __init__(self, 
                 root_dir:str, 
                 split:str = 'train', 
                 transform: Optional[Compose] = None):
        super(CityScapes, self).__init__()
        
        """
        
        _summary
        
        Args:
            root_dir (string): Directory with all the images and annotations.
            split (string): 'train' or 'val'.
            image_transform (callable, optional): Optional transform to be applied on a sample image.
            label_transform (callable, optional): Optional transform to be applied on a sample label.
        """
        self.root_dir = root_dir
        self.split = split
        self.transform = transform
        
        # Load the data
        self.data = []
        path = os.path.join(self.root_dir, 'images', split)
        for city in os.listdir(path):
            images = os.path.join(path, city)
            for image in os.listdir(images):
                image = os.path.join(images, image)
                label = image.replace('images', 'gtFine').replace('_leftImg8bit','_gtFine_labelTrainIds')
                self.data.append((image, label))

    def __len__(self)->int:
        
        """
        
        _summary
        
        Returns:
            int: _description_
        """
        
        return len(self.data)

    def __getitem__(self, idx:int)-> Tuple[torch.Tensor, torch.Tensor]:
        
        """
        
        _summary
        
        Args:
            idx (int): _description_

        Returns:
            tuple[torch.Tensor, torch.Tensor]: _description_
        """
        
        image_path, label_path = self.data[idx]

        # Load image and label
        image = Image.open(image_path).convert('RGB')
        label = Image.open(label_path).convert('L')
        image, label = np.array(image), np.array(label)
        
        if self.transform:
            transformed = self.transform(image=image, mask=label)
            image, label = transformed['image'], transformed['mask']

        image = torch.from_numpy(image).permute(2, 0, 1).float()/255
        label = torch.from_numpy(label).long()
        
        return image, label

In [8]:
def get_color_to_id() -> dict:
    """
    Returns a dictionary mapping RGB color tuples to their corresponding class IDs.

    Returns:
        dict: A dictionary where keys are RGB color tuples and values are class IDs.
    """
    id_to_color = get_id_to_color()
    color_to_id = {color: id for id, color in id_to_color.items()}
    return color_to_id

def get_id_to_color() -> dict:
    """
    Returns a dictionary mapping class IDs to their corresponding colors.

    Returns:
        dict: A dictionary where keys are class IDs and values are RGB color tuples.
    """
    return {
        0: (128, 64, 128),    # road
        1: (244, 35, 232),    # sidewalk
        2: (70, 70, 70),      # building
        3: (102, 102, 156),   # wall
        4: (190, 153, 153),   # fence
        5: (153, 153, 153),   # pole
        6: (250, 170, 30),    # light
        7: (220, 220, 0),     # sign
        8: (107, 142, 35),    # vegetation
        9: (152, 251, 152),   # terrain
        10: (70, 130, 180),   # sky
        11: (220, 20, 60),    # person
        12: (255, 0, 0),      # rider
        13: (0, 0, 142),      # car
        14: (0, 0, 70),       # truck
        15: (0, 60, 100),     # bus
        16: (0, 80, 100),     # train
        17: (0, 0, 230),      # motorcycle
        18: (119, 11, 32),    # bicycle
    }

In [9]:
from torch.utils.data import Dataset
import torch
from PIL import Image
import numpy as np
import os
from typing import Optional, Tuple
# from utils import get_color_to_id


class GTA5(Dataset):
    
    """
    _summary_    
    """
    
    def __init__(self, 
                 root_dir:str,
                 compute_mask:bool=False,
                 transform: Optional[Compose] = None):
        super(GTA5, self).__init__()
        
        """
        Args:
            root_dir (string): Directory with all the images and annotations.
            transform (callable, optional): Optional transform to be applied on a sample.
        """

        self.root_dir = root_dir
        self.compute_mask = compute_mask
        self.transform = transform
        if self.compute_mask:
            self.color_to_id = get_color_to_id()
        
        # Load the data
        self.data = []
        image_dir = os.path.join(self.root_dir, 'images')
        
        if self.compute_mask:
            label_dir = os.path.join(self.root_dir, 'labels')
        else:
            label_dir = os.path.join(self.root_dir, 'masks')
            
        for filename in os.listdir(image_dir):
            image = os.path.join(image_dir, filename)
            label = os.path.join(label_dir, filename)
            self.data.append((image, label))
        
    def __len__(self)->int:
        
        """_summary_

        Returns:
            int: _description_
        """
        
        return len(self.data)

    def __getitem__(self, idx:int)-> Tuple[torch.Tensor,torch.Tensor]:
        
        """_summary_

        Args:
            idx (int): _description_

        Returns:
            Tuple[torch.Tensor,torch.Tensor]: _description_
        """
        
        image_path, label_path = self.data[idx]

        # Load images and labels or masks
        image = Image.open(image_path).convert('RGB')
        
        if self.compute_mask:
            label = self._rgb_to_label(Image.open(label_path).convert('RGB'))
        else:
            label = Image.open(label_path).convert('L')
            
        image, label = np.array(image), np.array(label)
        
        if self.transform:
            transformed = self.transform(image=image, mask=label)
            image, label = transformed['image'], transformed['mask']

        image = torch.from_numpy(image).permute(2, 0, 1).float()/255
        label = torch.from_numpy(label).long()
        return image, label
    
    def _rgb_to_label(self, image:Image.Image)->np.ndarray:
        """_summary_

        Args:
            image (Image.Image): _description_

        Returns:
            np.ndarray: _description_
        """
        
        gray_image = Image.new('L', image.size)
        rgb_pixels = image.load()
        gray_pixels = gray_image.load()
        
        for i in range(image.width):
            for j in range(image.height):
                rgb = rgb_pixels[i,j]
                gray_pixels[i,j] = self.color_to_id.get(rgb,255)
                
        return gray_image

# BISENET

In [10]:
class ConvBlock(torch.nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=2, padding=1):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size,
                               stride=stride, padding=padding, bias=False)
        self.bn = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU()

    def forward(self, input):
        x = self.conv1(input)
        return self.relu(self.bn(x))


class Spatial_path(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.convblock1 = ConvBlock(in_channels=3, out_channels=64)
        self.convblock2 = ConvBlock(in_channels=64, out_channels=128)
        self.convblock3 = ConvBlock(in_channels=128, out_channels=256)

    def forward(self, input):
        x = self.convblock1(input)
        x = self.convblock2(x)
        x = self.convblock3(x)
        return x


class AttentionRefinementModule(torch.nn.Module):
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size=1)
        self.bn = nn.BatchNorm2d(out_channels)
        self.sigmoid = nn.Sigmoid()
        self.in_channels = in_channels
        self.avgpool = nn.AdaptiveAvgPool2d(output_size=(1, 1))

    def forward(self, input):
        # global average pooling
        x = self.avgpool(input)
        assert self.in_channels == x.size(1), 'in_channels and out_channels should all be {}'.format(x.size(1))
        x = self.conv(x)
        x = self.sigmoid(self.bn(x))
        # x = self.sigmoid(x)
        # channels of input and x should be same
        x = torch.mul(input, x)
        return x


class FeatureFusionModule(torch.nn.Module):
    def __init__(self, num_classes, in_channels):
        super().__init__()
        # self.in_channels = input_1.channels + input_2.channels
        # resnet101 3328 = 256(from spatial path) + 1024(from context path) + 2048(from context path)
        # resnet18  1024 = 256(from spatial path) + 256(from context path) + 512(from context path)
        self.in_channels = in_channels

        self.convblock = ConvBlock(in_channels=self.in_channels, out_channels=num_classes, stride=1)
        self.conv1 = nn.Conv2d(num_classes, num_classes, kernel_size=1)
        self.relu = nn.ReLU()
        self.conv2 = nn.Conv2d(num_classes, num_classes, kernel_size=1)
        self.sigmoid = nn.Sigmoid()
        self.avgpool = nn.AdaptiveAvgPool2d(output_size=(1, 1))

    def forward(self, input_1, input_2):
        x = torch.cat((input_1, input_2), dim=1)
        assert self.in_channels == x.size(1), 'in_channels of ConvBlock should be {}'.format(x.size(1))
        feature = self.convblock(x)
        x = self.avgpool(feature)

        x = self.relu(self.conv1(x))
        x = self.sigmoid(self.conv2(x))
        x = torch.mul(feature, x)
        x = torch.add(x, feature)
        return x


class BiSeNet(torch.nn.Module):
    def __init__(self, num_classes, context_path):
        super().__init__()
        # build spatial path
        self.saptial_path = Spatial_path()

        # build context path
        self.context_path = build_contextpath(name=context_path)

        # build attention refinement module  for resnet 101
        if context_path == 'resnet101':
            self.attention_refinement_module1 = AttentionRefinementModule(1024, 1024)
            self.attention_refinement_module2 = AttentionRefinementModule(2048, 2048)
            # supervision block
            self.supervision1 = nn.Conv2d(in_channels=1024, out_channels=num_classes, kernel_size=1)
            self.supervision2 = nn.Conv2d(in_channels=2048, out_channels=num_classes, kernel_size=1)
            # build feature fusion module
            self.feature_fusion_module = FeatureFusionModule(num_classes, 3328)

        elif context_path == 'resnet18':
            # build attention refinement module  for resnet 18
            self.attention_refinement_module1 = AttentionRefinementModule(256, 256)
            self.attention_refinement_module2 = AttentionRefinementModule(512, 512)
            # supervision block
            self.supervision1 = nn.Conv2d(in_channels=256, out_channels=num_classes, kernel_size=1)
            self.supervision2 = nn.Conv2d(in_channels=512, out_channels=num_classes, kernel_size=1)
            # build feature fusion module
            self.feature_fusion_module = FeatureFusionModule(num_classes, 1024)
        else:
            print('Error: unspport context_path network \n')

        # build final convolution
        self.conv = nn.Conv2d(in_channels=num_classes, out_channels=num_classes, kernel_size=1)

        self.init_weight()

        self.mul_lr = []
        self.mul_lr.append(self.saptial_path)
        self.mul_lr.append(self.attention_refinement_module1)
        self.mul_lr.append(self.attention_refinement_module2)
        self.mul_lr.append(self.supervision1)
        self.mul_lr.append(self.supervision2)
        self.mul_lr.append(self.feature_fusion_module)
        self.mul_lr.append(self.conv)

    def init_weight(self):
        for name, m in self.named_modules():
            if 'context_path' not in name:
                if isinstance(m, nn.Conv2d):
                    nn.init.kaiming_normal_(m.weight, mode='fan_in', nonlinearity='relu')
                elif isinstance(m, nn.BatchNorm2d):
                    m.eps = 1e-5
                    m.momentum = 0.1
                    nn.init.constant_(m.weight, 1)
                    nn.init.constant_(m.bias, 0)

    def forward(self, input):
        # output of spatial path
        sx = self.saptial_path(input)

        # output of context path
        cx1, cx2, tail = self.context_path(input)
        cx1 = self.attention_refinement_module1(cx1)
        cx2 = self.attention_refinement_module2(cx2)
        cx2 = torch.mul(cx2, tail)
        # upsampling
        cx1 = torch.nn.functional.interpolate(cx1, size=sx.size()[-2:], mode='bilinear')
        cx2 = torch.nn.functional.interpolate(cx2, size=sx.size()[-2:], mode='bilinear')
        cx = torch.cat((cx1, cx2), dim=1)

        if self.training == True:
            cx1_sup = self.supervision1(cx1)
            cx2_sup = self.supervision2(cx2)
            cx1_sup = torch.nn.functional.interpolate(cx1_sup, size=input.size()[-2:], mode='bilinear')
            cx2_sup = torch.nn.functional.interpolate(cx2_sup, size=input.size()[-2:], mode='bilinear')

        # output of feature fusion module
        result = self.feature_fusion_module(sx, cx)

        # upsampling
        result = torch.nn.functional.interpolate(result, scale_factor=8, mode='bilinear')
        result = self.conv(result)

        if self.training == True:
            return result, cx1_sup, cx2_sup

        return result

In [11]:
class resnet18(torch.nn.Module):
    def __init__(self, pretrained=True):
        super().__init__()
        self.features = models.resnet18(pretrained=pretrained)
        self.conv1 = self.features.conv1
        self.bn1 = self.features.bn1
        self.relu = self.features.relu
        self.maxpool1 = self.features.maxpool
        self.layer1 = self.features.layer1
        self.layer2 = self.features.layer2
        self.layer3 = self.features.layer3
        self.layer4 = self.features.layer4

    def forward(self, input):
        x = self.conv1(input)
        x = self.relu(self.bn1(x))
        x = self.maxpool1(x)
        feature1 = self.layer1(x)  # 1 / 4
        feature2 = self.layer2(feature1)  # 1 / 8
        feature3 = self.layer3(feature2)  # 1 / 16
        feature4 = self.layer4(feature3)  # 1 / 32
        # global average pooling to build tail
        tail = torch.mean(feature4, 3, keepdim=True)
        tail = torch.mean(tail, 2, keepdim=True)
        return feature3, feature4, tail


class resnet101(torch.nn.Module):
    def __init__(self, pretrained=True):
        super().__init__()
        self.features = models.resnet101(pretrained=pretrained)
        self.conv1 = self.features.conv1
        self.bn1 = self.features.bn1
        self.relu = self.features.relu
        self.maxpool1 = self.features.maxpool
        self.layer1 = self.features.layer1
        self.layer2 = self.features.layer2
        self.layer3 = self.features.layer3
        self.layer4 = self.features.layer4

    def forward(self, input):
        x = self.conv1(input)
        x = self.relu(self.bn1(x))
        x = self.maxpool1(x)
        feature1 = self.layer1(x)  # 1 / 4
        feature2 = self.layer2(feature1)  # 1 / 8
        feature3 = self.layer3(feature2)  # 1 / 16
        feature4 = self.layer4(feature3)  # 1 / 32
        # global average pooling to build tail
        tail = torch.mean(feature4, 3, keepdim=True)
        tail = torch.mean(tail, 2, keepdim=True)
        return feature3, feature4, tail


def build_contextpath(name):
    model = {
        'resnet18': resnet18(pretrained=True),
        'resnet101': resnet101(pretrained=True)
    }
    return model[name]

 # DISCRIMINATOR
 easy copiato da github dagli autori del paper

In [12]:
import torch.nn as nn
import torch.nn.functional as F


class FCDiscriminator(nn.Module):

	def __init__(self, num_classes, ndf = 64):
		super(FCDiscriminator, self).__init__()

		self.conv1 = nn.Conv2d(num_classes, ndf, kernel_size=4, stride=2, padding=1)
		self.conv2 = nn.Conv2d(ndf, ndf*2, kernel_size=4, stride=2, padding=1)
		self.conv3 = nn.Conv2d(ndf*2, ndf*4, kernel_size=4, stride=2, padding=1)
		self.conv4 = nn.Conv2d(ndf*4, ndf*8, kernel_size=4, stride=2, padding=1)
		self.classifier = nn.Conv2d(ndf*8, 1, kernel_size=4, stride=2, padding=1)

		self.leaky_relu = nn.LeakyReLU(negative_slope=0.2, inplace=True)
		#self.up_sample = nn.Upsample(scale_factor=32, mode='bilinear')
		#self.sigmoid = nn.Sigmoid()


	def forward(self, x):
		x = self.conv1(x)
		x = self.leaky_relu(x)
		x = self.conv2(x)
		x = self.leaky_relu(x)
		x = self.conv3(x)
		x = self.leaky_relu(x)
		x = self.conv4(x)
		x = self.leaky_relu(x)
		x = self.classifier(x)
		#x = self.up_sample(x)
		#x = self.sigmoid(x) 

		return x

# TRAINING

In [13]:
class CrossEntropy2d(nn.Module):

    def __init__(self, size_average=True, ignore_label=255):
        super(CrossEntropy2d, self).__init__()
        self.size_average = size_average
        self.ignore_label = ignore_label

    def forward(self, predict, target, weight=None):
        """
            Args:
                predict:(n, c, h, w)
                target:(n, h, w)
                weight (Tensor, optional): a manual rescaling weight given to each class.
                                           If given, has to be a Tensor of size "nclasses"
        """
        assert not target.requires_grad
        assert predict.dim() == 4
        assert target.dim() == 3
        assert predict.size(0) == target.size(0), "{0} vs {1} ".format(predict.size(0), target.size(0))
        assert predict.size(2) == target.size(1), "{0} vs {1} ".format(predict.size(2), target.size(1))
        assert predict.size(3) == target.size(2), "{0} vs {1} ".format(predict.size(3), target.size(3))
        n, c, h, w = predict.size()
        target_mask = (target >= 0) * (target != self.ignore_label)
        target = target[target_mask]
        if not target.data.dim():
            return Variable(torch.zeros(1))
        predict = predict.transpose(1, 2).transpose(2, 3).contiguous()
        predict = predict[target_mask.view(n, h, w, 1).repeat(1, 1, 1, c)].view(-1, c)
        loss = F.cross_entropy(predict, target, weight=weight, size_average=self.size_average)
        return loss

In [14]:
def save_checkpoint(config, model, optimizer, train_loss, mIOU, epoch):
    checkpoint_path = os.path.join(config["checkpoint_dir"], "checkpoint.pth")
    torch.save({
        'epoch': epoch + 1,
        'model_state_dict': model.state_dict(),
        'optimizer_state_dict': optimizer.state_dict(),
        'loss': train_loss,
        'mIOU': mIOU
    }, checkpoint_path)
    print(f"Checkpoint saved in {checkpoint_path} | Epoch: {epoch}")
    
    
def load_checkpoint(config, model, optimizer):
    if os.path.exists(config["checkpoint_dir"]):
        checkpoint = torch.load(config["checkpoint_dir"] + "/checkpoint.pth")
        model.load_state_dict(checkpoint['model_state_dict'])
        optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
        start_epoch = checkpoint['epoch']
        print(f"Checkpoint found. Resuming from epoch {start_epoch}.")
        return start_epoch
    else:
        os.mkdir(config["checkpoint_dir"]) # divide the directory wrt the model (eg. checkpoints/DeepLabV2, checkpoints/BiSeNet)
        print("No checkpoint found. Starting from scratch.")
        return 0

In [15]:
input_size = [1280, 720]
input_size_target = [1024, 512]

In [16]:
def loss_calc(pred, label):
    """
    This function returns cross entropy loss for semantic segmentation
    """
    # out shape batch_size x channels x h x w -> batch_size x channels x h x w
    # label shape h x w x 1 x batch_size  -> batch_size x 1 x h x w
    label = label.long().cuda()
    criterion = CrossEntropy2d().cuda()

    return criterion(pred, label)

def lr_poly(base_lr, iter, max_iter, power):
    return base_lr * ((1 - float(iter) / max_iter) ** (power))

def adjust_learning_rate(optimizer, iter, max_iter, config):
    lr = lr_poly(config["learning_rate"], iter, max_iter=1000, power=0.9)
    optimizer.param_groups[0]['lr'] = lr
    if len(optimizer.param_groups) > 1:
        optimizer.param_groups[1]['lr'] = lr * 10

In [17]:
def id_processing(targets):
    targets = targets.cuda()
    
    # Define valid indices
    valid_indices = torch.tensor(list(range(19)) + [255]).to(targets.device)

    # Replace all IDs not in valid_indices with 255
    processed_targets = torch.where(torch.isin(targets, valid_indices), targets, torch.tensor(255, device=targets.device))

    return processed_targets.long()

In [18]:
def poly_lr_scheduler(optimizer, init_lr, iter, max_iter, lr_decay_iter=1, power=0.9):
    """Polynomial decay of learning rate
            :param init_lr is base learning rate
            :param iter is a current iteration
            :param lr_decay_iter how frequently decay occurs, default is 1
            :param max_iter is number of maximum iterations
            :param power is a polymomial power

    """
    #if iter % lr_decay_iter or iter > max_iter:
        #return optimizer

    lr = init_lr*(1 - iter/max_iter)**power
    optimizer.param_groups[0]['lr'] = lr
    return lr


In [27]:
from itertools import cycle
from tqdm import tqdm

def train(model, source_loader, target_loader, criterion, optimizer, config, start_epoch):
    lambda_adv = 0.001
    model_D = FCDiscriminator(num_classes=19)
    model.cuda()
    model_D.cuda() 
    
    optimizer = optim.SGD(model.parameters(), lr=config["learning_rate"], momentum=config["momentum"], weight_decay=config["weight_decay"])
    optimizer_D = optim.Adam(model_D.parameters(), lr=config["learning_rate_D"], betas=(0.9, 0.99))
    
    g_initial_lr = optimizer.param_groups[0]['lr']
    d_initial_lr = optimizer_D.param_groups[0]['lr']

    bce_loss = torch.nn.BCEWithLogitsLoss()
    criterion = torch.nn.CrossEntropyLoss(ignore_index=255)
    
    interp = nn.Upsample(size=(input_size[1], input_size[0]), mode='bilinear')
    interp_target = nn.Upsample(size=(input_size_target[1], input_size_target[0]), mode='bilinear')

    source_label = 0
    target_label = 1
    
    for epoch in range(config["epochs"]):
        print("Epoche {}/{}".format(epoch + 1, config["epochs"]))
        model.train()
        model_D.train() 
        
        target_loader_cycle = cycle(target_loader)
        train_loop = tqdm(zip(source_loader, target_loader_cycle), total=len(source_loader), leave=False)
        train_loop.set_description(f'Epoch {epoch+1}/{config["epochs"]} (Train)')
        loss_seg_value = 0
        loss_adv_target_value = 0
        loss_d_value = 0
        for (source_data, source_labels), (target_data, _) in train_loop:
            source_data, source_labels = source_data.to(device), source_labels.to(device)
            target_data = target_data.to(device)
            
            optimizer.zero_grad()
            optimizer_D.zero_grad()

            #TRAIN G
            
            #Train with source
            for param in model_D.parameters():
                param.requires_grad = False
                
            pred, _, _ = model(source_data)
            pred = interp(pred)

            loss_seg = criterion(pred, source_labels)
            loss_seg.backward()
            loss_seg_value = loss_seg.data.cpu().numpy()
            train_loop.set_postfix(loss=loss_seg.item())
    
            #Train with target
            pred_target, _, _ = model(target_data)
            pred_target = interp_target(pred_target)
            
            d_out = model_D(F.softmax(pred_target))

            loss_adv_target = bce_loss(d_out, torch.FloatTensor(d_out.data.size()).fill_(source_label).cuda())
            loss_d = lambda_adv * loss_adv_target
            loss_d.backward()
            loss_adv_target_value = loss_adv_target.data.cpu().numpy()
            train_loop.set_postfix(loss=loss_d.item())
            #TRAIN D
            
            #Train with source
            for param in model_D.parameters():
                param.requires_grad = True
                
            pred = pred.detach()
            d_out = model_D(F.softmax(pred))
            
            loss_d = bce_loss(d_out, torch.FloatTensor(d_out.data.size()).fill_(source_label).cuda())
            loss_d.backward()
            train_loop.set_postfix(loss=loss_d.item())

            #Train with target
            pred_target = pred_target.detach()
            
            d_out = model_D(F.softmax(pred_target))
            
            loss_d = bce_loss(d_out, torch.full_like(d_out, target_label))
            loss_d.backward()
            loss_d_value = loss_d.data.cpu().numpy()
            train_loop.set_postfix(loss_seg=loss_seg.item(), loss_adv=loss_adv_target.item(), loss_d=loss_d.item())
            
            poly_lr_scheduler(optimizer, init_lr=g_initial_lr, iter=epoch, max_iter=config["epochs"])
            poly_lr_scheduler(optimizer_D, init_lr=d_initial_lr, iter=epoch, max_iter=config["epochs"])
            
            optimizer.step()
            optimizer_D.step()
        
        print(optimizer.param_groups[0]['lr'])
        print('loss_seg = {0:.3f} loss_adv = {1:.3f}, loss_d = {2:.3f}'.format(
        loss_seg_value, loss_adv_target_value, loss_d_value))


# TESTING
Su Cityscapes, è regolare

In [28]:
def mean_iou(num_classes, pred, target):
    mIOU = 0
    for i in range(len(pred)):
        hist = fast_hist(target[i].cpu().numpy(), pred[i].cpu().numpy(), num_classes)
        IOU = per_class_iou(hist)
        mIOU = mIOU + sum(IOU)/num_classes
    return mIOU

def fast_hist(a, b, n):
    """
    a and b are predict and mask respectively
    n is the number of classes
    """
    k = (a >= 0) & (a < n) #assign True if the value is in the range between 0 and 18 (class labels)
    return np.bincount(n * a[k].astype(int) + b[k], minlength=n ** 2).reshape((n, n))

def per_class_iou(hist):
    epsilon = 1e-5
    return (np.diag(hist)) / (hist.sum(1) + hist.sum(0) - np.diag(hist) + epsilon)

In [29]:
def val(model, val_loader):
    
    model.eval()
    model.cuda()
    
    interp = nn.Upsample(size=(512, 1024), mode='bilinear')
    total_mIOU = 0
    total_images = 0

    with torch.no_grad():
        for _, (inputs, targets) in enumerate(val_loader):
            image, label = inputs.cuda(), id_processing(targets).cuda()

            output = model(image)
            _, predicted = output.max(1)
            running_mIOU = mean_iou(output.size()[1], predicted, targets)
            total_mIOU += running_mIOU.sum().item()
            total_images += len(predicted)
        
    mIOU = total_mIOU/total_images
    
    print(f'\n\nmIoU: {(mIOU*100):.3f}%')

In [None]:
! rm -r checkpoints/
# best configuration (TO CONFIGURE)
config = dict(
    epochs=20,
    batch_size=4,
    learning_rate=1e-4,
    learning_rate_D=1e-2,
    momentum=0.9,
    weight_decay=1e-4,
    architecture="BiSeNet",
    checkpoint_dir="/kaggle/working/checkpoints" )
if torch.cuda.is_available():
    print("Building the model with the best configuration")
    # Build, train and analyze the model with the pipeline
    model = model_pipeline(config)
else:
    print("CUDA is Not available")

Building the model with the best configuration
No checkpoint found. Starting from scratch.
Epoche 1/20


Epoch 1/20 (Train):   1%|          | 6/625 [00:17<21:53,  2.12s/it, loss=0]                                        

In [None]:
! rm -r checkpoints/

In [29]:
# best configuration (TO CONFIGURE)
config = dict(
    epochs=5,
    batch_size=6,
    learning_rate=2.5e-4,
    learning_rate_D=1e-4,
    momentum=0.9,
    weight_decay=1e-4,
    architecture="BiSeNet",
    checkpoint_dir="/kaggle/working/checkpoints" )
if torch.cuda.is_available():
    print("Building the model with the best configuration")
    # Build, train and analyze the model with the pipeline
    model = model_pipeline(config)
else:
    print("CUDA is Not available")

Building the model with the best configuration


Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 144MB/s] 
Downloading: "https://download.pytorch.org/models/resnet101-63fe2227.pth" to /root/.cache/torch/hub/checkpoints/resnet101-63fe2227.pth
100%|██████████| 171M/171M [00:01<00:00, 156MB/s]  


No checkpoint found. Starting from scratch.
Epoche 1/5


                                                                                                                 

0.00025
loss_seg = 0.748 loss_adv = 2.362, loss_d = 0.322
Epoche 2/5


                                                                                                                 

0.00020451303651271462
loss_seg = 0.527 loss_adv = 2.305, loss_d = 0.418
Epoche 3/5


                                                                                                                 

0.00015786146687233883
loss_seg = 0.559 loss_adv = 2.373, loss_d = 0.394
Epoche 4/5


                                                                                                                 

0.00010959582263852174
loss_seg = 0.638 loss_adv = 2.560, loss_d = 0.277
Epoche 5/5


                                                                                                                 

5.873094715440094e-05
loss_seg = 0.448 loss_adv = 2.644, loss_d = 0.369


mIoU: 14.595%
