There are different types of neural networks architectures for segmentation problem:
1. Fully Convolutional Network
2. U-Net
3. SegNet
4. PSPNet: Pyramid Scene Parsing Network
5. DeepLab

FCN works as a Encoder and Decoder. It first version had only one layer in Decoder. 
The U-Net architecture is built upon the Fully Convolutional Network. Compared to FCN-8, the two main differences are U-net is symmetric and the skip connections between the downsampling path and the upsampling path apply a concatenation operator instead of a sum. As consequencies, the number of parameters of the model is reduced and it can be trained with a small labelled dataset (using appropriate data augmentation). 
SegNet doesn’t have strong difference with U-Net. An article about SegNet says:
«As compared to SegNet, U-Net does not reuse pooling indices but instead transfers the entire feature map (at the cost of more memory) to the corresponding decoders and concatenates them to upsampled (via deconvolution) decoder feature maps. There is no conv5 and max-pool 5 block in U-Net as in the VGG net architecture. SegNet, on the other hand, uses all of the pre-trained convolutional layer weights from VGG net as pre-trained weights». [1]

PSPNet was developed to better learn the global context representation of a scene. They are pooled with four different scales each one corresponding to a pyramid level and processed by a 1x1 convolutional layer to reduce their dimensions. This way each pyramid level analyses sub-regions of the image with different location.  
DeepLab combining atrous convolution, spatial pyramid pooling and fully connected CRFs.

U-Net was chosen for this task as one of the most popular architecture with good benchmarks. 

As a loss for neural network was chosen a sum of dice coefficient and binary cross entropy to archive better differentiable properties and dealing with imbalanced classes. 

 Adam was choosen for its speed.

[1] https://arxiv.org/pdf/1511.00561.pdf

In [None]:
import os
os.chdir("path")

from torch.utils.data import DataLoader
import torch
import torch.optim as optim
import albumentations as A
from torchvision import transforms

from lib import *

%matplotlib inline

In [None]:
batch_size = 1
data_train_path = "data/train"
data_val_path = "data/val"
jpg_format = "jpg"
png_format = "png"
class_number = 1
learning_rate = 1e-4
num_epochs = 20


In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)


In [None]:
trans = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

augmentation_pipeline = A.Compose(
    [
        A.GaussNoise(var_limit=(10, 100), p=1),
        A.Blur(10, p=1),
        A.RGBShift(p=1),
        A.RandomRotate90(always_apply=True),
        A.OneOf(
            [
                # apply one of transforms to 50% of image
                A.RandomGamma(),  # apply random gamma
                A.RandomBrightness(0.1),  # apply random brightness
            ],
            p=0.5
        )
    ],
    p=1
)


In [None]:
basic_train_dataset = BasicDataset(data_train_path, jpg_format, trans)
basic_train_aug_dataset = BasicDataset(data_train_path, jpg_format, trans, augmentation_pipeline)
basic_val_dataset = BasicDataset(data_val_path, png_format, trans)


In [None]:
dataloaders = {
    'train': DataLoader(basic_train_dataset, batch_size=batch_size, shuffle=True, num_workers=0),
    'val': DataLoader(basic_val_dataset, batch_size=batch_size, shuffle=True, num_workers=0)
}

model = UNet(class_number)

optimizer_ft = optim.Adam(model.parameters(), lr=learning_rate)
model = train_model(model, optimizer_ft, dataloaders, device, batch_size, num_epochs=num_epochs)


The first version of neural network was created without data augmentation. But it had some problems in validation data set. Here is an example:
![%D0%A1%D0%BD%D0%B8%D0%BC%D0%BE%D0%BA%20%D1%8D%D0%BA%D1%80%D0%B0%D0%BD%D0%B0%202020-02-18%20%D0%B2%2018.52.56.png](attachment:%D0%A1%D0%BD%D0%B8%D0%BC%D0%BE%D0%BA%20%D1%8D%D0%BA%D1%80%D0%B0%D0%BD%D0%B0%202020-02-18%20%D0%B2%2018.52.56.png)
You can see, that the subject is blured with enviroment with a close color. So data augmentation was performed in the way to solve this problem, it was done with blur, also was added rotate for better learning.

In [None]:
dataloaders = {
    'train': DataLoader(basic_train_dataset + basic_train_aug_dataset, batch_size=batch_size, shuffle=True,
                        num_workers=0),
    'val': DataLoader(basic_val_dataset, batch_size=batch_size, shuffle=True, num_workers=0)
}

model = UNet(class_number)

optimizer_ft = optim.Adam(model.parameters(), lr=learning_rate)
model, metrics = train_model(model, optimizer_ft, dataloaders, device, batch_size, num_epochs=num_epochs)


The best result for UNet with augmentation:

Epoch 10/19
_______
LR 1e-05
train: bce: 0.004201, dice: 0.056958, loss: 0.030580

val: bce: 0.010855, dice: 0.134940, loss: 0.072898

saving best model

12m 32s

In [None]:
model = UNet(class_number)
model.load_state_dict(torch.load("pretrained_model/model.pth"))  # works with gpu, was trained on colab, the best model with data
#augmentation
model.eval()

model = model.to(device)


In [None]:
test_data_loader = DataLoader(TestDataset("data/real_test", "JPG", 4, trans), batch_size=batch_size, shuffle=False,
                              num_workers=0)

pred_masks_dict = get_pred_masks(model, test_data_loader, device)
paths = ["data/real_test/000%s.JPG" % x.numpy()[0] for x in list(pred_masks_dict.keys())]
pred_masks = list(pred_masks_dict.values())
_ = get_html(paths, pred_masks, path_to_save="results/data")


In [None]:
test_data_loader = DataLoader(TestDataset("/content/drive/My Drive/data/val/images", "png", 8, trans),
                              batch_size=batch_size, shuffle=False, num_workers=0)
pred_masks_dict = get_pred_masks(model, test_data_loader, device)
pred_masks_dict = {k.numpy()[0]: encode_rle(v) for k, v in pred_masks_dict.items()}
df = pd.DataFrame.from_dict(pred_masks_dict, orient='index', columns=["rle_mask"])
df["img_id"] = df.index
df = df[['img_id', 'rle_mask']]
df.to_csv("pred_val_template.csv")


After getting results and looking at results on real_test, we can see some problems of our current model. The one problem is with colors. Can also be solved with data augmentation, but takes a lot of time and resources to train neural network.
Also we can try different learning rate and optimizer.