# Semantic Segmentation

*General structure following [fast.ai notebook on camvid](https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson3-camvid.ipynb)*

In [None]:
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

In [None]:
from pathlib import Path
import PIL

from fastai.vision import *
from fastai.callbacks.hooks import *
from fastai.utils.mem import *

# Load Data

### We use the [Berkely Deep Drive Dataset](https://bdd-data.berkeley.edu/) which contains a rich labeled dataset for image segmentation in diverse conditions (weather, city, reference car…).

In [None]:
path_data = Path('../data/bdd100k/seg')
path_lbl = path_data/'labels'
path_img = path_data/'images'

### Images and labels filenames

In [None]:
fnames = get_image_files(path_img, recurse = True)
fnames[:3]

In [None]:
lbl_names = get_image_files(path_lbl, recurse = True)
lbl_names[:3]

### Take a look at the image data we have

In [None]:
img_f = fnames[10]
img = open_image(img_f)
img.show(figsize=(10,10))

### Now we need to create a function that maps from the path of an image to the path of its segmentation.

In [None]:
get_y_fn = lambda x: path_lbl/x.parts[-2]/f'{x.stem}_train_id.png'

img_f, get_y_fn(img_f)

### We can now use the obtained label path to open a segmentation image.

In [None]:
mask = open_mask(get_y_fn(img_f))
mask.show(figsize=(10,10), alpha=1)

In [None]:
src_size = np.array(mask.shape[1:])
src_size, mask.data

# Datasets

### Now that we know how our data looks like we can create our data-set using the SegmentationItemList class provided by FastAI.

In [None]:
size = (180, 320)
bs = 16

In [None]:
# Classes extracted from dataset source code
# -> https://github.com/ucbdrive/bdd-data/blob/master/bdd_data/label.py

segmentation_classes = [
    'road', 'sidewalk', 'building', 'wall', 'fence', 'pole', 'traffic light',
    'traffic sign', 'vegetation', 'terrain', 'sky', 'person', 'rider', 'car',
    'truck', 'bus', 'train', 'motorcycle', 'bicycle', 'void'
]
void_code = 19  # used to define accuracy and disconsider unlabeled pixels

In [None]:
src = (SegmentationItemList.from_folder(path_img) # Load in x data from folder
       .split_by_folder(train='train', valid='val') # Split data into training and validation set 
       .label_from_func(get_y_fn, classes = segmentation_classes)) # Label data using the get_y_fn function

In [None]:
src

In [None]:
src.train.y.loss_func

In [None]:
data = (src.transform(get_transforms(), size=size, tfm_y=True) # Flip images horizontally 
        .databunch(bs=bs) # Create a databunch
        .normalize(imagenet_stats)) # Normalize for resnet

### We can show a few examples using the show_batch method which is available for all sorts of databunches

In [None]:
# data.show_batch(2, figsize=(10,7))
data.show_batch(2, figsize=(10,7), ds_type=DatasetType.Valid)

It is also possible to create annotated segmenatation data from scratch by youe own, using such tools as:
https://github.com/abreheret/PixelAnnotationTool

# Model creation and training

A function that will measure the accuracy of the model

In [None]:
def acc(input, target):
    target = target.squeeze(1)
    mask = target != void_code
    return (input.argmax(dim=1)[mask]==target[mask]).float().mean()

In [None]:
metrics=acc

In [None]:
wd=1e-5 # weight decay

To create a U-NET in FastAI the unet_learner class can be used. We not only going to pass it our data but we will also specify an encoder-network (Resnet34 in our case), our accuracy function as well as a weight-decay

In [None]:
learn = unet_learner(data, models.resnet34, metrics=metrics, wd=wd)

With our model ready to go we can now search for a fitting learning rate and then start training our model

In [None]:
lr_find(learn)
learn.recorder.plot()

To read about picking a learning rate, go to:
https://towardsdatascience.com/fastai-image-classification-32d626da20
Here we are searching for the point with the steepest downward slope that still has a high value.

In [None]:
lr=3e-3 # pick a learning rate

In [None]:
learn.fit_one_cycle(40, slice(lr), pct_start=0.9) # train model

Standardly only the decoder is unfrozen, which means that our pretrained encoder didn’t receive any training yet so we will now show some results and then train the whole model.

In [None]:
learn.save('berkeley-stage-1') # save model
learn.show_results(rows=3, figsize=(20,10))

### Perform fine-tuning of all layers

In [None]:
learn.unfreeze() # unfreeze all layers

In [None]:
# find and plot lr again
learn.recorder.plot()

In [None]:
lrs = slice(lr/400, lr/4)

In [None]:
# train the model
learn.fit_one_cycle(20, lrs, pct_start=0.5)

In [None]:
learn.show_results(rows=3, figsize=(20,10))

In [None]:
learn.save('berkeley-stage-2')

In [None]:
learn.load('berkeley-stage-1')

In [None]:
print(learn.summary())

# Making predictions

In [None]:
test_img = open_image('/home/ruslan/Desktop/example_images/example_04.png')

In [None]:
import time
start = time.time()
output = learn.predict( test_img )
print('Inference took %.4f [sec]'%(time.time()-start))

In [None]:
test_img.shape

In [None]:
test_resized = test_img.apply_tfms(None, size=output[0].shape)
test_resized.shape

In [None]:
test_resized.show(figsize=(10,10), y=learn.predict(test_resized)[0])