# Semantic Segmentation with OpenVINO™ using Segmenter

Semantic segmentation is a difficult computer vision problem with many applications such as autonomous driving, robotics, augmented reality, and many others.
Its goal is to assign labels to each pixel according to the object it belongs to, creating so-called segmentation masks.
To properly assign this label, the model needs to consider the local as well as global context of the image.
This is where transformers offer their advantage as they work well in capturing global context.

Segmenter is based on Vision Transformer working as an encoder, and Mask Transformer working as a decoder.
With this configuration, it achieves good results on different datasets such as ADE20K, Pascal Context, and Cityscapes.
It works as shown in the diagram below, by taking the image, splitting it into patches, and then encoding these patches.
Mask transformer combines encoded patches with class masks and decodes them into a segmentation map as the output, where each pixel has a label assigned to it.


<div style="text-align:center">
    <img src="https://user-images.githubusercontent.com/24582831/148507554-87eb80bd-02c7-4c31-b102-c6141e231ec8.png" width="50%"/>
</div>

More about the model and its details can be found in the following paper:
[Segmenter: Transformer for Semantic Segmentation](https://arxiv.org/abs/2105.05633) or in the [repository](https://github.com/rstrudel/segmenter).

To demonstrate how to convert and use Segmenter in OpenVINO, this notebook consists of the following steps:

* Preparing PyTorch Segmenter model
* Preparing preprocessing and visualization function
* Validating inference of original model
* Converting PyTorch model to ONNX
* Converting ONNX to OpenVINO IR
* Validating inference of converted model
* Validating converted model on a subset of ADE20K

# Get and prepare PyTorch model

First thing we'll need to do is clone [repository](https://github.com/rstrudel/segmenter) containing model and helper functions. We will use Tiny model with mask transformer, that is Seg-T-Mask/16. There are also better, but much larger models available in linked repo. This model is pretrained on [ADE20K](https://groups.csail.mit.edu/vision/datasets/ADE20K/) dataset used for segmentation.

PyTorch models are usually instance of [torch.nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) class, initialized by a state dictionary containing model weights.
Typical steps to get model are therefore:

1. Create instance of model class
2. Load checkpoint state dict, which contains pretrained model weights
3. Turn model to evaluation for switching some operations to inference mode

In our case, the code from repository already contains functions that create model and load weights, but we will need to download config and trained wight (checkpoint) file.

## Prerequisites

In [1]:
import sys
from pathlib import Path

sys.path.append("../utils")
from notebook_utils import download_file

In [2]:
# we'll need timm and mmsegmentation module, to use segmenter repo
!pip install timm mmsegmentation einops -q

First, we will clone the Segmenter repo and then download weights and config for our model

In [3]:
# clone Segmenter repo
if not Path("segmenter").exists():
    !git clone https://github.com/rstrudel/segmenter
else:
    print("Segmenter repo already cloned")
%cd segmenter

Segmenter repo already cloned
C:\Users\blazr\Documents\code\gsoc\openvino_notebooks\notebooks\236-segmenter-semantic-segmentation\segmenter


In [4]:
# download config and pretrained model weights
# here we use tiny model, there are also better but larger models available in repository
WEIGHTS_LINK = "https://www.rocq.inria.fr/cluster-willow/rstrudel/segmenter/checkpoints/ade20k/seg_tiny_mask/checkpoint.pth"
CONFIG_LINK = "https://www.rocq.inria.fr/cluster-willow/rstrudel/segmenter/checkpoints/ade20k/seg_tiny_mask/variant.yml"

MODEL_DIR = Path("model/")
DATA_DIR = Path("data/")
MODEL_DIR.mkdir(exist_ok=True)
DATA_DIR.mkdir(exist_ok=True)

download_file(WEIGHTS_LINK, directory=MODEL_DIR, show_progress=True)
download_file(CONFIG_LINK, directory=MODEL_DIR, show_progress=True)

WEIGHT_PATH = MODEL_DIR / "checkpoint.pth"
CONFIG_PATH = MODEL_DIR / "variant.yaml"

'model\checkpoint.pth' already exists.
'model\variant.yml' already exists.


## Loading PyTorch model

We will now use already provided helper functions from repository to initialize the model

In [5]:
from segm.model.factory import load_model

pytorch_model, config = load_model(WEIGHT_PATH)
# put model into eval mode, to set it for inference
pytorch_model.eval()

Segmenter(
  (encoder): VisionTransformer(
    (patch_embed): PatchEmbedding(
      (proj): Conv2d(3, 192, kernel_size=(16, 16), stride=(16, 16))
    )
    (dropout): Dropout(p=0.0, inplace=False)
    (blocks): ModuleList(
      (0): Block(
        (norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
        (norm2): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=192, out_features=576, bias=True)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=192, out_features=192, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
        )
        (mlp): FeedForward(
          (fc1): Linear(in_features=192, out_features=768, bias=True)
          (act): GELU(approximate='none')
          (fc2): Linear(in_features=768, out_features=192, bias=True)
          (drop): Dropout(p=0.0, inplace=False)
        )
        (drop_path): Identity()
      )
      (1): Block(
       