Tutorial: Semantic Segmentation

This guide compares a fine-tuned SegFormer with a fine-tuned variant of the 'The devil is in the labels' paper from Yin et al. on a semantic segmentation task. The dataset used is the CMP Facade Database.

The main framework we will utilize is Huggingface 🤗 transformers, an open-source library that offers easy-to-use implementations of state-of-the-art models. Baseline models are made publicly available on the HuggingFace hub, the largest open-source catalog of models and datasets, which includes the used Segformer baseline.

Semantic segmentation is the task of classifying each pixel in an image to a corresponding category. It has a wide range of use cases in fields such as medical imaging, autonomous driving, robotics, etc. For the facade dataset we are interested to classify the front-view of buildings based on 12 distinct classes. The classes are as follows: facade, molding, cornice, pillar, window, door, sill, blind, balcony, shop, deco, and background.

In 2014, Long et al. published a fundamental paper that used convolutional neural networks for semantic segmentation. More recent successes in computer vision include the usage of Transformers, therefore also the usage in image classification tasks (i.e. ViT). In this turn, Transformers have also been used for semantic segmentation, demonstrating excellent performance on several widely used datasets. We will specifically look at the SegFormer, which is a state-of-the-art model architecture for semantic segmentation introduced in 2021. It has a hierarchical Transformer encoder that does not rely on positional encodings and a simple multi-layer perceptron decoder. In this case, we will use SegFormer to classify street view images of buildings.

More, recently, Yin et al. introduced a novel approach to semantic segmentation that achieves state-of-the-art performance in a zero-shot setting. This means that the model is able to achieve results equivalent to supervised methods on various semantic segmentation datasets, without being trained on these datasets. The approach involves replacing class labels with vector-valued embeddings of short paragraphs that describe the class. This allows for the merging of multiple datasets with different class labels and semantics, resulting in a merged dataset of over 2 million images. The resulting model achieves performance equal to state-of-the-art supervised methods on 7 benchmark datasets, and even outperforms methods when fine-tuned on standard semantic segmentation datasets.

Installing requirements

Create a conda environment:

conda env create -f environment.yml

Load environment:

conda activate cmp

Fine-Tune a Model

Simply call the train.sh file:

./train.sh

References

This repository contains the source code based on the repository of Yin et al. with their publication Wei Yin, Yifan Liu, Chunhua Shen, Anton van den Hengel, Baichuan Sun, The devil is in the labels: Semantic segmentation from sentences and the HuggingFace SegFormer fine-tuning tutorial.

To obtain the weights go to:

Embedding: https://cloudstor.aarnet.edu.au/plus/s/gXaGsZyvoUwu97t

CKPT: https://cloudstor.aarnet.edu.au/plus/s/AtYYaVSVVAlEwve

To see the full tutorial open the Fine_tune_SS_Model.ipynb.

Have fun! :)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.vscode		.vscode
configs		configs
dataset		dataset
scripts		scripts
test_imgs		test_imgs
utils		utils
.gitignore		.gitignore
Fine_Tune_SS_Model.ipynb		Fine_Tune_SS_Model.ipynb
LICENSE		LICENSE
README.md		README.md
config.json		config.json
environment.yml		environment.yml
run.sh		run.sh
setup.py		setup.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tutorial: Semantic Segmentation

Installing requirements

Fine-Tune a Model

References

About

Releases

Packages

Languages

License

Xpitfire/SSIW

Folders and files

Latest commit

History

Repository files navigation

Tutorial: Semantic Segmentation

Installing requirements

Fine-Tune a Model

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages