Loci-Segmented: Improving Scene Segmentation Learning

TL;DR: Introducing Loci-Segmented, an extension to Loci, with a dynamic background module. Demonstrates over 32% relative IoU improvement to SOTA on the MOVi dataset.

loci-seg-03.mp4

Requirements

A suitable conda environment named loci-s can be created and activated with:

conda env create -f environment.yml
conda activate loci-s

Dataset and trained models

Preprocessed datasets together with model checkpoints can be found here

Reproducing the results from the paper

Make sure you download all necessary datasets and model checkpoints. To reproduce the MOVi results run:

run-movi-evalulation.sh
python eval-movi.py

To reproduce the evaluation on the datasets presented in the review paper on "Compositional scene representation learning via reconstruction: A survey" run:

run-review.sh
process-review.sh
python eval-review.py

Use your own data

We provide a example dataset creating script that you can adjust to your needs.

You can also inspect any compatible dataset using our Dataset Viewer

data/plot_hdf5.py <dataset>.hdf5

Training Guide

Our training pipeline employs multi-GPU configurations and extensive pretraining to accelerate model convergence. Specifically, we use a single node with 8 x GTX1080 GPUs for the pretraining phase, and a single node with 8 x A100 GPUs for the final Loci-s training. Below are the details for each stage of the training pipeline.

Note: The following examples use a single GPU setup, which is suboptimal for performance. Multi-GPU configurations are highly recommended.

Pretraining Phases

Decoder Pretraining

Pretrain individual decoders for mask, depth, and RGB using the following commands:

python -m model.main -cfg configs/pretrain-mask-decoder.json --pretrain-objects --single-gpu
python -m model.main -cfg configs/pretrain-depth-decoder.json --pretrain-objects --single-gpu
python -m model.main -cfg configs/pretrain-rgb-decoder.json --pretrain-objects --single-gpu

Encoder-Decoder Pretraining

Pretrain the Loci encoder with already pretrained mask, depth, and RGB decoders:

python -m model.main -cfg configs/pretrain-encoder-decoder-stage1.json --pretrain-objects --single-gpu --load-mask <mask-decoder>.ckpt --load-depth <depth-decoder>.ckpt --load-rgb <rgb-decoder>.ckpt

For a version that utilizes depth as an input feature, append -depth to the config name.

Hyper-Network Pretraining

Execute three passes through the encoder-decoder architecture to train the internal hyper-networks:

python -m model.main -cfg configs/pretrain-encoder-decoder-stage2.json --pretrain-objects --single-gpu --load-stage1 <encoder-decoder>.ckpt

Background Module Pretraining

Train the background module:

python -m model.main -cfg configs/pretrain-background.json --pretrain-bg --single-gpu

Final Training: Loci-s

Execute full-scale training for Loci-s:

python -m model.main -cfg configs/loci-s.json --train --single-gpu --load-objects <encoder-decoder>.ckpt --load-bg <background>.ckpt

Visualization Guide

Generate visualizations to inspect the model at various stages of pretraining and during the final phase.

Pretraining Visualizations

To visualize individual components like mask, depth, RGB, objects, or background during pretraining:

python -m model.main -cfg <config> --save-<mask|depth|rgb|objects|bg> --single-gpu --add-text --load <checkpoint>.ckpt

Final Model Visualizations

For visualizing the fully trained Loci-s model:

python -m model.main -cfg <config> --save --single-gpu --add-text --load <checkpoint>.ckpt

Note: To visualize using the segmentation pretraining network, append the --load-proposal flag followed by the corresponding checkpoint:

--load-proposal <proposal>.ckpt

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
configs		configs
data		data
model		model
nn		nn
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
eval-movi.py		eval-movi.py
eval-review.py		eval-review.py
process-review.sh		process-review.sh
run-movi-evalulation.sh		run-movi-evalulation.sh
run-review.sh		run-review.sh

License

CognitiveModeling/Loci-Segmented

Folders and files

Latest commit

History

Repository files navigation

Loci-Segmented: Improving Scene Segmentation Learning

Requirements

Dataset and trained models

Reproducing the results from the paper

Use your own data

Training Guide

Pretraining Phases

Final Training: Loci-s

Visualization Guide

Pretraining Visualizations

Final Model Visualizations

About

Resources

License

Stars

Watchers

Forks

Languages