Sequence-to-Sequence Video Object Segmentation

An end-to-end trainable model for object segmentation using convolutional LSTM and VGG-16 Architecture.

Youtube-VOS dataset:

YouTube-VOS is the first large-scale benchmark that supports multiple video object segmentation tasks. with4000+ high-resolution YouTube videos. It can be downloaded from:

https://youtube-vos.org/

our results have been submitted to the leaderboard of the 2018 challenge:

https://competitions.codalab.org/competitions/19544#results

Sample Results:

Input frame:

Network Output:

Run Instructions:

System Requirements:

The script is written in Python 3.5.2 and TensorFlow 1.13.1 for GPU, CPU operation might be possible but is not tested. Required packages are:

tensorflow
yaml
json
time
math
json
cv2
shutil
random
os
numpy
cv2
PIL
shutil

Configurations:

Please navigate to the config.yaml file in the root directory to setup the configurations:

$nano ./config.yaml

The file contents following configurable variables:

configs:
    path: ../dataset/ # Path to the Youtube-VOS dataset downloaded from https://youtube-vos.org/ website>
    checkpoints_path: "./checkpoints/model-1" # Path to the pre-saved checdkpoints if you would like to fine-tune or evaluate.

Training phase:

To initiate the training phase please run the following script:

$cd ./ECCV_Youtube_VOS
$python3 VOS.py --n_epochs=<number of epochs of training> --batch_Size=<size of the mini batch> --lr=<learning rate>

The program will start generating a pre-processed version of the Youtube-VOS dataset for the purpose of training. This copy can be deleted from your drive after training is done.

Evaluation phase:

After training your model you can creat object masks for the set of images in the validation set of Youtube-VOS.

$cd ./ECCV_Youtube_VOS
$python3 VOS_evaluate.py --batch_size=<size of the mini batch> --scenario-name=<the name of the directory to save created object masks>

Notes:

The model typically converges after 80 epochs with learning rate of 1e-5, the loss function is typically similar to the picture below:

Reference:

This is an implementation of the method based on Conv-LSTM and Encoder-Decoder architecture proposed in:

https://arxiv.org/abs/1809.00461

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
networks		networks
utils		utils
README.md		README.md
VOS.py		VOS.py
VOS_evaluate.py		VOS_evaluate.py
config.yaml		config.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

networks

networks

utils

utils

README.md

README.md

VOS.py

VOS.py

VOS_evaluate.py

VOS_evaluate.py

config.yaml

config.yaml

Repository files navigation

Sequence-to-Sequence Video Object Segmentation

Youtube-VOS dataset:

Sample Results:

Input frame:

Network Output:

Run Instructions:

System Requirements:

Configurations:

Training phase:

Evaluation phase:

Notes:

Reference:

About

Releases

Packages

Languages

BehradToghi/ConvLSTM_VOS

Folders and files

Latest commit

History

Repository files navigation

Sequence-to-Sequence Video Object Segmentation

Youtube-VOS dataset:

Sample Results:

Input frame:

Network Output:

Run Instructions:

System Requirements:

Configurations:

Training phase:

Evaluation phase:

Notes:

Reference:

About

Resources

Stars

Watchers

Forks

Languages