Compositional Video Prediction

Yufei Ye, Maneesh Singh, Abhinav Gupta*, and Shubham Tulsiani*

Given an initial frame, the task is to predict the next few frames in pixel level. The key insight is that a scene is comprised of distinct entities that undergo joint motions. To operationalize this idea, we propose Compositional Video Prediction (CVP), which consists of three main modules:

Entity Predictor: predicts per-entity representation;
Frame Decoder: generate pixels given entity-level representation;
Encoder: generate latent variables to account for multi-modality.

They jointly give us highly encouraging results compared to baseline methods as shown above.

This code repo is a re-implementation of the ICCV19 paper Compositional Video Prediciton. The code is developed based on Pytorch framework. It also integrates LPIPS for quantitative evaluation.

Citation

If you find this work useful, please use the following BibTeX entry.

@article{ye2019cvp,
  title={Compositional Video Prediction},
  author={Ye, Yufei and Singh, Maneesh  and Gupta, Abhinav and Tulsiani, Shubham},
  year={2019},
  booktitle={International Conference on Computer Vision (ICCV)}
}

Setup Repo

The code was developed by Python 3.6 and PyTorch 0.4.

git clone git@github.com:JudyYe/CVP.git

Demo: Predict video with pretrained model

mkdir -p models/ && wget -O models/ours.pth -L https://www.dropbox.com/s/p8y4p8xngoh467y/ours.pth?dl=0 
python demo.py --checkpoint models/ours.pth

The command above downloads our pretrained model. Then it hallucinates several videos (due to uncertainty) for each image under examples/. It should generates results similar to one column of the one in our website. Each row corresponds to one possible future. Please note:

You can download other pretrain-models including baselines from here.
Feel free to add flag --test_mod multi_${N} to generate N number of diverse futures.

python demo.py --checkpoint ${MODEL_PATH} --test_mod multi_2

Set up Dataset

Before training models on your own or evaluating them quantitatively, you need to set up dataset first. In the paper, results on two datasets are provided: the synthetic dataset Shapestacks and PennAction.

For a quick setup of ready-to-go data for Shapestacks, download and link to data/shapestacks/

 cd ${FOLDER_TO_SAVE_DATA}
 wget -O ss3456_render.tar.gz -L https://www.dropbox.com/s/6jllu13yqwrnql8/ss3456_render.tar.gz?dl=0 && tar xzf ss3456_render.tar.gz 
 ln -s ${FOLDER_TO_SAVE_DATA}/shapestacks data/shapestacks

Please read Dataset.md for further explanation about data format together with how to generate and preprocess the data.

Quantitative Evaluation

The best scores among K (K=100) samples are recorded. (See paper for further explanation.) The quality of frame is evaluated based on code repo LPIPS.

python test.py --checkpoint ${PATH_TO_MODEL} --test_mod best_100 --dataset ss3

The models are trained with 3 blocks in Shapestacks. Substitute ss3 with ss4 (or ss5, ss6) to evaluate how model generalizes to more blocks:

python test.py --checkpoint ${PATH_TO_MODEL} --test_mod best_100 --dataset ss4

Train your own model

The model and logs will be saved to output/. To train our model, simply run

python train.py --gpu ${GPU_ID}

We have provided code to reimplement baselines to ablate predictor, decoder, and encoder correspondingly. Please see Baseline.md for further details.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
PerceptualSimilarity		PerceptualSimilarity
cfgs		cfgs
cvp		cvp
data		data
docs		docs
examples		examples
utils		utils
.gitignore		.gitignore
README.md		README.md
_init_path.py		_init_path.py
demo.py		demo.py
test.py		test.py
train.py		train.py

JudyYe/CVP

Folders and files

Latest commit

History

Repository files navigation

Compositional Video Prediction

Citation

Setup Repo

Demo: Predict video with pretrained model

Set up Dataset

Quantitative Evaluation

Train your own model

About

Resources

Stars

Watchers

Forks

Languages