FlowFormer: A Transformer Architecture for Optical Flow
Zhaoyang Huang*, Xiaoyu Shi*, Chao Zhang, Qiang Wang, Ka Chun Cheung, Hongwei Qin, Jifeng Dai, Hongsheng Li
ECCV 2022
Our FlowFormer++ and VideoFlow are accepted by CVPR and ICCV, which ranks 2nd and 1st on the Sintel benchmark! Please also refer to our FlowFormer++ and VideoFlow.
- Code release (2022-8-1)
- Models release (2022-8-1)
Similar to RAFT, to evaluate/train FlowFormer, you will need to download the required datasets.
- FlyingChairs
- FlyingThings3D
- Sintel
- KITTI
- HD1K (optional)
By default datasets.py
will search for the datasets in these locations. You can create symbolic links to wherever the datasets were downloaded in the datasets
folder
├── datasets
├── Sintel
├── test
├── training
├── KITTI
├── testing
├── training
├── devkit
├── FlyingChairs_release
├── data
├── FlyingThings3D
├── frames_cleanpass
├── frames_finalpass
├── optical_flow
conda create --name flowformer
conda activate flowformer
conda install pytorch=1.6.0 torchvision=0.7.0 cudatoolkit=10.1 matplotlib tensorboard scipy opencv -c pytorch
pip install yacs loguru einops timm==0.4.12 imageio
The script will load the config according to the training stage. The trained model will be saved in a directory in logs
and checkpoints
. For example, the following script will load the config configs/default.py
. The trained model will be saved as logs/xxxx/final
and checkpoints/chairs.pth
.
python -u train_FlowFormer.py --name chairs --stage chairs --validation chairs
To finish the entire training schedule, you can run:
./run_train.sh
We provide models trained in the four stages. The default path of the models for evaluation is:
├── checkpoints
├── chairs.pth
├── things.pth
├── sintel.pth
├── kitti.pth
├── flowformer-small.pth
├── things_kitti.pth
flowformer-small.pth is a small version of our flowformer. things_kitti.pth is the FlowFormer# introduced in our supplementary, used for KITTI training set evaluation.
The model to be evaluated is assigned by the _CN.model
in the config file.
Evaluating the model on the Sintel training set and the KITTI training set. The corresponding config file is configs/things_eval.py
.
# with tiling technique
python evaluate_FlowFormer_tile.py --eval sintel_validation
python evaluate_FlowFormer_tile.py --eval kitti_validation --model checkpoints/things_kitti.pth
# without tiling technique
python evaluate_FlowFormer.py --dataset sintel
with tile | w/o tile | |
---|---|---|
clean | 0.94 | 1.01 |
final | 2.33 | 2.40 |
Evaluating the small version model. The corresponding config file is configs/small_things_eval.py
.
# with tiling technique
python evaluate_FlowFormer_tile.py --eval sintel_validation --small
# without tiling technique
python evaluate_FlowFormer.py --dataset sintel --small
with tile | w/o tile | |
---|---|---|
clean | 1.21 | 1.32 |
final | 2.61 | 2.68 |
Generating the submission for the Sintel and KITTI benchmarks. The corresponding config file is configs/submission.py
.
python evaluate_FlowFormer_tile.py --eval sintel_submission
python evaluate_FlowFormer_tile.py --eval kitti_submission
Visualizing the sintel dataset:
python visualize_flow.py --eval_type sintel --keep_size
Visualizing an image sequence extracted from a video:
python visualize_flow.py --eval_type seq
The default image sequence format is:
├── demo_data
├── mihoyo
├── 000001.png
├── 000002.png
├── 000003.png
.
.
.
├── 001000.png
FlowFormer is released under the Apache License
@article{huang2022flowformer,
title={{FlowFormer}: A Transformer Architecture for Optical Flow},
author={Huang, Zhaoyang and Shi, Xiaoyu and Zhang, Chao and Wang, Qiang and Cheung, Ka Chun and Qin, Hongwei and Dai, Jifeng and Li, Hongsheng},
journal={{ECCV}},
year={2022}
}
@inproceedings{shi2023flowformer++,
title={Flowformer++: Masked cost volume autoencoding for pretraining optical flow estimation},
author={Shi, Xiaoyu and Huang, Zhaoyang and Li, Dasong and Zhang, Manyuan and Cheung, Ka Chun and See, Simon and Qin, Hongwei and Dai, Jifeng and Li, Hongsheng},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={1599--1610},
year={2023}
}
@article{huang2023flowformer,
title={FlowFormer: A Transformer Architecture and Its Masked Cost Volume Autoencoding for Optical Flow},
author={Huang, Zhaoyang and Shi, Xiaoyu and Zhang, Chao and Wang, Qiang and Li, Yijin and Qin, Hongwei and Dai, Jifeng and Wang, Xiaogang and Li, Hongsheng},
journal={arXiv preprint arXiv:2306.05442},
year={2023}
}
In this project, we use parts of codes in: