`RGBD-pretraining in DFormer`

Authors: Bowen Yin, Xuying Zhang, Zhongyu Li, Li Liu, Ming-Ming Cheng, Qibin Hou*

This repository provides the RGBD pretraining code of '[ICLR 2024] DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation'. Our implementation is modified from the timm repository. If there are any questions, please let me know via raising issues or e-mail (bowenyin@mail.nankai.edu.cn).

1. 🚀 Get Start

1.1. Install

Enviroment requirement: Pyotrch & timm

If you have installed the dformer enviroment in our main repository, you can only additionally install timm.

conda create -n RGBD_Pretrain python=3.10 -y
conda activate RGBD_Pretrain
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install timm fvcore

If the above pipeline not work, You can also install following the timm.

1.2. Prepare Datasets

First, you need to prepare the ImageNet-1k dataset. We share the depth maps for the ImageNet-1K (20.4G) in the following links:

Baidu Netdisk	OneDrive

If the share links have any questions, please let me know (bowenyin@mail.nankai.edu.cn). Then, create the soft links:

ln -s path_to_imagenet datasets/ImageNet
ln -s path_to_imagenet_depth_maps datasets/Depth_ImageNet

2. 🚀 Train.

bash train.sh

After training, the checkpoints will be saved in the path `outputs/XXX', where the XXX is depends on the training config.

Then, the pretrained checkpoint is endowed with the capacity to encode the RGBD represetions and can be applied to various RGBD tasks.

We invite all to contribute in making this project and RGBD representation learning more acessible and useful. If you have any questions or suggestions about our work, feel free to contact me via e-mail (bowenyin@mail.nankai.edu.cn) or raise an issue.

Reference

@article{yin2023dformer,
  title={DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation},
  author={Yin, Bowen and Zhang, Xuying and Li, Zhongyu and Liu, Li and Cheng, Ming-Ming and Hou, Qibin},
  journal={arXiv preprint arXiv:2309.09668},
  year={2023}
}

Acknowledgment

Our implementation is mainly based on timm. The depth maps are generated by Omnidata. Thanks for their authors.

License

Code in this repo is for non-commercial use only.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
distributed_train.sh		distributed_train.sh
train.py		train.py
train.sh		train.sh
val.sh		val.sh
validate.py		validate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

models

models

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

distributed_train.sh

distributed_train.sh

train.py

train.py

train.sh

train.sh

val.sh

val.sh

validate.py

validate.py

Repository files navigation

`RGBD-pretraining in DFormer`

1. 🚀 Get Start

2. 🚀 Train.

Reference

Acknowledgment

License

About

Releases

Packages

Languages

License

VCIP-RGBD/RGBD-Pretrain

Folders and files

Latest commit

History

Repository files navigation

RGBD-pretraining in DFormer

1. 🚀 Get Start

2. 🚀 Train.

Reference

Acknowledgment

License

About

Resources

License

Stars

Watchers

Forks

Languages

`RGBD-pretraining in DFormer`