[paper] [model card]
- Use anaconda to create a Python 3.8 environment:
conda create -n py38 python3.8
conda activate py38
- Install requirements
pip install -r requirements
An unified and universal RGB-D database for depth representation pre-training.
The script for unifying various RGB-D frames to generate UniRGBD is scripts/rgbd_data.ipynb
. You can download our pre-processed version (split into several parts due to too large size): [HM3D][SceneNet][SUN3D][TUM, DIODE, NYUv2][Evaluation data (with ScanNet)][Outdoor data (from RGBD1K and DIML)]. The access code is tacp
.
Important: HM3D is free for academic, non-commercial research, but requires the access from Mattterport. After getting the access and 3D scenes, you can run scripts/hm3d_data.mp.py
to generate RGB-D frames or download the pre-processed version.
After decompression, the folder structure will be like (there may exist a few redundant folders):
data/rgbd_data/
├── diode_clean_resize
│ └── train
│ ├── indoors
│ └── outdoor
├── hm3d_rgbd
│ └── train
│ ├── 0
│ ├── 1
│ └── ...
├── nyuv2_resize
│ ├── all
│ ├── train
│ └── val
├── pretrain_val
│ ├── diode_val
│ ├── hm3d_val
│ ├── nyuv2_val
│ ├── scannet_val
│ ├── scenenet_val500
│ ├── sun3d_val
│ └── tum_val
├── scenenet_resize
│ └── train
│ ├── 0
│ ├── 1
│ └── ...
├── sun3d
│ ├── train
└── tumrgbd_clean_resize
├── train
Note that all path variables in scripts are absolute, so remember to change them as needed.
You can add arbitrary new data by appending the new folder to _C.DATA.RGBD.data_path
in config/default.py
.
Oringinal data source links: HM3D, SceneNet, SUN3D, TUM, DIODE, NYUv2, ScanNet, RGBD1K and DIML.
train.sh
is used for training on single GPU; multi_proc.sh
is used for training on multiple GPUs. The pre-trained weights will be stored in data/checkpoints
. All configuration files are in the config
folder.
eval.sh
supplies the standard evaluation procedure, including non-shuffle, block-shuffle, shuffle and out-of-domain.
Metrics calculation can be found in trainers/dist_trainer.py
.
The evaluation results will be stored in data/checkpoints/{}/evals
.
For fair comparison, we supply the standard evaluation order files here.
Run generate_eval_order.sh
to compare whether the evaluation orders are the same as ours.
Shuffle Top-1 | Block-shuffle Near-1 | Non-shuffle Near-1 | Out-domain Top-1 | |
---|---|---|---|---|
TAC | 0.974 | 0.642 | 0.603 | 0.850 |
scripts/demo.ipynb
gives a simple demonstration of encoding a depth image. You can also separate the depth encoder apart from the whole model as needed.
scripts/uncertainty.ipynb
: Conduct the MC Dropout uncertainty analysis.scripts/zero_shot.ipynb
: Conduct zero-shot room classification by depth images.scripts/mae
andconfig/v2/v2_mae.yaml
: Train cross-modal masked autoencoder model.config/v2/v2_edge.yaml
: RGBD alignment by Canny edge detection.config/v2/v2_tac_outdoortune.yaml
: Fine-tune model with a few outdoor frames.
Experiment codes are stored in here.
PointNav
VLN
EQA
Rearrange
@ARTICLE{10288539,
author={He, Zongtao and Wang, Liuyi and Dang, Ronghao and Li, Shu and Yan, Qingqing and Liu, Chengju and Chen, Qijun},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
title={Learning Depth Representation from RGB-D Videos by Time-Aware Contrastive Pre-training},
year={2023},
volume={},
number={},
pages={1-1},
doi={10.1109/TCSVT.2023.3326373}}