Skip to content

RavenKiller/TAC

Repository files navigation

Learning Depth Representation from RGB-D Videos by Time-Aware Contrastive Pre-training

[paper] [model card]

Setup environments

  1. Use anaconda to create a Python 3.8 environment:
conda create -n py38 python3.8
conda activate py38
  1. Install requirements
pip install -r requirements

UniRGBD dataset

An unified and universal RGB-D database for depth representation pre-training.

drawing

The script for unifying various RGB-D frames to generate UniRGBD is scripts/rgbd_data.ipynb. You can download our pre-processed version (split into several parts due to too large size): [HM3D][SceneNet][SUN3D][TUM, DIODE, NYUv2][Evaluation data (with ScanNet)][Outdoor data (from RGBD1K and DIML)]. The access code is tacp.

Important: HM3D is free for academic, non-commercial research, but requires the access from Mattterport. After getting the access and 3D scenes, you can run scripts/hm3d_data.mp.py to generate RGB-D frames or download the pre-processed version.

After decompression, the folder structure will be like (there may exist a few redundant folders):

data/rgbd_data/
├── diode_clean_resize
│   └── train
│       ├── indoors
│       └── outdoor
├── hm3d_rgbd
│   └── train
│       ├── 0
│       ├── 1
│       └── ...
├── nyuv2_resize
│   ├── all
│   ├── train
│   └── val
├── pretrain_val
│   ├── diode_val
│   ├── hm3d_val
│   ├── nyuv2_val
│   ├── scannet_val
│   ├── scenenet_val500
│   ├── sun3d_val
│   └── tum_val
├── scenenet_resize
│   └── train
│       ├── 0
│       ├── 1
│       └── ...
├── sun3d
│   ├── train
└── tumrgbd_clean_resize
    ├── train

Note that all path variables in scripts are absolute, so remember to change them as needed. You can add arbitrary new data by appending the new folder to _C.DATA.RGBD.data_path in config/default.py.

Oringinal data source links: HM3D, SceneNet, SUN3D, TUM, DIODE, NYUv2, ScanNet, RGBD1K and DIML.

Run pre-training

train

train.sh is used for training on single GPU; multi_proc.sh is used for training on multiple GPUs. The pre-trained weights will be stored in data/checkpoints. All configuration files are in the config folder.

evaluate

eval.sh supplies the standard evaluation procedure, including non-shuffle, block-shuffle, shuffle and out-of-domain. Metrics calculation can be found in trainers/dist_trainer.py. The evaluation results will be stored in data/checkpoints/{}/evals.

check evaluation order

For fair comparison, we supply the standard evaluation order files here. Run generate_eval_order.sh to compare whether the evaluation orders are the same as ours.

Evaluation performance

Shuffle Top-1 Block-shuffle Near-1 Non-shuffle Near-1 Out-domain Top-1
TAC 0.974 0.642 0.603 0.850

Customize usage

scripts/demo.ipynb gives a simple demonstration of encoding a depth image. You can also separate the depth encoder apart from the whole model as needed.

Pretrained weight

[Checkpoint]

Extended experiments

  1. scripts/uncertainty.ipynb: Conduct the MC Dropout uncertainty analysis.
  2. scripts/zero_shot.ipynb: Conduct zero-shot room classification by depth images.
  3. scripts/mae and config/v2/v2_mae.yaml: Train cross-modal masked autoencoder model.
  4. config/v2/v2_edge.yaml: RGBD alignment by Canny edge detection.
  5. config/v2/v2_tac_outdoortune.yaml: Fine-tune model with a few outdoor frames.

Embodied experiments

Experiment codes are stored in here.

Visualization

PointNav

image

VLN

image

EQA

image

Rearrange

image

Citation

@ARTICLE{10288539,
  author={He, Zongtao and Wang, Liuyi and Dang, Ronghao and Li, Shu and Yan, Qingqing and Liu, Chengju and Chen, Qijun},
  journal={IEEE Transactions on Circuits and Systems for Video Technology}, 
  title={Learning Depth Representation from RGB-D Videos by Time-Aware Contrastive Pre-training}, 
  year={2023},
  volume={},
  number={},
  pages={1-1},
  doi={10.1109/TCSVT.2023.3326373}}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published