Generalizing Single-View 3D Shape Retrieval to Occlusions and Unseen Objects

Qirui Wu, Daniel Ritchie, Manolis Savva, Angel Xuan Chang

Official repository of the paper Generalizing Single-View 3D Shape Retrieval to Occlusions and Unseen Objects. We systematically study the generalization of single-view 3D shape retrieval along three different axes: the presence of object occlusions and truncations, generalization to unseen 3D shape data, and generalization to unseen objects in the input images.

Setup

The environment is tested with Python 3.8, PyTorch 2.0, CUDA 11.7, PyTorch3D 0.7.3, Lightning 2.0.1.

conda create -n gcmic python=3.8
conda activate gcmic
pip3 install torch torchvision
pip install -r requirements.txt
conda install -c fvcore -c iopath -c bottler -c conda-forge fvcore iopath nvidiacub
pip install "git+https://github.com/facebookresearch/pytorch3d.git@v0.7.3"

Data

MOOS

Multi-Object Occlusion Scenes (MOOS) is generated using a heuristic algorithm that iteratively places newly sampled 3D shapes from 3D-FUTURE into the existing layout. Download MOOS raw and preprocessed data with the following command and extract/place them at ./data/moos.

cd data/moos 
wget https://aspis.cmpt.sfu.ca/projects/gcmic/data/moos/scenes.zip && unzip scenes.zip
wget https://aspis.cmpt.sfu.ca/projects/gcmic/data/moos/moos_annotations.zip && unzip moos_annotations.zip
wget https://aspis.cmpt.sfu.ca/projects/gcmic/data/moos/moos_h5.tar && tar -xvf moos_h5.tar

The data files should be organized as follows:

gcmic
├── data
│   ├── moos
│   │   ├── scenes # raw image data
│   │   │   ├── <scene_name>
│   │   │   │   ├── rgb
│   │   │   │   │   ├── rgb_<view_id>.rgb.png
│   │   │   │   ├── instances
│   │   │   │   │   ├── instances_<view_id>.rgb.png
│   │   │   │   ├── objects
│   │   │   │   │   ├── <obj_id>_<view_id>.rgb.png
│   │   │   │   │   ├── <obj_id>_<view_id>.mask.png
│   │   │   │   ├── depth
│   │   │   │   ├── normal
│   │   │   │   ├── layout2d.png # top-down view
│   │   │   │   ├── scene.json # scene metadata
│   │   ├── moos_annotation.txt
│   │   ├── moos_annotation_all.txt
│   │   ├── moos_annotation_no_occ.txt # annotation file containing object queries w/o occlusions
│   │   ├── moos_annotation_occ.txt # annotation file containing object queries w/ occlusions
│   │   ├── moos_1k.h5 # image queries
│   │   ├── moos_mv.h5 # multiviews for each shape
│   │   ├── moos_obj.h5 # pointcloud for each shape
│   │   ├── lfd_200.h5 # 200-view LFD for each shape
│   │   ├── moos_pose.json # object pose info for rendering
│   │   ├── ...

Please refer to ./preprocess/moos/gen_dataset_hdf5.py, ./preprocess/3dfuture/get_all_lfd.py and ./preprocess/moos/extract_pose_json.py for how to prepare preprocessed data. Please refer to 3D-FUTURE for downloading 3D shapes if you want to render your own shape multiviews and LFDs. Put 3D-FUTURE data under ./data/3dfuture.

We generate 10K scenes with the script ./preprocess/moos/render_scenes.py. Also note that we can reconstruct each scene by reading meta information from scene.json (run ./preprocess/moos/reconstruct_scenes.py). You can explore more demos of how to generate random scenes in ./notebook.

Pix3D

Download Pix3D raw data here, and preprocessed data with the following command and extract/place them at ./data/pix3d.

cd data/pix3d
wget https://aspis.cmpt.sfu.ca/projects/gcmic/data/pix3d/mask2former.zip && unzip mask2former.zip
wget https://aspis.cmpt.sfu.ca/projects/gcmic/data/pix3d/pix3d_annotations.zip && unzip pix3d_annotations.zip
wget https://aspis.cmpt.sfu.ca/projects/gcmic/data/pix3d/pix3d_h5.tar && tar -xvf pix3d_h5.tar

Please refer to details for Pix3D data structure.

Scan2CAD

Download ScanNet25K images and CAD annotations from ROCA data, and preprocessed data with the following command and extract/place them at ./data/scan2cad.

cd data/scan2cad
wget https://aspis.cmpt.sfu.ca/projects/gcmic/data/scan2cad/mask.zip && unzip mask.zip
wget https://aspis.cmpt.sfu.ca/projects/gcmic/data/scan2cad/scan2cad_annotations.zip && unzip scan2cad_annotations.zip
wget https://aspis.cmpt.sfu.ca/projects/gcmic/data/scan2cad/scan2cad_h5.tar && tar -xvf scan2cad_h5.tar

Please refer to details for Scan2CAD data structure. Download ShapeNet 3D shapes here if you want to render your own shape multiviews and LFDs.

Train

Train a CMIC model on the ALL set of MOOS.

python train.py -t train -e cmic_moos --data_conf conf/dataset/moos.yaml --model_conf conf/model/cmic.yaml --epochs 50 --batch_size 64 --num_views 12 --verbose False --annotation_file moos_annotation_all.txt --use_crop --use_1k_img

Train a CMIC model on the ALL set of Pix3D using Mask2Former predicted object masks.

python train.py -t train -e cmic_pix3d --data_conf conf/dataset/pix3d.yaml --model_conf conf/model/cmic.yaml --epochs 500 --batch_size 64 --num_views 12 --verbose False --annotation_file pix3d_annotation_all.txt --mask_source m2f_mask --val_check_interval 1 --use_crop

Train a CMIC model on Scan2CAD.

python train.py -t train -e cmic_scan2cad --data_conf conf/dataset/scan2cad.yaml --model_conf conf/model/cmic.yaml --epochs 500 --batch_size 64 --num_views 12 --verbose False --annotation_file scan2cad_annotation.txt --val_check_interval 1 --num_sanity_val_steps 100 --use_crop --use_480p_img --center_in_image

Fine-tune

Fine-tune cmic_moos on Pix3D

python train.py -t finetune -e cmic_moos_ft_pix3d --data_conf conf/dataset/pix3d.yaml --model_conf conf/model/cmic.yaml --epochs 5 --batch_size 64 --num_views 12 --verbose False --annotation_file pix3d_annotation_all.txt --mask_source m2f_mask --ckpt_path ./output/moos/cmic/cmic_moos/train/model.ckpt --val_check_interval 1 --use_crop

Fine-tune cmic_moos on Scan2CAD

python train.py -t finetune -e cmic_moos_ft_scan2cad --data_conf conf/dataset/scan2cad.yaml --model_conf conf/model/cmic.yaml --epochs 5 --batch_size 64 --num_views 12 --verbose False --annotation_file scan2cad_annotation.txt --ckpt_path ./output/moos/cmic/cmic_moos/train/model.ckpt --val_check_interval 1 --use_crop --num_sanity_val_steps 400 --use_480p_img --center_in_image

Evaluation

We first embed all shape multiviews from different datasets (MOOS, Pix3D, and Scan2CAD) using the specified pretrained shape encoder.

python test.py -t embed_shape -e <model_name> --data_conf conf/dataset/<dataset>.yaml --model_conf conf/model/cmic.yaml --batch_size 48 --num_views 12 --verbose False --ckpt model.ckpt

Evaluate on all|seen|unseen objects of different MOOS sets all|no_occ|occ.

python test.py -t test -e cmic_moos --data_conf conf/dataset/moos.yaml --model_conf conf/model/cmic.yaml --verbose False --batch_size 48 --ckpt model.ckpt --annotation_file moos_annotation_<all|no_occ|occ>.txt --offline_evaluation --test_objects <all|seen|unseen> --use_crop --use_1k_img

Evaluate on all|seen|unseen objects of different Pix3D sets all|easy|hard.

python test.py -t test -e cmic_pix3d --data_conf conf/dataset/pix3d.yaml --model_conf conf/model/cmic.yaml --verbose False --batch_size 48 --mask_source m2f_mask --ckpt model.ckpt --annotation_file pix3d_annotation_<all|easy|hard>.txt --offline_evaluation --test_objects <all|seen|unseen> --use_crop

Evaluate on the Scan2CAD dataset.

python test.py -t test -e cmic_scan2cad --data_conf conf/dataset/scan2cad.yaml --model_conf conf/model/cmic.yaml --verbose False --batch_size 48 --ckpt model.ckpt --annotation_file scan2cad_annotation.txt --offline_evaluation --use_crop --use_480p_img

Note

Add flags --not_eval_acc --shape_feats_source <dataset> to test on unseen 3D shapes.
Add flag --save_eval_vis to save retrieved 3D shape renderings and visualizations.

Acknowledgement

We thank Lewis Lin for helping developing the metric code in the early stage of the project.

Bibtex

@article{wu2023generalizing,
    author  = {Wu, Qirui and Ritchie, Daniel and Savva, Manolis and Chang, Angel.X},
    title   = {{Generalizing Single-View 3D Shape Retrieval to Occlusions and Unseen Objects}},
    year    = {2023},
    eprint  = {2401.00405},
    archivePrefix   = {arXiv},
    primaryClass    = {cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
conf		conf
data		data
docs		docs
gcmic		gcmic
notebook		notebook
preprocess		preprocess
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

3dlg-hcvc/generalizing_shape_retrieval

Folders and files

Latest commit

History

Repository files navigation

Generalizing Single-View 3D Shape Retrieval to Occlusions and Unseen Objects

Setup

Data

MOOS

Pix3D

Scan2CAD

Train

Fine-tune

Evaluation

Acknowledgement

Bibtex

About

Resources

Stars

Watchers

Forks

Languages