Code for the paper "Think-Program-reCtify: 3D Situated Reasoning with Large Language Models"
[Project Page] [Paper]
conda create -n llm-tpc python=3.9 -y
conda activate llm-tpc
pip install openai==0.28 numpy scikit-learn matplotlib omegaconf torch torch_redstone einops tqdm open_clip_torch trimesh plyfile shapely
pip install dgl-cu113 -f https://data.dgl.ai/wheels/repo.html
Organize the data as follows in data
.
data
├── openshape
│ ├── model.pt
│ └── open_clip_pytorch_model.bin
├── qa
│ └── SQA_test.json
├── scans
│ ├── scene0000_00
│ │ ├── scene0000_00_vh_clean_2.0.010000.segs.json
│ │ ├── scene0000_00_vh_clean_2.labels.ply
│ │ ├── scene0000_00_vh_clean_2.ply
│ │ ├── scene0000_00.aggregation.json
│ │ └── scene0000_00.txt
│ └── ...
└── scannetv2-labels.combined.tsv
To acquire the access to ScanNet dataset, please refer to ScanNet and follow the instructions there. You will get a download-scannet.py
script after your request for the ScanNet dataset is approved. Use the commands below to download the portion of ScanNet that is necessary for LLM-TPC:
python download-scannet.py -o data --type _vh_clean_2.0.010000.segs.json
python download-scannet.py -o data --type _vh_clean_2.labels.ply
python download-scannet.py -o data --type _vh_clean_2.ply
python download-scannet.py -o data --type .aggregation.json
python download-scannet.py -o data --type .txt
Download the question-answer pairs from SQA3D and put SQA_test.json
under data/qa
.
We use the pointbert-vitg14-rgb and OpenCLIP ViT-bigG-14 checkpoint from OpenShape.
Download model.pt
from here and open_clip_pytorch_model.bin
from here. Put them under data/openshape
.
cd scripts
# Input your OPENAI_API_KEY in 'llm-tpc/config.json'
python example.py --agent llm-tpc/config.json
cd scripts
python eval.py --log_dir ../logs/test/llm-tpc
cd src/dataset
python visualize_bbox.py
- Agents: the codebase we built upon.
- ReferIt3D: we design APIs for spacial relation recognition based on ReferIt3D.
- OpenShape: we design APIs for open-vocabulary object attribute classification based on OpenShape.
- ScanRefer: code for visualization.
@article{qingrong2024llm-tpc,
title={Think-Program-reCtify: 3D Situated Reasoning with Large Language Models},
author={Qingrong He and Kejun Lin and Shizhe Chen and Anwen Hu and Qin Jin},
journal={arXiv preprint arXiv:2404.14705},
year={2024}
}