Chaoqi Chen*
Qile Xu*
Wenjun Zhou
Hui Huang†
Visual Computing Research Center (VCC), Shenzhen University
*Equal contribution †Corresponding author
This is the official repository for the ACM SIGGRAPH 2026 paper "PointLLM-R: Enhancing 3D Point Cloud Reasoning via Chain-of-Thought".
Existing 3D multimodal large language models (MLLMs) produce answers via direct question-to-answer mapping, without explicit intermediate reasoning — making their outputs uninterpretable and brittle on complex queries involving multi-hop inference, functional attribute judgment, or commonsense integration.
We address this by proposing a two-stage data generation framework that automatically produces high-quality Chain-of-Thought (CoT) annotations for 3D point cloud QA. The first stage refines raw QA pairs through multi-dimensional quality assessment. The second stage employs HiLPO (Human-in-the-Loop Prompt Optimization), an iterative prompt refinement mechanism that guides an LLM to generate structured, step-by-step rationales grounded in 3D geometry. Using this pipeline, we construct PoCoTI, the first large-scale 3D point cloud dataset with CoT annotations (~55K samples). Fine-tuning PointLLM on PoCoTI yields PointLLM-R-7B, a model that generates verifiable reasoning paths before answering, outperforming all baselines — including 13B-parameter models — on generative 3D classification and captioning benchmarks.
This repo is built on top of PointLLM. Installation, environment setup, Objaverse data preparation, inference, and standard evaluation all follow PointLLM's instructions. This README focuses on what is new: the PoCoTI dataset, the fine-tuning procedure, and differences in evaluation.
Follow the PointLLM installation guide. Requirements are identical.
pip install -e .Follow PointLLM's data preparation instructions to download the Objaverse colored point cloud files. The default expected path is ./data/objaverse_data.
PoCoTI contains ~55K point cloud QA pairs, each with a structured 5-step CoT annotation. Download from HuggingFace:
huggingface-cli download QileXu/PoCoTI-55K \
--repo-type dataset \
--local-dir ./data/anno_dataPlace the file at ./data/anno_data/PoCoTI_55k.json. Each entry has the following structure:
{
"object_id": "<objaverse_object_id>",
"conversation_type": "single_round",
"conversations": [
{
"from": "human",
"value": "<point>\n<question>"
},
{
"from": "gpt",
"value": "<REASONING>\nStep 1: ...\nStep 2: ...\nStep 3: ...\nStep 4: ...\nStep 5: ...\n</REASONING>\n<ANSWER> ... </ANSWER>"
}
]
}object_id maps to the Objaverse point cloud files in ./data/objaverse_data.
bash scripts/finetune_CoT.shTo skip fine-tuning, download PointLLM-R-7B directly:
huggingface-cli download QileXu/PointLLM-R-7BEvaluation scripts and metrics follow PointLLM. Three benchmarks are supported:
| Benchmark | Script | Split | GT Annotations |
|---|---|---|---|
| Objaverse (classification + captioning) | scripts/eval/objaverse.sh |
3,000 val objects | Provided by PointLLM |
| ModelNet40 (zero-shot classification) | scripts/eval/modelnet40_cls.sh |
2,468 test objects | Provided by PointLLM |
| OmniObject3D (zero-shot classification) | scripts/eval/omniobject3d.sh |
5,989 val objects | QileXu/OmniObject3D_brief_description_val_GT |
Download the OmniObject3D GT annotations:
huggingface-cli download QileXu/OmniObject3D_brief_description_val_GT \
--repo-type dataset \
--local-dir ./data/anno_dataThe judge is set via --gpt_type in the eval scripts.
To evaluate on Objaverse captioning:
# In scripts/eval/objaverse.sh, set:
# MODEL_VERSION=QileXu/PointLLM-R-7B
# PROMPT_INDEX=2 (2 = captioning, 0/1 = classification)
bash scripts/eval/objaverse.shpython pointllm/eval/PointLLM_chat.py \
--model_path QileXu/PointLLM-R-7BAfter startup, the script prompts you to enter an object ID (from ./data/objaverse_data) and then enter your questions interactively. Enter q to quit, or exit to end the current object's conversation and switch to a new one.
