Here is a curated list of papers about 3D-Related Tasks empowered by Large Language Models (LLMs). It contains various tasks including 3D understanding, reasoning, generation, and embodied agents. Also, we include other Foundation Models (CLIP, SAM) for the whole picture of this area.
This is an active repository, you can watch for following the latest advances. If you find it useful, please kindly star ⭐ this repo and cite the paper.
- [2024-05-16] 📢 Check out the first survey paper in the 3D-LLM domain: When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
- [2024-01-06] Runsen Xu added chronological information and Xianzheng Ma reorganized it in Z-A order for better following the latest advances.
- [2023-12-16] Xianzheng Ma and Yash Bhalgat curated this list and published the first version;
Date | keywords | Institute (first) | Paper | Publication | Others |
---|---|---|---|---|---|
2023-5-20 | 3D-CLR | UCLA | 3D Concept Learning and Reasoning from Multi-View Images | CVPR '23 | github |
- | Transcribe3D | TTI, Chicago | Transcribe3D: Grounding LLMs Using Transcribed Information for 3D Referential Reasoning with Self-Corrected Finetuning | CoRL '23 | github |
Date | keywords | Institute | Paper | Publication | Others |
---|---|---|---|---|---|
2023-11-29 | ShapeGPT | Fudan University | ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model | Arxiv | github |
2023-11-27 | MeshGPT | TUM | MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers | Arxiv | project |
2023-10-19 | 3D-GPT | ANU | 3D-GPT: Procedural 3D Modeling with Large Language Models | Arxiv | github |
2023-9-21 | LLMR | MIT | LLMR: Real-time Prompting of Interactive Worlds using Large Language Models | Arxiv | - |
2023-9-20 | DreamLLM | MEGVII | DreamLLM: Synergistic Multimodal Comprehension and Creation | Arxiv | github |
2023-4-1 | ChatAvatar | Deemos Tech | DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance | ACM TOG | website |
Your contributions are always welcome!
I will keep some pull requests open if I'm not sure if they are awesome for 3D LLMs, you could vote for them by adding 👍 to them.
If you have any questions about this opinionated list, please get in touch at xianzheng@robots.ox.ac.uk or Wechat ID: mxz1997112.
If you find this repository useful, please consider citing this paper:
@misc{ma2024llmsstep3dworld,
title={When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models},
author={Xianzheng Ma and Yash Bhalgat and Brandon Smart and Shuai Chen and Xinghui Li and Jian Ding and Jindong Gu and Dave Zhenyu Chen and Songyou Peng and Jia-Wang Bian and Philip H Torr and Marc Pollefeys and Matthias Nießner and Ian D Reid and Angel X. Chang and Iro Laina and Victor Adrian Prisacariu},
year={2024},
journal={arXiv preprint arXiv:2405.10255},
}
This repo is inspired by Awesome-LLM