We maintain a curated list of Awesome Embodied AI works. Currently, we include simulators, tasks and datasets in Embodied AI field.
- Simulators help render images and simulate the behavior of agents, as if they are situated in an real world environment.
- Datasets provide training data (e.g. navigation instructions) and ground truths (e.g. navigation trajectories).
(Some simulator comes along with a dataset with the same name, so there might be duplicated names in different sections.)
Please feel free to pull requests or open an issue to add papers.
Platform to simulate real world environments.
- Habitat-Simulator
- Venue/Year: ICCV 2019 | [paper] [code] [homepage]
- Visual Content: Matterport3D, House3D, AI2-THOR, etc. (partially realistic)
- Action Space: continuous
- AI2-THOR
- Venue/Year: Arxiv 2019 | [paper] [code] [homepage]
- Visual Content: AI2-THOR
- Action Space: continuous
- Interactive: Yes
- CHALET
- Matterport3D
- Venue/Year: 3DV 2017 | [paper] [code] [homepage]
- Visual Content: Matterport3D (realistic)
- Action Space: graph based
- MINOS
- Venue/Year: CVPR 2017 | [paper] [code] [homepage]
- Visual Content: SUNCG+Matterport3D (partially realistic)
- Action Space: continuous
- Gibson
- Venue/Year: CVPR 2018 | [paper] [code] [homepage]
- Visual Content: Gibson+2D3DS+Matterport3D (realistic)
- Action Space: continuous
- Interactive: Yes
- House3D
- SUNCG
- Venue/Year: CVPR 2017 | [paper]
- Visual Content: SUNCG
- HoME
- VirtualHome
- SceneNet RGB-D
- Venue/Year: ICCV 2017 | [paper] [code] [homepage]
- Visual Content: SceneNet RGB-D
- Action Space: continuous
- Interactive: Yes
Embodied task definitions.
REVERIE - requires an intelligent agent to correctly localize a remote target object (can not be observed at starting location) specified by a concise high-level natural language instruction.
VLN - requires an embodied agent to follow natural language instructions to navigate from a starting pose to a goal location.
VNLA - a grounded vision-language task where an agent with visual perception is guided via language to find objects in photorealistic indoor environments.
EQA - an agent is spawned at a random location in a 3D environment and asked a question. The agent must first intelligently navigate to explore the environment, gather necessary visual information through first-person (egocentric) vision, and then answer the question.
IQA - requires an agent to navigate around the scene, acquire visual understanding of scene elements, interact with objects (e.g. open refrigerators) and plan for a series of actions conditioned on the question.
TOUCHDOWN - requires an agent to first follow navigation instructions in a real-life visual urban environment, and then identify a location described in natural language to find a hidden object at the goal position.
Embodied datasets built upon simulators.
- REVERIE CVPR 2020 based on Matterport3D paper code
- language content: navigation instructions
- applicable tasks: REVERIE, VLN, referring expression
- R2R CVPR 2018 based on Matterport3D paper homepage
- language content: navigation instructions
- applicable tasks: VLN
- VNLA CVPR 2019 based on Matterport3D paper code
- language content: navigation instructions and assistance
- applicable tasks: VNLA, VLN, referring expression
- HANNA EMNLP 2019 based on Matterport3D paper code
- language content: navigation instructions and assistance
- applicable tasks: VNLA, VLN, referring expression
- CVDN CoRL 2019 based on Matterport3D paper code homepage
- language content: dialogues
- applicable tasks: VNLA, VLN
- EQA CVPR 2018 based on House3D paper code homepage
- language content: question-answer pairs
- applicable tasks: EQA, VLN
- IQUADv1 CVPR 2018 based on AI2-THOR paper code
- language content: question-answer pairs
- applicable tasks: IQA, EQA, VLN
- TOUCHDOWN CVPR 2019 based on Google Street View paper code
- language content: navigation instructions
- applicable tasks: TOUCHDOWN, VLN, referring expression
- Talk The Way 2018 paper code
- visual content: manually captured neighborhoods of New York City
- language content: navigation dialogues
- applicable tasks: VNLA, VLN
- LANI & CHAI 2019 based on CHALET paper code
- language content: navigation instructions
- applicable tasks: VLN
- Activity & ActivityPrograms CVPR 2018 paper code homepage
- language content: task descriptions
- applicable tasks: VLN
- Habitat ICCV 2019 paper code homepage
- language content: navigation instructions, task descriptions, etc.
- applicable tasks: IQA, EQA, VLN, language grounding, etc.