InfoNav: A Unified Value Framework Integrating Semantics Value and Information Gain for Zero-Shot Navigation
Paper: Coming soon
Watch the Demo Video - See InfoNav in action!
InfoNav is an intelligent navigation system for embodied agents in indoor environments, combining Vision-Language Models (VLM) and Large Language Models (LLM) for semantic-aware object navigation tasks. The system proposes a unified value framework that integrates semantic value and information gain for zero-shot navigation.
InfoNav is built on top of Habitat-Sim and uses ROS for inter-module communication. The system integrates:
- VLM-based Object Detection: GroundingDINO, BLIP2 for open-vocabulary detection and image-text matching
- LLM-based Semantic Reasoning: Room-object association and hypothesis generation
- Unified Value Framework: Combining semantic value with information gain
- Hybrid Exploration Strategy: Frontier-based exploration with semantic value maps
- Robust Navigation Planning: A* path planning with TSP optimization for multi-target scenarios
- OS: Ubuntu 20.04
- GPU: NVIDIA GPU with CUDA support and 8GB+ VRAM (tested on RTX 4070 Ti)
- Python: 3.9+
- ROS: ROS Noetic
sudo apt update
sudo apt-get install libarmadillo-dev libompl-devgit clone git@github.com:WongKinYiu/yolov7.git # yolov7
git clone https://github.com/IDEA-Research/GroundingDINO.git # GroundingDINODownload the following model weights and place them in the data/ directory:
mobile_sam.pt: https://github.com/ChaoningZhang/MobileSAM/tree/master/weights/mobile_sam.ptgroundingdino_swint_ogc.pth:wget -O data/groundingdino_swint_ogc.pth https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
yolov7-e6e.pt:wget -O data/yolov7-e6e.pt https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-e6e.pt
git clone git@github.com:Pandakingxbc/InfoNav.git
cd InfoNavconda env create -f infonav_environment.yaml -y
conda activate infonav# You can use 'nvcc --version' to check your CUDA version.
# CUDA 11.8
pip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https://download.pytorch.org/whl/cu118
# CUDA 12.1
pip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https://download.pytorch.org/whl/cu121
# CUDA 12.4
pip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https://download.pytorch.org/whl/cu124We recommend using habitat-lab v0.3.1
# habitat-lab v0.3.1
git clone https://github.com/facebookresearch/habitat-lab.git
cd habitat-lab; git checkout tags/v0.3.1;
pip install -e habitat-lab
# habitat-baselines v0.3.1
pip install -e habitat-baselinesNote: Any numpy-related errors will not affect subsequent operations, as long as numpy==1.23.5 and numba==0.60.0 are correctly installed.
pip install salesforce-lavis==1.0.2
cd .. # Return to InfoNav directory
pip install -e .Note: Any numpy-related errors will not affect subsequent operations, as long as numpy==1.23.5 and numba==0.60.0 are correctly installed.
Set up the DeepSeek API key for LLM hypothesis analysis:
export DEEPSEEK_API_KEY="your_api_key_here"You can add this to your ~/.bashrc to make it persistent:
echo 'export DEEPSEEK_API_KEY="your_api_key_here"' >> ~/.bashrc
source ~/.bashrcYou can skip this and use our pre-generated LLM output results in
llm/answers/instead.
# Install ROS Noetic first: http://wiki.ros.org/noetic/Installation/Ubuntu
catkin init
catkin config --extend /opt/ros/noetic
catkin build
source devel/setup.bashOfficial Reference: https://github.com/facebookresearch/habitat-lab/blob/main/DATASETS.md
Note: Both HM3D and MP3D scene datasets require applying for official permission first.
- Apply for permission at https://matterport.com/habitat-matterport-3d-research-dataset.
- Download https://api.matterport.com/resources/habitat/hm3d-val-habitat-v0.2.tar.
- Save
hm3d-val-habitat-v0.2.tarto theInfoNav/directory, then run:
mkdir -p data/scene_datasets/hm3d/val
mv hm3d-val-habitat-v0.2.tar data/scene_datasets/hm3d/val/
cd data/scene_datasets/hm3d/val
tar -xvf hm3d-val-habitat-v0.2.tar
rm hm3d-val-habitat-v0.2.tar
cd ../..
ln -s hm3d hm3d_v0.2 # Create a symbolic link for hm3d_v0.2- Apply for download access at https://niessner.github.io/Matterport/.
- After successful application, you will receive a
download_mp.pyscript, which should be run withpython2.7to download the dataset. - After downloading, place the files in
InfoNav/data/scene_datasets.
# Create necessary directory structure
mkdir -p data/datasets/objectnav/hm3d
mkdir -p data/datasets/objectnav/mp3d
# HM3D-v0.1
wget -O data/datasets/objectnav/hm3d/v1.zip https://dl.fbaipublicfiles.com/habitat/data/datasets/objectnav/hm3d/v1/objectnav_hm3d_v1.zip
unzip data/datasets/objectnav/hm3d/v1.zip -d data/datasets/objectnav/hm3d && mv data/datasets/objectnav/hm3d/objectnav_hm3d_v1 data/datasets/objectnav/hm3d/v1 && rm data/datasets/objectnav/hm3d/v1.zip
# HM3D-v0.2
wget -O data/datasets/objectnav/hm3d/v2.zip https://dl.fbaipublicfiles.com/habitat/data/datasets/objectnav/hm3d/v2/objectnav_hm3d_v2.zip
unzip data/datasets/objectnav/hm3d/v2.zip -d data/datasets/objectnav/hm3d && mv data/datasets/objectnav/hm3d/objectnav_hm3d_v2 data/datasets/objectnav/hm3d/v2 && rm data/datasets/objectnav/hm3d/v2.zip
# MP3D
wget -O data/datasets/objectnav/mp3d/v1.zip https://dl.fbaipublicfiles.com/habitat/data/datasets/objectnav/m3d/v1/objectnav_mp3d_v1.zip
unzip data/datasets/objectnav/mp3d/v1.zip -d data/datasets/objectnav/mp3d/v1 && rm data/datasets/objectnav/mp3d/v1.zipMake sure that the folder data has the following structure:
data
βββ datasets
β βββ objectnav
β βββ hm3d
β β βββ v1
β β β βββ train
β β β βββ val
β β β βββ val_mini
β β βββ v2
β β βββ train
β β βββ val
β β βββ val_mini
β βββ mp3d
β βββ v1
β βββ train
β βββ val
β βββ val_mini
βββ scene_datasets
β βββ hm3d
β β βββ val
β β βββ 00800-TEEsavR23oF
β β βββ 00801-HaxA7YrQdEC
β β βββ .....
β βββ hm3d_v0.2 -> hm3d
β βββ mp3d
β βββ 17DRP5sb8fy
β βββ 1LXtFkjw3qL
β βββ .....
βββ groundingdino_swint_ogc.pth
βββ mobile_sam.pt
βββ yolov7-e6e.pt
Note that train and val_mini are not required and you can choose to delete them.
- Start VLM Servers
The system relies on several VLM servers (GroundingDINO, BLIP2-ITM, SAM, YOLOv7, D-FINE). Start them all at once using the provided script:
# Requires tmux: sudo apt-get install tmux
chmod +x scripts/start_vlm_servers.sh
./scripts/start_vlm_servers.sh start infonavWait until all servers finish loading (monitor with tmux attach -t vlm_servers). Server URLs:
- GroundingDINO:
http://localhost:12181 - BLIP2-ITM:
http://localhost:12182 - SAM:
http://localhost:12183 - YOLOv7:
http://localhost:12184 - D-FINE:
http://localhost:12185
To stop servers: ./scripts/start_vlm_servers.sh stop
- Start ROS Core
roscore- Launch Exploration Node
source devel/setup.bash
roslaunch exploration_manager exploration.launch- Run Evaluation
python habitat_evaluation.py --config config/habitat_eval_hm3dv2.yamlMain configuration files:
config/habitat_eval_hm3dv1.yaml- HM3D-v1 dataset configconfig/habitat_eval_hm3dv2.yaml- HM3D-v2 dataset configconfig/habitat_eval_mp3d.yaml- MP3D dataset configsrc/planner/exploration_manager/config/algorithm.xml- Algorithm parameters
For testing and debugging:
python habitat_manual_control.pyInfoNav/
βββ habitat_evaluation.py # Main evaluation loop
βββ params.py # Global constants
βββ src/planner/ # C++ ROS planning modules
β βββ exploration_manager/ # FSM and exploration logic
β βββ plan_env/ # Map representations
β βββ path_searching/ # A* path planning
βββ vlm/ # Vision-Language Models
βββ llm/ # Large Language Models
βββ basic_utils/ # Utility functions
βββ habitat2ros/ # Habitat-ROS bridge
βββ config/ # Configuration files
If you find this work useful, please cite:
@article{infonav2025,
title={InfoNav: A Unified Value Framework Integrating Semantics Value and Information Gain for Zero-Shot Navigation},
author={},
journal={},
year={2025}
}This project is released under the MIT License.
This project builds upon several excellent open-source projects:
For questions or issues, please open an issue on GitHub.
- Release real-world deployment code
- Release ROS2 support version
