Skip to content

CS7389K/Group-Project

Repository files navigation

TurtleBot3 VLM Perception System

ROS2 Python Ubuntu License

A hybrid vision-language perception system for TurtleBot3 mobile manipulation, combining YOLO11 object detection with Moondream2 VLM reasoning on Jetson Xavier NX.

Overview

An intelligent perception pipeline for TurtleBot3 combining YOLO11 object detection (30+ FPS) with Moondream2 VLM reasoning (2-3 Hz) to enable autonomous manipulation decisions. The system analyzes material properties, physical attributes, and graspability to determine optimal robot actions.

Table of Contents

Features

  • Modular ROS2 Architecture: Separate packages for VLM inference and perception
  • Multiple Deployment Options: On-device, remote server, or hybrid configurations
  • Jetson-Optimized: 8-bit/4-bit quantization, memory-efficient CUDA allocation
  • Easy Setup: Automated installation script with automatic workspace configuration

Prerequisites

  • Hardware:
    • Computer capable of running the VLM server
    • TurtleBot3 (Burger/Waffle/Waffle Pi) running a NVIDIA Jetson Xavier NX 8GB
      • USB Camera or Raspberry Pi Camera Module
      • OpenMANIPULATOR-X (optional)
  • Operating System:
    • TurtleBot3: Ubuntu 20.04
    • VLM Server: Ubuntu 22.04 LTS
  • ROS2: Foxy
  • Python: 3.8+

Usage

# On Both the Remote Machine and Turtlebot3 Jetson:
git clone https://github.com/CS7389K/Group-Project.git ~/turtlebot3_vlm
cd ~/turtlebot3_vlm

# On the Remote Machine:
# First, start the Flask VLM server (required for vlm_bridge)
python3 scripts/vlm_ros_server.py

# In a separate terminal on the same Remote Machine:
colcon build --symlink-install --packages-select vlm_bridge
source ./install/setup.bash
ros2 launch vlm_bridge vlm_bridge.launch.py

# On Turtlebot3 Jetson:
colcon build --symlink-install --packages-select turtlebot3_vlm_perception
source ./install/setup.bash
ros2 launch vlm_bridge vlm_with_yolo.launch.py
# Or Without YOLO:
# ros2 launch turtlebot3_vlm_perception vlm.launch.py

Important: The Flask server (vlm_ros_server.py) must be running on the same machine as vlm_bridge to handle VLM inference requests.

Download VLM Models

cd scripts
python3 download_vlm_models.py --output-dir ./vlm_models

Project Structure

├── src/
│   ├── turtlebot3_vlm_perception/    # TurtleBot3 ROS2 client
│   │   └── launch/                   # Launch files for VLM perception stack
│   └── vlm_bridge/                   # VLM inference server
│       └── launch/                   # Launch files for VLM server & bridges
├── tools/
│   ├── git_fetch_and_build.sh        # Git update and build helper
│   └── restart_camera.sh             # Camera troubleshooting utility
├── scripts/
│   └── download_vlm_models.py        # Download VLM
├── pyproject.toml                    # Python project configuration
└── docker-compose.yml                # Docker deployment configuration

System Architecture

┌─────────────────────────────────────────────────────────────┐
│                    TurtleBot3 + Jetson Xavier NX            │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌─────────────────┐         ┌──────────────────────────┐   │
│  │ TurtleBot3      │         │ VLM Reasoner             │   │
|  | Camera          | Image   |                          |   |
│  │ Publisher       │────────>│                          │   │
│  │ (30 FPS)        │         │ ┌────────────────────┐   │   │
│  └─────────────────┘         │ │ YOLO11 Detection   │   │   │
│         │                    │ │ (30+ FPS)          │   │   │
│    RPi Camera v2             │ └─────────┬──────────┘   │   │
│    (GStreamer HW Accel)      │           │              │   │
│                              │           v              │   │
│                              │ ┌────────────────────┐   │   │
│                              │ │ Moondream2 VLM     │   │   │
│                              │ │ (Physics Analysis) │   │   │
│                              │ │ (2-3 Hz)           │   │   │
│                              │ └─────────┬──────────┘   │   │
│                              │           │              │   │
│                              │           v              │   │
│                              │ ┌────────────────────┐   │   │
│                              │ │ Decision Fusion    │   │   │
│                              │ │ GRASP/PUSH/AVOID   │   │   │
│                              │ └────────────────────┘   │   │
│                              └──────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

ROS2 Topics

  • /camera/image_raw (sensor_msgs/Image) - Camera feed from TurtleBot3
  • /yolo/detections (std_msgs/String) - YOLO object detection results with bounding boxes
  • /vlm/response (std_msgs/String) - VLM inference responses
  • /vlm/inference_result (std_msgs/String) - Network bridge VLM results
  • /vlm/analysis_result (std_msgs/String) - Integrated YOLO+VLM analysis
  • /vlm/response (std_msgs/String) - VLM responses (by client nodes)

Troubleshooting

If camera fails to open:

# Release camera resources
./tools/restart_camera.sh

Resources

ROS2 Documentation

Models & Frameworks

Hardware

License

This project is licensed under the MIT License - see the LICENSE file for details.


Built with ❤️ using ROS2 Foxy and Jetson Xavier NX

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published