Skip to content

RobbinSeason/spot_yolo_perception_ros2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

spot_yolo_perception_ros2

A ROS2-based multi-camera perception system with YOLOv5 and RGB-D 3D localization for autonomous exploration and grasping on Boston Dynamics Spot.


πŸš€ Overview

This repository implements the perception module of an autonomous robotic system for exploration and object retrieval on a Boston Dynamics Spot platform.

Built on ROS2, this package integrates real-time object detection, RGB-D based 3D localization, and perception-driven interaction.

The system forms part of a closed-loop pipeline combining perception, navigation, and manipulation.


πŸ”₯ Highlights

  • Real-time object detection using YOLOv5 (mAP@0.5 = 0.962)
  • Multi-camera RGB-D perception (5 cameras on Spot)
  • 2D β†’ 3D projection using depth + camera intrinsics
  • TF-based transformation into global map frame (spot/map)
  • Multi-view fusion with spatial clustering
  • Temporal filtering for robust detection
  • Automatic navigation goal generation
  • Pixel-level detection for grasping (hand camera)
  • Dataset pipeline from ROS bag β†’ training images

βš™οΈ Dependencies

πŸ“¦ System Requirements

  • Ubuntu 22.04
  • ROS2 (Humble recommended)
  • Python β‰₯ 3.8

πŸ“₯ Install ROS2 Dependencies

sudo apt update
sudo apt install python3-pip ros-$ROS_DISTRO-vision-msgs

πŸ“¦ Install Python Dependencies

pip3 install yolov5
pip3 install opencv-python numpy

πŸ› οΈ Installation

1️⃣ Create Workspace

mkdir -p ~/yolov5_ws/src
cd ~/yolov5_ws/src

2️⃣ Clone Repository

git clone https://github.com/RobbinSeason/spot_yolo_perception_ros2.git

3️⃣ Build

cd ~/yolov5_ws
colcon build
source install/setup.bash

🧠 System Architecture

1️⃣ Global Detection (Exploration Stage)

Processes multi-camera RGB-D data to detect and localize objects in 3D.

Pipeline:

  1. YOLOv5 detection
  2. Bounding box center extraction
  3. Depth lookup (median filtering)
  4. 2D β†’ 3D projection
  5. TF transform β†’ spot/map
  6. Multi-camera clustering
  7. Navigation goal generation

Outputs:

  • object_position_3d (vision_msgs/Detection3DArray)
  • /goal_pose (geometry_msgs/PoseStamped)
  • object_markers (RViz visualization)

2️⃣ Hand Camera Detection (Grasping Stage)

Provides precise pixel-level localization for grasping.

Pipeline:

  1. Trigger detection via service
  2. YOLO inference per frame
  3. Select highest-confidence detection
  4. Reject unstable detections
  5. Temporal fusion (median filtering)
  6. Output stable pixel coordinates

Outputs:

  • hand_object_pixel_center (geometry_msgs/PointStamped)

πŸ“· Multi-Camera Setup

The system uses 5 RGB-D cameras:

  • /spot/camera/frontleft/image/compressed
  • /spot/camera/frontright/image/compressed
  • /spot/camera/left/image/compressed
  • /spot/camera/right/image/compressed
  • /spot/camera/back/image/compressed

Each camera is paired with:

  • Depth image
  • Camera info

πŸš€ Usage

1️⃣ Run Global Detection

ros2 launch yolov5_ros2 yolov5_ros2_launch.py

2️⃣ Run Hand Detection

ros2 launch yolov5_ros2 hand_yolo_launch.py

3️⃣ Trigger Hand Detection

ros2 service call /hand_pixel/trigger std_srvs/srv/Trigger

πŸ“‘ ROS Topics

Subscribed

Topic Type
RGB images sensor_msgs/CompressedImage
Depth images sensor_msgs/Image
Camera info sensor_msgs/CameraInfo
Hand camera sensor_msgs/Image

Published

Topic Type Description
object_position_3d Detection3DArray 3D detections
/goal_pose PoseStamped Navigation target
object_markers Marker RViz visualization
hand_object_pixel_center PointStamped Pixel grasp point

βš™οΈ Parameters

Global Detection

device: "cpu"
model: "best"

work_offset_distance: 0.4
instance_radius: 2.0
buffer_size: 3
publish_score_thresh: 0.6

sync_mode: "auto"
max_wait_s: 0.05

Hand Detection

score_thresh: 0.5
stable_frames: 3
max_jump_px: 25.0
timeout_sec: 50.0

πŸ§ͺ Dataset Collection (ROS Bag β†’ Images)

▢️ Run Export Script

python3 bag_to_yolo_images.py

πŸ’‘ Workflow

ros2 bag play your_data.bag
python3 bag_to_yolo_images.py

πŸ“ Output Structure

dataset_raw/
β”œβ”€β”€ frontleft/images/
β”œβ”€β”€ frontright/images/
β”œβ”€β”€ left/images/
β”œβ”€β”€ right/images/
└── back/images/

πŸ“ Project Structure

yolov5_ros2/
β”œβ”€β”€ README.md
β”œβ”€β”€ package.xml
β”œβ”€β”€ setup.py

β”œβ”€β”€ config/
β”‚   β”œβ”€β”€ best.pt
β”‚   β”œβ”€β”€ yolov5_params.yaml
β”‚   └── hand_yolo_params.yaml

β”œβ”€β”€ launch/
β”‚   β”œβ”€β”€ yolov5_ros2_launch.py
β”‚   └── hand_yolo_launch.py

β”œβ”€β”€ yolov5_ros2/
β”‚   β”œβ”€β”€ yolo_detect_2d.py
β”‚   └── hand_yolo_pixel_trigger_node.py

β”œβ”€β”€ bag_to_yolo_images.py

⚠️ Notes

  • YOLO weights must be placed in:
    config/best.pt
    
  • TF must include:
    camera_frame β†’ spot/map
    
  • RGB and depth must be aligned

πŸ§ͺ Known Limitations

  • Sensitive to depth noise
  • False positives may affect grasping
  • TF synchronization may impact accuracy

πŸ‘€ Author

Haoyu Gong
Karlsruhe Institute of Technology (KIT)


πŸ“„ License

MIT License

About

ROS2-based multi-camera RGB-D perception system using YOLOv5 for autonomous exploration and grasping on Spot robot.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages