Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 2 additions & 26 deletions docs/extensions/perception.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,5 @@
--8<-- "src/rai_extensions/rai_perception/README.md:sec1"
Agents create two ROS 2 Nodes: `grounding_dino` and `grounded_sam` using [ROS2Connector](../API_documentation/connectors/ROS_2_Connectors.md).
These agents can be triggered by ROS2 services:

- `grounding_dino_classify`: `rai_interfaces/srv/RAIGroundingDino`
- `grounded_sam_segment`: `rai_interfaces/srv/RAIGroundedSam`

> [!TIP]
>
> If you wish to integrate open-set detection into your ros2 launch file, a premade launch
> file can be found in `rai/src/rai_bringup/launch/openset.launch.py`

> [!NOTE]
> The weights will be downloaded to `~/.cache/rai` directory.

## RAI Tools

`rai_perception` package contains tools that can be used by [RAI LLM agents](../tutorials/walkthrough.md)
enhance their perception capabilities. For more information on RAI Tools see
[Tool use and development](../tutorials/tools.md) tutorial.

--8<-- "src/rai_extensions/rai_perception/README.md:sec3"

> [!TIP]
>
> you can try example below with [rosbotxl demo](../demos/rosbot_xl.md) binary.
> The binary exposes `/camera/camera/color/image_raw` and `/camera/camera/depth/image_raw` topics.

--8<-- "src/rai_extensions/rai_perception/README.md:sec4"

--8<-- "src/rai_extensions/rai_perception/README.md:sec5"
188 changes: 89 additions & 99 deletions src/rai_extensions/rai_perception/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,125 +2,139 @@

# RAI Perception

This package provides ROS2 integration with [Idea-Research GroundingDINO Model](https://github.com/IDEA-Research/GroundingDINO) and [Grounded-SAM-2, RobotecAI fork](https://github.com/RobotecAI/Grounded-SAM-2) for object detection, segmentation, and gripping point calculation. The `GroundedSamAgent` and `GroundingDinoAgent` are ROS2 service nodes that can be readily added to ROS2 applications. It also provides tools that can be used with [RAI LLM agents](../tutorials/walkthrough.md) to construct conversational scenarios.
RAI Perception brings powerful computer vision capabilities to your ROS2 applications. It integrates [GroundingDINO](https://github.com/IDEA-Research/GroundingDINO) and [Grounded-SAM-2](https://github.com/RobotecAI/Grounded-SAM-2) to detect objects, create segmentation masks, and calculate gripping points.

In addition to these building blocks, this package includes utilities to facilitate development, such as a ROS2 client that demonstrates interactions with agent nodes.
The package includes two ready-to-use ROS2 service nodes (`GroundedSamAgent` and `GroundingDinoAgent`) that you can easily add to your applications. It also provides tools that work seamlessly with [RAI LLM agents](../tutorials/walkthrough.md) to build conversational robot scenarios.

## Installation
## Prerequisites

Before installing `rai-perception`, ensure you have:

While installing `rai_perception` via Pip is being actively worked on, to incorporate it into your application, you will need to set up a ROS2 workspace.
1. **ROS2 installed** (Jazzy recommended, or Humble). If you don't have ROS2 yet, follow the official ROS2 installation guide for [jazzy](https://docs.ros.org/en/jazzy/Installation.html) or [humble](https://docs.ros.org/en/humble/Installation.html).
2. **Python 3.8+** and `pip` installed (usually pre-installed on Ubuntu).
3. **NVIDIA GPU** with CUDA support (required for optimal performance).
4. **wget** installed (required for downloading model weights):
```bash
sudo apt install wget
```

### ROS2 Workspace Setup
## Installation

Create a ROS2 workspace and copy this package:
**Step 1:** Source ROS2 in your terminal:

```bash
mkdir -p ~/rai_perception_ws/src
cd ~/rai_perception_ws/src

# only checkout rai_perception package
git clone --depth 1 --branch main https://github.com/RobotecAI/rai.git temp
cd temp
git archive --format=tar --prefix=rai_perception/ HEAD:src/rai_extensions/rai_perception | tar -xf -
mv rai_perception ../rai_perception
cd ..
rm -rf temp
# For ROS2 Jazzy (recommended)
source /opt/ros/jazzy/setup.bash

# For ROS2 Humble
source /opt/ros/humble/setup.bash
```

### ROS2 Dependencies
**Step 2:** Install ROS2 dependencies. `rai-perception` requires its ROS2 packages that needs to be installed separately:

```bash
# Update package lists first
sudo apt update

# Install rai_interfaces as a debian package
sudo apt install ros-jazzy-rai-interfaces # or ros-humble-rai-interfaces for Humble
```

Add required ROS dependencies. From the workspace root, run
**Step 3:** Install `rai-perception` via pip:

```bash
rosdep install --from-paths src --ignore-src -r
pip install rai-perception
```

### Build and Run
> [!TIP]
> It's recommended to install `rai-perception` in a virtual environment to avoid conflicts with other Python packages.

Source ROS2 and build:
> [!TIP]
> To avoid sourcing ROS2 in every new terminal, add the source command to your `~/.bashrc` file:
>
> ```bash
> echo "source /opt/ros/jazzy/setup.bash" >> ~/.bashrc # or humble
> ```

```bash
# Source ROS2 (humble or jazzy)
source /opt/ros/${ROS_DISTRO}/setup.bash
<!--- --8<-- [end:sec1] -->

# Build workspace
cd ~/rai_perception_ws
colcon build --symlink-install
<!--- --8<-- [start:sec4] -->

# Source ROS2 packages
source install/setup.bash
```
## Getting Started

### Python Dependencies
This section provides a step-by-step guide to get you up and running with RAI Perception.

`rai_perception` depends on `rai-core` and `sam2`. There are many ways to set up a virtual environment and install these dependencies. Below, we provide an example using Poetry.
### Quick Start

**Step 1:** Copy the following template to `pyproject.toml` in your workspace root, updating it according to your directory setup:
After installing `rai-perception`, launch the perception agents:

```toml
# rai_perception_project pyproject template
[tool.poetry]
name = "rai_perception_ws"
version = "0.1.0"
description = "ROS2 workspace for RAI perception"
package-mode = false
**Step 1:** Open a terminal and source ROS2:

[tool.poetry.dependencies]
python = "^3.10, <3.13"
rai-core = ">=2.5.4"
rai-perception = {path = "src/rai_perception", develop = true}
```bash
source /opt/ros/jazzy/setup.bash # or humble
```

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
**Step 2:** Launch the perception agents:

```bash
python -m rai_perception.scripts.run_perception_agents
```

**Step 2:** Install dependencies:
> [!NOTE]
> The weights will be downloaded to `~/.cache/rai` directory on first use.

The agents create two ROS 2 nodes: `grounding_dino` and `grounded_sam` using [ROS2Connector](../API_documentation/connectors/ROS_2_Connectors.md).

First, we create Virtual Environment with Poetry:
### Testing with Example Client

The `rai_perception/talker.py` example demonstrates how to use the perception services for object detection and segmentation. It shows the complete pipeline: GroundingDINO for object detection followed by GroundedSAM for instance segmentation, with visualization output.

**Step 1:** Open a terminal and source ROS2:

```bash
cd ~/rai_perception_ws
poetry lock
poetry install
source /opt/ros/jazzy/setup.bash # or humble
```

Now, we are ready to launch perception agents:
**Step 2:** Launch the perception agents:

```bash
# Activate virtual environment
source "$(poetry env info --path)"/bin/activate
export PYTHONPATH
PYTHONPATH="$(dirname "$(dirname "$(poetry run which python)")")/lib/python$(poetry run python --version | awk '{print $2}' | cut -d. -f1,2)/site-packages:$PYTHONPATH"
python -m rai_perception.scripts.run_perception_agents
```

# run agents
python src/rai_perception/scripts/run_perception_agents.py
**Step 3:** In a different terminal (remember to source ROS2 first), run the example client:

```bash
source /opt/ros/jazzy/setup.bash # or humble
python -m rai_perception.examples.talker --ros-args -p image_path:="<path-to-image>"
```

You can use any image containing objects like dragons, lizards, or dinosaurs. For example, use the `sample.jpg` from the package's `images` folder. The client will detect these objects and save a visualization with bounding boxes and masks to `masks.png` in the current directory.

> [!TIP]
> To manage ROS 2 + Poetry environment with less friction: Keep build tools (colcon) at system level, use Poetry only for runtime dependencies of your packages.
>
> If you wish to integrate open-set vision into your ros2 launch file, a premade launch
> file can be found in `rai/src/rai_bringup/launch/openset.launch.py`

<!--- --8<-- [end:sec1] -->
### ROS2 Service Interface

`rai-perception` agents create two ROS 2 nodes: `grounding_dino` and `grounded_sam` using [ROS2Connector](../../../docs/API_documentation/connectors/ROS_2_Connectors.md).
These agents can be triggered by ROS2 services:
The agents can be triggered by ROS2 services:

- `grounding_dino_classify`: `rai_interfaces/srv/RAIGroundingDino`
- `grounded_sam_segment`: `rai_interfaces/srv/RAIGroundedSam`

> [!TIP]
>
> If you wish to integrate open-set vision into your ros2 launch file, a premade launch
> file can be found in `rai/src/rai_bringup/launch/openset.launch.py`
<!--- --8<-- [end:sec4] -->

> [!NOTE]
> The weights will be downloaded to `~/.cache/rai` directory.
<!--- --8<-- [start:sec5] -->

## Dive Deeper: Tools and Integration

## RAI Tools
This section provides information for developers looking to integrate RAI Perception tools into their applications.

`rai_perception` package contains tools that can be used by [RAI LLM agents](../../../docs/tutorials/walkthrough.md)
### RAI Tools

`rai_perception` package contains tools that can be used by [RAI LLM agents](../tutorials/walkthrough.md)
to enhance their perception capabilities. For more information on RAI Tools see
[Tool use and development](../../../docs/tutorials/tools.md) tutorial.
[Tool use and development](../tutorials/tools.md) tutorial.

<!--- --8<-- [start:sec2] -->

Expand All @@ -132,7 +146,7 @@ This tool calls the GroundingDINO service to detect objects from a comma-separat

> [!TIP]
>
> you can try example below with [rosbotxl demo](../../../docs/demos/rosbot_xl.md) binary.
> you can try example below with [rosbotxl demo](../demos/rosbot_xl.md) binary.
> The binary exposes `/camera/camera/color/image_raw` and `/camera/camera/depth/image_rect_raw` topics.

<!--- --8<-- [start:sec3] -->
Expand Down Expand Up @@ -198,30 +212,6 @@ with ROS2Context():
I have detected the following items in the picture desk: 2.43m away
```

## Simple ROS2 Client Node Example

The `rai_perception/talker.py` example demonstrates how to use the perception services for object detection and segmentation. It shows the complete pipeline: GroundingDINO for object detection followed by GroundedSAM for instance segmentation, with visualization output.

This example is useful for:

- Testing perception services integration
- Understanding the ROS2 service call patterns
- Seeing detection and segmentation results with bounding boxes and masks

Run the example:

```bash
cd ~/rai_perception_ws
python src/rai_perception/scripts/run_perception_agents.py
```

In a different window, run

```bash
cd ~/rai_perception_ws
ros2 run rai_perception talker --ros-args -p image_path:=src/rai_perception/images/sample.jpg
```

The example will detect objects (dragon, lizard, dinosaur) and save a visualization with bounding boxes and masks to `masks.png`.

<!--- --8<-- [end:sec3] -->

<!--- --8<-- [end:sec5] -->
5 changes: 3 additions & 2 deletions src/rai_extensions/rai_perception/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
[tool.poetry]
name = "rai_perception"
version = "0.1.2"
name = "rai-perception"
# TODO, update the version once it is published to PyPi
version = "0.1.5"
description = "Package for object detection, segmentation and gripping point detection."
authors = ["Kajetan Rachwał <kajetan.rachwal@robotec.ai>"]
readme = "README.md"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,19 +67,46 @@ def _load_model_with_error_handling(self, model_class):
raise e

def _download_weights(self):
self.logger.info(
f"Downloading weights from {self.WEIGHTS_URL} to {self.weights_path}"
)
try:
subprocess.run(
[
"wget",
self.WEIGHTS_URL,
"-O",
self.weights_path,
str(self.weights_path),
"--progress=dot:giga",
]
],
check=True,
capture_output=True,
text=True,
)
# Verify file exists and has reasonable size (> 1MB)
if not os.path.exists(self.weights_path):
raise Exception(f"Downloaded file not found at {self.weights_path}")
file_size = os.path.getsize(self.weights_path)
if file_size < 1024 * 1024:
raise Exception(
f"Downloaded file is too small ({file_size} bytes), expected > 1MB"
)
self.logger.info(
f"Successfully downloaded weights ({file_size / (1024 * 1024):.2f} MB)"
)
except Exception:
self.logger.error("Could not download weights")
raise Exception("Could not download weights")
except subprocess.CalledProcessError as e:
error_msg = e.stderr if e.stderr else e.stdout if e.stdout else str(e)
self.logger.error(f"wget failed: {error_msg}")
# Clean up partial download
if os.path.exists(self.weights_path):
os.remove(self.weights_path)
raise Exception(f"Could not download weights: {error_msg}")
except Exception as e:
self.logger.error(f"Could not download weights: {e}")
# Clean up partial download
if os.path.exists(self.weights_path):
os.remove(self.weights_path)
raise

def _remove_weights(self):
os.remove(self.weights_path)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Copyright (C) 2025 Robotec.AI
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@

import rclpy
from rai.agents import wait_for_shutdown

from rai_perception.agents import GroundedSamAgent, GroundingDinoAgent


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
import numpy as np
import sensor_msgs.msg
from langchain_core.tools import BaseTool
from pydantic import BaseModel, Field
from pydantic import BaseModel, ConfigDict, Field
from rai.communication.ros2 import ROS2Connector
from rai.communication.ros2.api import convert_ros_img_to_ndarray
from rai.communication.ros2.ros_async import get_future_result
Expand Down Expand Up @@ -84,12 +84,18 @@ class GroundingDinoBaseTool(BaseTool):
box_threshold: float = Field(default=0.35, description="Box threshold for GDINO")
text_threshold: float = Field(default=0.45, description="Text threshold for GDINO")

model_config = ConfigDict(arbitrary_types_allowed=True)

def _run(self, *args, **kwargs):
"""Abstract method - must be implemented by subclasses."""
raise NotImplementedError("Subclasses must implement _run method")

def _call_gdino_node(
self, camera_img_message: sensor_msgs.msg.Image, object_names: list[str]
) -> Future:
cli = self.connector.node.create_client(RAIGroundingDino, GDINO_SERVICE_NAME)
while not cli.wait_for_service(timeout_sec=1.0):
self.node.get_logger().info(
self.connector.node.get_logger().info(
f"service {GDINO_SERVICE_NAME} not available, waiting again..."
)
req = RAIGroundingDino.Request()
Expand Down
Loading