RobotecAI · Juliaj · Nov 20, 2025 · Nov 20, 2025 · Nov 20, 2025 · Nov 20, 2025
diff --git a/docs/extensions/perception.md b/docs/extensions/perception.md
@@ -1,29 +1,5 @@
 --8<-- "src/rai_extensions/rai_perception/README.md:sec1"
-Agents create two ROS 2 Nodes: `grounding_dino` and `grounded_sam` using [ROS2Connector](../API_documentation/connectors/ROS_2_Connectors.md).
-These agents can be triggered by ROS2 services:
-
--   `grounding_dino_classify`: `rai_interfaces/srv/RAIGroundingDino`
--   `grounded_sam_segment`: `rai_interfaces/srv/RAIGroundedSam`
-
-> [!TIP]
->
-> If you wish to integrate open-set detection into your ros2 launch file, a premade launch
-> file can be found in `rai/src/rai_bringup/launch/openset.launch.py`
-
-> [!NOTE]
-> The weights will be downloaded to `~/.cache/rai` directory.
-
-## RAI Tools
-
-`rai_perception` package contains tools that can be used by [RAI LLM agents](../tutorials/walkthrough.md)
-enhance their perception capabilities. For more information on RAI Tools see
-[Tool use and development](../tutorials/tools.md) tutorial.
-
---8<-- "src/rai_extensions/rai_perception/README.md:sec3"
-
-> [!TIP]
->
-> you can try example below with [rosbotxl demo](../demos/rosbot_xl.md) binary.
-> The binary exposes `/camera/camera/color/image_raw` and `/camera/camera/depth/image_raw` topics.
 
 --8<-- "src/rai_extensions/rai_perception/README.md:sec4"
+
+--8<-- "src/rai_extensions/rai_perception/README.md:sec5"
diff --git a/src/rai_extensions/rai_perception/README.md b/src/rai_extensions/rai_perception/README.md
@@ -2,125 +2,139 @@
 
 # RAI Perception
 
-This package provides ROS2 integration with [Idea-Research GroundingDINO Model](https://github.com/IDEA-Research/GroundingDINO) and [Grounded-SAM-2, RobotecAI fork](https://github.com/RobotecAI/Grounded-SAM-2) for object detection, segmentation, and gripping point calculation. The `GroundedSamAgent` and `GroundingDinoAgent` are ROS2 service nodes that can be readily added to ROS2 applications. It also provides tools that can be used with [RAI LLM agents](../tutorials/walkthrough.md) to construct conversational scenarios.
+RAI Perception brings powerful computer vision capabilities to your ROS2 applications. It integrates [GroundingDINO](https://github.com/IDEA-Research/GroundingDINO) and [Grounded-SAM-2](https://github.com/RobotecAI/Grounded-SAM-2) to detect objects, create segmentation masks, and calculate gripping points.
 
-In addition to these building blocks, this package includes utilities to facilitate development, such as a ROS2 client that demonstrates interactions with agent nodes.
+The package includes two ready-to-use ROS2 service nodes (`GroundedSamAgent` and `GroundingDinoAgent`) that you can easily add to your applications. It also provides tools that work seamlessly with [RAI LLM agents](../tutorials/walkthrough.md) to build conversational robot scenarios.
 
-## Installation
+## Prerequisites
+
+Before installing `rai-perception`, ensure you have:
 
-While installing `rai_perception` via Pip is being actively worked on, to incorporate it into your application, you will need to set up a ROS2 workspace.
+1. **ROS2 installed** (Jazzy recommended, or Humble). If you don't have ROS2 yet, follow the official ROS2 installation guide for [jazzy](https://docs.ros.org/en/jazzy/Installation.html) or [humble](https://docs.ros.org/en/humble/Installation.html).
+2. **Python 3.8+** and `pip` installed (usually pre-installed on Ubuntu).
+3. **NVIDIA GPU** with CUDA support (required for optimal performance).
+4. **wget** installed (required for downloading model weights):
+    ```bash
+    sudo apt install wget
+    ```
 
-### ROS2 Workspace Setup
+## Installation
 
-Create a ROS2 workspace and copy this package:
+**Step 1:** Source ROS2 in your terminal:
 
 ```bash
-mkdir -p ~/rai_perception_ws/src
-cd ~/rai_perception_ws/src
-
-# only checkout rai_perception package
-git clone --depth 1 --branch main https://github.com/RobotecAI/rai.git temp
-cd temp
-git archive --format=tar --prefix=rai_perception/ HEAD:src/rai_extensions/rai_perception | tar -xf -
-mv rai_perception ../rai_perception
-cd ..
-rm -rf temp
+# For ROS2 Jazzy (recommended)
+source /opt/ros/jazzy/setup.bash
+
+# For ROS2 Humble
+source /opt/ros/humble/setup.bash
 ```
 
-### ROS2 Dependencies
+**Step 2:** Install ROS2 dependencies. `rai-perception` requires its ROS2 packages that needs to be installed separately:
+
+```bash
+# Update package lists first
+sudo apt update
+
+# Install rai_interfaces as a debian package
+sudo apt install ros-jazzy-rai-interfaces  # or ros-humble-rai-interfaces for Humble
+```
 
-Add required ROS dependencies. From the workspace root, run
+**Step 3:** Install `rai-perception` via pip:
 
 ```bash
-rosdep install --from-paths src --ignore-src -r
+pip install rai-perception
 ```
 
-### Build and Run
+> [!TIP]
+> It's recommended to install `rai-perception` in a virtual environment to avoid conflicts with other Python packages.
 
-Source ROS2 and build:
+> [!TIP]
+> To avoid sourcing ROS2 in every new terminal, add the source command to your `~/.bashrc` file:
+>
+> ```bash
+> echo "source /opt/ros/jazzy/setup.bash" >> ~/.bashrc  # or humble
+> ```
 
-```bash
-# Source ROS2 (humble or jazzy)
-source /opt/ros/${ROS_DISTRO}/setup.bash
+<!--- --8<-- [end:sec1] -->
 
-# Build workspace
-cd ~/rai_perception_ws
-colcon build --symlink-install
+<!--- --8<-- [start:sec4] -->
 
-# Source ROS2 packages
-source install/setup.bash
-```
+## Getting Started
 
-### Python Dependencies
+This section provides a step-by-step guide to get you up and running with RAI Perception.
 
-`rai_perception` depends on `rai-core` and `sam2`. There are many ways to set up a virtual environment and install these dependencies. Below, we provide an example using Poetry.
+### Quick Start
 
-**Step 1:** Copy the following template to `pyproject.toml` in your workspace root, updating it according to your directory setup:
+After installing `rai-perception`, launch the perception agents:
 
-```toml
-# rai_perception_project pyproject template
-[tool.poetry]
-name = "rai_perception_ws"
-version = "0.1.0"
-description = "ROS2 workspace for RAI perception"
-package-mode = false
+**Step 1:** Open a terminal and source ROS2:
 
-[tool.poetry.dependencies]
-python = "^3.10, <3.13"
-rai-core = ">=2.5.4"
-rai-perception = {path = "src/rai_perception", develop = true}
+```bash
+source /opt/ros/jazzy/setup.bash  # or humble
+```
 
-[build-system]
-requires = ["poetry-core>=1.0.0"]
-build-backend = "poetry.core.masonry.api"
+**Step 2:** Launch the perception agents:
+
+```bash
+python -m rai_perception.scripts.run_perception_agents
 ```
 
-**Step 2:** Install dependencies:
+> [!NOTE]
+> The weights will be downloaded to `~/.cache/rai` directory on first use.
+
+The agents create two ROS 2 nodes: `grounding_dino` and `grounded_sam` using [ROS2Connector](../API_documentation/connectors/ROS_2_Connectors.md).
 
-First, we create Virtual Environment with Poetry:
+### Testing with Example Client
+
+The `rai_perception/talker.py` example demonstrates how to use the perception services for object detection and segmentation. It shows the complete pipeline: GroundingDINO for object detection followed by GroundedSAM for instance segmentation, with visualization output.
+
+**Step 1:** Open a terminal and source ROS2:
 
 ```bash
-cd ~/rai_perception_ws
-poetry lock
-poetry install
+source /opt/ros/jazzy/setup.bash  # or humble
 ```
 
-Now, we are ready to launch perception agents:
+**Step 2:** Launch the perception agents:
 
 ```bash
-# Activate virtual environment
-source "$(poetry env info --path)"/bin/activate
-export PYTHONPATH
-PYTHONPATH="$(dirname "$(dirname "$(poetry run which python)")")/lib/python$(poetry run python --version | awk '{print $2}' | cut -d. -f1,2)/site-packages:$PYTHONPATH"
+python -m rai_perception.scripts.run_perception_agents
+```
 
-# run agents
-python src/rai_perception/scripts/run_perception_agents.py
+**Step 3:** In a different terminal (remember to source ROS2 first), run the example client:
+
+```bash
+source /opt/ros/jazzy/setup.bash  # or humble
+python -m rai_perception.examples.talker --ros-args -p image_path:="<path-to-image>"
 ```
 
+You can use any image containing objects like dragons, lizards, or dinosaurs. For example, use the `sample.jpg` from the package's `images` folder. The client will detect these objects and save a visualization with bounding boxes and masks to `masks.png` in the current directory.
+
 > [!TIP]
-> To manage ROS 2 + Poetry environment with less friction: Keep build tools (colcon) at system level, use Poetry only for runtime dependencies of your packages.
+>
+> If you wish to integrate open-set vision into your ros2 launch file, a premade launch
+> file can be found in `rai/src/rai_bringup/launch/openset.launch.py`
 
-<!--- --8<-- [end:sec1] -->
+### ROS2 Service Interface
 
-`rai-perception` agents create two ROS 2 nodes: `grounding_dino` and `grounded_sam` using [ROS2Connector](../../../docs/API_documentation/connectors/ROS_2_Connectors.md).
-These agents can be triggered by ROS2 services:
+The agents can be triggered by ROS2 services:
 
 -   `grounding_dino_classify`: `rai_interfaces/srv/RAIGroundingDino`
 -   `grounded_sam_segment`: `rai_interfaces/srv/RAIGroundedSam`
 
-> [!TIP]
->
-> If you wish to integrate open-set vision into your ros2 launch file, a premade launch
-> file can be found in `rai/src/rai_bringup/launch/openset.launch.py`
+<!--- --8<-- [end:sec4] -->
 
-> [!NOTE]
-> The weights will be downloaded to `~/.cache/rai` directory.
+<!--- --8<-- [start:sec5] -->
+
+## Dive Deeper: Tools and Integration
 
-## RAI Tools
+This section provides information for developers looking to integrate RAI Perception tools into their applications.
 
-`rai_perception` package contains tools that can be used by [RAI LLM agents](../../../docs/tutorials/walkthrough.md)
+### RAI Tools
+
+`rai_perception` package contains tools that can be used by [RAI LLM agents](../tutorials/walkthrough.md)
 to enhance their perception capabilities. For more information on RAI Tools see
-[Tool use and development](../../../docs/tutorials/tools.md) tutorial.
+[Tool use and development](../tutorials/tools.md) tutorial.
 
 <!--- --8<-- [start:sec2] -->
 
@@ -132,7 +146,7 @@ This tool calls the GroundingDINO service to detect objects from a comma-separat
 
 > [!TIP]
 >
-> you can try example below with [rosbotxl demo](../../../docs/demos/rosbot_xl.md) binary.
+> you can try example below with [rosbotxl demo](../demos/rosbot_xl.md) binary.
 > The binary exposes `/camera/camera/color/image_raw` and `/camera/camera/depth/image_rect_raw` topics.
 
 <!--- --8<-- [start:sec3] -->
@@ -198,30 +212,6 @@ with ROS2Context():
 I have detected the following items in the picture desk: 2.43m away
 ```
 
-## Simple ROS2 Client Node Example
-
-The `rai_perception/talker.py` example demonstrates how to use the perception services for object detection and segmentation. It shows the complete pipeline: GroundingDINO for object detection followed by GroundedSAM for instance segmentation, with visualization output.
-
-This example is useful for:
-
--   Testing perception services integration
--   Understanding the ROS2 service call patterns
--   Seeing detection and segmentation results with bounding boxes and masks
-
-Run the example:
-
-```bash
-cd ~/rai_perception_ws
-python src/rai_perception/scripts/run_perception_agents.py
-```
-
-In a different window, run
-
-```bash
-cd ~/rai_perception_ws
-ros2 run rai_perception talker --ros-args -p image_path:=src/rai_perception/images/sample.jpg
-```
-
-The example will detect objects (dragon, lizard, dinosaur) and save a visualization with bounding boxes and masks to `masks.png`.
-
 <!--- --8<-- [end:sec3] -->
+
+<!--- --8<-- [end:sec5] -->
diff --git a/src/rai_extensions/rai_perception/pyproject.toml b/src/rai_extensions/rai_perception/pyproject.toml
@@ -1,6 +1,7 @@
 [tool.poetry]
-name = "rai_perception"
-version = "0.1.2"
+name = "rai-perception"
+# TODO, update the version once it is published to PyPi
+version = "0.1.5"
 description = "Package for object detection, segmentation and gripping point detection."
 authors = ["Kajetan Rachwał <kajetan.rachwal@robotec.ai>"]
 readme = "README.md"

diff --git a/src/rai_extensions/rai_perception/rai_perception/agents/base_vision_agent.py b/src/rai_extensions/rai_perception/rai_perception/agents/base_vision_agent.py
@@ -67,19 +67,46 @@ def _load_model_with_error_handling(self, model_class):
                 raise e
 
     def _download_weights(self):
+        self.logger.info(
+            f"Downloading weights from {self.WEIGHTS_URL} to {self.weights_path}"
+        )
         try:
             subprocess.run(
                 [
                     "wget",
                     self.WEIGHTS_URL,
                     "-O",
-                    self.weights_path,
+                    str(self.weights_path),
                     "--progress=dot:giga",
-                ]
+                ],
+                check=True,
+                capture_output=True,
+                text=True,
+            )
+            # Verify file exists and has reasonable size (> 1MB)
+            if not os.path.exists(self.weights_path):
+                raise Exception(f"Downloaded file not found at {self.weights_path}")
+            file_size = os.path.getsize(self.weights_path)
+            if file_size < 1024 * 1024:
+                raise Exception(
+                    f"Downloaded file is too small ({file_size} bytes), expected > 1MB"
+                )
+            self.logger.info(
+                f"Successfully downloaded weights ({file_size / (1024 * 1024):.2f} MB)"
             )
-        except Exception:
-            self.logger.error("Could not download weights")
-            raise Exception("Could not download weights")
+        except subprocess.CalledProcessError as e:
+            error_msg = e.stderr if e.stderr else e.stdout if e.stdout else str(e)
+            self.logger.error(f"wget failed: {error_msg}")
+            # Clean up partial download
+            if os.path.exists(self.weights_path):
+                os.remove(self.weights_path)
+            raise Exception(f"Could not download weights: {error_msg}")
+        except Exception as e:
+            self.logger.error(f"Could not download weights: {e}")
+            # Clean up partial download
+            if os.path.exists(self.weights_path):
+                os.remove(self.weights_path)
+            raise
 
     def _remove_weights(self):
         os.remove(self.weights_path)

diff --git a/src/rai_extensions/rai_perception/rai_perception/scripts/__init__.py b/src/rai_extensions/rai_perception/rai_perception/scripts/__init__.py
@@ -0,0 +1,13 @@
+# Copyright (C) 2025 Robotec.AI
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#         http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
diff --git a/...rception/scripts/run_perception_agents.py → ...rception/scripts/run_perception_agents.py b/...rception/scripts/run_perception_agents.py → ...rception/scripts/run_perception_agents.py
@@ -15,6 +15,7 @@
 
 import rclpy
 from rai.agents import wait_for_shutdown
+
 from rai_perception.agents import GroundedSamAgent, GroundingDinoAgent
 
 

diff --git a/src/rai_extensions/rai_perception/rai_perception/tools/gdino_tools.py b/src/rai_extensions/rai_perception/rai_perception/tools/gdino_tools.py
@@ -17,7 +17,7 @@
 import numpy as np
 import sensor_msgs.msg
 from langchain_core.tools import BaseTool
-from pydantic import BaseModel, Field
+from pydantic import BaseModel, ConfigDict, Field
 from rai.communication.ros2 import ROS2Connector
 from rai.communication.ros2.api import convert_ros_img_to_ndarray
 from rai.communication.ros2.ros_async import get_future_result
@@ -84,12 +84,18 @@ class GroundingDinoBaseTool(BaseTool):
     box_threshold: float = Field(default=0.35, description="Box threshold for GDINO")
     text_threshold: float = Field(default=0.45, description="Text threshold for GDINO")
 
+    model_config = ConfigDict(arbitrary_types_allowed=True)
+
+    def _run(self, *args, **kwargs):
+        """Abstract method - must be implemented by subclasses."""
+        raise NotImplementedError("Subclasses must implement _run method")
+
     def _call_gdino_node(
         self, camera_img_message: sensor_msgs.msg.Image, object_names: list[str]
     ) -> Future:
         cli = self.connector.node.create_client(RAIGroundingDino, GDINO_SERVICE_NAME)
         while not cli.wait_for_service(timeout_sec=1.0):
-            self.node.get_logger().info(
+            self.connector.node.get_logger().info(
                 f"service {GDINO_SERVICE_NAME} not available, waiting again..."
             )
         req = RAIGroundingDino.Request()
Original file line number	Diff line number	Diff line change
Expand Up		@@ -15,6 +15,7 @@

		import rclpy
		from rai.agents import wait_for_shutdown

		from rai_perception.agents import GroundedSamAgent, GroundingDinoAgent


Expand Down